CIS 25 - Notes 0009

Virtual Memory is a technique that allows the execution of processes that may not be completely in memory. One major advantage of this scheme is that programs can be larger than physical memory.

Introduction

Virtual Memory is the separation of user logical memory from physical memory. This separation allows an extremely large virtual memory to be provided for programmers when only a smaller physical memory is available.

Virtual Memory is commonly implemented by demand paging.

Demand Paging

Logical memory is divided up into pages. Physical memory is divided up into frames. A page table links the two together. A page table is an array of page references (along with other info).

Executing processes only see the huge logical memory. The system (hardware) takes the logical addresses and references the page table. If the page table reference is valid, the system (hardware) continues to reference the physical memory. If the page table reference is invalid, the system (hardware) causes a page fault.

When a page fault occurs, the operating system verifies that the page exists. If it does not exist, it displays a friendly "Page Fault" to the user (or "Blue Screen of Death" under Windows 9x).

If however the OS verifies that the page exists, the page needs to be brought into memory and mapped to the frame via a page table. After that is done, the instruction that caused the page fault needs to be restarted.

Unused (or unreferenced) pages are normally stored on the hard disk (or some other fast storage media) until they are needed (or until the system page-faults to bring those pages in). Thus, page-faults are usually very expensive to do because they reference the relatively slow IO device such as a disk (slow compared to RAM).

Page Replacement

What happens when all the frames are filled up and a page fault occurs? One of the mapped pages has to be kicked out of physical memory and another one brought in.

A lot of the research in demand paging concentrates on various algorithms used to select the victim page to kick out.

The goal of any page replacement algorithm is to reduce the number of page faults.

FIFO Page Replacement

FIFO page replacement is First In First Out for pages. A page that is brought into physical memory first is the first to be kicked out.

Refer to section 10.4.2 in the book for an illustration.

Optimal Page Replacement

There is an optimal page replacement algorithm, but just like with SJF in scheduling, it's not very useful for real life implementation. It is however useful to compare other algorithms to the best case.

Optimal algorithm: Replace the page that will not be used for the longest period of time.

Notice that it's not practical since we can't predict the future (just like with SJF).

LRU Page Replacement

Imagine taking the optimal algorithm and reversing it. In memory accesses, future is very much like the past. So, instead of selecting the page that will not be used the longest, lets select a page that has not been used the longest.

The least-recently-used (LRU) algorithm needs to figure out when and how much a particular page has been used. Knowing that, we can then select a page that hasn't been active recently, in a hope that it won't be active in the near future.

This seemingly good algorithm is not very efficient, since it's expensive to store counts for every page table reference, etc.

LRU Approximation

Since it's expensive to implement direct LRU, we can create an algorithm that approximates the performance of LRU.

While most systems don't provide the expensive counter variable on page table references that is needed to implement LRU, some systems (including Intel's Pentium line) provide a referenced bit. The bit is set if the page has been referenced.

Additional-Reference-Bits Algorithm

Instead relying on the hardware for some counter, we can simulate one. Imagine keeping an 8 bit field with each page reference, and then at some sequential time periods (100 milliseconds or something) we shift the referenced bit onto the most significant part of that 8 bit field. After a few iterations, we get a value that can be compared to others and that indicates how often the page has been accessed in relation to other pages. Refer to section 10.4.5.1 for a more detailed discussion.

Second-Chance Algorithm

A second-chance algorithm gives accessed pages a second chance before replacing them. This also needs a reference bit on the page table.

When a victim frame has been selected, if its reference bit is set, the page gets a second chance. We clear the bit, and move onto the next page. If a page is found with a cleared bit, we simply replace it.

Often used pages will not be replaced because their reference bit will be set to 1 often.

Notice that 2nd chance will degrade to FIFO is all bits are set.

Enhanced Second Chance

We can make several enhancements to the second chance algorithm. Instead of just looking at the referenced bit, we can also look at the modified bit (many systems including Intel's Pentiums have this bit). For example, if the page has been modified, we require extra work of writing the page out to disk (so we may avoid doing that if we can find a page that wasn't modified).

Counting-Based Page Replacement

By keeping a counter, we generally can design many algorithms to deal with page replacement.

LRU - Least Recently Used (talked about earlier)

MFU - Most Frequently Used is based on the argument that the page with the smallest count was probably just brought in and has yet to be used.

Page-Buffering Algorithm

No matter what scheme we use for locating the victim frame, we can implement other enhancements to the process.

We can have a pool (buffer) of free frames, such that instead of looking specifically to replace a page, we simply give it a free frame, and kick out some page whenever the system becomes a bit free.

Allocation of Frames

Frames are actual pieces of physical memory (RAM). The way we allocate them has a significant impact on our system.

Minimum Number of Frames

There is always a certain minimum for number of frames system-wide and per-process. A system needs to have some minimum in order to function, similarly, each process needs to have a certain minimum.

The reason for a minimum comes in when some instructions may access several frames (or may themselves span several frames). Thus, for example, if a process only has 2 frames, and some instruction is trying to access 3 pages, the instruction causes a page-fault, the system loads those pages, and restarts the instruction, which again causes that same page fault. Clearly, we need some minimum number of frames in order to get anywhere.

Allocation Algorithms

There are basically two ways of approaching allocation. Doing it on the global scale or on the local scale. On the global scale means that every frame request goes to the global pool of frames. On a local level, frame needs of some process are only fulfilled by the frames allocated to that process.

We can allocate the same number of pages for each process in the system (which would mean that large important processes are getting about as much memory as small simple programs).

We can also take the size of the process in considering how many frames to allocate to it. For example, a process that's 100k will need more frames than a process that's only 10k.

Again, it's fairly difficult to predict the amount of memory a process will require. A small program may allocate and use a LOT of memory, while a seemingly large program will use very little.

Thrashing

Thrashing is when paging takes over the main activity of the system. It is when your system does nothing but replace pages.

Thrashing is usually caused by high multiprogramming and too few frames to handle the job. Ironically, in some cases, as page faults increase, CPU utilization goes down, and the system brings in more processes to utilize the CPU, which causes even more thrashing.

If frame allocation is global, than thrashing usually spreads to the entire system. The whole system begins to thrash. If on the other hand, frame allocation is local, then only some particular process begins to thrash; at which point, the OS might give it more frames.

Working Set Model

The working set model uses the accessed bit of the page table to get a sense of the set of pages/frames the process is actually using. The idea is that processes have a certain locality to them. Functions use local variables often, but once you return from a function, those variables are no longer used. The working set model tries to determine those pages/frames which are currently being used.

This working set can then be used to provide the process exactly the number of frames it needs, thus, reducing thrashing. If the working set is larger than the currently available supply of frames, then it's probably best to swap out the entire process (since if we let it execute then it will thrash).

Prepaging

There are also a bunch of other techniques that can be utilized to improve system performance. Prepaging is bringing in most needed pages into memory when we swap in a process. It may also mean saving currently working set of pages before swapping out the process, and then restoring them all when we swap in the process (instead of letting each page cause its own page fault).

Page Size

The actual page size plays a major role in the system. If the page size is too small, then the page table itself takes up a significant amount of memory (and may itself be paged out). If the page size is too big, we may be wasting a lot of memory (if a process only needs to use 10k of memory, we don't want to provide 4MB for it). There are arguments either way, but the general trend seems to be towards larger page sizes (memory is cheap).

Program Structure

The idea of virtual memory is such that the user/programmer does not have to concern themselves (or even know about) how memory is managed by the system. Programmers are just given this huge chunk of memory.

A lot more efficient programs and algorithms can be implemented if the programmer and the compiler are aware of paging. For one, just because you can logically address 4 Gigs of RAM doesn't mean that your program should do that.

The order you access memory in also has an impact. Consider code from section 10.8.5 of your book. The way you structure your loop affects your program performance.

Other Stuff

There are other things when implementing virtual memory, like page locking (you might want to lock some pages in memory). Other things are how you might or might not use paging with real time systems, etc., because paging can introduce arbitrary delays into your system, etc.

© 2006, Particle