COMPUTER ORGANIZATION AND DESIGN
The Hardware/Software Interface
Chapter 9
Main Memory
Yunmin Go
School of CSEE
Agenda
◼ Background
◼ Contiguous Memory Allocation
◼ Paging
◼ Structure of the Page Table
◼ Swapping
◼ Example: The Intel 32 and 64-bit Architecture
Main Memory - 2
Background
◼ Program must be brought (from disk) into memory and placed within
a process for it to be run
◼ Main memory and registers are only storage CPU can access directly
◼ Memory unit only sees a stream of:
◼ addresses + read requests, or
◼ address + data and write requests
◼ Register access is done in one CPU clock (or less)
◼ Main memory can take many cycles, causing a stall
◼ Cache sits between main memory and CPU registers
◼ Protection of memory required to ensure correct operation
Main Memory - 3
Basic Hardware
◼ Protection from other processes
◼ Each process has a separate memory space
◼ Can be implemented by base register and limit register
◼ CPU H/W compares every address generated
◼ Base/limit registers can be loaded by privileged instruction
Main Memory - 4
Address Binding
◼ Most systems allow a user process to reside in
any part of the physical memory
◼ Representation of address
◼ Source program: symbolic address
◼ Ex) variable count
◼ Compiler: symbolic address → relocatable address
◼ Ex) 14 bytes from beginning of the module
◼ Linkage editor or loader: relocatable address
→ absolute address
◼ Ex) 0x74014
◼ Address binding: mapping one address space
to another
Main Memory - 5
Address Binding Time
◼ Compile time: If memory location known a priori,
absolute code can be generated; must recompile
code if starting location changes
◼ Load time: Must generate relocatable code if
memory location is not known at compile time
◼ Execution time: Binding delayed until run time if
the process can be moved during its execution
from one memory segment to another
◼ Need hardware support for address maps (e.g., base
and limit registers)
◼ Most operating systems use this method
Main Memory - 6
Logical vs. Physical Address Space
◼ Address
◼ Logical address: generated by the CPU; also referred to as virtual address
◼ Physical address: address seen by the memory unit
◼ Logical and physical addresses are the same in compile-time and load-time
address-binding schemes
◼ Logical and physical addresses differ in execution-time address-binding
scheme
Main Memory - 7
Logical vs. Physical Address Space
◼ Address space
◼ Logical address space: the set of all logical addresses generated by a
program
◼ Physical address space: the set of all physical addresses corresponding to
these logical addresses
◼ Memory-management unit: Hardware device that maps virtual
address to physical address at run time
Main Memory - 8
Logical vs. Physical Address Space
◼ A simple method of address mapping using relocation register (base
register)
◼ The value in the relocation register is added to every address generated by a
user process at the time it is sent to memory
◼ The user program deals with logical addresses; it never sees the real physical
addresses
Main Memory - 9
Dynamic Loading
◼ If the entire program and data must be in physical memory, the size
of process is limited by that of physical memory
◼ With dynamic loading
◼ A routine is not loaded until it is called
◼ When a routine is called …
◼ Calling routine checks whether the routine is loaded or not
◼ If not, relocatable linking loader loads the routine and update the program’s address tables
◼ Then the control is passed to the routine
◼ Advantage
◼ Unused routine is never loaded → better memory-space utilization
◼ Does not require special support from OS
◼ However, OS can support by providing library routines
Main Memory - 10
Dynamic Linking
◼ Static linking: system libraries and program code combined by the
loader into the binary program image
◼ Dynamic linking: linking is postponed until execution time
◼ Stub: a small piece of code that indicates …
◼ How to locate the appropriate library routine
◼ How to load the library if it is not ready
◼ After a library routine is dynamically linked, that routine is executed directly
◼ stub replaces itself with address of the routine and execute it.
◼ Advantages of dynamic linking
◼ Library code can be shared among processes
◼ Library update doesn’t require re-linking
◼ Dynamic linking requires help from OS
Main Memory - 11
Agenda
◼ Background
◼ Contiguous Memory Allocation
◼ Paging
◼ Structure of the Page Table
◼ Swapping
◼ Example: The Intel 32 and 64-bit Architecture
Main Memory - 12
Contiguous Memory Allocation
◼ Main memory must support both OS and user processes
◼ Limited resource, must allocate efficiently
◼ Contiguous allocation is one early method
◼ Main memory usually into two partitions:
◼ Operating system: usually held in low memory
◼ User processes: held in high memory
◼ Each process contained in single contiguous section of memory
Main Memory - 13
Memory Mapping and Protection
◼ Relocation registers are used to protect user processes from each
other, and from changing operating-system code and data
◼ Base register contains value of smallest physical address
◼ Limit register contains range of logical addresses
◼ each logical address must be less than the limit register
◼ MMU maps logical address dynamically
◼ Relocation and limit registers are
loaded by dispatcher during
context switching
Main Memory - 14
Variable Partition
◼ Variable partition scheme
◼ Hole: block of available memory (holes are scattered throughout memory)
◼ When a process arrives, it is allocated memory from a hole large enough to
accommodate it
◼ Process exiting frees its partition and adjacent free partitions can be combined
◼ Operating system maintains information about:
◼ Allocated partitions
◼ Free partitions (hole)
Main Memory - 15
Dynamic Storage Allocation Problem
◼ Dynamic storage allocation strategies
◼ First-fit: Allocate the first hole that is big enough
◼ Best-fit: Allocate the smallest hole that is big enough
◼ We must search entire list, unless the list is sorted by size
◼ It produces the smallest leftover hole
◼ Worst-fit: Allocate the largest hole
◼ We must also search entire list, unless the list is sorted by size
◼ It produces the largest leftover hole
◼ Which is the best?
◼ First-fit and best-fit are better than worst-fit in terms of speed and storage
utilization
◼ It is not clear whether first fit is better than best fit or not, but first fit is generally
faster than best fit
Main Memory - 16
Fragmentation
◼ External Fragmentation: total memory space exists to satisfy a
request, but it is not contiguous
OS
◼ First-fit and best-fit strategies suffer from external fragmentation
process 5
◼ Sometimes, it’s impossible to allocate large continuous memory
although total size of free memory larger than the required
memory process 8
◼ 50-percent rule (analysis of first fit)
◼ Given N allocated blocks, another 0.5N blocks will be
lost to fragmentation. (1/3 of memory may be unusable)
process 2
◼ Fragmentation of main memory also affects backing store
Main Memory - 17
Fragmentation
◼ Memory allocated to a process may slightly larger than requested
memory
◼ Ex) allocating 18462 bytes from hole whose size is 18464 bytes
◼ Reduce overhead to keep small holes
◼ Generally, physical memory can be broken into fixed-size blocks
◼ Internal fragmentation: difference between the amounts of allocated
memory and the requested memory
Main Memory - 18
Fragmentation
◼ Remedy of external ◼ Alternative remedy: permit
fragmentation: compaction logical address space of the
◼ Possible only if relocation is processes to be noncontiguous
dynamic and is done at execution ◼ Paging
time ◼ Segmentation
◼ Expensive
OS OS
Mapping
process 5 process 5
process 8
process 8
process 2
process 2
Main Memory - 19
Agenda
◼ Background
◼ Contiguous Memory Allocation
◼ Paging
◼ Structure of the Page Table
◼ Swapping
◼ Example: The Intel 32 and 64-bit Architecture
Main Memory - 20
Paging
◼ Paging: a memory-management scheme
that permits the physical address space
of a process to be noncontiguous
◼ Physical memory: consists of frames Mapping
(fixed-size blocks)
◼ Keep track of all free frames
◼ Logical memory: consists of pages
(fixed-size blocks)
◼ Pages are mapped with frames through
page table → translate logical to physical address
Main Memory - 21
Address Translation Scheme
◼ Every address generated by CPU is divided into:
◼ Page number (p): used as an index into a per-process page table which
contains base address of each page in physical memory
◼ Page offset (d): location in the frame being referenced
→ The base address of the frame is combined with the page offset to define the
physical memory address that is sent to the memory unit
page number page offset
p d
m -n n
◼ Size of logical address space: 2m
◼ Page size: 2n
Main Memory - 22
Paging Hardware
Main Memory - 23
Paging Model
◼ Paging model of logical and physical memory
Main Memory - 24
Paging Example
◼ Logical address: n = 2 and m = 4
◼ Using a page size of 4 bytes and a physical memory of 32 bytes (8 pages)
Main Memory - 25
Paging Example
◼ Calculating internal fragmentation
◼ Page size = 2,048 bytes
◼ Process size = 72,766 bytes
◼ 35 pages + 1,086 bytes
◼ Internal fragmentation of 2,048 - 1,086 = 962 bytes
◼ Worst case fragmentation = 1 frame – 1 byte
◼ On average fragmentation = 1 / 2 frame size
So small page sizes are desirable?
Main Memory - 26
Overhead of Paging
◼ Each page table entry takes memory to track
◼ This overhead is reduced as the size of the page increases
◼ Disk I/O is more efficient when the amount of data being transferred
is larger (Chapter 11)
◼ Page sizes have grown over time
◼ Today, pages are typically either 4KB or 8KB
◼ Some CPUs and operating systems support multiple page sizes
◼ Windows 10 supports page sizes of 4KB and 2MB
◼ Linux supports two page size: 4KB(default) and huge page (architecture-dependent)
Main Memory - 27
Free Frames
◼ OS keeps information about
frames in a frame table
◼ # of total fames
◼ Which frames are allocated
◼ Which frames are available
◼ ETC.
◼ OS maintains list of free frames
Before allocation After allocation
Main Memory - 28
Hardware Support Main Memory
◼ Page table is stored in main memory CPU
Page table Size
◼ Page-table base register (PTBR) PTBR
points the page table PTLR
◼ Page-table length register (PTLR)
indicates size of the page table
◼ In this scheme every data/instruction access requires two memory accesses
◼ One for the page table and one for the data/instruction
◼ The two memory access problem can be solved by the use of a special fast-
lookup hardware cache called translation look-aside buffers (TLBs) (also
called associative memory)
Main Memory - 29
Hardware Support
◼ Translation look-aside buffer (TLB): small fast look up H/W cache
◼ Associative, high-speed memory, consists of pairs (key, value)
◼ If page number of logical address is in TLB, delay is less than 10% of unmapped memory
access
◼ Otherwise (TLB miss), access page table in main memory
◼ Insert (page number, frame number) into TLB
◼ If TLB is full, OS select one for replacement
◼ Some TLBs store address-space identifiers (ASIDs) in each TLB entry
◼ ASID uniquely identifies each process to provide address-space protection for that process
◼ If the TLB does not support separate ASIDs, TLB must be flushed at every context switch
◼ TLBs typically small (64 to 1,024 entries)
Main Memory - 30
Hardware Support
◼ Associative memory – parallel search
P age # Frame #
◼ Address translation (p, d)
◼ If p is in associative register, get frame # out
◼ Otherwise get frame # from page table in memory
Main Memory - 31
Paging Hardware with TLB
Main Memory - 32
Effective Access Time
◼ Hit ratio: percentage of times that a page number is found in the TLB
◼ An 80% hit ratio means that we find the desired page number in the TLB 80%
of the time
◼ Suppose that 10 nanoseconds to access memory
◼ If we find the desired page in TLB then a mapped-memory access take 10 ns
◼ Otherwise we need two memory access so it is 20 ns
◼ Effective Access Time (EAT)
EAT = 0.80 x 10 + 0.20 x 20 = 12 nanoseconds
implying 20% slowdown in access time
◼ Consider a more realistic hit ratio of 99%,
EAT = 0.99 x 10 + 0.01 x 20 = 10.1ns
implying only 1% slowdown in access time
Main Memory - 33
Memory Protection
◼ Each frame has a protection bits
◼ Read-write / read only
◼ Attempt to write to a read-only page causes H/W trap
→ Extension: read-only, read-write, execute-only, …
◼ Combination of them
◼ Each entry of page table has valid-invalid bit
◼ “valid” indicates that the associated page is in the process’ logical address
space, and is thus a legal page
◼ “invalid” indicates that the page is not in the process’ logical address space
◼ OS does not allow to access the page
◼ Rarely does a process use all of its address range
◼ Page table can be wasting memory → page-table length register (PTLR)
Main Memory - 34
Valid-Invalid Bit
◼ Valid (v) or Invalid (i) bit in a page table
Main Memory - 35
Shared Pages
◼ Shared code
◼ One copy of read-only (reentrant) code shared among processes (i.e., text
editors, compilers, window systems)
◼ Similar to multiple threads sharing the same process space
◼ Also useful for inter-process communication if sharing of read-write pages is
allowed
◼ Private code and data
◼ Each process keeps a separate copy of the code and data
◼ The pages for the private code and data can appear anywhere in the logical
address space
Main Memory - 36
Shared Pages Example
Main Memory - 37
Agenda
◼ Background
◼ Contiguous Memory Allocation
◼ Paging
◼ Structure of the Page Table
◼ Swapping
◼ Example: The Intel 32 and 64-bit Architecture
Main Memory - 38
Structure of the Page Table
◼ Memory structures for paging can get huge using straight-forward
methods
◼ Consider a 32-bit logical address space as on modern computers
◼ Page size of 4 KB (212)
◼ Page table would have 1 million entries (232 / 212)
◼ If each entry is 4 bytes → each process 4 MB of physical address space for
the page table alone
◼ Don’t want to allocate that contiguously in main memory
◼ One simple solution is to divide the page table into smaller units
◼ Hierarchical Paging
◼ Hashed Page Tables
◼ Inverted Page Tables
Main Memory - 39
Hierarchical Page Tables
◼ Hierarchical page tables: Break up the logical address space into
multiple page tables
Two-level page table
◼ We then page the page table
◼ Ex) On 32-bit machine with 4KB page size,
a logical address is divided into:
◼ a page number consisting of 20 bits
◼ a page offset consisting of 12 bits
◼ since the page table is paged,
the page number is further divided into:
◼ a 10-bit page number
◼ a 10-bit page offset
◼ Known as forward-mapped page table p1 is an index into the outer page table,
p2 is the displacement within the page of the inner page table
Main Memory - 40
Hierarchical Page Tables
◼ Address-Translation Scheme
Main Memory - 41
Hierarchical Page Tables
◼ 64-bit logical address space
◼ Even two-level paging scheme not sufficient
◼ If page size is 4 KB (212)
◼ Then page table has 252 entries
◼ If two level scheme, inner page tables could
be 210 4-byte entries
◼ Outer page table has 242 entries or 244 bytes
◼ One solution is to add a 2nd outer page table
◼ But in the following example the 2nd outer
page table is still 234 bytes in size
◼ And possibly 4 memory access to get
to one physical memory location
Main Memory - 42
Hashed Page Table
◼ Hashed page tables
◼ Common in large address spaces (> 32 bits)
◼ The virtual page number is hashed into a page table
◼ This page table contains a chain of elements hashing to the same location
◼ Each element contains
◼ The virtual page number
◼ The value of the mapped page frame
◼ A pointer to the next element
◼ Virtual page numbers are compared
in this chain searching for a match
◼ If a match is found, the corresponding
physical frame is extracted
Main Memory - 43
Inverted Page Table
◼ Inverted page table: Rather than each process having a page table
and keeping track of all possible logical pages, track all physical
pages
◼ Only one inverted page table exists
in whole system
◼ Each entry is for real page of memory
◼ Each entry has address-space identifier
◼ Entry of inverted page table consists of
◼ <process-id, page number>
◼ Logical address consists of
◼ <process-id, page-number, offset>
Main Memory - 44
Agenda
◼ Background
◼ Contiguous Memory Allocation
◼ Paging
◼ Structure of the Page Table
◼ Swapping
◼ Example: The Intel 32 and 64-bit Architecture
Main Memory - 45
Swapping
◼ Swapping: a process can be swapped temporarily out of memory to
a backing store, and then brought back into memory for continued
execution
◼ It makes it possible for the total physical
memory space of processes can exceed
physical memory
◼ Backing store: fast disk large enough to
accommodate copies of all memory images
for all users; must provide direct access
to these memory images
◼ System maintains a ready queue of
ready-to-run processes which have
memory images on disk
Main Memory - 46
Context Switch Time including Swapping
◼ If next processes to be put on CPU is not in memory, need to swap
out a process and swap in target process
◼ Context switch time can then be very high
◼ Major part of swap time is transfer time
◼ Total transfer time is directly proportional to the amount of memory swapped
◼ Ex) size of process = 100MB
hard disk transfer rate = 50MB/sec
total context switch including swapping
= swap out time (2000ms) + swap in time (2000ms) = 4000ms (4sec)
Main Memory - 47
Context Switch Time and Swapping
◼ Standard swapping involves moving entire processes between main
memory and backing store
◼ It allows physical memory to be oversubscribed, so that the system can
accommodate more processes than size of physical memory
◼ However, standard swapping is not used in modern operating systems
◼ The amount of time required to move entire processes is prohibitive
◼ Modified versions of swapping are found on many systems (i.e.,
UNIX, Linux, and Windows)
◼ Swapping normally disabled
◼ Started if more than threshold amount of memory allocated
◼ Disabled again once memory demand reduced below threshold
Main Memory - 48
Swapping with Paging
Main Memory - 49
Swapping on Mobile Systems
◼ Not typically supported
◼ Flash memory based
◼ Small amount of space
◼ Limited number of write cycles
◼ Poor throughput between flash memory and CPU on mobile platform
◼ Instead use other methods to free memory if low
◼ iOS asks apps to voluntarily relinquish allocated memory
◼ Read-only data thrown out and reloaded from flash if needed
◼ Failure to free can result in termination
◼ Android terminates apps if low free memory, but first writes application state
to flash for fast restart
◼ Both OSes support paging
Main Memory - 50
Agenda
◼ Background
◼ Contiguous Memory Allocation
◼ Paging
◼ Structure of the Page Table
◼ Swapping
◼ Example: The Intel 32 and 64-bit Architecture
Main Memory - 51
The Intel 32 and 64-bit Architectures
◼ Dominant industry chips
◼ Pentium CPUs are 32-bit and called IA-32 architecture
◼ Current Intel CPUs are 64-bit and called IA-64 architecture
◼ Many variations in the chips, cover the main ideas here
Main Memory - 52
The Intel IA-32 Architecture
◼ Logical to physical address translation in IA-32
◼ Segmentation
Main Memory - 53
IA-32 Segmentation
◼ Supports both segmentation and segmentation with paging
◼ Each segment can be 4GB
◼ Up to 16K segments per process
◼ Divided into two partitions
◼ First partition of up to 8K segments are private to process (kept in local descriptor table
(LDT))
◼ Second partition of up to 8K segments shared among all processes (kept in global
descriptor table (GDT))
Main Memory - 54
IA-32 Segmentation
◼ CPU generates logical address
◼ Selector given to segmentation unit
◼ Which produces linear addresses
◼ Linear address given to paging unit
◼ Which generates physical address in main memory
◼ Paging units form equivalent of MMU
◼ Pages sizes can be 4 KB or 4 MB
Main Memory - 55
IA-32 Paging
◼ IA-32 allows a page size of either 4KB
or 4MB
◼ For 4KB page, IA-32 uses a two-level
paging scheme
◼ If Page_Size flag is set, the page
directory points directly 2MB page
◼ Bypassing the inner page table
Main Memory - 56
IA-32 Page Address Extensions
◼ 32-bit address limits led Intel to create page address extension (PAE), allowing
32-bit apps access to more than 4GB of memory space
◼ Paging went to a 3-level scheme
◼ Top two bits refer to a page directory pointer table
◼ Page-directory and page-table entries moved to 64-bits in size
◼ Net effect is increasing address space to 36 bits – 64GB of physical memory
Main Memory - 57
Intel x86-64
◼ Current generation Intel x86 architecture
◼ 64 bits is ginormous (> 16 exabytes)
◼ In practice only implement 48 bit addressing
◼ Page sizes of 4 KB, 2 MB, 1 GB
◼ Four levels of paging hierarchy
◼ Can also use PAE so virtual addresses are 48 bits and physical
addresses are 52 bits
Main Memory - 58