KEMBAR78
Chapter 9. Main Memory (Updated) | PDF | Data | Computing
0% found this document useful (0 votes)
352 views58 pages

Chapter 9. Main Memory (Updated)

This document discusses main memory and memory management techniques. It begins by explaining how programs must be loaded into main memory to execute and discusses the hardware protection of separate memory spaces for different processes. It then covers contiguous memory allocation and issues like fragmentation that can occur. Paging and page tables are introduced as solutions to these problems by mapping logical to physical addresses dynamically. The concepts of logical versus physical address spaces and dynamic loading and linking of programs are also summarized.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
352 views58 pages

Chapter 9. Main Memory (Updated)

This document discusses main memory and memory management techniques. It begins by explaining how programs must be loaded into main memory to execute and discusses the hardware protection of separate memory spaces for different processes. It then covers contiguous memory allocation and issues like fragmentation that can occur. Paging and page tables are introduced as solutions to these problems by mapping logical to physical addresses dynamically. The concepts of logical versus physical address spaces and dynamic loading and linking of programs are also summarized.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

COMPUTER ORGANIZATION AND DESIGN

The Hardware/Software Interface

Chapter 9
Main Memory

Yunmin Go

School of CSEE
Agenda
◼ Background
◼ Contiguous Memory Allocation
◼ Paging
◼ Structure of the Page Table
◼ Swapping
◼ Example: The Intel 32 and 64-bit Architecture

Main Memory - 2
Background
◼ Program must be brought (from disk) into memory and placed within
a process for it to be run
◼ Main memory and registers are only storage CPU can access directly
◼ Memory unit only sees a stream of:
◼ addresses + read requests, or
◼ address + data and write requests
◼ Register access is done in one CPU clock (or less)
◼ Main memory can take many cycles, causing a stall
◼ Cache sits between main memory and CPU registers
◼ Protection of memory required to ensure correct operation

Main Memory - 3
Basic Hardware
◼ Protection from other processes
◼ Each process has a separate memory space
◼ Can be implemented by base register and limit register
◼ CPU H/W compares every address generated
◼ Base/limit registers can be loaded by privileged instruction

Main Memory - 4
Address Binding
◼ Most systems allow a user process to reside in
any part of the physical memory
◼ Representation of address
◼ Source program: symbolic address
◼ Ex) variable count
◼ Compiler: symbolic address → relocatable address
◼ Ex) 14 bytes from beginning of the module
◼ Linkage editor or loader: relocatable address
→ absolute address
◼ Ex) 0x74014
◼ Address binding: mapping one address space
to another
Main Memory - 5
Address Binding Time
◼ Compile time: If memory location known a priori,
absolute code can be generated; must recompile
code if starting location changes
◼ Load time: Must generate relocatable code if
memory location is not known at compile time
◼ Execution time: Binding delayed until run time if
the process can be moved during its execution
from one memory segment to another
◼ Need hardware support for address maps (e.g., base
and limit registers)
◼ Most operating systems use this method

Main Memory - 6
Logical vs. Physical Address Space
◼ Address
◼ Logical address: generated by the CPU; also referred to as virtual address
◼ Physical address: address seen by the memory unit
◼ Logical and physical addresses are the same in compile-time and load-time
address-binding schemes
◼ Logical and physical addresses differ in execution-time address-binding
scheme

Main Memory - 7
Logical vs. Physical Address Space
◼ Address space
◼ Logical address space: the set of all logical addresses generated by a
program
◼ Physical address space: the set of all physical addresses corresponding to
these logical addresses
◼ Memory-management unit: Hardware device that maps virtual
address to physical address at run time

Main Memory - 8
Logical vs. Physical Address Space
◼ A simple method of address mapping using relocation register (base
register)
◼ The value in the relocation register is added to every address generated by a
user process at the time it is sent to memory
◼ The user program deals with logical addresses; it never sees the real physical
addresses

Main Memory - 9
Dynamic Loading
◼ If the entire program and data must be in physical memory, the size
of process is limited by that of physical memory
◼ With dynamic loading
◼ A routine is not loaded until it is called
◼ When a routine is called …
◼ Calling routine checks whether the routine is loaded or not
◼ If not, relocatable linking loader loads the routine and update the program’s address tables
◼ Then the control is passed to the routine
◼ Advantage
◼ Unused routine is never loaded → better memory-space utilization
◼ Does not require special support from OS
◼ However, OS can support by providing library routines
Main Memory - 10
Dynamic Linking
◼ Static linking: system libraries and program code combined by the
loader into the binary program image
◼ Dynamic linking: linking is postponed until execution time
◼ Stub: a small piece of code that indicates …
◼ How to locate the appropriate library routine
◼ How to load the library if it is not ready
◼ After a library routine is dynamically linked, that routine is executed directly
◼ stub replaces itself with address of the routine and execute it.
◼ Advantages of dynamic linking
◼ Library code can be shared among processes
◼ Library update doesn’t require re-linking
◼ Dynamic linking requires help from OS
Main Memory - 11
Agenda
◼ Background
◼ Contiguous Memory Allocation
◼ Paging
◼ Structure of the Page Table
◼ Swapping
◼ Example: The Intel 32 and 64-bit Architecture

Main Memory - 12
Contiguous Memory Allocation
◼ Main memory must support both OS and user processes
◼ Limited resource, must allocate efficiently
◼ Contiguous allocation is one early method
◼ Main memory usually into two partitions:
◼ Operating system: usually held in low memory
◼ User processes: held in high memory
◼ Each process contained in single contiguous section of memory

Main Memory - 13
Memory Mapping and Protection
◼ Relocation registers are used to protect user processes from each
other, and from changing operating-system code and data
◼ Base register contains value of smallest physical address
◼ Limit register contains range of logical addresses
◼ each logical address must be less than the limit register
◼ MMU maps logical address dynamically
◼ Relocation and limit registers are
loaded by dispatcher during
context switching

Main Memory - 14
Variable Partition
◼ Variable partition scheme
◼ Hole: block of available memory (holes are scattered throughout memory)
◼ When a process arrives, it is allocated memory from a hole large enough to
accommodate it
◼ Process exiting frees its partition and adjacent free partitions can be combined
◼ Operating system maintains information about:
◼ Allocated partitions
◼ Free partitions (hole)

Main Memory - 15
Dynamic Storage Allocation Problem
◼ Dynamic storage allocation strategies
◼ First-fit: Allocate the first hole that is big enough
◼ Best-fit: Allocate the smallest hole that is big enough
◼ We must search entire list, unless the list is sorted by size
◼ It produces the smallest leftover hole
◼ Worst-fit: Allocate the largest hole
◼ We must also search entire list, unless the list is sorted by size
◼ It produces the largest leftover hole
◼ Which is the best?
◼ First-fit and best-fit are better than worst-fit in terms of speed and storage
utilization
◼ It is not clear whether first fit is better than best fit or not, but first fit is generally
faster than best fit
Main Memory - 16
Fragmentation
◼ External Fragmentation: total memory space exists to satisfy a
request, but it is not contiguous
OS
◼ First-fit and best-fit strategies suffer from external fragmentation
process 5
◼ Sometimes, it’s impossible to allocate large continuous memory
although total size of free memory larger than the required
memory process 8
◼ 50-percent rule (analysis of first fit)
◼ Given N allocated blocks, another 0.5N blocks will be
lost to fragmentation. (1/3 of memory may be unusable)
process 2
◼ Fragmentation of main memory also affects backing store

Main Memory - 17
Fragmentation
◼ Memory allocated to a process may slightly larger than requested
memory
◼ Ex) allocating 18462 bytes from hole whose size is 18464 bytes
◼ Reduce overhead to keep small holes
◼ Generally, physical memory can be broken into fixed-size blocks
◼ Internal fragmentation: difference between the amounts of allocated
memory and the requested memory

Main Memory - 18
Fragmentation
◼ Remedy of external ◼ Alternative remedy: permit
fragmentation: compaction logical address space of the
◼ Possible only if relocation is processes to be noncontiguous
dynamic and is done at execution ◼ Paging
time ◼ Segmentation
◼ Expensive
OS OS
Mapping
process 5 process 5

process 8
process 8

process 2

process 2

Main Memory - 19
Agenda
◼ Background
◼ Contiguous Memory Allocation
◼ Paging
◼ Structure of the Page Table
◼ Swapping
◼ Example: The Intel 32 and 64-bit Architecture

Main Memory - 20
Paging
◼ Paging: a memory-management scheme
that permits the physical address space
of a process to be noncontiguous
◼ Physical memory: consists of frames Mapping
(fixed-size blocks)
◼ Keep track of all free frames
◼ Logical memory: consists of pages
(fixed-size blocks)
◼ Pages are mapped with frames through
page table → translate logical to physical address

Main Memory - 21
Address Translation Scheme
◼ Every address generated by CPU is divided into:
◼ Page number (p): used as an index into a per-process page table which
contains base address of each page in physical memory
◼ Page offset (d): location in the frame being referenced

→ The base address of the frame is combined with the page offset to define the
physical memory address that is sent to the memory unit
page number page offset
p d
m -n n

◼ Size of logical address space: 2m


◼ Page size: 2n
Main Memory - 22
Paging Hardware

Main Memory - 23
Paging Model
◼ Paging model of logical and physical memory

Main Memory - 24
Paging Example
◼ Logical address: n = 2 and m = 4
◼ Using a page size of 4 bytes and a physical memory of 32 bytes (8 pages)

Main Memory - 25
Paging Example
◼ Calculating internal fragmentation
◼ Page size = 2,048 bytes
◼ Process size = 72,766 bytes
◼ 35 pages + 1,086 bytes
◼ Internal fragmentation of 2,048 - 1,086 = 962 bytes
◼ Worst case fragmentation = 1 frame – 1 byte
◼ On average fragmentation = 1 / 2 frame size

So small page sizes are desirable?

Main Memory - 26
Overhead of Paging
◼ Each page table entry takes memory to track
◼ This overhead is reduced as the size of the page increases
◼ Disk I/O is more efficient when the amount of data being transferred
is larger (Chapter 11)
◼ Page sizes have grown over time
◼ Today, pages are typically either 4KB or 8KB
◼ Some CPUs and operating systems support multiple page sizes
◼ Windows 10 supports page sizes of 4KB and 2MB
◼ Linux supports two page size: 4KB(default) and huge page (architecture-dependent)

Main Memory - 27
Free Frames
◼ OS keeps information about
frames in a frame table
◼ # of total fames
◼ Which frames are allocated
◼ Which frames are available
◼ ETC.
◼ OS maintains list of free frames

Before allocation After allocation

Main Memory - 28
Hardware Support Main Memory

◼ Page table is stored in main memory CPU


Page table Size
◼ Page-table base register (PTBR) PTBR

points the page table PTLR

◼ Page-table length register (PTLR)


indicates size of the page table
◼ In this scheme every data/instruction access requires two memory accesses
◼ One for the page table and one for the data/instruction
◼ The two memory access problem can be solved by the use of a special fast-
lookup hardware cache called translation look-aside buffers (TLBs) (also
called associative memory)

Main Memory - 29
Hardware Support
◼ Translation look-aside buffer (TLB): small fast look up H/W cache
◼ Associative, high-speed memory, consists of pairs (key, value)
◼ If page number of logical address is in TLB, delay is less than 10% of unmapped memory
access
◼ Otherwise (TLB miss), access page table in main memory
◼ Insert (page number, frame number) into TLB
◼ If TLB is full, OS select one for replacement
◼ Some TLBs store address-space identifiers (ASIDs) in each TLB entry
◼ ASID uniquely identifies each process to provide address-space protection for that process
◼ If the TLB does not support separate ASIDs, TLB must be flushed at every context switch
◼ TLBs typically small (64 to 1,024 entries)

Main Memory - 30
Hardware Support
◼ Associative memory – parallel search
P age # Frame #

◼ Address translation (p, d)


◼ If p is in associative register, get frame # out
◼ Otherwise get frame # from page table in memory

Main Memory - 31
Paging Hardware with TLB

Main Memory - 32
Effective Access Time
◼ Hit ratio: percentage of times that a page number is found in the TLB
◼ An 80% hit ratio means that we find the desired page number in the TLB 80%
of the time
◼ Suppose that 10 nanoseconds to access memory
◼ If we find the desired page in TLB then a mapped-memory access take 10 ns
◼ Otherwise we need two memory access so it is 20 ns
◼ Effective Access Time (EAT)
EAT = 0.80 x 10 + 0.20 x 20 = 12 nanoseconds
implying 20% slowdown in access time
◼ Consider a more realistic hit ratio of 99%,
EAT = 0.99 x 10 + 0.01 x 20 = 10.1ns
implying only 1% slowdown in access time
Main Memory - 33
Memory Protection
◼ Each frame has a protection bits
◼ Read-write / read only
◼ Attempt to write to a read-only page causes H/W trap
→ Extension: read-only, read-write, execute-only, …
◼ Combination of them
◼ Each entry of page table has valid-invalid bit
◼ “valid” indicates that the associated page is in the process’ logical address
space, and is thus a legal page
◼ “invalid” indicates that the page is not in the process’ logical address space
◼ OS does not allow to access the page
◼ Rarely does a process use all of its address range
◼ Page table can be wasting memory → page-table length register (PTLR)
Main Memory - 34
Valid-Invalid Bit
◼ Valid (v) or Invalid (i) bit in a page table

Main Memory - 35
Shared Pages
◼ Shared code
◼ One copy of read-only (reentrant) code shared among processes (i.e., text
editors, compilers, window systems)
◼ Similar to multiple threads sharing the same process space
◼ Also useful for inter-process communication if sharing of read-write pages is
allowed
◼ Private code and data
◼ Each process keeps a separate copy of the code and data
◼ The pages for the private code and data can appear anywhere in the logical
address space

Main Memory - 36
Shared Pages Example

Main Memory - 37
Agenda
◼ Background
◼ Contiguous Memory Allocation
◼ Paging
◼ Structure of the Page Table
◼ Swapping
◼ Example: The Intel 32 and 64-bit Architecture

Main Memory - 38
Structure of the Page Table
◼ Memory structures for paging can get huge using straight-forward
methods
◼ Consider a 32-bit logical address space as on modern computers
◼ Page size of 4 KB (212)
◼ Page table would have 1 million entries (232 / 212)
◼ If each entry is 4 bytes → each process 4 MB of physical address space for
the page table alone
◼ Don’t want to allocate that contiguously in main memory
◼ One simple solution is to divide the page table into smaller units
◼ Hierarchical Paging
◼ Hashed Page Tables
◼ Inverted Page Tables

Main Memory - 39
Hierarchical Page Tables
◼ Hierarchical page tables: Break up the logical address space into
multiple page tables
Two-level page table
◼ We then page the page table
◼ Ex) On 32-bit machine with 4KB page size,
a logical address is divided into:
◼ a page number consisting of 20 bits
◼ a page offset consisting of 12 bits
◼ since the page table is paged,
the page number is further divided into:
◼ a 10-bit page number
◼ a 10-bit page offset

◼ Known as forward-mapped page table p1 is an index into the outer page table,
p2 is the displacement within the page of the inner page table
Main Memory - 40
Hierarchical Page Tables
◼ Address-Translation Scheme

Main Memory - 41
Hierarchical Page Tables
◼ 64-bit logical address space
◼ Even two-level paging scheme not sufficient
◼ If page size is 4 KB (212)
◼ Then page table has 252 entries
◼ If two level scheme, inner page tables could
be 210 4-byte entries
◼ Outer page table has 242 entries or 244 bytes
◼ One solution is to add a 2nd outer page table
◼ But in the following example the 2nd outer
page table is still 234 bytes in size
◼ And possibly 4 memory access to get
to one physical memory location

Main Memory - 42
Hashed Page Table
◼ Hashed page tables
◼ Common in large address spaces (> 32 bits)
◼ The virtual page number is hashed into a page table
◼ This page table contains a chain of elements hashing to the same location
◼ Each element contains
◼ The virtual page number
◼ The value of the mapped page frame
◼ A pointer to the next element
◼ Virtual page numbers are compared
in this chain searching for a match
◼ If a match is found, the corresponding
physical frame is extracted

Main Memory - 43
Inverted Page Table
◼ Inverted page table: Rather than each process having a page table
and keeping track of all possible logical pages, track all physical
pages
◼ Only one inverted page table exists
in whole system
◼ Each entry is for real page of memory
◼ Each entry has address-space identifier
◼ Entry of inverted page table consists of
◼ <process-id, page number>
◼ Logical address consists of
◼ <process-id, page-number, offset>

Main Memory - 44
Agenda
◼ Background
◼ Contiguous Memory Allocation
◼ Paging
◼ Structure of the Page Table
◼ Swapping
◼ Example: The Intel 32 and 64-bit Architecture

Main Memory - 45
Swapping
◼ Swapping: a process can be swapped temporarily out of memory to
a backing store, and then brought back into memory for continued
execution
◼ It makes it possible for the total physical
memory space of processes can exceed
physical memory
◼ Backing store: fast disk large enough to
accommodate copies of all memory images
for all users; must provide direct access
to these memory images
◼ System maintains a ready queue of
ready-to-run processes which have
memory images on disk
Main Memory - 46
Context Switch Time including Swapping
◼ If next processes to be put on CPU is not in memory, need to swap
out a process and swap in target process
◼ Context switch time can then be very high
◼ Major part of swap time is transfer time
◼ Total transfer time is directly proportional to the amount of memory swapped
◼ Ex) size of process = 100MB
hard disk transfer rate = 50MB/sec
total context switch including swapping
= swap out time (2000ms) + swap in time (2000ms) = 4000ms (4sec)

Main Memory - 47
Context Switch Time and Swapping
◼ Standard swapping involves moving entire processes between main
memory and backing store
◼ It allows physical memory to be oversubscribed, so that the system can
accommodate more processes than size of physical memory
◼ However, standard swapping is not used in modern operating systems
◼ The amount of time required to move entire processes is prohibitive
◼ Modified versions of swapping are found on many systems (i.e.,
UNIX, Linux, and Windows)
◼ Swapping normally disabled
◼ Started if more than threshold amount of memory allocated
◼ Disabled again once memory demand reduced below threshold

Main Memory - 48
Swapping with Paging

Main Memory - 49
Swapping on Mobile Systems
◼ Not typically supported
◼ Flash memory based
◼ Small amount of space
◼ Limited number of write cycles
◼ Poor throughput between flash memory and CPU on mobile platform
◼ Instead use other methods to free memory if low
◼ iOS asks apps to voluntarily relinquish allocated memory
◼ Read-only data thrown out and reloaded from flash if needed
◼ Failure to free can result in termination
◼ Android terminates apps if low free memory, but first writes application state
to flash for fast restart
◼ Both OSes support paging

Main Memory - 50
Agenda
◼ Background
◼ Contiguous Memory Allocation
◼ Paging
◼ Structure of the Page Table
◼ Swapping
◼ Example: The Intel 32 and 64-bit Architecture

Main Memory - 51
The Intel 32 and 64-bit Architectures
◼ Dominant industry chips

◼ Pentium CPUs are 32-bit and called IA-32 architecture

◼ Current Intel CPUs are 64-bit and called IA-64 architecture

◼ Many variations in the chips, cover the main ideas here

Main Memory - 52
The Intel IA-32 Architecture
◼ Logical to physical address translation in IA-32

◼ Segmentation

Main Memory - 53
IA-32 Segmentation
◼ Supports both segmentation and segmentation with paging
◼ Each segment can be 4GB
◼ Up to 16K segments per process
◼ Divided into two partitions
◼ First partition of up to 8K segments are private to process (kept in local descriptor table
(LDT))
◼ Second partition of up to 8K segments shared among all processes (kept in global
descriptor table (GDT))

Main Memory - 54
IA-32 Segmentation
◼ CPU generates logical address
◼ Selector given to segmentation unit
◼ Which produces linear addresses

◼ Linear address given to paging unit


◼ Which generates physical address in main memory
◼ Paging units form equivalent of MMU
◼ Pages sizes can be 4 KB or 4 MB

Main Memory - 55
IA-32 Paging
◼ IA-32 allows a page size of either 4KB
or 4MB
◼ For 4KB page, IA-32 uses a two-level
paging scheme

◼ If Page_Size flag is set, the page


directory points directly 2MB page
◼ Bypassing the inner page table

Main Memory - 56
IA-32 Page Address Extensions
◼ 32-bit address limits led Intel to create page address extension (PAE), allowing
32-bit apps access to more than 4GB of memory space
◼ Paging went to a 3-level scheme
◼ Top two bits refer to a page directory pointer table
◼ Page-directory and page-table entries moved to 64-bits in size
◼ Net effect is increasing address space to 36 bits – 64GB of physical memory

Main Memory - 57
Intel x86-64
◼ Current generation Intel x86 architecture
◼ 64 bits is ginormous (> 16 exabytes)
◼ In practice only implement 48 bit addressing
◼ Page sizes of 4 KB, 2 MB, 1 GB
◼ Four levels of paging hierarchy
◼ Can also use PAE so virtual addresses are 48 bits and physical
addresses are 52 bits

Main Memory - 58

You might also like