W4118 Operating Systems
Instructor: Junfeng Yang
Logistics
Homework 5 due 3:09pm this Thursday
2
Last lecture: LRU page
replacement
Least-Recently-Used: throw out a page that
has not been referenced for the longest time
With locality, approximate OPT
However, difficult to implement efficiently
Clock: efficient, approximate LRU
Clock extension: bias toward replacing clean
pages
Problem with LRU
Doesn’t work well repeated scans with data set larger
than memory
Doesn’t consider frequency
• LFU
• LRU-k and LRU-2Q
3
Today
Linux Memory Management
Address Space Translation
• How does Linux translate address?
Address Space Layout
• What does a process Address Space look like?
• What does the kernel Address Space look like?
• How to implement Copy-on-Write (COW)?
Dynamic Memory Allocation
• How to allocate pages?
• How to allocate objects?
Virtual Memory
• How to replace pages?
• How to load page from disk?
4
Address Translation
Linux uses paging to translate logical
addresses to physical addresses. Linux
does not use segmentation
More portable since some RISC architectures
don’t support segmentation
Hierarchical paging is flexible enough
5
Recall: x86 segmentation and
paging hardware
CPU generates logical address (seg, offset)
Given to segmentation unit
• Which produces linear addresses
Linear address given to paging unit
• Which generates physical address in main memory
• Paging units form equivalent of MMU
6
Segmentation
Since x86 segmentation hardware cannot be
disabled, Linux just uses NULL mappings
Set segment base to 0x00000000, limit to 0xffffffff
segment offset == linear addresses
arch/i386/kernel/head.S
Linux defines four segments
User code (__USER_CS)
User data (__USER_DS)
Kernel code (__KERNEL_CS)
Kernel data (__KERNEL_DATA)
Segment Protection
Current Privilege level (CPL) specifies
privileged mode or user mode
Stored in current code segment descriptor
User code segment: CPL = 3
Kernel code segment: CPL = 0
Descriptor Privilege Level (DPL) specifies
protection
Only accessible if CPL <= DPL
Switch between user mode and kernel mode
(e.g. system call and return)
Hardware load the corresponding segment selector
(__USER_CS or __KERNEL_CS) into register cs
8
Paging
Linux uses 4-level hierarchical paging
A linear address is split into five parts, to
seamlessly handle a range of different addressing
modes
Page Global Dir
Page Upper Dir
Page Middle Dir
Page Table
Page Offset
Example: 32-bit address space, 4KB page without
physical address extension
Page Global dir: 10 bits
Page Upper dir and Page Middle dir are not used
Page Table: 10 bits
Page Offset: 12 bits
9
Page Table Operations
Linux provides data structures and
operations to create, delete, read and write
page directories
include/asm-i386/pgtable.h
arch/i386/mm/hugetlbpage.c
Naming convention
pgd: Page Global Directory
pmd: Page Middle Directory
pud: Page Upper Directory
pte: Page Table Entry
Example: mk_pte(p, prot)
10
TLB Operations
x86 uses hardware TLB
OS does not manage TLB
Only operation: flush TLB entries
include/asm-i386/tlbflush.h
load cr3: flush all TLB entries
invlpg addr: flush a single TLB entry
• Much more efficient than flushing all TLB entries
11
Today
Linux Memory Management
Address Space Translation
• How does Linux translate address?
Address Space Layout
• What does a process Address Space look like?
• What does the kernel Address Space look like?
• How to implement Copy-on-Write (COW)?
Dynamic Memory Allocation
• How to allocate pages?
• How to allocate objects?
Virtual Memory
• How to replace pages?
• How to load page from disk?
12
Recall: Linux Process Address
Space
0xFFFFFFFF Kernel data
kernel mode structures
Kernel space
0xC0000000 User-mode stack-area Kernel space is
also mapped into
user space
User space from user mode
to kernel mode,
user mode
no need to switch
Shared runtime-libraries address spaces
Task’s code and data
0
Kernel Address Space
Kernel Address Space [3G, 4G)
[0, 896 MB) Physical Memory Mapping
• Kernel uses this mapping to manage physical memory
• physical address = logical address – 3G
The rest 128 MB are used for
• Accessing High Memory
• Non-contiguous pages
• Fix-mapped Linear Addresses
Persistent
Physical Fix-mapped
vmalloc …vmalloc High
Memory Linear
area area Memory
Mapping addresses
Mappings
0xC0000000 0xFFFFFFFF
14
A Cool Trick: Copy-on-Write
In fork(), parent and child often share
significant amount of memory
Expensive to do copy all pages
COW Idea: exploit VA to PA indirection
Instead of copying all pages, share parent pages
in child address space
If either process writes to shared pages, only then
is the page copied
How to detect page write?
• Mark pages as read-only in both parent and child
address space
• On write, page fault occurs
15
Sharing Pages
copy_process() in kernel/fork.c
copy_mm()
dup_mmap() // copy page tables
copy_page_range() in mm/memory.c
copy_pud_range()
copy_pmd_range()
copy_pte_range()
copy_one_pte() // mark
readonly
16
Copy Page on Page Fault
do_page_fault in arch/i386/mm/fault.c
cr2 stores faulting virtual address
handle_mm_fault in mm/memory.c
handle_pte_fault in mm/memory.c
if(write_access)
do_wp_page()
17
Today
Linux Memory Management
Address Space Translation
• How does Linux translate address?
Address Space Layout
• What does a process Address Space look like?
• What does the kernel Address Space look like?
• How to implement Copy-on-Write (COW)?
Dynamic Memory Allocation
• How to allocate pages?
• How to allocate objects?
Virtual Memory
• How to replace pages?
• How to load page from disk?
18
Dynamic Memory Allocation
How to allocate pages?
Data structures for page allocation
Buddy algorithm for page allocation
How to allocate objects?
Slab allocation
19
Page Descriptor
Keep track of the status of each page
frame
struct page, include/linux/mm.h
All stored in mem_map array
Simple mapping between a page and its
descriptor
Nth page’s descriptor is mem_map[N]
virt_to_page
page_to_pfn
Memory Zone
Keep track of pages in different zones
struct zone, include/linux/mmzone.h
ZONE_DMA: <16MB
ZONE_NORMAL: 16MB-896MB
ZONE_HIGHMEM: >896MB
Page Allocator
Linux use a buddy allocator for page allocation
Buddy Allocator: Fast, simple allocation for blocks that
are 2^n bytes [Knuth 1968]
Data structure: free lists of blocks of pages of size
2^0, 2^1, …
Allocation restrictions: 2^n pages
Allocation of 2^n pages:
Search free lists (n, n+1, n+2, …) for appropriate size
• Recursively divide larger blocks until reach block of correct
size
• Insert “buddy” blocks into free lists
Free
Recursively coalesce block with buddy if buddy free
Example: allocate a 256-page block
mm/page_alloc.c
Advantages and Disadvantages of
Buddy Allocator
Advantages
Fast and simple compared to general dynamic
memory allocation
Avoid external fragmentation by keeping free
physical pages contiguous
• Can use paging, but three problems:
– DMA bypasses paging
– Modifying page table leads to TLB flush
– Cannot use “super page” to increase TLB coverage
Disadvantages
Internal fragmentation
• Allocation of block of k pages when k != 2^n
Slab Allocator
For objects smaller than a page
Implemented on top of page allocator
Each memory region is called a cache
Two types of slab allocator
Fixed-size slab allocator: cache contains objects of
same size
• for frequently allocated objects
General-purpose slab allocator: caches contain
objects of size 2^n
• for less frequently allocated objects
• For allocation of object with size k, round to nearest 2^n
mm/slab.c
Advantages and Disadvantages of slab
allocator
Advantages
Reduce internal fragmentation: many objects in
one page
Fast: no need to allocate and free page frames
• Allocation: no search of objects with the right size
for fixed-size allocator; simple search for general-
purpose allocator
• Free: no merge with adjacent free blocks
Disadvantages
Memory overhead for bookkeeping
Internal fragmentation for general-purpose slab
allocator
Today
Linux Memory Management
Address Space Translation
• How does Linux translate address?
Address Space Layout
• What does a process Address Space look like?
• What does the kernel Address Space look like?
• How to implement Copy-on-Write (COW)?
Dynamic Memory Allocation
• How to allocate pages?
• How to allocate objects?
Virtual Memory
• How to replace pages?
• How to load page from disk?
26
Data Structures for Page
Replacement
Two lists in struct zone
active_list: hot pages
inactive_list: cold pages
Two bits in struct page
PG_active: is page on active list?
PG_referenced: has page been referenced
recently?
27
Functions for Page Replacement
lru_cache_add*(): add to inactive or active
list
mark_page_accessed(): called twice to
move a page from inactive to active
page_referenced(): test if a page is
referenced
refill_inactive_zone(): move pages from
active to inactive
When to Replace Pages?
Usually when free_more_memory() in
fs/buffer.c called
try_to_free_pages in mm/vmscan.c
shrink_caches
shrink_zone
refill_inactive_zone
shrink_cache
shrink_list
pageout()
29
How to Load Page?
do_page_fault() in arch/i386/mm/fault.c
cr2 stores faulting virtual address
handle_mm_fault() in mm/memory.c
handle_pte_fault()
if(!pte_present(entry))
do_no_page() // demand paging
do_file_page() // file mapped
page
do_swap_page() // swapped out
page
30