6/8/2014
OS Lecture #22
Operating Systems
Start Lecture #22
Remark: Lab 4 is available and is due in 2 weeks (23 April 2009).
3.7 Segmentation
Up to now, the virtual address space has been contiguous. In segmentation the virtual address space is
divided into a number of variable-size segments. One can view the designs we have studied so far as having
just one segment, the entire process.
Among other issues this makes memory management difficult when there are more that two
dynamically growing regions.
With two regions you start them on opposite sides of the virtual space as we did before.
Better is to have many virtual address spaces each starting at zero.
This split up is user visible. So a segment is a logical split up of the address space. Unlike with (userinvisible) paging, segment boundaries occur at logical point, e.g., at the end of a procedure.
Imagine a system with several large, dynamically-growing, data structures. The same problem we
mentioned for the OS when there are more than two growing regions, occurs as well for user
programs. The user (or some user-mode tool) must decide how much virtual space to leave between
the different tables. With segmentation
Eases flexible protection and sharing: One places in a single segment a unit that is logically shared. This
would be the natural method to implement shared libraries.
When shared libraries are implemented on paging systems, the design essentially mimics segmentation
by treating a collection of pages as a segment. This is more complicated since one must ensure that the
end of the unit to be shared occurs on a page boundary (this is done by padding).
Without segmentation (equivalently said with just one segment) all procedures are packed together so,
if one changes in size, all the virtual addresses following this procedure are changed and the program
must be re-linked. With each procedure in a separate segment this relinking would be limited to the
symbols defined or used in the modified procedure.
Homework: Explain the difference between internal fragmentation and external fragmentation. Which one
occurs in paging systems? Which one occurs in systems using pure segmentation?
** Two Segments
Late PDP-10s and TOPS-10
Each process has one shared text segment, that can also contain shared (normally read only) data. As
the name indicates, all process running the same executable share the same text segment.
The process also contains one (private) writable data segment.
Permission bits define for each segment.
** Three Segments
Traditional (early) Unix had three segments as shown on the right.
http://www.cs.nyu.edu/courses/spring09/V22.0202-002/lectures/lecture-22.html
1/8
6/8/2014
OS Lecture #22
1. Shared text marked execute only.
2. Data segment (global and static variables).
3. Stack segment (automatic variables).
Since the text doesn't grow, this was sometimes treated as 2 segments by
combining text and data into one segment. But then the text could not be
shared.
** General (Not Necessarily Demand) Segmentation
Segmentation is a user-visible division of a process into multiple variablesize segments, whose sizes change dynamically during execution. It enables
fine-grained sharing and protection. For example, one can share the text
segment as done in early unix.
With segmentation, the virtual address has two components: the segment
number and the offset in the segment.
Segmentation does not mandate how the program is stored in memory.
One possibility is that the entire program must be in memory in order to run it. Use whole process
swapping. Early versions of Unix did this.
Can also implement demand segmentation (see below).
More recently, segmentation is combined with demand paging (done below).
Any segmentation implementation requires a segment table with one entry for each segment.
A segment table is similar to a page table.
Entries are called STEs, Segment Table Entries.
Each STE contains the base address of the segment and the limit value (the size of the segment).
Why is there no limit value in a page table?
Answer: All pages are the same size so the limit is obvious.
The address translation for segmentation is
(seg#, offset) --> if (offset<limit) base+offset else error.
3.7.1 Implementation of Pure Segmentation
Pure Segmentation means segmentation without paging.
Segmentation, like whole program swapping, exhibits external fragmentation (sometimes called
checkerboarding). (See the treatment of OS/MVT for a review of external fragmentation and whole
program swapping). Since segments are smaller than programs (several segments make up one program), the
external fragmentation is not as bad as with whole program swapping. But it is still a serious problem.
As with whole program swapping, compaction can be employed.
** Demand Segmentation
Same idea as demand paging, but
applied to segments.
Consideration
Programmer aware
http://www.cs.nyu.edu/courses/spring09/V22.0202-002/lectures/lecture-22.html
Demand
Paging
No
Demand
Segmentation
Yes
2/8
6/8/2014
OS Lecture #22
How many addr spaces
1
Many
If a segment is loaded, base
and limit are stored in the
VA size > PA size
Yes
Yes
STE and the valid bit is set in
Protect individual
No
Yes
the STE.
procedures separately
The STE is accessed for each
Accommodate elements
No
Yes
memory reference (not really,
with changing sizes
there is probably a TLB).
Ease user sharing
No
Yes
If the segment is not loaded,
let the VA size
Sharing, Protection,
the valid bit is unset. The base Why invented
exceed the PA size independent addr spaces
and limit as well as the disk
address of the segment is
Internal fragmentation
Yes
No, in principle
stored in the an OS table.
External fragmentation
No
Yes
A reference to a non-loaded
Placement question
No
Yes
segment generate a segment
Replacement question
Yes
Yes
fault (analogous to page fault).
To load a segment, we must solve both the placement question and the replacement question (for
demand paging, there is no placement question).
Pure segmentation was once implemented by Burroughs in the B5500. I believe the implementation
was in fact demand segmentation but I am not sure. Demand segmentation is not used in modern
systems.
The table on the right compares demand paging with demand segmentation. The portion above the double
line is from Tanenbaum.
** 3.7.2 and 3.7.3 Segmentation With (Demand) Paging
These two sections of the book cover segmentation combined with demand paging in two different systems.
Section 3.7.2 covers the historic Multics system of the 1960s (it was coming up at MIT when I was an
undergraduate there). Multics was complicated and revolutionary. Indeed, Thompson and Richie developed
(and named) Unix partially in rebellion to the complexity of Multics. Multics is no longer used.
Section 3.7.3 covers the Intel Pentium hardware, which offers a segmentation+demand-paging scheme that
is not used by any of the current operating systems (OS/2 used it in the past). The Pentium design permits one
to "convert" the system into a pure damand-paging scheme and that is the common usage today.
I will present the material in the following order.
1. Describe segmentation+paging (not demand paging) generically, i.e. not tied to any specific hardware
or software.
2. Note the possibility of using demand paging (again generically).
3. Give some details of the Multics implementation.
4. Give some details of the Pentium hardware, especially how it can emulate straight demand paging.
** Segmentation With (non-demand) Paging
One can combine segmentation and paging to get advantages of both at a cost in complexity. In particular,
user-visible, variable-size segments are the most appropriate units for protection and sharing; the addition of
(non-demand) paging eliminates the placement question and external fragmentation (at the smaller average
cost of 1/2-page internal fragmentation per segment).
http://www.cs.nyu.edu/courses/spring09/V22.0202-002/lectures/lecture-22.html
3/8
6/8/2014
OS Lecture #22
The basic idea is to employ (non-demand) paging on each segment. A segmentation plus paging scheme has
the following properties.
A virtual address becomes a triple: (seg#, page#, offset).
Each segment table entry (STE) points to the page table for that segment. Compare this with a
multilevel page table.
The physical size of each segment is a multiple of the page size (since the segment consists of pages).
The logical size is not; instead we keep the exact size in the STE (limit value) and terminate the process
if it references beyond the limit. In this case the last page of each segment is partially wasted (internal
fragmentation).
The page# field in the address gives the entry in the chosen page table and the offset gives the offset in
the page.
From the limit field, one can easily compute the size of the segment in pages (which equals the size of
the corresponding page table in PTEs).
A straightforward implementation of segmentation with paging would requires 3 memory references
http://www.cs.nyu.edu/courses/spring09/V22.0202-002/lectures/lecture-22.html
4/8
6/8/2014
OS Lecture #22
(STE, PTE, referenced word) so a TLB is crucial.
Some books carelessly say that segments are of fixed size. This is wrong. They are of variable size with
a fixed maximum and with the requirement that the physical size of a segment is a multiple of the page
size.
Keep protection and sharing information on segments. This works well for a number of reasons.
1. A segment is variable size.
2. Segments and their boundaries are user-visible
3. Segments are shared by sharing their page tables. This eliminates the problem mentioned above
with shared pages.
Since we have paging, there is no placement question and no external fragmentation.
The problems are the complexity and the resulting 3 memory references for each user memory
reference. The complexity is real. The three memory references would be fatal were it not for TLBs,
which considerably ameliorate the problem. TLBs have high hit rates and for a TLB hit there is
essentially no penalty.
Although it is possible to combine segmentation with non-demand paging, I do not know of any system that
did this.
Homework: 36.
Homework: Consider a 32-bit address machine using paging with 8KB pages and 4 byte PTEs. How many
bits are used for the offset and what is the size of the largest page table? Repeat the question for 128KB
pages. So far this question has been asked before. Repeat both parts assuming the system also has
segmentation with at most 128 segments.
Homework: Consider a system with 36-bit addresses that employs both segmentation and paging. Assume
each PTE and STE is 4-bytes in size
1. Assume the system has a page size of 8K and each process can have up to 256 segments. How large
in bytes is the largest possible page table? How large in pages is the largest possible segment?
2. Assume the system has a page size of 4K and each segment can have up to 1024 pages. What is the
maximum number of segments a process can have? How large in bytes is the largest possible segment
table? How large in bytes is the largest possible process.
3. Assume the largest possible segment table is 213 bytes and the largest possible page table is 216 bytes.
How large is a page? How large in bytes is the largest possible segment?
** Segmentation With Demand Paging
There is very little to say. The previous section employed (non-demand) paging on each segment. For the
present scheme, we employ demand paging on each segment, that is we perform fetch-on-demand for the
pages of each segment.
The Multics Scheme
Multics was the first system to employ segmentation plus demand paging. The implementation was as
described above with just a few wrinkles, some of which we discuss now together with some of the
parameter values.
http://www.cs.nyu.edu/courses/spring09/V22.0202-002/lectures/lecture-22.html
5/8
6/8/2014
OS Lecture #22
The Multics hardware (GE-645) was word addressable, with 36-bit words (the 645 predates bytes).
Each virtual address was 34-bits in length and was divided into three parts as mentioned above. The
seg# field was the high-order 18 bits; the page# field was the next 6 bits; and the offset was the loworder 10 bits.
The actual implementation was more complicated and the full 34-bit virtual address was not present in
one place in an instruction.
Thus the system supported up to 218=256K segments, each of size up to 26=64 pages. Each page is
of size 210 (36-bit) words.
Since the segment table can have 256K STEs (called descriptors), the table itself can be large and was
itself demand-paged.
Multics permits some segments to be demand-paged while other segments are not paged; a bit in each
STE distinguishes the two cases.
The Pentium Scheme
The Pentium design implements a trifecta: Depending on the setting of a various control bits the Pentium
scheme can be pure demand-paging, pure segmentation, or segmentation with demand-paging.
The Pentium supports 214=16K segments, each of size up to 232 bytes.
This would seem to require a 14+32=46 bit virtual address, but that is not how the Pentium works.
The segment number is not part of the virtual address found in normal instructions.
Instead separate instructions are used to specify which are the currently active "code segment" and
"data segment" (and other less important segments). Technically, the CS register is loaded with the
"selector" of the active code segment and the DS register is loaded with the "selector" of the active
data register.
When the selectors are loaded, the base and limit values are obtained from the corresponding STEs
(called descriptors).
There are actually two flavors of segments, some are private to the process; others are system
segments (including the OS itself), which are addressable (but not necessarily accessible) by all
processes.
Once the 32-bit segment base and the segment limit are determined, the 32-bit address from the instruction
itself is compared with the limit and, if valid, is added to the base and the sum is called the 32-bit "linear
address". Now we have three possibilities depending on whether the system is running in pure segmentation,
pure demand-paging, or segmentation plus demand-paging mode.
1. In pure segmentation mode the linear address is treated as the physical address and memory is
accessed.
2. In segmentation plus demand-paging mode, the linear address is broken into three parts since the
system implements 2-level-paging. That is, the high-order 10 bits are used to index into the 1st-level
page table (called the page directory). The directory entry found points to a 2nd-level page table and
the next 10 bits index that table (called the page table). The PTE referenced points to the frame
containing the desired page and the lowest 12 bits of the linear address (the offset) finally point to the
referenced word. If either the 2nd-level page table or the desired page are not resident, a page fault
occurs and the page is made resident using the standard demand paging model.
3. In pure demand-paging mode all the segment bases are zero and the limits are set to the maximum.
Thus the 32-bit address in the instruction become the linear address without change (i.e., the
http://www.cs.nyu.edu/courses/spring09/V22.0202-002/lectures/lecture-22.html
6/8
6/8/2014
OS Lecture #22
segmentation part is effectively) disabled. Then the (2-level) demand paging procedure just described
is applied.
Current operating systems for the Pentium use this last mode.
3.8 Research on Memory Management
Skipped
3.9 Summary
Read
Some Last Words on Memory Management
We have studied the following concepts.
Segmentation / Paging / Demand Loading (fetch-on-demand).
Each is a yes or no alternative.
This gives 8 possibilities.
Placement and Replacement.
Internal and External Fragmentation.
Page Size and locality of reference.
Multiprogramming level and medium term scheduling.
Chapter 4 File Systems
There are three basic requirements for file systems.
1. Size: Store very large amounts of data.
2. Persistence: Data survives the creating process.
3. Concurrent Access: Multiple processes can access the data concurrently.
High level solution: Store data in files that together form a file system.
4.1 Files
4.1.1 File Naming
Very important. A major function of the file system is to supply uniform naming. As with files themselves,
important characteristics of the file name space are that it is persistent and concurrently accessible.
Unix-like operating systems extend the file name space to encompass devices as well
Does each file have a unique name?
Answer: Often no. We will discuss this below when we study links.
File Name Extensions
The extensions are suffixes attached to the file names and are intended to in some why describe the high-level
http://www.cs.nyu.edu/courses/spring09/V22.0202-002/lectures/lecture-22.html
7/8
6/8/2014
OS Lecture #22
structure of the file's contents.
For example, consider the ".html" extension in "class-notes.html", the name of the file we are viewing.
Depending on the system and application, these extensions can have little or great significance. The extensions
can be
1. Conventions just for humans. For example letter.teq (my convention) signifies to me that this letter is
written using the troff text-formatting language and employs the tbl preprocessor to handle tables and
the eqn preprocessor to handle mathematical equations. Neither linux, troff, tbl, nor equ place any
significance in the .teq extension.
2. Conventions giving default behavior for some programs.
The emacs editor by default edits .html files in html mode. However, emacs can edit such files in
any mode and can edit any file in html mode. It just needs to be told to do so during the editing
session.
The firefox browser assumes that an .html extension signifies that the file is written in the html
markup language. However, having <html> ... </html> inside the file works as well.
The gzip file compressor/decompressor appends the .gz extension to files it compresses, but
accepts a --suffix flag to specify another extension.
3. Default behaviors for the operating system or window manager or desktop environment.
Click on .xls file in windows and excel is started.
Click on .xls file in nautilus under linux and openoffice is started.
4. Required for certain programs.
The gnu C compiler (and probably other C compilers) requires C programs be have the .c (or
.h) extension, and requires assembler programs to have the .s extension.
5. Required by the operating system
MS-DOS treats .com files specially.
Case Sensitive?
Should file names be case sensitive. For example, do x.y, X.Y, x.Y all name the same file? There is no clear
answer.
Unix-like systems employ case sensitive file names so the three names given above are distinct.
Windows systems employ case insensitive file names to the three names given above are equivalent.
Mathematicians (and others) often "consider an element x of a set X" so use case sensitive naming.
Normal English (and other natural language) usage often employs case insensitivity (e.g. capitalizing a
word at the beginning of a sentence does not change the word).
http://www.cs.nyu.edu/courses/spring09/V22.0202-002/lectures/lecture-22.html
8/8