KEMBAR78
03unixintro2 | PDF
Introduction to Unix, Part 2




CS 167                III–1   Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                          III–1
File-System I/O

                    • Concerns
                      – getting data to and from the device
                          - disk architecture
                      – caching in primary storage
                      – simultaneous I/O and computation




           CS 167                           III–2      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  To understand how the file system operates, we certainly need to know a bit about disk
architectures and how we can most effectively utilize them. We cover this at length later on
in the course, but for now we’ll discuss the general concerns. What’s also important is how
the contents of files are cached in primary storage, and how this cache is used so that I/O
and computation can proceed at the same time.




                                                                                                                   III–2
Disk Architecture
                                  Sector


                                                                             Track




         Disk heads
         (on top and bottom
         of each platter)                                       Cylinder

CS 167                    III–3     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                                III–3
The Buffer Cache


                            Buffer

                       User Process




                        Buffer Cache

           CS 167                            III–4      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  File I/O in Unix, and in most operating systems, is not done directly to the disk drive, but
through an intermediary, the buffer cache.
  The buffer cache has two primary functions. The first, and most important, is to make
possible concurrent I/O and computation within a Unix process. The second is to insulate
the user from physical block boundaries.
  From a user thread’s point of view, I/O is synchronous. By this we mean that when the I/O
system call returns, the system no longer needs the user-supplied buffer. For example, after
a write system call, the data in the user buffer has either been transmitted to the device or
copied to a kernel buffer—the user can now scribble over the buffer without affecting the
data transfer. Because of this synchronization, from a user thread’s point of view, no more
than one I/O operation can be in progress at a time. Thus user-implemented multibuffered
I/O is not possible (in a single-threaded process).
  The buffer cache provides a kernel implementation of multibuffering I/O, and thus
concurrent I/O and computation are possible even for single-threaded processes.




                                                                                                                    III–4
Multi-Buffered I/O


                    Process

                      read( … )



                                           i-1           i                      i+1
                                       previous     current                probable
                                        block        block                next block




           CS 167                           III–5     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  The use of read-aheads and write-behinds makes possible simultaneous I/O and
computation: if the block currently being fetched is block i and the previous block fetched
was block i-1, then block i+1 is also fetched. Modified blocks are normally written out not
synchronously but instead sometime after they were modified, asynchronously.




                                                                                                                  III–5
Maintaining the Cache

                                                            buffer requests


                              Aged        probably free buffers

                                                            returns of no-longer-
                                                            active buffers
                     oldest


                              LRU         probably active buffers

                                                            returns of active
                 youngest
                                                            buffers

            CS 167                              III–6      Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   In the typical Unix system, active buffers are maintained in least-recently-used (LRU) order
in the system-wide LRU list. Thus after a buffer has been used (as part of a read or write
system call), it is returned to the end of the LRU list. The system also maintains a separate
list of “free” buffers called the aged list. Included in this list are buffers holding no-longer-
needed blocks, such as blocks from files that have been deleted.
   Fresh buffers are taken from the aged list. If this list is empty, then a buffer is obtained
from the LRU list as follows. If the first buffer (least recently used) in this list is clean (i.e.,
contains a block that is identical to its copy on disk), then this buffer is taken. Otherwise
(i.e., if the buffer is dirty), it is written out to disk asynchronously and, when written, is
placed at the end of the aged list. The search for a fresh buffer continues on to the next
buffer in the LRU list, etc.
   When a file is deleted, any buffers containing its blocks are placed at the head of the aged
list. Also, when I/O into a buffer results in an I/O error, the buffer is placed at the head of
the aged list.




                                                                                                                       III–6
Address Translation
                     Virtual
                     Memory
                                                                          Real
                                                                         Memory




                                                                                                 page
           pages                                                                                 frames


                                          address map




            CS 167                            III–7     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  References to virtual memory are mapped to real memory. In this lecture we assume that
address translation is done via paging. We use the terminology that virtual memory is divided
into fixed-size pieces called pages, while real-memory is divided into identically sized pieces
called page frames.




                                                                                                                    III–7
Multiprocess
                            Address Translation
                         Process 1’s
           Process 2’s     Virtual
             Virtual       Memory
             Memory                                                   Real
                                                                     Memory




           CS 167                         III–8     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Multiple processes compete for real memory. While many page frames appear only in one
process, others are shared by multiple processes.




                                                                                                                III–8
Can we reimplement read and write so
         that data isn’t copied from the kernel
         buffer cache to the user buffer, but re-
         mapped to it?




CS 167                       III–9   Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                                 III–9
No …

                       (But we can do something almost as good.)




           CS 167                          III–10    Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Why not? It’s because user buffers aren’t necessarily either multiples of a page size in
length or aligned on page boundaries.




                                                                                                                 III–10
Mapped Files

                    • Traditional File I/O
                       char buf[BigEnough];
                       fd = open(file, O_RDWR);
                       for (i=0; i<n_recs; i++) {
                          read(fd, buf, sizeof(buf));
                          use(buf);
                       }
                    • Mapped File I/O
                       void *MappedFile;
                       fd = open(file, O_RDWR);
                       MappedFile = mmap(... , fd, ...);
                       for (i=0; i<n_recs; i++)
                          use(MappedFile[i]);

           CS 167                            III–11     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Traditional I/O involves explicit calls to read and write, which in turn means that data is
accessed via a buffer; in fact, two buffers are usually employed: data is transferred
between a user buffer and a kernel buffer, and between the kernel buffer and the I/O
device.
  An alternative approach is to map a file into a process’s address space: the file provides
the data for a portion of the address space and the kernel’s virtual-memory system is
responsible for the I/O. A major benefit of this approach is that data is transferred directly
from the device to where the user needs it; there is no need for an extra system buffer.




                                                                                                                    III–11
Traditional File I/O


         user buffer



                           kernel buffer



                                                                          Disk


CS 167                           III–12    Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                                       III–12
Mapped File I/O


virtual memory


                    address map



                            real memory



                                                                     Disk

CS 167                     III–13   Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                                III–13
Mmap System Call
                     void *mmap(
                       void *addr,
                         // where to map file (0 if don’t care)
                       size_t len,
                         // how much to map
                       int prot,
                         // memory protection (read, write, exec.)
                       int flags,
                         // shared vs. private, plus more
                       int fd,
                         // which file
                       off_t off
                         // starting from where
                       )


            CS 167                            III–14     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




   Mmap maps the file given by fd, starting at position off, for len bytes, into the caller’s
address space starting at location addr
         len is rounded up to a multiple of the page size
         off must be page-aligned
         if addr is zero, the kernel assigns an address
         if addr is positive, it is a suggestion to the kernel as to where the mapped file should
         be located (it usually will be aligned to a page).           However, if flags includes
         MAP_FIXED, then addr is not modified by the kernel (and if its value is not
         reasonable, the call fails)
         the call returns the address of the beginning of the mapped file
   The flags argument must include either MAP_SHARED or MAP_PRIVATE (but not both). If
it’s MAP_SHARED, then the mapped portion of the caller’s address space contains the
current contents of the file; when the mapped portion of the address space is modified by the
process, the corresponding portion of the file is modified.
   However, if flags includes MAP_PRIVATE, then the idea is that the mapped portion of the
address space is initialized with the contents of the file, but that changes made to the
mapped portion of the address space by the process are private and not written back to the
file. The details are a bit complicated: as long as the mapping process does not modify any of
the mapped portion of the address space, the pages contained in it contain the current
contents of the corresponding pages of the file. However, if the process modifies a page, then
that particular page no longer contains the current contents of the corresponding file page,
but contains whatever modifications are made to it by the process. These changes are not
written back to the file and not shared with any other process that has mapped the file. It’s
unspecified what the situation is for other pages in the mapped region after one of them is
modified. Depending on the implementation, they might continue to contain the current
contents of the corresponding pages of the file until they, themselves, are modified. Or they
might also be treated as if they’d just been written to and thus no longer be shared with
others.



                                                                                                                     III–14
Share-Mapped File I/O


virtual memory


                     address map



                             real memory
    virtual memory


                                                                      Disk

CS 167                      III–15   Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                                 III–15
Example
            int main( ) {
              int fd;
              char* buffer;
              int size;

                fd = open("file", O_RDWR);
                size = lseek(fd, 0, SEEK_END);
                if ((int)(buffer = (char *)mmap(0, size,
                    PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0)) == -1) {
                  perror("mmap");
                  exit(1);
                }

                // buffer points to region of memory containing the
                // contents of the file

                ...

            }
           CS 167                           III–16     Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  Here we map the entire contents of a file into the caller’s address space, allowing it both
read and write access. Note mapping the file into memory does not cause any immediate I/O
to take place. The operating system will perform the I/O when necessary, according to its
own rules.




                                                                                                                   III–16
What if you do I/O both traditionally and
     via mmap to the same file simultaneously?




                      ’d   ?
CS 167                  III–17   Copyright © 2006 Thomas W. Doeppner. All rights reserved.




                                                                                             III–17
Integrated VM and Buffer Cache
                          Process 1’s
            Process 2’s     Virtual
                                          Kernel’s
              Virtual       Memory                                                               Disk
                                        Buffer Cache
              Memory
                                                                                Real
                                                                               Memory




            CS 167                              III–18   Copyright © 2006 Thomas W. Doeppner. All rights reserved.




  To integrate virtual memory (and hence I/O via mmap) with traditional I/O, we must make
certain that there is at most one copy of any page from a file in primary storage. This copy (in
a page frame) is mapped both into the buffer cache and into all processes that have mmaped
that page of the file.




                                                                                                                     III–18

03unixintro2

  • 1.
    Introduction to Unix,Part 2 CS 167 III–1 Copyright © 2006 Thomas W. Doeppner. All rights reserved. III–1
  • 2.
    File-System I/O • Concerns – getting data to and from the device - disk architecture – caching in primary storage – simultaneous I/O and computation CS 167 III–2 Copyright © 2006 Thomas W. Doeppner. All rights reserved. To understand how the file system operates, we certainly need to know a bit about disk architectures and how we can most effectively utilize them. We cover this at length later on in the course, but for now we’ll discuss the general concerns. What’s also important is how the contents of files are cached in primary storage, and how this cache is used so that I/O and computation can proceed at the same time. III–2
  • 3.
    Disk Architecture Sector Track Disk heads (on top and bottom of each platter) Cylinder CS 167 III–3 Copyright © 2006 Thomas W. Doeppner. All rights reserved. III–3
  • 4.
    The Buffer Cache Buffer User Process Buffer Cache CS 167 III–4 Copyright © 2006 Thomas W. Doeppner. All rights reserved. File I/O in Unix, and in most operating systems, is not done directly to the disk drive, but through an intermediary, the buffer cache. The buffer cache has two primary functions. The first, and most important, is to make possible concurrent I/O and computation within a Unix process. The second is to insulate the user from physical block boundaries. From a user thread’s point of view, I/O is synchronous. By this we mean that when the I/O system call returns, the system no longer needs the user-supplied buffer. For example, after a write system call, the data in the user buffer has either been transmitted to the device or copied to a kernel buffer—the user can now scribble over the buffer without affecting the data transfer. Because of this synchronization, from a user thread’s point of view, no more than one I/O operation can be in progress at a time. Thus user-implemented multibuffered I/O is not possible (in a single-threaded process). The buffer cache provides a kernel implementation of multibuffering I/O, and thus concurrent I/O and computation are possible even for single-threaded processes. III–4
  • 5.
    Multi-Buffered I/O Process read( … ) i-1 i i+1 previous current probable block block next block CS 167 III–5 Copyright © 2006 Thomas W. Doeppner. All rights reserved. The use of read-aheads and write-behinds makes possible simultaneous I/O and computation: if the block currently being fetched is block i and the previous block fetched was block i-1, then block i+1 is also fetched. Modified blocks are normally written out not synchronously but instead sometime after they were modified, asynchronously. III–5
  • 6.
    Maintaining the Cache buffer requests Aged probably free buffers returns of no-longer- active buffers oldest LRU probably active buffers returns of active youngest buffers CS 167 III–6 Copyright © 2006 Thomas W. Doeppner. All rights reserved. In the typical Unix system, active buffers are maintained in least-recently-used (LRU) order in the system-wide LRU list. Thus after a buffer has been used (as part of a read or write system call), it is returned to the end of the LRU list. The system also maintains a separate list of “free” buffers called the aged list. Included in this list are buffers holding no-longer- needed blocks, such as blocks from files that have been deleted. Fresh buffers are taken from the aged list. If this list is empty, then a buffer is obtained from the LRU list as follows. If the first buffer (least recently used) in this list is clean (i.e., contains a block that is identical to its copy on disk), then this buffer is taken. Otherwise (i.e., if the buffer is dirty), it is written out to disk asynchronously and, when written, is placed at the end of the aged list. The search for a fresh buffer continues on to the next buffer in the LRU list, etc. When a file is deleted, any buffers containing its blocks are placed at the head of the aged list. Also, when I/O into a buffer results in an I/O error, the buffer is placed at the head of the aged list. III–6
  • 7.
    Address Translation Virtual Memory Real Memory page pages frames address map CS 167 III–7 Copyright © 2006 Thomas W. Doeppner. All rights reserved. References to virtual memory are mapped to real memory. In this lecture we assume that address translation is done via paging. We use the terminology that virtual memory is divided into fixed-size pieces called pages, while real-memory is divided into identically sized pieces called page frames. III–7
  • 8.
    Multiprocess Address Translation Process 1’s Process 2’s Virtual Virtual Memory Memory Real Memory CS 167 III–8 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Multiple processes compete for real memory. While many page frames appear only in one process, others are shared by multiple processes. III–8
  • 9.
    Can we reimplementread and write so that data isn’t copied from the kernel buffer cache to the user buffer, but re- mapped to it? CS 167 III–9 Copyright © 2006 Thomas W. Doeppner. All rights reserved. III–9
  • 10.
    No … (But we can do something almost as good.) CS 167 III–10 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Why not? It’s because user buffers aren’t necessarily either multiples of a page size in length or aligned on page boundaries. III–10
  • 11.
    Mapped Files • Traditional File I/O char buf[BigEnough]; fd = open(file, O_RDWR); for (i=0; i<n_recs; i++) { read(fd, buf, sizeof(buf)); use(buf); } • Mapped File I/O void *MappedFile; fd = open(file, O_RDWR); MappedFile = mmap(... , fd, ...); for (i=0; i<n_recs; i++) use(MappedFile[i]); CS 167 III–11 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Traditional I/O involves explicit calls to read and write, which in turn means that data is accessed via a buffer; in fact, two buffers are usually employed: data is transferred between a user buffer and a kernel buffer, and between the kernel buffer and the I/O device. An alternative approach is to map a file into a process’s address space: the file provides the data for a portion of the address space and the kernel’s virtual-memory system is responsible for the I/O. A major benefit of this approach is that data is transferred directly from the device to where the user needs it; there is no need for an extra system buffer. III–11
  • 12.
    Traditional File I/O user buffer kernel buffer Disk CS 167 III–12 Copyright © 2006 Thomas W. Doeppner. All rights reserved. III–12
  • 13.
    Mapped File I/O virtualmemory address map real memory Disk CS 167 III–13 Copyright © 2006 Thomas W. Doeppner. All rights reserved. III–13
  • 14.
    Mmap System Call void *mmap( void *addr, // where to map file (0 if don’t care) size_t len, // how much to map int prot, // memory protection (read, write, exec.) int flags, // shared vs. private, plus more int fd, // which file off_t off // starting from where ) CS 167 III–14 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Mmap maps the file given by fd, starting at position off, for len bytes, into the caller’s address space starting at location addr len is rounded up to a multiple of the page size off must be page-aligned if addr is zero, the kernel assigns an address if addr is positive, it is a suggestion to the kernel as to where the mapped file should be located (it usually will be aligned to a page). However, if flags includes MAP_FIXED, then addr is not modified by the kernel (and if its value is not reasonable, the call fails) the call returns the address of the beginning of the mapped file The flags argument must include either MAP_SHARED or MAP_PRIVATE (but not both). If it’s MAP_SHARED, then the mapped portion of the caller’s address space contains the current contents of the file; when the mapped portion of the address space is modified by the process, the corresponding portion of the file is modified. However, if flags includes MAP_PRIVATE, then the idea is that the mapped portion of the address space is initialized with the contents of the file, but that changes made to the mapped portion of the address space by the process are private and not written back to the file. The details are a bit complicated: as long as the mapping process does not modify any of the mapped portion of the address space, the pages contained in it contain the current contents of the corresponding pages of the file. However, if the process modifies a page, then that particular page no longer contains the current contents of the corresponding file page, but contains whatever modifications are made to it by the process. These changes are not written back to the file and not shared with any other process that has mapped the file. It’s unspecified what the situation is for other pages in the mapped region after one of them is modified. Depending on the implementation, they might continue to contain the current contents of the corresponding pages of the file until they, themselves, are modified. Or they might also be treated as if they’d just been written to and thus no longer be shared with others. III–14
  • 15.
    Share-Mapped File I/O virtualmemory address map real memory virtual memory Disk CS 167 III–15 Copyright © 2006 Thomas W. Doeppner. All rights reserved. III–15
  • 16.
    Example int main( ) { int fd; char* buffer; int size; fd = open("file", O_RDWR); size = lseek(fd, 0, SEEK_END); if ((int)(buffer = (char *)mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0)) == -1) { perror("mmap"); exit(1); } // buffer points to region of memory containing the // contents of the file ... } CS 167 III–16 Copyright © 2006 Thomas W. Doeppner. All rights reserved. Here we map the entire contents of a file into the caller’s address space, allowing it both read and write access. Note mapping the file into memory does not cause any immediate I/O to take place. The operating system will perform the I/O when necessary, according to its own rules. III–16
  • 17.
    What if youdo I/O both traditionally and via mmap to the same file simultaneously? ’d ? CS 167 III–17 Copyright © 2006 Thomas W. Doeppner. All rights reserved. III–17
  • 18.
    Integrated VM andBuffer Cache Process 1’s Process 2’s Virtual Kernel’s Virtual Memory Disk Buffer Cache Memory Real Memory CS 167 III–18 Copyright © 2006 Thomas W. Doeppner. All rights reserved. To integrate virtual memory (and hence I/O via mmap) with traditional I/O, we must make certain that there is at most one copy of any page from a file in primary storage. This copy (in a page frame) is mapped both into the buffer cache and into all processes that have mmaped that page of the file. III–18