Chapter 2
Operating System: File System Management
PREPARED BY: NUTAN S BORSE
File-System Structure
► File structure
► Logical storage unit
► Collection of related information
► File system resides on secondary storage (disks)
► Provided user interface to storage, mapping logical to physical
► Provides efficient and convenient access to disk by allowing data to be stored, located retrieved easily
► Disk provides in-place rewrite and random access
► I/O transfers performed in blocks of sectors (usually 512 bytes)
► File control block – storage structure consisting of information about a file
► Device driver controls the physical device
► File system organized into layers
Layered File System
File System Layers
► Device drivers manage I/O devices at the I/O control layer
► Given commands like “read drive1, cylinder 72, track 2, sector 10, into memory location 1060” outputs
low-level hardware specific commands to hardware controller
► Basic file system given command like “retrieve block 123” translates to device driver
► Also manages memory buffers and caches (allocation, freeing, replacement)
► Buffers hold data in transit
► Caches hold frequently used data
► File organization module understands files, logical address, and physical blocks
● Translates logical block # to physical block #
● Manages free space, disk allocation
File System Layers (Cont.)
► Logical file system manages metadata information
► Translates file name into file number, file handle, location by maintaining file control blocks (inodes in
UNIX)
► Directory management
► Protection
► Layering useful for reducing complexity and redundancy, but adds overhead and can decrease
performanceTranslates file name into file number, file handle, location by maintaining file control
blocks (inodes in UNIX)
► Logical layers can be implemented by any coding method according to OS designer
File System
► File Concept: A file is a collection of similar records. The file is treated as a single
entity by users and applications and may be referred by name. Files have unique
file names and may be created and deleted. Restrictions on access control usually
apply at the file level.
► A file is a container for a collection of information. The file manager provides a
protection mechanism to allow users administrator how processes executing on
behalf of different users can access the information in a file. File protection is a
fundamental property of files because it allows different people to store their
information on a shared computer.
► File represents programs and data. Data files may be numeric, alphabetic, binary or
alpha numeric. Files may be free form, such as text files. In general, file is
sequence of bits, bytes, lines or records.
A common technique for implementing file types is to include the type as part of the file name. the
name is split into two parts a name and an extension. Various file types are shown in the following
table.
File Type Usual extension Function
Executable exe, com, bin read to run machine language program
Object obj, O compiled, machine language, not linked
source Code c, cc, java source code in various languages
Batch bat, sh commands to the command interpreter
Text txt, doc textual data, documents
word processor wp, tex, rtf, doc various word-processor formats
Library lib, dll, mpeg, mov libraries of routines for programmers
print or view arc, zip, tar ASCII or binary file in a format for printing
or viewing
Archieve arc, zip, tar related files grouped into one file,
sometimes
compressed for archiving or storage
Multimedia mpeg, mov, rm binary file containing audio or
A/V
information
File Attributes:
► A file has certain attributes which vary from one operating system to another.
Name: Every file has a name by which it is referred.
Identifier: It is unique number that identifies the file within the file system.
Type: This information is needed for those systems that support different types of files.
Location: It is a pointer to a device & to the location of the file on that device
Size: It is the current size of a file in bytes, words or blocks.
Protection: It is the access control information that determines who can read, write &
execute a file.
Time, date & user identification: It gives information about time of creation or last
modification & last use.
File Operations:
► Any file system provides not only a means to store data organized as files, but a collection
of functions that can be performed on files. Typical operations include the following:
Creating files: Two steps are necessary to create a file. First, space must be found for the file
in the file system. Secondly, an entry must be made in the directory for the new file.
Reading a file: Data & read from the file at the current position. The system must keep a read
pointer to know the location in the file from where the next read is to take place. Once the read
has been taken place, the read pointer is updated.
Writing a file: Data are written to the file at the current position. The system must keep a
write pointer to know the location in the file where the next write is to take place. The write
pointer must be updated whenever a write occurs.
File Operations(cont.):
Repositioning within a file: The directory is searched for the appropriate entry & the current
file position is set to a given value. After repositioning data can be read from or written into
that position. Repositioning within a file does not need to involve any actual I/O. this file
operation is also known as a file seek.
Deleting a file: To delete a file, we search the directory for the required file. After deletion,
the space is released so that it can be reused by other files.
Truncating a file: the user may want to erase the contents of a file but keep its attribute.
Rather than forcing the user to delete the file and then recreate it, this function allows all
attributes to remain unchanged- except for file length- but lets the file be reset to length zero
and its file space released
File Access Methods:
► Files store information.When it is used, this information must be accessed and read into
computer memory.
► The information in the file can be accessed in several ways.
Sequential Access:
❖ It is the simplest access method. Information in the file is processed in order i.e. one
record after another.
❖ A process can read all the data in a file in order starting from beginning but can’t skip &
read arbitrarily from any location. Sequential files can be rewound. It is convenient when
storage medium was magnetic tape rather than disk.
❖ Eg : A file consisting of 100 records, the current position of read/write head is 45th
record, suppose we want to read the 75th record then, it access sequentially from 45, 46,
47 …….. 74, 75. So the read/write head traverse all the records between 45 to 75.
File Access Methods(cont.):
► Eg : A file consisting of 100 records, the current position of read/write head is
45th record, suppose we want to read the 75th record then, it access sequentially
from 45, 46, 47 …….. 74, 75. So the read/write head traverse all the records
between 45 to 75.
Beginning Current Position Target Record End
0 45 75 100
Fig 1: sequential access file
File Access Methods(cont.):
Direct Access:
► A file is made up of fixed length-logical records that allow programs to read &
write records rapidly in no particular order.
► This method can be used when disk are used for storing files.
► This method is used in many applications e.g. database systems.
► If an airline customer wants to reserve a seat on a particular flight, the reservation
program must be able to access the record for that flight directly without reading
the records before it. In a direct access file, there is no restriction in the order of
reading or writing.
► For example, we can read block 14, then read block 50 & then write block 7 etc.
Direct access files are very useful for immediate access to large amount of
information.
File Access Methods(cont.):
Indexed Access:
► In this method an index is created which contains a key field and pointers to the
various blocks. To find an entry in the file for a key value, we first search the
index and then use the pointer to directly access a file and find the desired entry.
► With large files, the index file itself may become too large to be keep in memory.
One solution is to create an index for the index file. The primary index file would
contain pointers to secondary index files, which would point to the actual data
items. Figure 2 shows a situation as implemented by VMS (Virtual Memory
Storage) index and relative files.
File Access Methods(cont.):
Fig 2: Example of index and relative files
File Directories:
► Sometimes the file system consisting of millions of files, at that situation it is very
hard to manage the files.
► To manage these files grouped these files and load one group into one partition.
Each partition is called a directory.
► A directory structure provides a mechanism for organizing many files in the file
system.
Operation on the directories:
The various operations that can be performed on directory are:
► Searching for a file: we need to search a directory structure to find the entry for a particular file.
Since files have symbolic names in a user-readable form and similar names may indicate a
relationship between files. We may want to able to find all files whose names match a particular
pattern.
► Create a file: new files are created and added to the directory.
► Delete a file: when we do not require to use a particular file, it is removed from the directory.
► List a directory: when we want to list all the files in a particular directory, and the contents of
directory entry for each file in the list.
► Rename a file: Whenever we need to change the name of the file, we can change the name.
► Traverse the file system: in a directory structure, we may wish to access every directory, and every
file within a directory structure.
Directory Structure:
The most common schemes for defining the structure of the directory are:
1) Single-level Directory: it is the simplest directory structure. All files are contained in the same
directory which is easy to support and understand (figure 3)
1) Single-level Directory(cont.)
► Advantages:
► Since it is a single directory, so its implementation is very easy.
► If the files are smaller in size, searching will become faster.
► The operations like file creation, searching, deletion, updating is very easy in such a directory structure.
► Disadvantages:
► There may chance of name collision because two files can not have the same name.
► Searching will become time taking if the directory is large.
► The same type of files cannot be grouped together.
2) Two-level Directory
The disadvantage of single level directory is the confusion of files names between different users. The
solution for this problem is to create a directory for each user as shown in figure 4.
► In the two-level directory structure, each user has his own user files directory (UFD). Each user has
similar structure but lists only the files of a single user.
► When user login, the system’s master file directory (MFD) is searched. The master file directory is
indexed by user name or account number and each entry points to the user directory for that user.
► When users refer to a particular file, only their own user file directory is searched.
Thus different users may have files with the same name, as long as all file names within each user file
directory are unique.
2) Two-level Directory(cont.)
2) Two-level Directory(cont.)
Advantages:
► We can give full path like /User-name/directory-name.
► Different users can have same directory as well as file name.
► Searching of files become more easy due to path name and user-grouping.
Disadvantages:
► A user is not allowed to share files with other users.
► Searching will become time taking if the directory is large.
► Two files of the same type cannot be grouped together in the same user.
3)Tree- Structured Directory
► The tree-structure allows user to create their own sub- directories and organize
their files accordingly.
► The tree has a root directory. Every file in the systems has a unique path name. A
path is the path from the root through all the sub- directories to a specified file.
► A directory contains a set of files and or sub-directories.
3)Tree- Structured Directory
3)Tree- Structured Directory(cont.)
Advantages:
User can access other user’s files by specifying the path name.
User can create his own sub-directories.
Searching becomes very easy; we can use both absolute path as
well as relative.
Disadvantages:
Every file does not fit into the hierarchical model; files may
be saved into multiple directories.
We cannot share files.
It is inefficient, because accessing a file may go under
multiple directories.
4)Acyclic Graph Directories:
A shared directory or file will exist in the file system in two or more places
at once.
A shared directory or file is not the same as two copies of the file. With a
shared file there is only one actual field and any changes made by one
person would be immediately visible to the other.
An acyclic graph allows directories to have shared sub-directories and
files.
The same file or sub-directory may be in two or more process exists in th file
system at a time.
An acyclic graph directory structure is more flexible than a simple structure but it is
also more complex.
4)Acyclic Graph Directories:
4)Acyclic Graph Directories:
Advantages:
We can share files.
Searching is easy due to different-different paths.
Allow multiple directories to contain same file.
Disadvantages:
We share the files via linking; in case of deleting it may create the problem.
Need to be cautions of dangling pointers when files are deleted.
Allocation Methods - Contiguous
► An allocation method refers to how disk blocks are allocated for files:
► Contiguous allocation – each file occupies set of contiguous blocks
► Best performance in most cases
► Simple – only starting location (block #) and length (number of blocks) are required
► Problems include finding space for file, knowing file size, external fragmentation, need
for compaction off-line (downtime) or on-line
Contiguous Allocation
► Mapping from logical to
physical
Allocation Methods - Linked
► Linked allocation – each file a linked list of blocks
► File ends at nil pointer
► No external fragmentation
► Each block contains pointer to next block
► No compaction, external fragmentation
► Free space management system called when new block needed
► Improve efficiency by clustering blocks into groups but increases internal fragmentation
► Reliability can be a problem
► Locating a block can take many I/Os and disk seeks
Allocation Methods – Linked (Cont.)
FAT (File Allocation Table) variation
► Beginning of volume has table, indexed by block number
► Much like a linked list, but faster on disk and cacheable
► New block allocation simple
Linked Allocation
► Each file is a linked list of disk blocks: blocks may be scattered anywhere on the
disk
block = pointer
Allocation Methods - Indexed
► Indexed allocation
► Each file has its own index block(s) of pointers to its data blocks
► Logical view
Example of Indexed Allocation
Indexed Allocation (Cont.)
► Need index table
► Random access
► Dynamic access without external fragmentation, but have overhead
of index block
Performance
► Best method depends on file access type
► Contiguous great for sequential and random
► Linked good for sequential, not random
► Declare access type at creation -> select either contiguous or linked
► Indexed more complex
► Single block access could require 2 index block reads then data block read
► Clustering can help improve throughput, reduce CPU overhead
Free-Space Management
► File system maintains free-space list to track available blocks/clusters
► (Using term “block” for simplicity)
1) Bit vector or bit map (n blocks)
0 1 2 n-1
…
1 ⇒ block[i] free
bit[i] =
0 ⇒ block[i] occupied
Block number calculation
(number of bits per word) *
(number of 0-value words) +
offset of first 1 bit
CPUs have instructions to return offset within word of first “1” bit
Free-Space Management (Cont.)
► Bit map requires extra space
► Example:
block size = 4KB = 212 bytes
disk size = 240 bytes (1 terabyte)
n = 240/212 = 228 bits (or 32MB)
if clusters of 4 blocks -> 8MB of memory
► Easy to get contiguous files
Linked Free Space List on Disk
● Linked list (free list)
● Cannot get contiguous space easily
● No waste of space
● No need to traverse the entire list (if # free
blocks recorded)
Free-Space Management (Cont.)
► Grouping
► Modify linked list to store address of next n-1 free blocks in first free block, plus a pointer to next block that
contains free-block-pointers (like this one)
► Counting
► Because space is frequently contiguously used and freed, with contiguous-allocation allocation, extents, or
clustering
► Keep address of first free block and count of following free blocks
► Free space list then has entries containing addresses and counts
Free-Space Management (Cont.)
► Space Maps
► Used in ZFS
► Consider meta-data I/O on very large file systems
► Full data structures like bit maps couldn’t fit in memory -> thousands of I/Os
► Divides device space into metaslab units and manages metaslabs
► Given volume can contain hundreds of metaslabs
► Each metaslab has associated space map
► Uses counting algorithm
► But records to log file rather than file system
► Log of all block activity, in time order, in counting format
► Metaslab activity -> load space map into memory in balanced-tree structure, indexed by offset
► Replay log into that structure
► Combine contiguous free blocks into single entry
Efficiency and Performance
► Efficiency dependent on:
► Disk allocation and directory algorithms
► Types of data kept in file’s directory entry
► Pre-allocation or as-needed allocation of metadata structures
► Fixed-size or varying-size data structures