FILE-SYSTEM
MODULE 5
INTRODUCTION TO FILE CONCEPTS
• File is a contiguous logical address space
• Basic Types:
• data
• numeric
• character
• binary
• Program
• Other types such as text file, source file, executable file, etc
FILE ATTRIBUTES
• Name – only information kept in human-readable form
• Identifier – unique tag (number) identifies file within file system
• Type – needed for systems that support different types
• Location – pointer to file location on device
• Size – current file size
• Protection – controls who can do reading, writing, executing
• Time, Date, and User Identification – data for protection, security, and
usage monitoring
• Information about files are kept in the directory structure, which is
maintained on the disk.
• File control Block(FCB) stores all the information about a file in a
directory structure.
FILE OPERATIONS
• CREATE: make a new file • APPEND: like write, but only at
the end of the file
• DELETE: remove an existing file
• SEEK: move the “current”
• OPEN: prepare a file to be
pointer elsewhere in the file
accessed
• GET ATTRIBUTES: retrieve
• CLOSE: indicate that a file is no
attribute information
longer being accessed
• SET ATTRIBUTES: modify
• READ: get data from a file
attribute information
• WRITE: put data to a file
• RENAME: change a file’s name
4
FILE TYPES – NAME, EXTENSION
FILE-SYSTEM STRUCTURE
• File structure
• logical storage unit
• collection of related information
• File system resides on secondary storage (disks)
• provided user interface to storage, mapping logical to physical
• provides efficient and convenient access to disk by allowing data to be
stored, located retrieved easily
• disk provides in-place rewrite and random access
• i/o transfers performed in blocks of sectors (usually 512 bytes)
• File control block – storage structure consisting of information about a file
• Device driver controls the physical device
FILE SHARING
• SHARING OF FILES ON MULTI-USER SYSTEMS IS DESIRABLE
• SHARING MAY BE DONE THROUGH A PROTECTION SCHEME
• ON DISTRIBUTED SYSTEMS, FILES MAY BE SHARED ACROSS A
NETWORK
• NETWORK FILE SYSTEM (NFS) IS A COMMON DISTRIBUTED FILE-
SHARING METHOD
• IF MULTI-USER SYSTEM
• USER IDS IDENTIFY USERS, ALLOWING PERMISSIONS AND
PROTECTIONS TO BE PER-USER
GROUP IDS ALLOW USERS TO BE IN GROUPS, PERMITTING GROUP
ACCESS RIGHTS
• OWNER OF A FILE / DIRECTORY
• GROUP OF A FILE / DIRECTORY
PROTECTION
• FILE OWNER/CREATOR SHOULD BE ABLE TO CONTROL:
• WHAT CAN BE DONE
• BY WHOM
• TYPES OF ACCESS
• READ
• WRITE
• EXECUTE
• APPEND
• DELETE
• LIST
FILE ACCESS METHODS
• The information stored in the file can be
accessed using
1. Sequential Access
2. Direct Access
3. Index Access
1. SEQUENTIAL FILE ACCESS
• This is the simplest access method where file is processed in order, one
record after the other.
• A read operation read next reads the next record of the file and
advances the file pointer.
• The write operation write next appends the record to the end of the file.
• These files can only be reset to the beginning of the file.
• It is used in batch processing applications.
read next
write next
reset
no read after last write
(rewrite)
SEQUENTIAL ACCESS
• EX: IF THE CURRENT POSITION OF THE READ WRITE HEAD IS 45 THEN
TO READ THE 75TH RECORD, THE WE NEED TO REACH IT SEQUENTIALLY
ONLY. 45,46,47…..73,74,75
DIRECT ACCESS
• Direct Access method is also called as relative access.
• File is viewed as a numbered sequence of blocks or records.
• Records are read and written randomly.
• The relative block is numbered from 0 and the next block is 1 and so
on.
• Given a logical record of length L, inorder to access record N, it is
searched at the location L*N.
INDEXED METHOD
• Data is organized into records that are stored in data files.
• An Index is constructed for a file, the index contains pointers to the
various records.
• Retrieving records is quick, need not search entire data set.
OPERATIONS PERFORMED ON DIRECTORY
• SEARCH FOR A FILE
• CREATE A FILE
• DELETE A FILE
• LIST A DIRECTORY
• RENAME A FILE
• TRAVERSE THE FILE SYSTEM
ORGANIZE THE DIRECTORY
• EFFICIENCY – LOCATING A FILE QUICKLY
• NAMING – CONVENIENT TO USERS
• TWO USERS CAN HAVE SAME NAME FOR DIFFERENT FILES
• THE SAME FILE CAN HAVE SEVERAL DIFFERENT NAMES
• GROUPING – LOGICAL GROUPING OF FILES BY PROPERTIES,
(E.G., ALL JAVA PROGRAMS, ALL GAMES, …)
SINGLE-LEVEL DIRECTORY
• A SINGLE DIRECTORY FOR ALL USERS
• All files are stored in the same directory.
• When the system has more users then all their files cannot be stored
at the same level.
• When no. of files increase, then it is difficult to retrieve and maintain
the files.
TWO-LEVEL DIRECTORY
• In a two level directory,
Master file directory (MFD)
User File directory (UFD)
• MFD is indexed by the user name or account number.
• To create or delete a user file, the operating system searches within
the UFD.
• It is a tree with height 2, MFD is the root, UFD is the next level files
are stored in the next level.
TREE-STRUCTURED DIRECTORIES
• A two level directory structure can be extended to a tree structure of
any arbitrary height.
• Users can create subdirectories to store the files.
• Every file has a unique path name from the root to a specified file.
Eg. C:/Documents/MCA/ISEM/OS/notes.pdf
• An absolute path begins with the root directory and follows the path
down to the file.
• A relative path defines the path of the specified from the current
directory.
• Current directory specifies the current location.
TREE-STRUCTURED DIRECTORIES
ACCESS LISTS AND GROUPS
• MODE OF ACCESS: READ, WRITE, EXECUTE
• THREE CLASSES OF USERS: RWX
A) OWNER ACCESS 7 111
RWX
B) GROUP ACCESS 6 110
RWX
C) PUBLIC ACCESS 1 001
EG:
• ASK MANAGER TO CREATE A GROUP (UNIQUE NAME), SAY G,
AND ADD SOME USERS TO THE GROUP.
• FOR A PARTICULAR FILE (SAY GAME) OR SUBDIRECTORY,
DEFINE AN APPROPRIATE ACCESS.
owner group public
• ATTACH A GROUP TO A FILE
chmod 761 game
chgrp G game
DIRECTORY IMPLEMENTATION
• Linear list of file names with pointer to the data blocks
• Simple to program
• Time-consuming to execute
• linear search time
• could keep ordered alphabetically via linked list
• Hash table – linear list with hash data structure
• decreases directory search time
• collisions – situations where two file names hash to the same
location
• only good if entries are fixed size, or use chained-overflow method
ALLOCATION METHODS
CONTIGUOUS ALLOCATION
• An Allocation method refers to how disk blocks are
allocated for files:
1. Contiguous Allocation – each file occupies set of
contiguous blocks
• best performance in most cases
• simple – only starting location (block #) and length
(number of blocks) are required
• problems include finding space for file, knowing file size,
external fragmentation, need for compaction.
CONTIGUOUS ALLOCATION
• MAPPING FROM LOGICAL TO PHYSICAL MEMORY
ALLOCATION METHODS
LINKED ALLOCATION
2. linked allocation – each file a linked list of blocks which can be
scatted anywhere.
• A write operation searches for a free block, performs a write
operation and links to the end of the list. A file ends at NIL pointer.
• Directory contain a pointer to the first and the last blocks of the file.
Each block contains a pointer to the next block, etc
• There is no compaction, external fragmentation
• The free space management system is called when new block
needed
• It improve efficiency by clustering blocks into groups but increases
internal fragmentation.
• Locating a block can take many i/os and disk seeks.
LINKED ALLOCATION
Mapping
LINKED ALLOCATION
• FAT (File Allocation Table) – VARIATION OF LINKED ALLOCATION
• Beginning of volume(C: D: , ) has table, indexed by block
number
• Unused blocks are indicated by a 0, to allocate a new block
to a file find 0 valued table entry and replace the previous
end of file with a new block.
block = pointer
• Faster on disk and cacheable
• New Block allocation is simple
• Each file is a linked list of disk blocks: blocks may be
scattered anywhere on the disk
FILE-ALLOCATION TABLE
ALLOCATION METHODS - INDEXED
3. INDEXED ALLOCATION
• Each file has its own INDEX BLOCK(S) of pointers to its
data blocks
• Indexed allocation solves the problem of scattered blocks by
bringing all the pointers together in one location called index
block.
• ith entry in the index block points to the ith block of the file.
• When a new file is created, all pointers in the index block are
set to nil.
• Size of the index block is determined by Linked scheme,
Multilevel Index scheme or combined scheme.
EXAMPLE OF INDEXED ALLOCATION
MULTILEVEL INDEX
COMBINED SCHEME
4K bytes per block, 32-bit addresses
More index blocks than can be addressed with 32-bit file pointer
PERFORMANCE
• Best method depends on file access type
• Contiguous great for sequential and random
• Linked good for sequential, not random
• Declare access type at creation -> select either contiguous or
linked
• Indexed more complex
• Single block access could require 2 index block reads then data block
read
• Clustering can help improve throughput, reduce cpu overhead
FREE-SPACE MANAGEMENT
• FILE SYSTEM MAINTAINS FREE-SPACE LIST TO TRACK
AVAILABLE BLOCKS/CLUSTERS
• (USING TERM “BLOCK” FOR SIMPLICITY)
• BIT VECTOR OR BIT MAP (N BLOCKS)
0 1 2 n-1
1 block[i] free
bit[i] =
0 block[i] occupied
Block number calculation
(number of bits per word) *(number of 0-value words) + offset of first 1 bit
CPUs have instructions to return offset within word of first “1” bit
FREE-SPACE MANAGEMENT (CONT.)
• BIT MAP REQUIRES EXTRA SPACE
• EXAMPLE:
BLOCK SIZE = 4KB = 212 BYTES
DISK SIZE = 240 BYTES (1 TERABYTE)
N = 240/212 = 228 BITS (OR 256 MB)
IF CLUSTERS OF 4 BLOCKS -> 64MB OF MEMORY
• EASY TO GET CONTIGUOUS FILES
• LINKED LIST (FREE LIST)
• CANNOT GET CONTIGUOUS SPACE EASILY
• NO WASTE OF SPACE
• NO NEED TO TRAVERSE THE ENTIRE LIST (IF # FREE BLOCKS RECORDED)
LINKED FREE SPACE LIST ON DISK
FREE-SPACE MANAGEMENT (CONT.)
• Grouping
• Modify linked list to store address of next n-1 free blocks
in first free block, plus a pointer to next block that contains
free-block-pointers (like this one)
• Counting
• Because space is frequently contiguously used and freed,
with contiguous-allocation allocation, extents, or clustering
• Keep address of first free block and count of
following free blocks
• Free space list then has entries containing addresses
and counts
PROTECTION
• FILE OWNER/CREATOR SHOULD BE ABLE TO CONTROL:
• WHAT CAN BE DONE
• AND BY WHOM
• TYPES OF ACCESS:
• READ
• WRITE
• EXECUTE
• APPEND
• DELETE
• LIST
A. Frank - P. Weisberg
RECOVERY
• CONSISTENCY CHECKING – COMPARES DATA IN DIRECTORY
STRUCTURE WITH DATA BLOCKS ON DISK, AND TRIES TO FIX
INCONSISTENCIES
• CAN BE SLOW AND SOMETIMES FAILS
• USE SYSTEM PROGRAMS TO BACK UP DATA FROM DISK TO
ANOTHER STORAGE DEVICE (MAGNETIC TAPE, OTHER
MAGNETIC DISK, OPTICAL)
• RECOVER LOST FILE OR DISK BY RESTORING DATA FROM
BACKUP
DISK MANAGEMENT
DISK STRUCTURE
DISK SCHEDULING
OBJECTIVES
• Seek time: it is the time taken for the disk arm to move
the heads to the cylinder containing the desired sector.
• Rotational latency: it is the time spent waiting for the
disk to rotate the desired sector to the disk head.
• Disk bandwidth =
total no. Of bytes transferred .
Total time b/w first request and completion
• Maximize the throughput
• Minimize the response time
FCFS
FIRST COME FIRST SERVE
The DISK starts from 0 and ends at 199.
98
53 183
37 122
14
124
65
67
EFFICIENCY OF THE SCHEDULING
Total head movement =
(53 to 98) + (98 to 183) + (183 to 37) + (37 to 122) +
(122 to 14) + (14 to 124) + (124 to 65) + (65 to 67)
= 45+85+146+85+108+110+59+2
=640 cylinders
Average head movement=
= Total / no. of requests
= 640/8
= 80 head movements per request.
SSTF
SHORTEST SEEK TIME FIRST
Total head
movement =
12+2+30+23+
53 84+24+2+59
65 =236
67 Average head
37
14 movement=
98 =29.5
122
124
183
SCAN
Total head
movement =
53
16+23+14+65
+2+ 31 + 24+
14 37 +2+59
=236
65
Average
67 head
movement=
98 122
=29.5
124 183
Scans end to end
C-SCAN
Total head
movement =
53 146+37
65 =183
67
98
122
Average head
124 183 movement=
=22.875
14
37
Scans end to end but circular
LOOK
Total head
53
movement
65 = 130+169
67 =305
98
122 Average
head
124 movement=
37 183 =38.125
14
Scans farthest end to farthest end
C-LOOK
Total head
movement =
53
65
67 =153
98 122 Average head
movement=
124 =19.125
14 183
37
Scans farthest end to farthest end but circular