The UNIX File System
The UNIX File System
An ordinary file is the most common file type containing a stream of data. This type of file can be
further divided into two types: • Text file • Binary file
A text file contains only printable characters. All C and Java program sources, and shell and Perl
scripts, are text files. A text file contains lines where each line is terminated with the linefeed (LF)
character, also known as newline. When you press [Enter] while inserting text in a text editor like vi
or pico, the LF character is appended to every line. You won’t see this character normally, but the cat
-e command makes it visible (by showing a $ at the end of each line). Also, the od command makes
all characters in a file visible.
A binary file, on the other hand, contains both printable and nonprintable characters that cover the
entire ASCII range (0 to 255). Most UNIX commands are binary files, and the object code and
executables that you produce by compiling C programs are also binary files. Picture, sound, and video
files are binary files as well (with few exceptions). Displaying such files with a simple cat command
produces unreadable output and may even disturb your terminal’s settings.
2 Directory file—A folder containing the names of other files and subdirectories as well as a number
associated with each name.
A directory contains no data as such, but maintains some details of the files and subdirectories that it
contains. The UNIX file system is organized with a number of directories and subdirectories. You can
also create them when you need to group a set of files pertaining to a specific application. A
directory file contains an entry for every file and subdirectory that it houses. If you have 20 files in a
directory, there will be 20 entries in the directory. Each entry has two components: • The filename
and • A unique identification number for the file or directory (called the inode number).
If a directory bar contains an entry for a file foo, we commonly (and loosely) say that the directory
bar contains the file foo. Though we’ll often be using the phrase “contains the file” rather than
“contains the filename,” we must not interpret the statement literally. A directory contains the
filename and not the file’s contents. We can’t, however, write a directory file, but we can perform
some action that makes the kernel write a directory. For instance, when we create or remove a file,
the kernel automatically updates its corresponding directory by adding or removing the entry (inode
number and filename) associated with the file.
3 Device file—This represents a device or peripheral. To read or write a device, you have to perform
these operations on its associated file.
We’ll also be printing files, installing software from DVD-ROMs, or backing up files to tape. All of
these activities are performed by reading or writing the file representing the device. For instance,
when we restore files from tape, we read the file associated with the tape drive.
A device file is indeed special; it’s not really a stream of characters. In fact, it doesn’t contain
anything at all. Every file has some attributes that are not stored in the file but are stored elsewhere
on disk. The attributes of a device file entirely govern the operation of the device. The kernel
identifies a device from its attributes and then uses them to operate the device.
There are other types of files. We need to make this distinction between file types because the
significance of a file’s attributes often depends on its type. Execute permission for an ordinary file
means something quite different from that for a directory. We can’t directly put something into a
directory file, and a device file isn’t really a stream of characters. Some commands work with all file
types, but some don’t.
Naming of files
On most UNIX systems today, a filename can consist of up to 255 characters. Files may or may not
have extensions, and can consist of practically any ASCII character except the / and the NULL
character (ASCII value 0). As a general rule you should avoid using unprintable characters in
filenames. Further, since the shell has a special treatment
for characters like $, `, ?, *, & among others, it is recommended that only the following characters be
used in filenames:
• The period (.), hyphen (-), and underscore (_). UNIX imposes no restrictions on the extension, if
any, that a file should have. A shell script doesn’t need to have the .sh extension even though it helps
in identification. But the C compiler expects .c program files and Java expects .java. DOS/Windows
users must also keep these two points in mind:
• A filename can comprise multiple embedded dots; a.b.c.d.e is a perfectly valid filename. Moreover,
a filename can also begin with a dot or end with one.
• UNIX is sensitive to case; chap01, Chap01 and CHAP01 are three different filenames, and it’s
possible for them to coexist in the same directory.
File System Hierarchy
All files in UNIX are organized in a hierarchical i.e. an inverted tree structure. This hierarchy has a top
called root, which serves as the reference point for all files. root is actually a directory that is
represented by a / (frontslash). Don’t mix up the root directory with the user-id root, which is used
by the system administrator to log in. In this text, we’ll be using both the name “root” and the
symbol / to represent the root directory. The root directory (/) has a number of subdirectories under
it. These subdirectories have more subdirectories and other files under them. For instance, home is a
directory under root, and romeo is yet another directory under home. login.sql is presumably an
ordinary file under romeo. Every hierarchy contains parent-child relationships, and we can
conveniently say that romeo is the parent of login.sql, home is the parent of romeo, and / (root) is
the parent of home. We can specify the relationship login.sql has with root by a pathname:
/home/romeo/login.sql. The first / represents the root directory and the remaining /s act as
delimiters of the pathname components. This pathname is appropriately referred to as an absolute
pathname because by using root as the ultimate reference point we can specify a file’s location in an
absolute manner.
For our initial comprehension, we’ll stick to the directories that follow. It helps, from the
administrative point of view at least, to view the entire file system as comprising two groups. The
first group contains the files that are made available during system installation:
• /bin and /usr/bin These are the directories where all the commonly used UNIX commands
(binaries, hence the name bin) are found. Note that the PATH variable always shows these directories
in its list.
• /sbin and /usr/sbin If there’s a command that we can’t execute but the system administrator can,
then it would probably be in one of these directories. You won’t be able to execute mostcommands
in these directories. Only the system administrator’s PATH shows these directories.
• /etc This directory contains the configuration files of the system. We can change a very important
aspect of system functioning by editing a text file in this directory. Our login name and password are
stored in files /etc/passwd and /etc/shadow.
• /dev This directory contains all device files. These files don’t occupy space on disk. There could be
more subdirectories like pts, dsk, and rdsk in this directory.
• /lib and /usr/lib These directories contain all library files in binary form. We need to link our C
programs with files in these directories.
• /usr/include This directory contains the standard header files used by C programs. The statement
#include used in most C programs refers to the file stdio.h in this directory.
• /usr/share/man This is where the man pages are stored. There are separate subdirectories here
(like man1, man2, etc.) that contain the pages for each section. For instance, the man page of ls can
be found in /usr/share/man/man1, where the 1 in man1 represents Section 1 of the UNIX manual.
These subdirectories may have different names on your system (like sman1, sman2, etc., in Solaris).
Users also work with their own files; they write programs, send and receive mail, and create
temporary files. These files are available in the second group:
• /tmp The directories where users are allowed to create temporary files. These files are wiped away
regularly by the system.
• /var The variable part of the file system. Contains all of our print jobs and our outgoing and
incoming mail.
• /home On many systems, users are housed here. romeo would have his home directory in
/home/romeo. However, our system may use a different location for home directories.