Linux System Programming
Part 3 - Filesystem and Files
IBA Bulgaria
2018
Files in Linux
● “Everything is a file.”
● inode - data structure that describes a filesystem object.
● Files are always opened from user space by a name.
● A name and inode pair is called a link.
● Regular files - bytes of data.
● Directories - mapping between filenames and inodes (links).
● Hard links - multiple links map different names to the same inode.
● Symbolic links - like regular files which contain the complete pathname of the linked-to files.
● Special files - block device files, character device files, named pipes, Unix domain sockets.
Filesystems and namespaces
● Linux provides a global and unified namespace of files and directories (with a root ‘/’).
● A filesystem is a collection of files and directories in a formal and valid hierarchy.
● Filesystems may be individually added (mounted) to and removed (unmounted) from the global
namespace of files and directories.
● Some special filesystems - ‘/dev’ and ‘/proc’.
Working with Files in C
● Before a file can be read from or written to, it must be opened.
● Each open instance of a file is given a unique file descriptor (fd).
● File descriptors are represented by the C int type.
● A single file can be opened more than once, by a different or even the same process.
● open() / fopen()- opens file and returns its file descriptor.
● read() / fread() - reads data from a file.
● write() / fwrite() - writes data into a file.
● fflush() - flushes a stream.
● close() - unmaps the a file descriptor with the associated file.
● lseek() / fseek() - set the file position of a file descriptor to a given value.
● fcntl() - manipulate file descriptor, for example for locking.
● errno - number of last error.
Buffered vs unbuffered streams
● Stream is a representation of flow of data from one side to another e.g. from disk to memory and
from memory to disk.
● File is a representation to store data on disk file. File uses streams to store and load data.
● Buffer is (often) used to hold stream data temporarily.
● Characters written to or read from an unbuffered stream are transmitted individually to or from
the file as soon as possible.
● Characters written to or read from a fully buffered stream are transmitted to or from the file in
blocks of arbitrary size.
● Characters written to a line buffered stream are transmitted to the file in blocks when a newline
character is encountered.
Buffered vs unbuffered streams
Unbuffered:
Low-level file routines (unbuffered streams): Contents
immediately
Hi! !iH
● open(), read(), write(), lseek(), etc. available to the
● Part of <unistd.h> library. program
● Work with file descriptors of type int.
● Treat the input/output as binary data.
High-level file routines (buffered streams): Buffered:
Buffer contents
Buffer
available to the
● fopen(), fread(), fwrite(), fseek(), etc. Hi! H i ! program when:
● Part of <stdio.h> library. I’m robot
● Use the object type FILE. ● full buffer
● Treat the input/output as text streams. ● closed stream
● The read/write of the accumulated buffer
● program terminates
Characters accumulated ● new line (*)
can be forced with fflush(). into a buffer ● flush
flags parameter is a bit mask of the following bits:
Basic file processing O_APPEND The file will be opened in append mode.
O_ASYNC SIGIO generated when readable or writable.
O_CREAT If the file doesn’t exist - create it.
O_DIRECT Opened for direct I/O.
O_DIRECTOR If name is not a directory, open( ) will fail.
Y
If flags has O_CREAT set, the mode is a bit mask of: O_EXCL If O_CREAT and file exists, open( ) will fail.
S_IRWXU S_IWGRP O_LARGEFILE A file larger than 2G to be opened.
S_IRUSR S_IXGRP O_NOCTTY This flag is not frequently used.
S_IWUSR S_IRWXO O_NOFOLLOW If name is a symbolic link, open( ) will fail.
S_IXUSR S_IROTH O_NONBLOCK If possible, open in nonblocking mode.
S_IRWXG S_IWOTH O_SYNC The file will be opened for synchronous I/O.
S_IRGRP S_IXOTH O_TRUNC If the file exists, truncated it to zero length.
openclose.c
Open and close a file
Initialize the variables
Open the file as read-only and get the fd
If the fd is -1, print error message
Otherwise, print the value of fd (success)
Close the file
If the close failed, print error message
Reading file contents
● Each call reads up to len bytes into buf from the current file offset of
the file referenced by fd.
● On success, the number of bytes written into buf is returned.
● On error, the call returns -1 and errno is set.
readfile.c
Read and print a file
Initialize the variables
Open ‘./readfile.c’, if failed print error & exit
While reading from file returns length <> 0
If length is -1 and errno is EINTR,
try to read again
If length is -1 and errno <> EINTR,
print error and finish reading
Otherwise print the buffer
Close the file and notify on error
Create and write into a file
● Writes up to count bytes starting at buf to the current file position of the file
referenced by the file descriptor fd.
● On success, the number of bytes written is returned, and the file position is
updated.
● On error, the call returns -1 and errno is set.
writefile.c
Write sentences into file
Initialize the variables
Create empty ‘./sentences.txt’, if failed
print error & exit
Do 100 times
Call getSentence() to get a new
text into the buffer and its length
Write the buffer into the file and get
the number of bytes written
If the number of written bytes is -1,
print error and stop writing.
Close the file and notify the error or success
Seeking in files and Sparse files
● The behavior of fseek() depends on the origin argument: SEEK_CUR, SEEK_END or
SEEK_SET.
● The call returns the new file position on success.
● On error, the call returns -1 and errno is set.
● Seeking after the end of a file and then writing in it causes holes padded with zeros.
● Files with holes are called sparse files.
● Holes do not occupy any physical disk space.
● du - estimate file space usage.
makesparse.c
Make a file with a hole
Initialize the variables
Create a file, its name is the first argument
passed to the program
If failed to open - notify error and exit
Write the name of the file into the file
Jump 16777216 bytes forward into the file
Write the name of the file into the file again
Close the file
Locking files
Use the fcntl() call providing a pointer to a
flock structure. This call manipulates the file
descriptor fd, depending on the command
cmd:
● To lock a block of a file use F_SETLK.
● If the block is already locked use
F_GETLK to get information about the
locking process.
● To unlock a block of a file use
F_SETLK but set flock.l_type =
F_UNLCK.
lockfile.c
Lock and write in there
Initialize the variables
Open for writing or create ‘./testlocks.txt’
While lock of 64 bytes from offset fails:
Get information about the locking
process and print info about it
Move the offset 64 bytes further
Move offset bytes from the start of the file
and write info about the current process
Wait for Enter key press
Unlock the 64 locked bytes, notify if failed
Close the file
Exercise
Program FileManipulator:
Write a program (‘fmanipulator.c’), which takes 3 arguments: words_count, min_length, max_length. The
program should generate a file called ‘words.txt’, which contains words_count words (separated with
spaces). To generate the words use the following function from the code files in ‘/day03/rndword/’ (note
that the function will not generate real words, but rather random sequences of characters):
char * rndword(int min_length, int max_length);
Then the program should print the generated file to the screen.
After the program creates ‘words.txt’ and before writing into it it should lock the first 100 bytes of the
file. When the program prints out the generated words it should wait for Enter key press and then unlock
the file. Respectively, if the file was already locked by another process, the program should notify about
this and wait for Enter key press, before trying again (to lock, generate and write).
Compile and run a couple of instances to test.