UNIX SYSTEM OVERVIEW
Chapter1
Topics
1.1. Introduction
1.2. UNIX Architecture
1.3. Logging In
1.4. Files and Directories
1.5. Input and Output
1.6. Programs and Processes
1.7. Error Handling
1.8. User Identification
1.9. Signals
1.10. Time Values
1.11. System Calls and Library Functions
1.12. Summary
1.13. Exercises
Questions!!!!
What is an operating system?
Different types of operating system
Single-user, single task
Single-user, multi-tasking
Multi-user
Real-time operating system(RTOS)
Single-user, single task
As the name implies, this operating system is designed to
manage the computer so that one user can effectively do
one thing at a time.
The Palm OS for Palm handheld computers is a good
example of a modern single-user, single-task operating
system.
Single-user, multi-tasking
This is the type of operating system most people use on
their desktop and laptop computers today.
Microsoft's Windows and Apple's Mac OS platforms are
both examples of operating systems that will let a single
user have several programs in operation at the same
time.
For example, it's entirely possible for a Windows user to
be writing a note in a word processor while downloading a
file from the Internet while printing the text of an e-mail
message.
Multi-user
A multi-user operating system allows many different users to take
advantage of the computer's resources simultaneously.
Unix, VMS and mainframe operating systems, such as MVS, are
examples of multi-user operating systems.
It's important to differentiate between multi-user operating systems
and single-user operating systems that support networking.
Windows 2000 and Novell Netware can each support hundreds or
thousands of networked users, but the operating systems themselves
aren't true multi-user operating systems.
The system administrator is the only "user" for Windows 2000 or
Netware. The network support and all of the remote user logins the
network enables are, in the overall plan of the operating system, a
program being run by the administrative user.
Real-time operating systems
An RTOS performs the same tasks as general OS, but it is
specially designed to run applications with very precise timing
and a high degree of reliability.
(RTOS) - Real-time operating systems are used to control
machinery, scientific instruments and industrial systems.
An RTOS typically has very little user-interface capability, and
no end-user utilities, since the system will be a "sealed box"
when delivered for use.
Example:
VxWorks has a long history in critical applications - for example, in
cars and various NASA space platforms.
eCos or RTLinux.
1.1 Introduction
What is the function of an OS?
All operating systems provide services for programs they run.
Typical services include executing a new program, opening a file,
reading a file, allocating a region of memory, getting the current
time of day, and so on.
All these services are provided by the UNIX.
1.2 UNIX Architecture
Figure 1.1. Architecture of the UNIX operating system
In a strict sense, an operating system can be defined as the
software that controls the hardware resources of the computer and
provides an environment under which programs can run.
The Kernel: Generally, we call this software the kernel, since it is
relatively small and resides at the core of the environment.
The system calls : The interface to the kernel is a layer of
software called the system calls (the shaded portion in Figure 1.1).
A system call is how a program requests a service from an
operating system's kernel.
This may include hardware-related services (for example,
accessing a hard disk drive), creation and execution of new
processes, and communication with integral kernel services such
as process scheduling. System calls provide an essential interface
between a process and the operating system.
Built-in Libraries: Libraries of common functions are built on top of
the system call interface, but applications are free to use both.
The shell: It is a special application that provides an interface for
running other applications.
For Example: Providing a command-line interface (i.e., the shell
prompt or command prompt).
an operating system = kernel + all the other software that makes a
computer useful.
This other software includes system utilities, applications, shells,
libraries of common functions, and so on.
What is the difference between Unix and Linux?
Command-line-wise, almost none,
Linux has a much larger market appeal and following than
any commercial UNIX.
The major difference is Linux is an open source, free to use
operating system.
Unix is an operating system commonly used in workstations
by IBM and PCs by Solaris, Intel, HP etc.
Versions of UNIX
BSD 4.4
SunOS
Solaris
SCO UNIX
AIX
HP/UX
ULTRIX
The freely available versions include Linux and FreeBSD.
1.3 Logging In
Login Name & password.
Password file : /etc/passwd.
If we look at our entry in the password file we see that it's composed of
seven colon-separated fields:
the login name,
encrypted password,
numeric user ID (205),
numeric group ID (105),
a comment field,
Home directory (/home/sar),
and shell program (/bin/ksh).
sar:x:205:105:Stephen Rago:/home/sar:/bin/ksh
In Chapter 6, we'll look at these files and some functions to access
them.
Shells
Once we log in, some system information messages are
typically displayed, and then we can type commands to
the shell program.
A shell is a command-line interpreter that reads user input
and executes commands.
The user input to a shell is normally from the terminal (an
interactive shell) or sometimes from a file (called a shell
script).
The common shells in use are summarized in Figure 1.2.
The system knows which shell to execute for us from the final field in our
entry in the password file.
Bourne shell (sh)
B shell - /bin/sh This is the default Unix shell for many
Unix operating systems .
Bourne shell was written by S. R. Bourne and its more
emphasis is to use it as a scripting language rather than
an interactive shell .
Example:
shell prompt : $
executable file : /bin/sh
Read on interactive/non interactive login to bash
/etc/profile
~/.profile
Bourne Again Shell (bash)
Bash the Bourne again shell was developed by GNU project
.It is based on B shell language and has features of C and K
shells.
Linux uses the Bourne-again shell for its default shell. In fact,
/bin/sh is a link to /bin/bash.
Example:
shell prompt : $
executable file : /bin/bash
Read on interactive/non interactive login to bash
/etc/profile
~/.profile
~/.bash_profile
~/.bash_login
C Shell
This shell was written at the University of California, Berkeley. It
provides a C-like language with which to write shell scripts hence its name.
Example:
shell prompt : %
executable file : /bin/csh
Read on csh shell invocation .
/etc/csh.cshrc
~/.cshrc
Read on interactive/non interactive login to tcsh shell
/etc/.login
~/.login
~/.logout
/etc/csh.login
Korn shell (ksh)
This shell was written by David Korn of Bell labs. It is now provided as
the standard shell on Unix systems.
It provides all the features of the C and TC shells together with a shell
programming language similar to that of the original Bourne shell.
It is the most efficient shell. Consider using this as your standard
interactive shell.
Example:
shell prompt : $
executable file : /bin/ksh
Read on interactive/non interactive login to bash
/etc/profile
~/.profile
TC shell (tcsh)
This shell is available in the public domain. It provides all the features of
the C shell together with emacs style editing of the command line.
The default user shell in FreeBSD and Mac OS X is the TENEX C shell,
but they use the Bourne shell for their administrative shell scripts
because the C shell's programming language is notoriously difficult to
use.
Example:
shell prompt : &
executable file : /bin/tcsh
Read on tcsh shell invocation .
~/.tcshrc
/etc/csh.cshrc
~/.cshrc
Read on interactive/non interactive login to tcsh shell
/etc/.login
~/.login
~/.logout
/etc/csh.login
1.4. Files and Directories
File System
hierarchical arrangement of directories and files.
Everything starts in the directory called root whose name is the
single character /.
A directory is a file that contains directory entries.
directory entry = filename + the attributes of the file.
The attributes of a file are:
type of file: regular file or directory
the size of the file,
the owner of the file,
permissions for the file whether other users may access this file
when the file was last modified.
The stat and fstat functions return a structure of information
containing all the attributes of a file.
In Chapter 4, we'll examine all the attributes of a file in great detail.
The Unix file system looks like an inverted tree structure.
You start with the root directory, denoted
by /, at the top and work down through sub-directories
underneath it.
Filename
The names in a directory are called filenames.
The only two characters that cannot appear in a filename are the slash
character (/) and the null character.
The slash separates the filenames that form a pathname (described
next) and the null character terminates a pathname.
Two filenames are automatically created whenever a new directory is
created:
. (called dot) and .. (called dot-dot).
Dot refers to the current directory, and dot-dot refers to the parent
directory.
In the root directory, dot-dot is the same as dot.
The Research UNIX System and some older UNIX System V file
systems restricted a filename to 14 characters.
BSD versions extended this limit to 255 characters.
Today, almost all commercial UNIX file systems support at least 255character filenames.
Pathname
A sequence of one or more filenames, separated by slashes and
optionally starting with a slash, forms a pathname.
A pathname that begins with a slash is called an absolute
pathname; otherwise, it's called a relative pathname.
Relative pathnames refer to files relative to the current directory.
The name for the root of the file system (/) is a special-case
absolute pathname that has no filename component.
Example
Listing the names of all the files in a directory
Working Directory
Every process has a working directory, sometimes called the
current working directory.
This is the directory from which all relative pathnames are
interpreted.
A process can change its working directory with the chdir function.
For example,
the relative pathname doc/memo/joe refers to the file or directory joe,
The pathname /usr/lib/lint is an absolute pathname that refers to the file
or directory lint in the directory lib, in the directory usr, which is in the
root directory.
Home Directory
When we log in, the working directory is set to our home directory.
Our home directory is obtained from our entry in the password f
cd ~
Takes you back to your home directory
1.5. Input and Output
File Descriptors
File descriptors are normally small non-negative integers that the kernel uses
to identify the files being accessed by a particular process.
Whenever it opens an existing file or creates a new file, the kernel returns a file
descriptor that we use when we want to read or write the file.
Standard Input, Standard Output, and Standard Error
By convention, all shells open three descriptors whenever a new program is
run: standard input, standard output, and standard error.
If nothing special is done, as in the simple command ls then all three are
connected to the terminal.
Unbuffered I/O
Unbuffered I/O is provided by the functions open, read, write, lseek, and close.
These functions all work with file descriptors.
In Chapter 3, we describe the unbuffered I/O functions in more detail.
Standard I/O
The standard I/O functions provide a buffered interface to the
unbuffered I/O functions.
Advantage of using the standard I/O functions is that they simplify
dealing with lines of input.
The fgets function, for example, reads an entire line.
The read function, on the other hand, reads a specified number of
bytes.
The standard I/O library provides functions that let us control the
style of buffering used by the library.
The most common standard I/O function is printf.
1.6. Programs and Processes
Program
A program is an executable file residing on disk in a directory.
A program is read into memory and is executed by the kernel as a result of one
of the six exec functions.
Processes and Process ID
An executing instance of a program is called a process,
Some operating systems use the term task to refer to a program that is being
executed.
The UNIX System guarantees that every process has a unique numeric
identifier called the process ID.
The process ID is always a non-negative integer.
getpid() function is used to get the process ID,
note: #include "apue.h
Process Control
There are three primary functions for process control: fork, exec, and waitpid.
Threads and Thread IDs
Lightweight processes
A thread is a flow of execution through the process code, with its own
program counter, system registers, process related attributes and stack.
Multiple threads of control can exploit the parallelism possible on
multiprocessor systems.
Each thread belongs to exactly one process and no thread can exist outside
a process.
As with processes, threads are identified by IDs. Thread IDs, however, are
local to a process.
A thread ID from one process has no meaning in another process.
We use thread IDs to refer to specific threads as we manipulate the threads
within a process.
Threads
Question: What is the difference between multithreading, multiprocessing and
multitasking?
Multitasking
Figure 1. Single-core systems schedule tasks on 1 CPU to multitask
Multiprocessing
Figure 2. Dual-core systems enable multitasking operating systems to
execute two processes simultaneously
Multithreading
Multithreading extends the idea of multitasking into applications, so you can
subdivide specific operations within a single application into individual
threads.
Each of the threads can run in parallel.
Figure 3. Dual-core system enables multithreading
1.7. Error Handling
When an error occurs in one of the UNIX System
functions, a negative value is often returned, and the
integer errno is usually set to a value that gives additional
information.
For example, the open function returns either a non-
negative file descriptor if all is OK or -1 if an error occurs.
An error from open has about 15 possible errno values,
such as file doesn't exist, permission problem, and so on.
The file <errno.h> defines the symbol errno and constants
for each value that errno can assume
Error Recovery
The errors defined in <errno.h> can be divided into two
categories: fatal and nonfatal.
A fatal error has no recovery action.
The best we can do is print an error message on the
user's screen or write an error message into a log file, and
then exit.
Nonfatal errors, on the other hand, can sometimes be
dealt with more robustly.
Most nonfatal errors are temporary in nature, such as with
a resource shortage, and might not occur when there is
less activity on the system.
1.8. User Identification
User ID
The user ID from our entry in the password file is a numeric value
that identifies us to the system.
This user ID is assigned by the system administrator when our
login name is assigned, and we cannot change it.
The user ID is normally assigned to be unique for every user.
We'll see how the kernel uses the user ID to check whether we
have the appropriate permissions to perform certain operations.
We call the user whose user ID is 0 either root or the superuser.
Group ID
Assigned by the system administrator when our login
name is assigned.
Groups are normally used to collect users together into
projects or departments. This allows the sharing of
resources, such as files, among members of the same
group.
There is also a group file that maps group names into
numeric group IDs. The group file is usually /etc/group.
With every file on disk, the file system stores both the user
ID and the group ID of a file's owner
Storing both of these values requires only four bytes,
assuming that each is stored as a two-byte integer.
getuid(), getgid(); from "apue.h"
1.9. Signals
Signals are a technique used to notify a process that some
condition has occurred
For example, if a process divides by zero, the signal whose
name is SIGFPE (floating-point exception) is sent to the
process.
The process has three choices for dealing with the signal.
Ignore the signal. This option isn't recommended for signals that
denote a hardware exception, such as dividing by zero or referencing
memory outside the address space of the process, as the results are
undefined.
Let the default action occur. For a divide-by-zero condition, the default
is to terminate the process.
Provide a function that is called when the signal occurs (this is called
"catching" the signal). By providing a function of our own, we'll know
when the signal occurs and we can handle it as we wish.
Signals
Many conditions generate signals.
Two terminal keys, called the interrupt key often the DELETE key
or Control-C and the quit key often Control-backslash are used to
interrupt the currently running process.
Another way to generate a signal is by calling the kill function. We
can call this function from a process to send a signal to another
process.
Naturally, there are limitations: we have to be the owner of the
other process (or the superuser) to be able to send it a signal.
1.10. Time Values
UNIX systems have maintained two different time values:
Calendar time.
This value counts the number of seconds since the Epoch: 00:00:00
January 1, 1970, Coordinated Universal Time (UTC). (Older manuals
refer to UTC as Greenwich Mean Time.)
These time values are used to record the time when a file was last
modified, for example.
The primitive system data type time_t holds these time values.
Process time.
This is also called CPU time and measures the central processor
resources used by a process.
Process time is measured in clock ticks, which have historically been
50, 60, or 100 ticks per second.
The primitive system data type clock_t holds these time values.
(We'll show how to obtain the number of clock ticks per second with
the sysconf function in Section 2.5.4.)
Time Values
When we measure the execution time of a process, the
UNIX System maintains three values for a process:
Clock time
User CPU time
System CPU time
Clock time
The clock time, sometimes called wall clock time, is the amount of
time the process takes to run
Its value depends on the number of other processes being run on
the system.
Time Values
The user CPU time
is the CPU time attributed to user instructions.
The system CPU time
is the CPU time attributed to the kernel when it executes on behalf
of the process.
For example, whenever a process executes a system service, such
as read or write, the time spent within the kernel performing that
system service is charged to the process.
The sum of user CPU time and system CPU time is often called the
CPU time.
1.11. System Calls and Library Functions
All operating systems provide service points through
which programs request services from the kernel.
Version 7 of the Research UNIX System provided about
50 system calls,
4.4BSD provided about 110,
SVR4 had around 120.
Linux has anywhere between 240 and 260 system calls,
depending on the version.
FreeBSD has around 320.
Its definition is in the C language,
System Calls and Library Functions
For example, the printf function may use the write system call
to output a string,
but the strcpy (copy a string) and atoi (convert ASCII to integer)
functions don't involve the kernel at all.
Both system calls and library functions appear as normal C
functions.
Both exist to provide services for application programs.
We should realize, however, that we can replace the library
functions, if desired, whereas the system calls usually cannot
be replaced.
Example:
Malloc uses sbrk system call
We can change the implementation of malloc using sbrk system call.
Figure 1.11 shows the relationship between the application, the
malloc function, and the sbrk system call.
Figure 1.11. Separation of malloc function and sbrk system call
Figure 1.12. Difference between C library functions and system calls