COP 5611: OPERATING SYSTEMS -Spring 2002- Xin Yuan
L e c t u r e # 1: I n t r o d u c t i o n
Topics for today
OS vs distributed OS
Today's topic is from Chapter 1 in the Advanced Concepts in OS text.
Operating systems:
What kind of services are provided by OS?
o Imagine the steps to run a program that prints a single character to the screen
without the help of the OS?
An operating system allows (1) users to access system resources painlessly. (2) system
resources to be used effectively.
o resource management: time management, space management, process
synchronization, deadlock handling, accounting and status information.
o User friendliness: execution environment, error detection and handling,
protection, security, fault tolerance and failure recovery.
Advanced operating systems:
Traditional operating systems: running on stand alone computer with single processors.
Advanced operating systems: distributed operating systems, multiprocessor operating
systems and database operating systems.
o running on a system with multiple (autononus) computers/processors
o usually assuming that the individual computers run traditional operating systems.
o One goal of these advanced operating systems is to hide the details of multiple
computers and make users feel that they are using a single computer.
o we need different kind of os due to the difference of system architectures (share
memory systems and distributed memory systems) and due to the difference of
application requirement (database and realtime)
o An example: a large scale distributed file system: Andrew file system.
Some examples of the issues to be considered in advanced operating systems (this is what we will
be covering in the course)
Example No.1: checkpointing bank transcations.
o What is a consistent state in a distributed system?
Example No. 2: mutual exclusion
o An atomic test-and-set instruction in a single processor system
o What to do in a distributed system?
Summary:
This course will NOT cover techniques used in traditional operating systems.
This course will cover techniques to build a high level system services that manage a
group of individual computers.
L e c t u r e # 2: D i s t r i b u t e d
O p e r a t i n g S y s t e ms : a n
i nt r oduc t i on
Topics for today
Overview of major issues in distributed operating systems
Terminology
Communication models
Remote procedure calls
These topics are from Chapter 4 in the Advanced Concepts in OS text.
What is a distributed system?
It consists of multiple computers that do not share a memory.
Each Computer has its own memory and runs its own operating
system.
The computers can communicate with each other through a
communication network.
See Figure 4.1 for the architecture of a distributed system.
Why build a distributed system?
Microprocessors are getting more and more powerful.
A distributed system combines (and increases) the computing power
of individual computer.
Some advantages include:
o Resource sharing
(but not as easily as if on the same machine)
o Enhanced performance
(but 2 machines are not as good as a single machine that is 2
times as fast)
o Improved reliability & availability
(but probability of single failure increases, as does difficulty of
recovery)
o Modular expandability
Distributed OS's have not been economically successful!!!
System models:
the minicomputer model (several minicomputers with each computer supporting multiple
users and providing access to remote resources).
the workstation model (each user has a workstation, the system provides some common
services, such as a distributed file system).
the processor pool model (the model allocates processor to a user according to the user's
needs).
Where is the knowledge of distributed operating systems likely to be useful?
custom OS's for high performance computer systems
OS subsystems, like NFS, NIS
distributed ``middleware'' for large computations
distributed applications
Issues in Distributed Systems
the lack of global knowledge
naming
scalability
compatibility
process synchronization (requires global knowledge)
resource management (requires global knowledge)
security
fault tolerance, error recovery
Lack of Global Knowledge
Communication delays are at the core of the problem
Information may become false before it can be acted upon
these create some fundamental problems:
o no global clock -- scheduling based on fifo queue?
o no global state -- what is the state of a task? What is a correct program?
Naming
named objects: computers, users, files, printers, services
namespace must be large
unique (or at least unambiguous) names are needed
logical to physical mapping needed
mapping must be changeable, expandable, reliable, fast
Scalability
How large is the system designed for?
How does increasing number of hosts affect overhead?
broadcasting primitives, directories stored at every computer -- these design options will not
work for large systems.
Compatibility
Binary level: same architecture (object code)
Execution level: same source code can be compiled and executed (source code).
Protocol level: only requires all system components to support a common set of protocols.
Process synchronization
test-and-set instruction won't work.
Need all new synchronization mechanisms for distributed systems.
Distributed Resource Management
Data migration: data are brought to the location that needs them.
o distributed filesystem (file migration)
o distributed shared memory (page migration)
Computation migration: the computation migrates to another location.
o remote procedure call: computation is done at the remote machine.
o processes migration: processes are transferred to other processors.
Security
Authetication: guaranteeing that an entity is what it claims to be.
Authorization: deciding what privileges an entity has and making only those privileges
available.
Structuring
the monolithic kernel: one piece
the collective kernel structure: a collection of processes
object oriented: the services provided by the OS are implemented as a set of objects.
client-server: servers provide the services and clients use the services.
Communication Networks
WAN and LAN
traditional operating systems implement the TCP/IP protocol stack: host to network layer, IP
layer, transport layer, application layer.
Most distributed operating systems are not concerned with the lower layer communication
primitives.
Communication Models
message passing
remote procedure call (RPC)
Message Passing Primitives
Send (message, destination), Receive (source, buffer)
buffered vs. unbuffered
blocking vs. nonblocking
reliable vs. unreliable
synchronous vs. asynchronous
Example: Unix socket I/O primitives
# i n c l u d e < s y s /s o c k e t .h >
s s i z e _ t s e n d t o (i n t s o c k e t , c o n s t v o i d *me s s a g e ,
s i z e _ t l e n g t h ,i n t f l a g s ,
c o n s t s t r u c t s o c k a d d r *d e s t _ a d d r , s i z e _ t
d e s t _ l e n );
s s i z e _ t r e c v f r o m(i n t s o c k e t , v o i d *b u f f e r ,
s i z e _ t l e n g t h ,i n t f l a g s ,s t r u c t s o c k a d d r
*a d d r e s s ,
s i z e _ t *a d d r e s s _ l e n );
i n t p o l l (s t r u c t p o l l f d f d s [ ] , n f d s _ t n f d s ,
i n t t i me o u t );
i n t s e l e c t (i n t n f d s , f d _ s e t *r e a d f d s , f d _ s e t
*w r i t e f d s ,
f d _ s e t *e r r o r f d s , s t r u c t t i me v a l *t i me o u t );
You can find more information on these and other socket I/O operations in the Unix man pages.
RPC
With message passing, the application programmer must worry about many details:
parsing messages
pairing responses with request messages
converting between data representations
knowing the address of the remote machine/server
handling communication and system failures
RPC is introduced to help hide and automate these details.
RPC is based on a ``virtual'' procedure call model
client calls server, specifying operation and arguments
server executes operation, returning results
RPC Issues
Stubs (See Unix rpcgen tool, for example.)
o are automatically generated, e.g. by compiler
o do the ``dirty work'' of communication
Binding method
o server address may be looked up by service-name
o or port number may be looked up
Parameter and result passing
Error handling semantics
RPC Diagram
L e c t u r e # 3:
The o r e t i c a l
F o u n d a t i o n s --
Cl o c k s i n a
Di s t r i b u t e d
E n v i r o n me n t
Topics for today
Some inherent limitations of a distributed system and their
implication.
Lamport logical clocks
Vector clocks
These topics are from Chapter 5-5.4 in Advanced Concepts in OS.
Distributed systems
A collection of computers that do not share a common clock and a
common memory
Processes in a distributed system exchange information over the
communication channel, the message delay is unpredictable.
Inherent limitations of a distributed system
Absence of a global clock
Distributed processes cannot rely on having an accurate
view of global state, due to transmission delays.
Effectively, we cannot talk meaningfully about global state.
The traditional notions of "time" and "state" do not work
in distributed systems. We need to develop some
concepts that are corresponding to "time" and "state" in a
uniprocessor system.
Lamport's logical clocks
the "time" concept in distributed systems -- used to order events in a
distributed system.
assumption:
o the execution of a process is characterized by a sequence of
events. An event can be the execution of one instruction or of
one procedure.
o sending a message is one event, receiving a message is one
event.
The events in a distributed system are not total chaos. Under some
conditions, it is possible to ascertain the order of the events.
Lamport's logical clocks try to catch this.
Lamport's ``happened before'' relation
The ``happened before'' relation () is defined as follows:
A B if A and B are within the same process (same sequential thread
of control) and A occurred before B.
A B if A is the event of sending a message M in one process
and B is the event of receiving M by another process
if A B and B C then A C
Event A causally affects event B iff A B.
Distinct events A and B are concurrent (A | | B) if we do not
have A B or B A.
Lamport Logical Clocks
are local to each process (processor?)
do not measure real time
only measure ``events''
are consistent with the happened-before relation
are useful for totally ordering transactions, by using logical clock
values as timestamps
Logical Clock Conditions
Ci is the local clock for process Pi
if a and b are two successive events in Pi, then
Ci(b) = Ci(a) + d1, where d1 > 0
if a is the sending of message m by Pi, then m is assigned
timestamp tm = Ci(a)
if b is the receipt of m by Pj, then
Cj(b) = max{Cj(b), tm + d2}, where d2 > 0
Logical Clock Conditions
The value of d could be 1, or it could be an approximation to the elapsed
real time. For example, we could take d1 to be the elapsed local time,
and d2 to be the estimated message transmission time. The latter solves the
problem of waiting forever for a virtual time instant to pass.
Total Ordering
We can extend the partial ordering of the happened-before relation to a total
ordering on ervents, by using the logical clocks and resolving any ties by an
arbitrary rule based on the processor/process ID.
If a is an event in Pi and b is in Pj, aÞ b iff
Ci(a)< Cj(b) or
Ci(a)=Cj(b) and Pi < Pj
where < is an arbitrary total ordering of the processes
How useful is this? How close to real time?
Example of Lamport Logical Clocks
C(a) < C(b) does not imply a b
That is, the ordering we get from Lamport's clocks is not enough to
guarantee that if two events precede one another in the ordering relation
they are also causally related. The following Vector Clock scheme is
intended to improve on this.
Vector Clocks
Clock values are vectors
Vector length is n, the number of processes
Ci[i](a) = local time of Pi at event a
Ci[j](a) = time Cj[j](b) of last event b at Pj that is known to happen
before local event a
Vector Clock Algorithm
if a and b are successive events in Pi, then Ci[i](b) = Ci[i](a) + d1
if a is sending of m by Pi with vector timestamp tm
b is receipt of m by Pj then
Cj[k](b) = max{Cj[k](b), tm[k]}
Vector Clock Ordering Relation
t = tÛ"i t[i] = t[i]
t ¹ tÛi t[i] ¹ t[i]
t tÛ"i t[i] t[i]
t < tÛ(t tand t ¹ t)
t | | tÛnot (t < tor t < t)
The relation defined above is a partial ordering.
Vector Clocks
a b if ta < tb
b a if tb < ta
otherwise a and b are concurrent
This is not a total ordering, but it is sufficient to guarantee a causal
relationship, i.e.,
a b iff ta < tb
How scalable is this?
Figure 5.5 in the book.
Non-causal Ordering of Messages
Message delivery is said to be causal if the order in which messages are
received is consistent with the order in which they are sent. That is,
if Send(M1) Send (M2) then for every recipient of both messages, M1 is
received before M2.
Enforcing Causal Ordering of Messages
Basic idea: Buffer each message until the message that immediately
precedes it is delivered.
The text describes two protocols for implementing this idea:
Birman-Shiper-Stephenson: uses all broadcast messages
Shiper-Eggli-Sandoz: does not have this restriction
Note: These methods serialize the actions of the system. That makes the
behavior more predictable, but also may mean loss of performance, due to
idle time. That, plus scaling problems, means these algorithms are not likely
to be of much use for high-performance computing.
Birman-Shiper-Stephenson Causal Message Ordering
1. Before Pi broadcasts m, it increments VTPi[i] and timestamps m.
Thus VTPi[i]- 1 is the number of messages from Pi preceding m.
2. When Pj (j ¹ i) receives message m with timestamp VTm from Pi,
delivery is delayed locally until both of the following are satisfied:
1. VTPj[i] = VTm[i] - 1
2. VTPj[k] ³ VTm[k] for all k ¹ i
Delayed messages are queued at each process, sorted by their
vector timestamps, with concurrent messages ordered by time
of receipt.
3. When m is delivered to Pj, VTPj is updated as usual for vector clocks.
Schiper-Eggli-Sandoz Protocol
Generalizes the above, so that messages do not need to be broadcast, but are
just sent between pairs of processes, and the communication channels do not
need to be FIFO.
How would you implement and test the above algorithms?