Synchronization
Synchronization
Synchronization
CS755!                        6-1!
                   Synchronization Problem 
         • How processes cooperate and synchronize with one another in a
           distributed system !
           ➡  In single CPU systems, critical regions, mutual exclusion, and other
             synchronization problems are solved using methods such as semaphores.!
           ➡  These methods will not work in distributed systems because they
             implicitly rely on the existence of shared memory. !
           ➡  Examples: !
              ✦    Two processes interacting using a semaphore must both be able to access the
                   semaphore. In a centralized system, the semaphore is stored in the kernel and
                   accessed by the processes using system calls!
              ✦    If two events occur in a distributed system, it is difficult to determine which
                   event occurred first.!
         • How to decide on relative ordering of events!
           ➡  Does one event precede another event?!
           ➡  Difficult to determine if events occur on different machines.!
CS755!                                                                                               6-2!
                                          Issues 
  • Part 1 - Clocks!
         ➡  How to synchronize events based on actual time?!
            ✦    Clock synchronization!
         ➡  How to determine relative ordering?!
            ✦    Logical clocks!
  • Part 2 - Global State and Election!
         ➡  What is the global state of a distributed system?!
         ➡  How do we determine the coordinator of a distributed system?!
CS755!                                                                      6-3!
                      Clock synchronization 
  •  In a centralized system:!
         ➡  Time is unambiguous: A process gets the time by issuing a system call to the
           kernel. If process A gets the time and latter process B gets the time. The value
           B gets is higher than (or possibly equal to) the value A got!
         ➡  Example: UNIX make examines the times at which all the source and object
           files were last modified:!
            ✦    If time (input.c) > time(input.o) then recompile input.c !
            ✦    If time (input.c) < time(input.o) then no compilation is needed !
CS755!                                                                                        6-4!
                  Clock synchronization (2) 
         • In a distributed system:!
            ➡  Achieving agreement on time is not trivial!!
CS755!                                                                                           6-5!
               Logical vs Physical Clocks 
         • Clock synchronization need not be absolute! (due to Lamport, 1978): !
           ➡  If two processes do not interact, their clocks need not be synchronized.!
           ➡  What matters is not that all processes agree on exactly what time is it, but
             rather, that they agree on the order in which events occur.!
           ➡  We will discuss this later under logical clock synchronization .!
CS755!                                                                                       6-6!
         Logical Clock Synchronization 
         • Happens-before relation!
           ➡  If a and b are events in the same process, and a happens-before b, then
             ab is true!
           ➡  If a is the event of a message being sent by one process, and b is the event
             of the same message being received by another process, then ab is also
             true!
           ➡  If two events, a and b, happen in different processes that do not exchange
             messages , then ab is not true, but neither is ba. These events are said
             to be concurrent (ab). !
           ➡  happens-before is transitive: ab and bc  ac !
                            p1
                                     a            b           m1
                            p2                                                                                               Physical
                                                                        c                  d                                  time
                                                                                                          m2
                            p3
                                          e                                                                      f
                                 From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755!                                                      © Addison-Wesley Publishers 2000                                            6-7!
                      Lamport s Algorithm 
         • Capturing happens-before relation!
         • Each process pi has a local monotonically increasing counter, called its
           logical clock Li!
         • Each event e that occurs at process pi is assigned a Lamport timestamp Li(e)!
         • Rules:!
           ➡  Li is incremented before event e is issued at pi: Li := Li + 1!
           ➡  When pi sends message m, it adds t = Li: (m, t) [this is event send(m)]!
           ➡  On receiving (m, t), pj computes Lj := max(Lj, t); Lj := Lj + 1; timestamp event
             receive(m) !                 1           2
                               p1
                                          a           b           m1
                                                                           3                   4
                                                                                                                                Physical
                               p2
                                                                                                                                 time
                                                                            c                 d
                                                                                                            m2
                                              1                                                                      5
                               p3
                                              e                                                                     f
                                    From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755!                                                         © Addison-Wesley Publishers 2000                                            6-8!
                 0!           1!
                                                  Example 
                                                  2!    0!                                                                     1!         2!
                 0!           0!                   0!                                              0!                          0!         0!
                 6!    A!     8!                  10!                                              6!            A!            8!        10!
                12!          16!                  20!                                             12!                         16!        20!
                18!          24!     B!           30!                                             18!                         24!   B!   30!
                24!          32!                  40!                                             24!                         32!        40!
                30!          40!                  50!                                             30!                         40!        50!
                36!          48!     C!           60!                                             36!                         48!   C!   60!
                42!          56!                  70!                                             42!                         61!        70!
                48!    D!    64!                  80!                                             48!             D!          69!        80!
                54!          72!                  90!                                             70!                         77!        90!
                60!          80!                  100!                                            76!                         85!        100!
         a)  Three processes, each with its own clock.                         The clocks run at different rates.!
         b)  Lamport's algorithm corrects the clocks.!
         •    Lamport solution:!
              ➡  Between every two events, the clock must tick at least once!
              ➡  No two events occur at exactly the same time. If two events happen in processes
                  1 and 2, both with time 40, the former becomes 40.1 and the latter becomes 40.2!
                                   From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755!                                                         © Prentice-Hall, Inc. 2007
                                                                                                                                                6-9!
                  Lamport Timestamps 
     a)    The updates have to be ordered the same way across the replicas.!
     b)    This can be achieved by a totally ordered multicast algorithm.!
     c)    Totally ordered multicasting can be achieved by means of Lamport clocks.!
                            From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755!                                                  © Prentice-Hall, Inc. 2007
                                                                                                                       6-10!
                 Part 2 
         Global State & Election
              Algorithms 
CS755!                             6-11!
                                   Global State 
         • Sometimes it is useful to know the global state of a distributed system!
           ➡  Garbage collection!
                                                                                         p1!                                    p2!
                                                                          Object!                        message	
                                                                          reference!
                                                                                                                               garbage object	
           ➡  Deadlocks!
                                                                                       p1!                wait-for	
           p2!
                                                                                                          wait-for	
           ➡  Termination!
                                                                                   p1!                                                p2!
                                                                                                             activate	
                                                                                       passive	
                          passive	
                             From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755!                                                  © Addison-Wesley Publishers 2000                                                           6-12!
                     Distributed Snapshot 
         • Represents a state in which the distributed system might have been.!
           ➡  Consistent global state: If A sends a message to B and and the system
              records B receiving the message, it should also record A sending it.!
             !!            0           1               2                                                      3
                          e1         e1             e1                                                     e1
                     p1
                                           m1                                                             m2
                                                                                                                           Physical
                     p2
                                                                   0            1                   2                        time
                                                                e2          e2                   e2
                                                      Inconsistent cut
                                                                                                 Consistent cut
                               From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755!                                                    © Addison-Wesley Publishers 2000                                            6-13!
                             Election Problem 
  • Many algorithms require one process to act as a coordinator!
            ✦    Example: Coordinator in the centralized mutual exclusion algorithm !
  • In general, it does not matter which process takes on this special
         responsibility, but one has to do it. !
  • How to elect a coordinator?                !!
CS755!                                                                                  6-14!
                           General Approach 
  • Assumptions:!
         ➡  Each process has a unique number (e.g., its network address) to distinguish
           them and hence to select one!
         ➡  One process per machine: For simplicity!
         ➡  Every process knows the process number of every other process!
         ➡  Processes do not know which processes are currently up and which ones are
           currently down!
  • Approach:!
         ➡  Locate the process with the highest process number and designate it as
           coordinator.!
         ➡  Election algorithms differ in the way they do the location !
CS755!                                                                                    6-15!
                      The Bully Algorithm 
         • When a process, P, notices that the coordinator is no longer responding
           to requests, it initiates an election:!
           ➡  P sends an ELECTION message to all processes with higher numbers!
           ➡  If no one responds, P wins the election and becomes coordinator!
           ➡  If one of the higher-ups answers, it takes over. P s job is done!
         • When a process gets an ELECTION message from one of its lower-
           numbered colleagues:!
           ➡  Receiver sends an OK message back to the sender to indicate that he is
              alive and will take over!
           ➡  Receiver holds an election, unless it is already holding one!
           ➡  Eventually, all processes give up but one, and that one is the new
              coordinator.!
           ➡  New coordinator announces its victory by sending all processes a
              message telling them that starting immediately it is the new coordinator!
         • If a process that was previously down comes back:!
           ➡  It holds an election. If it happens to be the highest process currently
             running, it will win the election and take over the coordinator s job!
CS755!
         • The biggest guy in town always wins!!                                          6-16!
                                                        Example 
                 2!          1!                                        2!                  1!                                        2!   1!
         4!
                       Election!
                                                            4!
                                                                              OK!                                            4!
                                   5!                                                                  5!                                      5!
                                                                              OK!
              Elec
                                                                                                                                                 Election!
                  tion
         0!                        6!                      0!                                         6!                     0!                6!
                       !
                 7!          3!                                        7!                 3!                                         7!   3!
                                    Previous!
                                   coordinator!
                                   has crashed!
2! 1! 2! 1!
         4!                         5!                                   4!                                         5!
                                         OK!                                                                             Coordinator!
         0!                        6!                                   0!                                         6!
                  7!         3!                                                     7!                 3!
                                                                                                                 Coordinator!
                                          From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755!                                                                © Prentice-Hall, Inc. 2002
                                                                                                                                                        6-17!
                             A Ring Algorithm 
         • Use a ring (processes are physically or logically ordered, so that each
           process knows who its successor is). No token is used. !
         • Algorithm: !
           ➡  When any process notices that coordinator is not functioning: !
              ✦    Builds an ELECTION message (containing its own process number)!
              ✦    Sends the message to its successor (if successor is down, sender skips over it and
                   goes to next member along the ring, or the one after that, until a running process
                   is located)!
              ✦    At each step, sender adds its own process number to the list in the message!
           ➡  When the message gets back to the process that started it all: !
              ✦    Process recognizes the message containing its own process number!
              ✦    Changes message type to COORDINATOR!
              ✦    Circulates message once again to inform everyone else: Who the new
                   coordinator is (list member with highest number); Who the members of the new
                   ring are!
              ✦    When message has circulated once, it is removed!
CS755!                                                                                                  6-18!
                                        Example 
                               5!                                                             Election message!
                               6!               1!                               2!
                               0!                                                               2!
                          0!                                                                          3!
                                                                                                             2!
                                           5!                                                                3!
                                           6!
     Previous coordinator! 7!                                                                         4!
          has crashed!
                        No response!
                                                6!                               5!
                                                                  5!
                                                         Election
                                                         message!
                          From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755!                                                © Prentice-Hall, Inc. 2007
                                                                                                                     6-19!
                 Part 3 
         Sharing in Distributed
               Systems 
CS755!                            6-20!
              Mutual Exclusion Problem 
  • Systems involving multiple processes are often programmed using critical
         regions.!
  • When a process has to read or update certain shared data structure, it first
         enters a critical region to achieve mutual exclusion, i.e., ensure that no
         other process will use the shared data structure at the same time !
  • In single-processor systems, critical regions are protected using
         semaphores, and monitors or similar constructs. How to accomplish the
         same in distributed systems?!
         ➡  We assume, for now, that transactions are not supported by the system!
CS755!                                                                                6-21!
                Protocols & Requirements 
         •    Application-level protocol !
              enter() !     !//enter critical section - block if necessary!
              resourceAccesses() !// access shared resources in critical section!
              exit() !      !// leave critical section - other processes may enter!
         •    Requirements!
              1.  At most one process may execute in the critical section (safety)!
              2.  Requests to enter and exit the critical section eventually succeed
                  (liveness)!
              3.  If one request to enter the critical section happened-before another,
                  then entry to the critical section is granted in that order ( order)!
CS755!                                                                                     6-22!
                A Centralized Algorithm  
               (Central Server Algorithm) 
  • Shortcomings!
         ➡  Coordinator: A single point of failure; A performance bottleneck!
         ➡  If processes normally block after making a request, they cannot distinguish a
           dead coordinator from permission denied !
CS755!                                                                                      6-24!
                   A Distributed Algorithm 
         • Based on a total ordering of events in a system (happens-before
           relation)!
         • Algorithm: !
           ➡  When a process wants to enter a critical region: !
              ✦    Builds a message: {name of critical region; process number; current time}!
              ✦    Sends the message to all other processes (assuming reliable transfer)!
           ➡  When a process receives a request message from another process: !
              ✦    If the receiver is not in the critical region and does not want to enter it, it
                   sends back an OK message to the sender!
              ✦    If the receiver is already in the critical region, it does not reply. Instead it
                   queues the request!
              ✦    If the receiver wants to enter the critical region but has not yet done so, it
                   compares the timestamp with the one contained in the message it has sent
                   everyone: If the incoming message is lower, the receiver sends back an OK
                   message; otherwise the receiver queues the request and sends nothing!
CS755!                                                                                                6-25!
                                          Example 
         a)    Two processes want to enter the same critical region at the same
               moment.
         b)    Process 0 has the lowest timestamp, so it wins.
         c)    When process 0 is done, it sends an OK also, so 2 can now enter
               the critical region.
                           From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755!                                                 © Prentice-Hall, Inc. 2007
                                                                                                                      6-26!
               A Token Ring Algorithm 
                              Processes!
         0!   2!    4!   9!    7!         1!        6!         5!         8!        3!
                              Network!                                                                                         2!
                                                                                                                  1!                 3!
0! 4!
9! 5!
                                    From Tanenbaum and van Steen, Distributed Systems: Principles and Paradigms, 2nd edition
CS755!                                                          © Prentice-Hall, Inc. 2007
                                                                                                                                               6-27!
             A Token Ring Algorithm (2) 
  • Pros:!
         ➡  Guarantees mutual exclusion (only 1 process has the token at any instant)!
         ➡  No starvation (token circulates among processes in a well-defined order)!
  • Cons:!
         ➡  Regeneration of the token if it is ever lost: Detecting that the token is lost is
           difficult, since the amount of time between successive appearances of the
           token on the network is unbounded!
         ➡  Recovery if a process crashes: It is easier than in other cases though. If we
           require a process receiving the token to acknowledge receipt. A dead process
           is detected when its neighbor tries to give the token and fails. Dead process is
           then removed and the token handed to the next process down the line.!
CS755!                                                                                          6-28!
                        Comparison of the 3
                           Algorithms 
                               Messages per                            Delay before entry
                 Algorithm                                                                                           Problems
                               entry/exit                              (in message times)
                                                                                                                     Crash of any
                 Distributed   2(n–1)                                  2(n–1)
                                                                                                                     process
                                                                                                                     Lost token,
                 Token ring    1 to '                                 0 to n – 1
                                                                                                                     process crash
                                            System may be
                         System in a        temporarily in an    System in a
                         consistent         inconsistent state   consistent
                         state              during execution     state
CS755!                                                                            6-30!
                    Transaction Primitives 
  • Special primitives required for programming using transactions!
         ➡  Supplied by the operating system or by the language runtime system!
CS755!                                                                            6-31!
             Centralized Transaction
                   Execution 
                       User!                                          User!
                                                  …!
                     Application !                                  Application !
                              Read, Write, !
                                                         Results!
                              Abort, EOT!
                                               Scheduler!
                                                 (SC)!
                              Scheduled!
                                                         Results!
                              Operations!
                                                Recovery!
                                                Manager!
                                                 (RM)!
CS755!                                                                                        6-32!
                 Distributed Transaction
                        Execution 
                 User application!
                                                                   Local!
                       RM!                            RM!         Recovery!
                                                                  Protocol!
CS755!                                                                            6-33!
         Properties of Transactions 
          ATOMICITY!
             ➡  All or nothing!
             ➡  Multiple operations combined as an atomic transaction!
          CONSISTENCY!
             ➡  No violation of integrity constraints!
             ➡  Transactions are correct programs!
          DURABILITY!
             ➡  Committed updates persist!
             ➡  Database recovery!
CS755!                                                                   6-34!
             Transactions Provide… 
  • Atomic and reliable execution in the presence of failures!
  • Correct execution in the presence of multiple user accesses !
  • Correct management of replicas (if they support it) !
CS755!                                                              6-35!
                       Lost Update Problem 
          Initial values: A=100, B=200, C=300
         Transaction 	
T	
:	
                                                      Transaction 	
U:	
	
         balance = b.getBalance();	
                                                 balance = b.getBalance();	
         b.setBalance(balance*1.1);	
                                                b.setBalance(balance*1.1);	
         a.withdraw(balance/10);	
                                                   c.withdraw(balance/10);	
         balance = b.getBalance();	
                      $200	
                                                                                      balance = b.getBalance();	
             $200	
                                                                                      b.setBalance(balance*1.1);	
            $220	
         b.setBalance(balance*1.1);	
                     $220	
         a.withdraw(balance/10);	
                          $80	
                                                                                      c.withdraw(balance/10);	
               $280	
                                   From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755!                                                        © Addison-Wesley Publishers 2000                                           6-36!
         Inconsistent Retrievals Problem 
   Initial values: A=200, B=200
         Transaction V: 	
                                              Transaction W: 	
         a.withdraw(100);	
                                                                         aBranch.branchTotal();	
         b.deposit(100);	
         a.withdraw(100);	
                  $100	
                                                                         total = a.getBalance();	
                        $100	
                                                                         total = total+b.getBalance();	
                  $300	
                                                                         total = total+c.getBalance();	
         b.deposit(100);	
                   $300	
                               From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755!                                                    © Addison-Wesley Publishers 2000                                           6-37!
                              Conflict Rules 
 Operations of different	
 Conflict	
                                                                  Reason	
     transactions	
    read	
      read	
     No	
                    Because the effect of a pair of 	
 read	
 operations	
                                                      does not depend on the order in which they are	
                                                      executed	
    read	
      write	
    Yes	
                   Because the effect of a 	
 read	
 and a 	
write	
 operation	
                                                      depends on the order of their execution	
 	
    write	
     write	
    Yes	
                   Because the effect of a pair of 	
 write	
 operations	
                                                      depends on the order of their execution	
 	
                             From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755!                                                  © Addison-Wesley Publishers 2000                                 6-38!
         Resolving Lost Update Example 
         Transaction T: 	
                                                      Transaction U: 	
         balance = b.getBalance();	
                                            balance = b.getBalance();	
         b.setBalance(balance*1.1);	
                                           b.setBalance(balance*1.1);	
         a.withdraw(balance/10);	
                                              c.withdraw(balance/10);	
         balance = b.getBalance();	
                $200	
         b.setBalance(balance*1.1);	
               $220	
                                                                                 balance = b.getBalance();	
              $220	
                                                                                 b.setBalance(balance*1.1);	
             $242	
         a.withdraw(balance/10);	
                     $80	
                                                                                 c.withdraw(balance/10);	
                $278	
                               From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755!                                                    © Addison-Wesley Publishers 2000                                           6-39!
     Resolving Inconsistent Retrieval
                Problem 
     Transaction V: 	
                                                 Transaction W:	
     a.withdraw(100);	
                                                                        aBranch.branchTotal();	
     b.deposit(100);	
         a.withdraw(100);	
                 $100	
         b.deposit(100);	
                  $300	
                                                                        total = a.getBalance();	
                         $100	
                                                                        total = total+b.getBalance();	
                   $400	
                                                                        total = total+c.getBalance();	
                                                                        ...	
                               From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755!                                                    © Addison-Wesley Publishers 2000                                           6-40!
                   Distributed Transaction
                        Serializability 
  • Somewhat more involved. Two execution histories have to be considered:!
         ➡  local histories !
         ➡  global history!
CS755!                                                                                      6-41!
         Locking-Based Algorithms 
  • Transactions indicate their intentions by requesting locks from the
    scheduler (called lock manager).!
  • Locks are either read lock (rl) [also called shared lock] or write lock (wl)
    [also called exclusive lock]!
  • Read locks and write locks conflict (because Read and Write operations are
    incompatible)!
   ! !        ! rl ! wl!
   ! !rl      ! yes !no!
   ! !wl ! no !no!
  • Locking works nicely to allow concurrent processing of transactions.!
CS755!                                                                             6-42!
             Two-Phase Locking (2PL) 
         1.  A transaction locks an object before using it.!
         2.  When an object is locked by another transaction, the requesting
             transaction must wait.!
         3.  When a transaction releases a lock, it may not request another
             lock.!
                                                       Lock point!
                                                                                  Obtain lock!
                                                                                  Release lock!
                   No. of locks!
                                            Phase 1!                 Phase 2!
                                   BEGIN!                                       END!
CS755!                                                                                            6-43!
                                   Strict 2PL 
         Hold locks until the end.
              No. of locks!
Obtain lock!
Release lock!
                                                               Transaction!
                              BEGIN!                END!       duration!
                                       period of!
                                       use!
CS755!                                                                        6-44!
                     Locking Example 
         Transaction T:	
 	
                                               Transaction U: 	
                                                                                                              	
         balance = b.getBalance();	
                                        balance = b.getBalance();	
         b.setBalance(bal*1.1);	
                                           b.setBalance(bal*1.1);	
         a.withdraw(bal/10);	
                                              c.withdraw(bal/10);	
         Operations	
                    Locks	
                           Operations	
                           Locks	
         openTransaction	
         bal = b.getBalance();	
 write lock B 	
         b.setBalance(bal*1.1);	
                                           openTransaction	
         a.withdraw(bal/10);	
           write lock A 	
                   bal = b.getBalance();	
 waits for T’s	
                                                                                                      lock on B 	
         closeTransaction	
              unlock A, B 	
                      	
                                                                                                                     write lock B 	
                                                                             b.setBalance(bal*1.1);	
                                                                             c.withdraw(bal/10);	
    write lock C 	
                                                                             closeTransaction	
                     unlock B, C 	
                         From Coulouris, Dollimore and Kindberg, Distributed Systems: Concepts and Design, 3rd ed.
CS755!                                              © Addison-Wesley Publishers 2000                                                    6-45!