Distributed Systems: Synchronization

Distributed Systems:Synchronization Lecturer: RoohollahAbdipour Acknowledgment: Many of the slides are based on slides by Prof. Paul Krzyzanowski at Rutgers University, pxk@cs.rutgers.edu

Clock Synchronization

Logical vs. physical clocks Logical clock keeps track of event ordering • among related (causal) events Physical clocks keep time of day • Consistent across systems

Physical clocks in computers Real-time Clock: CMOS clock (counter) circuit driven by a quartz oscillator • battery backup to continue measuring time when power is off OS generally programs a timer circuit to generate an interrupt periodically • e.g., 60, 100, 250, 1000 interrupts per second(Linux 2.6+ adjustable up to 1000 Hz) • Programmable Interval Timer (PIT) – Intel 8253, 8254 • Interrupt service procedure adds 1 to a counter in memory

Problem Getting two systems to agree on time • Two clocks hardly ever agree • Quartz oscillators oscillate at slightly different frequencies Clocks tick at different rates • Create ever-widening gap in perceived time • Clock Drift Difference between two clocks at one point in time • Clock Skew

8:00:00 8:00:00 Sept 18, 2006 8:00:00

8:01:24 8:01:48 Skew = +84 seconds +84 seconds/35 days Drift = +2.4 sec/day Skew = +108 seconds +108 seconds/35 days Drift = +3.1 sec/day Oct 23, 2006 8:00:00

Perfect clock Computer’s time, C UTC time, t

Drift with slow clock skew Computer’s time, C UTC time, t

Drift with fast clock skew Computer’s time, C UTC time, t

Dealing with drift Assume we set computer to true time Not good idea to set clock back • Illusion of time moving backwards can confuse message ordering and software development environments

Dealing with drift Go for gradual clock correction If fast: Make clock run slower until it synchronizes If slow: Make clock run faster until it synchronizes

Dealing with drift OS can do this: Change rate at which it requests interrupts e.g.:if system requests interrupts every 17 msec but clock is too slow: request interrupts at (e.g.) 15 msec Or software correction: redefine the interval Adjustment changes slope of system time: Linear compensating function

Clock synchronized skew Linear compensatingfunction applied Compensating for a fast clock Computer’s time, C UTC time, t

Compensating for a fast clock Computer’s time, C UTC time, t

Getting accurate time • Attach GPS receiver to each computer ± 1 msec of UTC • Attach WWV radio receiver Obtain time broadcasts from Boulder or DC ± 3 msec of UTC (depending on distance) • Attach GOES receiver ± 0.1 msec of UTC Not practical solution for every machine • Cost, size, convenience, environment

Getting accurate time Synchronize from another machine • One with a more accurate clock Machine/service that provides time information: Time server

RPC Simplest synchronization technique • Issue RPC to obtain time • Set time what’s the time? client server 3:42:19 Does not account for network or processing latency

Tserver Cristian’s algorithm Compensate for delays • Note times: • request sent: T0 • reply received: T1 • Assume network delays are symmetric server request reply client time T0 T1

Tserver server request reply client time T0 T1 Cristian’s algorithm Client sets time to: = estimated overhead in each direction

Error bounds If minimum message transit time (Tmin) is known: Place bounds on accuracy of result

range = T1-T0-2Tmin Tmin Tmin Earliest time message arrives Latest time message leaves Error bounds Tserver server request reply client time T0 T1 accuracy of result =

Cristian’s algorithm: example • Send request at 5:08:15.100 (T0) • Receive response at 5:08:15.900 (T1) • Response contains 5:09:25.300 (Tserver) • Elapsed time is T1 -T0 5:08:15.900 - 5:08:15.100 = 800 msec • Best guess: timestamp was generated 400 msec ago • Set time to Tserver+ elapsed time 5:09:25.300 + 400 = 5:09.25.700

200 200 Cristian’s algorithm: example If best-case message time=200 msec T0 = 5:08:15.100 T1 = 5:08:15.900 Ts = 5:09:25:300 Tmin = 200msec Tserver server request reply client time T0 T1 800 Error =

Berkeley Algorithm • Gusella & Zatti, 1989 • Assumes no machine has an accurate time source • Obtains average from participating computers • Synchronizes all clocks to average

Berkeley Algorithm • Machines run time dæmon • Process that implements protocol • One machine is elected (or designated) as the server (master) • Others are slaves

Berkeley Algorithm • Master polls each machine periodically • Ask each machine for time • Can use Cristian’s algorithm to compensate for network latency • When results are in, compute average • Including master’s time • Hope: average cancels out individual clock’s tendencies to run fast or slow • Send offset by which each clock needs adjustment to each slave • Avoids problems with network delays if we send a time stamp

Berkeley Algorithm Algorithm has provisions for ignoring readings from clocks whose skew is too great • Compute a fault-tolerant average If master fails • Any slave can take over

Berkeley Algorithm: example

3:25 2:50 9:10 3:00 Berkeley Algorithm: example 3:25 9:10 2:50 1. Request timestamps from all slaves

3:25 2:50 9:10 3:00 Berkeley Algorithm: example 3:25 9:10 2:50 2. Compute fault-tolerant average:

3:25 2:50 9:10 3:00 Berkeley Algorithm: example +0.15 -0:20 -6:05 +0:15 3. Send offset to each client

Logical Clocks

Logical clocks Assign sequence numbers to messages • All cooperating processes can agree on order of events • vs. physical clocks: time of day Assume no central time source • Each system maintains its own local clock • No total ordering of events • No concept of happened-when

Happened-before Lamport’s “happened-before” notation a b event a happened before event b e.g.: a: message being sent, b: message receipt Transitive: if ab and bc then ac

Logical clocks & concurrency Assign “clock” value to each event • if ab then clock(a) < clock(b) • since time cannot run backwards If a and b occur on different processes that do not exchange messages, then neither a  b nor b  a are true • These events are concurrent

Event counting example • Three systems: P0, P1, P2 • Events a, b, c, … • Local event counter on each system • Systems occasionally communicate

Event counting example e a b c d f P1 3 4 5 1 2 6 g h i P2 2 1 3 j k P3 1 2

Event counting example e a b c d f P1 3 4 5 1 2 6 g h i P2 2 1 3 j k P3 1 2 Bad ordering: e  h f  k

Lamport’s algorithm • Each message carries a timestamp of the sender’s clock • When a message arrives: • if receiver’s clock < message timestamp set system clock to (message timestamp + 1) • else do nothing • Clock must be advanced between any two events in the same process

Lamport’s algorithm Algorithm allows us to maintain time ordering among related events • Partial ordering

Event counting example e a b c d f P1 3 4 5 1 2 6 g h i P2 2 1 7 6 j k P3 1 2 7

Summary • Algorithm needs monotonically increasing software counter • Incremented at least when events that need to be timestamped occur • Each event has a Lamport timestamp attached to it • For any two events, where a  b: L(a) < L(b)

Mutual Exclusion &Election Algorithms

Process Synchronization Techniques to coordinate execution among processes • One process may have to wait for another • Shared resource (e.g. critical section) may require exclusive access

Centralized Systems Mutual exclusion via: • Test & set in hardware • Semaphores • Messages • Condition variables

Distributed Mutual Exclusion • Assume there is agreement on how a resource is identified • Pass identifier with requests • Create an algorithm to allow a process to obtain exclusive access to a resource.

Centralized algorithm • Mimic single processor system • One process elected as coordinator request(R) C • Request resource • Wait for response • Receive grant • access resource • Release resource grant(R) P release(R)

Distributed Systems: Synchronization