710 likes | 935 Views
Distributed Synchronization. Clock Synchronization. When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time. Physical Clocks Clock Synchronization.
E N D
Clock Synchronization • When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time.
Physical Clocks Clock Synchronization • Maximum resolution desired for global time keeping determines the maximum difference which can be tolerated between “synchronized” clocks • The time keeping of a clock, its tick rate should satisfy: • The worst possible divergence δ between two clocks is thus: • So the maximum time Δt between clock synchronization operations that can ensure δ is:
Physical Clocks Clock Synchronization • Christian’s Algorithm • Periodically poll the machine with access to the reference time source • Estimate round-trip delay with a time stamp • Estimate interrupt processing time • figure 3-6, page 129 Tanenbaum • Take a series of measurements to estimate the time it takes for a timestamp to make it from the reference machine to the synchronization target • This allows the synchronization to converge within δ with a certain degree of confidence • Probabilistic algorithm and guarantee
Physical Clocks Clock Synchronization • Wide availability of hardware and software to keep clocks synchronized within a few milliseconds across the Internet is a recent development • Network Time Protocol (NTP) discussed in papers by David Mill(s) • GPS receiver in the local network synchronizes other machines • What if all have GPS receivers • Increasing deployment of distributed system algorithms depending on synchronized clocks • Supply and demand constantly in flux
Physical Clocks (1) • Computation of the mean solar day.
Physical Clocks (2) • TAI seconds are of constant length, unlike solar seconds. Leap seconds are introduced when necessary to keep in phase with the sun.
Clock Synchronization Algorithms • The relation between clock time and UTC when clocks tick at different rates.
Cristian's Algorithm • Getting the current time from a time server.
The Berkeley Algorithm • The time daemon asks all the other machines for their clock values • The machines answer • The time daemon tells everyone how to adjust their clock
Lamport Timestamps • Three processes, each with its own clock. The clocks run at different rates. • Lamport's algorithm corrects the clocks.
Example: Totally-Ordered Multicasting • Updating a replicated database and leaving it in an inconsistent state.
Global State (1) • A consistent cut • An inconsistent cut
Global State (2) • Organization of a process and channels for a distributed snapshot
Global State (3) • Process Q receives a marker for the first time and records its local state • Q records all incoming message • Q receives a marker for its incoming channel and finishes recording the state of the incoming channel
The Bully Algorithm (1) • The bully election algorithm • Process 4 holds an election • Process 5 and 6 respond, telling 4 to stop • Now 5 and 6 each hold an election
Mutual Exclusion • Distributed components still need to coordinate their actions, including but not limited to access to shared data • Mutual exclusion to some limited set of operations and data is thus required • Consider several approaches and compare and contrast their advantages and disadvantages • Centralized Algorithm • The single central process is essentially a monitor • Central server becomes a semaphore server • Three messages per use: request, grant, release • Centralized performance constraint and point of failure
Mutual ExclusionDistributed Algorithm Factors • Functional Requirements 1) Freedom from deadlock 2) Freedom from starvation 3) Fairness 4) Fault tolerance • Performance Evaluation • Number of messages • Latency • Semaphore system Throughput • Synchronization is always overhead and must be accounted for as a cost
Mutual ExclusionDistributed Algorithm Factors • Performance should be evaluated under a variety of loads • Cover a reasonable range of operating conditions • We care about several types of performance • Best case • Worst case • Average case • Different aspects of performance are important for different reason and in different contexts
Mutual ExclusionLamport’s Algorithm • Every site keeps a request queue sorted by logical time stamp • Uses Lamport’s logical clocks to impose a total global order on events associated with synchronization • Algorithm assumes ordered message delivery between every pair of communicating sites • Messages sent from site Sj in a particular order arrive at Sj in the same order • Note: Since messages arriving at a given site come from many sources the delivery order of all messages can easily differ from site to site
Lamport’s AlgorithmRequest Resource r • Thus, each site has a request queue containing resource use requests and replies • Note that the requests and replies for any given pair of sites must be in the same order in queues at both sites • Because of message order delivery assumption
Lamport’s AlgorithmEntering CS for Resource r • Site Si enters the CS protecting the resource when • This ensures that no message from any site with a smaller timestamp could ever arrive • This ensures that no other site will enter the CS • Recall that requests to all potential users of the resource and replies from then go into request queues of all processes including the sender of the message
Lamport’s AlgorithmReleasing the CS • The site holding the resource is releasing it, call that site • Note that the request for resource r had to be at the head of the request_queue at the site holding the resource or it would never have entered the CS • Note that the request may or may not have been at the head of the request_queue at the receiving site
Lamport ME Example Pj Pi Pj enters critical section queue(j10) 15 release(i5) queue(j10) 14 Pi in critical section reply(12) reply(12) 13 13 12 12 queue(j10, i5) 11 11 queue(i5) request (i5) request (j10) queue(j10)
Lamport’s AlgorithmComments • Performance: 3(N-1) messages per CS invocation since each requires (N-1) REQUEST, REPLY, and RELEASE messages • Observation: Some REPLY messages are not required • If sends a request to and then receives a REQUEST from with a timestamp smaller than its own REQUEST • need not send a reply to because it already has enough information to make a decision • This reduces the messages to between 2(N-1) and 3(N-1) • As a distributed algorithm there is no single point of failure but there is increased overhead
Ricart and Agrawala • Refine Lamport’s mutual exclusion by merging the REPLY and RELEASE messages • Assumption: total ordering of all events in the system implying the use of Lamport’s logical clocks with tie breaking • Request CS (P) operation: 1) Site requesting the CS creates a message and sends it to all processes using the CS including itself • Messages are assumed to be reliably delivered in order • Group communication support can play an obvious role
Ricart and AgrawalaReceive a CS Request • If the receiver is not currently in the CS and does not have pending request for it in its request_queue • Send REPLY • If the receiver is already in the CS • Queue the request, sending no reply • If the receiver desires the CS but has not entered • Compare the TS of its request to that just received • REPLY if received is newer • Queue the request if pending request is newer
Ricart and Agrawala • Enter a CS • A process enters the CS when it receives a REPLY from every member of the group that can use the CS • Leave a CS • When the process leaves the CS it sends a REPLY to the senders of all pending messages on its queue
Ricart and AgrawalaExample 1 I J K k in CS OK(i) i in CS OK(k) OK(j) OK(j) request(k12) request(i8)
Ricart and AgrawalaExample 2 I J K OK(j) k in CS j in CS OK(i) OK(i) i in CS OK(k) OK(k) OK(j) q(k9) q(j8, k9) q(j8) request(i7) request(j8) request(k9)
Ricart and AgrawalaObservations • The algorithm works because the global logical clock ensures a global total ordering on events • This ensures, in turn, that the decision about who enters the CS is unambiguous • Single point of failure is now N points of failure • A crashed group member cannot be distinguished from a busy CS • Distributed and “optimized” version is N times more vulnerable than the centralized version! • Explicit message denying entry helps reliability and converts this into busy wait
Ricart and AgrawalaObservations • Either group communication support is used, or each user of the CS must keep track of all other potential users correctly • Powerful motivation for standard group communication primitives • Argument against a centralized server said that a single process involved in each CS decision was bad • Now we have N processes involved in each decision • Improvements: get a majority - Makaewa’s algorithm • Bottom Line: a distributed algorithm is possible • Shows theoretical and practical challenges of designing distributed algorithms that are useful
Token Passing Mutex • General structure • One token per CS token denotes permission to enter • Only process with token allowed in CS • Token passed from process to process logical ring • Mutex • Pass token to process i + 1 mod N • Received token gives permission to enter CS • hold token while in CS • Must pass token after exiting CS • Fairness ensured: each process waits at most N-1 entries to get CS
Token Passing Mutex • Correctness is obvious • No starvation since passing is in strict order • Difficulties with token passing mutex • Idle case of no process entering CS pays overhead of constantly passing the token • Lost tokens: diagnosis and creating a new token • Duplicate tokens: ensure generation of only one token • Crashes: require a receipt to detect dead destinations • Receipts double the message overhead • Design challenge: holding time for unneeded token • Too short high overhead, too long high CS latency
Mutex Comparison • Centralized • Simplest and most efficient • Centralized coordinator crashes create the need to detect crash and choose a new coordinator • M/use: 3; Entry Latency: 2 • Distributed • 3(N-1) messages per CS use (Lamport) • 2(N-1) messages per CS use (Ricart & Agrawala) • If any process crashes with a non-empty queue, algorithm won’t work • M/use: 2(N-1); Entry Latency: 2(N-1)
Mutex Comparison • Token Ring • Ensures fairness • Overhead is subtle no longer linked to CS use • M/use: 1 ; Entry Latency: 0 N-1 • This algorithm pays overhead when idle • Need methods for re-generating a lost token • Design Principle: building fault handling into algorithms for distributed systems is hard • Crash recovery is subtle and introduces overhead in normal operation • Performance Metrics: M/use and Entry Latency
Election Algorithms • Centralized approaches often necessary • Best choice in mutex, for example • Need method of electing a new coordinator when it fails • General assumptions • Give processes unique system/global numbers (e.g. PID) • Elect process using a total ordering on the set • All processes know process number of members • All processes agree on new coordinator • All do not know if it is up or down election algorithm is responsible for determining this • Design challenge: network delay vs. crashed peer
Bully Algorithm • Suppose the coordinator doesn’t respond to P1 request • P1 holds an election by sending an election message to all processes with higher numbers • If P1 receives no responses, P1 is the new coordinator • If any higher numbered process responds, P1 ends its election • Process receives an election request • Reply to the sender tells it that it has lost the election • Holds an election of its own • Eventually all but highest surviving process give up • Process recovering from a crash takes over if highest
Bully Algorithm 1 2 5 4 6 0 3 7 • Example: Processes 0-7, 4 detects that 7 has crashed • 4 holds election and loses • 5 holds election and loses • 6 holds election and wins • Message overhead variable • Who starts an election matters • Solid lines say “Am I leader?” • Dotted lines say “you lose” • Hollow lines say “I won” • 6 becomes the coordinator • When 7 recovers it is a bully and sends “I win” to all
Ring Algorithm • Processes have a total order known by all • Each process knows its successor forming a ring • Ring: mod N • So thesuccessor of Pi is P(i+1) mod N • No token involved • Any process Pinoticing that the coordinator is not responding • Sends an election message to its successor P(i+1) mod N • If successor is down, send to next member timeout • Receiving process adds its number to the message and passes it along
Ring Algorithm • When election message gets back to election initiator • Change message to coordinator • Circulate to all members • Coordinator is highest process in the total order • All processes know the order and thus all will agree no matter how the election started • Strength • Only one coordinator chosen • Weakness • Scalability: latency increases with N because the algorithm is sequential
Ring Algorithm • What if more than one process detects a crashed coordinator? • More than one election will be produced: message storm • All messages will contain the same information: member process numbers and order of members • Same coordinator is chosen (highest number) • Refinement might include filtering duplicate messages • Some duplicates will happen • Consider two elections chasing each other • Eliminate one initiated by lower numbered process • Duplicated until lower reaches source of the higher
Global State (3) • Process 6 tells 5 to stop • Process 6 wins and tells everyone
A Ring Algorithm • Election algorithm using a ring.
Mutual Exclusion: A Centralized Algorithm • Process 1 asks the coordinator for permission to enter a critical region. Permission is granted • Process 2 then asks permission to enter the same critical region. The coordinator does not reply. • When process 1 exits the critical region, it tells the coordinator, when then replies to 2
A Distributed Algorithm • Two processes want to enter the same critical region at the same moment. • Process 0 has the lowest timestamp, so it wins. • When process 0 is done, it sends an OK also, so 2 can now enter the critical region.
A Toke Ring Algorithm • An unordered group of processes on a network. • A logical ring constructed in software.
Comparison • A comparison of three mutual exclusion algorithms.
Deadlocks • Definition: Each process in a set is waiting for a resource to be released by another process in set • The set is some subset of all processes • Deadlock only involves the processes in the set • Remember the necessary conditions for DL • Remember that methods for handling DL are based on preventing or detecting and fixing one or more necessary conditions
Deadlocks Necessary Conditions • Mutual exclusion • Process has exclusive use of resource allocated to it • Hold and Wait • Process can hold one resource while waiting for another • No Preemption • Resources are released only by explicit action by controlling process • Requests cannot be withdrawn (i.e. request results in eventual allocation or deadlock) • Circular Wait • Every process in the DL set is waiting for another process in the set, forming a cycle in the SR graph