1 / 18

Failure Detectors

Failure Detectors. Can we do anything in asynchronous systems?. Reliable broadcast Process j sends a message m to all processes in the system Requirement: If m is delivered by any correct process then it should be delivered by all correct processes

Download Presentation

Failure Detectors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Failure Detectors

  2. Can we do anything in asynchronous systems? • Reliable broadcast • Process j sends a message m to all processes in the system • Requirement: • If m is delivered by any correct process then it should be delivered by all correct processes • Intuition: message m may be received by the process but it may deliver it at a later point • Assumption • A single message send is atomic • If a message is sent, it would be received as long as the receiving processes does not fail

  3. Is this algorithm correct? • Let 1..n be processes in the system • For x = 1 to n • Send m to x • Upon receiving m • Deliver m • What is wrong?

  4. How to fix it? • Let 1..n be processes in the system • We will use this algorithm in our work with failure detectors

  5. Main Requirements • Accuracy • When a process is suspected to have failed, it actually has • Completeness • When a process fails it is suspected • Assumption in this work: no repairs possible

  6. Different Completeness Requirements • Strong completeness • Eventually every process that crashes is permanently suspected by all processes • Weak completeness • Eventually every process that crashes is permanently suspected by some correct process

  7. Different Accuracy Requirements • Strong accuracy • No process is suspected before it crashes • Weak accuracy • Some correct process is never suspected • Eventual strong accuracy • There is a time (unknown to processes themselves) after whichno process is suspected before it crashes • Eventual weak accuracy • There is a time (unknown to processes themselves) after whichsome correct process is never suspected

  8. Classification of Failure Detectors

  9. Reducibility of Detectors • Given a failure detector P can we implement Q? • Given a failure detector Q, can we implement P?

  10. Reducibility of Detectors D TD->D’ D’

  11. Reducibility of Detectors Repeat forever { p queries local failure detector Dp} suspectp = Dp send (p, suspectp) to all [] When receive(q, suspectq) outputp = (outputp  suspectq ) – { q }

  12. Reducibility of Detectors

  13. Solving Consensus with Weak Failure Detector S Phase 1 for x = 1 to n – 1 report the new votes you learnt in the previous round wait until you receive votes from everyone you do not suspect to have failed end for Phase 2 report all the votes you have learnt wait until you receive votes from everyone you do not suspect to have failed Phase 3 Consider only those votes that are known to everyone Choose the vote of the smallest ID process as the decision

  14. Solving Consensus with Weak Failure Detector S • Assume that the number of processes failed is strictly less than n/2 • Round based computation • Coordinator in round x is (x mod n) + 1 • Coordinator is just a process that follows a protocol that slightly differs from others • Otherwise, there are no other assumptions about it

  15. Solving Consensus with Weak Failure Detector S • In each round • Phase 1 • Send your estimates to coordinator • Phase 2: at coordinator • Wait until at least (n+1)/2 messages are received • Use them to decide on a tentative decision • Send tentative decision to all • Phase 3 • Wait until tentative decision received from coordinator or coordinator is suspected • In the former case, send an ack, and revise your estimate to be the tentative decision • In the latter case, send a nack • Phase 4: at coordinator • If (n+1)/2 acks are received then make a final decision and send it using reliable broadcast

  16. Solving Consensus with Weak Failure Detector S • Upon receiving reliable broadcast message • Decide on the value proposed in it

  17. Other Results • S (or,  W) is the weakest failure detector that can be used for solving consensus • P is the weakest failure detector that can be used to solve leader election • The goal of the proposed survey in this area is to study this issue further.

More Related