1 / 22

The Consensus Problem in Fault Tolerant Computing

The Consensus Problem in Fault Tolerant Computing. Sajayasree K K ME(CSE) E0 245 Fault Tolerant Computing. The Problem. The consensus problem is to form an agreement among the fault-free members of the resource population on a quantum of information in order

bettyray
Download Presentation

The Consensus Problem in Fault Tolerant Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Consensus Problem in Fault Tolerant Computing Sajayasree K K ME(CSE) E0 245 Fault Tolerant Computing

  2. The Problem The consensus problem is to form an agreement among the fault-free members of the resource population on a quantum of information in order to maintain the performance and integrity of the system.

  3. Organisation • Background • Different approaches • Problem formulation • The PMC model • The Byzantine Agreement • Fault Classification • Testing • Conclusion

  4. Background What is the need for consensus? Connect computer resources to get a system with greater power and availability than any of its parts. The reverse can happen if faulty elements are allowed to corrupt the system.

  5. Fault Byzantine Generals Lamport et al. 1982 System Diagnosis Perperata et al. 1967 Contain the fault Diagnose the fault Two Approaches How to overcome the inadvertent or malicious spread of information by the faulty segment of the population?

  6. General Problem Formulation Reconfiguration Fault Diagnosis or masking Reliable Communication Unreliable communication medium Synchronization General layered approach to fault management

  7. Problems: • Performance • Cost • Distributed and Central voting General Problem Formulation P1 P2 P3 General NMR system

  8. The PMC Model 1967, Preparata, Metze and Chien. Each processor tests another PE. Construct a graph and a syndrome. Conditions: All failures are hard or permanent failures A fault-free processor is always able to determine accurately the condition of the PE it is testing. A faulty processor produces unreliable test results. No more than t PEs may be faulty

  9. The PMC Model A 1 x B E 0 0 D C 0

  10. The Byzantine Agreement Started by work of Wensley et al. in 1978. Software Implemented Fault Tolerance (SIFT) The number of PEs (n) must be greater than 3t, where t is the number of faulty elements. Each processor has a secret value. Values are exchanged by messages. Interactive Consistency: Consistency: Each fault free PE should form an identical vector of values whose ith element corresponds to the ith processor in the system. Meaningfulness: A vector element corresponding to a fault-free processor should be the actual secret value of that processor.

  11. An Example

  12. Byzantine General Problem The Byzantine Generals Problem introduced by Lamport, Shodtak and Pease 1982. Byzantine commanding general, who has surrounded the enemy with his many armies each led by a lieutenant general, wishes to organize a concerted plan of action, i.e., to attack or to retreat.

  13. Fault Classification Analysis of characteristics of fault faulty processor results in proposition of fault models. Fault models proposed define the behavior of a PE once it has become faulty. System Diagnosis: description of test results given the status of tester and tested Byzantine agreement: description of limitations of a faulty processor. In general, the more constraints in the fault model, the easier it will be to form consensus.

  14. Fault Classification: A failure in system Diagnosis Interactions of a faulty PE

  15. Test Validity Models

  16. Fault Classification: A failure in system Diagnosis Fault Class

  17. Fault Classification: A failure in Byzantine Agreement In worst case faulty PEs are assumed to work with complete knowledge about the state of the system :Adversary Model Limitations to adversary model. Defining algorithms that work only for this model can be limiting and impractical. So another classification of faults are introduces where stronger class is a subset of weaker class.

  18. Fault Classification: A failure in Byzantine Agreement

  19. Fault Classification: A failure in Byzantine Agreement Fail Stop Byzantine Fault

  20. Testing

  21. Conclusion Despite their different characteristics, the Byzantine agreement and system diagnosis have very similar goals, namely to produce a correct agreement despite the number of faults. Show similarities of both approaches to allow future research to draw from both areas rather than continuing apart.

  22. References • Michael Barborak, Miroslaw Malek and Anton Dahbura, “The Consensus Problem in Fault-Tolerant Computing”, ACM Computing Surveys, Vol. 25, No. 2, June 1993. • Michael Fischer, Nancy Lynch and Michael Paterson, “Impossibility of Distributed Consensus with One Faulty Process”, Journal of the ACM, April 1985. • PODC Influential Paper Award 2001, http://www.podc.org/influential/2001.html

More Related