1 / 23

State Machine Replication

State Machine Replication. I do Zachevsky Marat Radan Supervisor: Ittay Eyal . Project Presentation. Winter Semester 2010. Goals. Learn and understand Paxos and Python. Design program for fault-tolerant distributed system using the Paxos algorithm.

lyle
Download Presentation

State Machine Replication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. State Machine Replication Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Project Presentation Winter Semester 2010

  2. Goals • Learn and understand Paxos and Python. • Design program for fault-tolerant distributed system using the Paxos algorithm. • Test on a real internet scale system, Planet-Lab.

  3. The Problem – Distributed Storage • Using Distributed Algorithms on a network has many advantages • It also has many problems • This project focuses on the Synchronization Problem

  4. Synchronization • The task: Successfully issue a state machine which involves all the computers of a network • All the computers need to be in sync regarding the Current State and the Next States. • All the computers need to know the transitions.

  5. Problems? • Can any computer choose the next state? • What if a computer disconnects ungracefully? • What if a message is delayed due to congestion? • Other problems… • Solution: Use a dedicated algorithm

  6. A Solution – Paxos • Keeping the Safety requirements ensures an agreed-upon value, by all computers, is chosen • Keeping the Liveness requirements ensures a value will be chosen

  7. Paxos - Background Paxos Made Simple Leslie Lamport 01 Nov 2001 • Paxos Made Live

  8. Principles • The system consists of three agent classes: • Proposers • Acceptors • Learners • Some of them distinguished • Communicate via messages

  9. Principles – continued • A single computer – a Leader – is in charge • Decision cycle in two phases: • A majority must promise to commit to a recent proposal. • Once a majority has committed, all computers are informed of the Decision.

  10. Safety requirements • Only a value that has been proposed may be chosen, • Only a single value is chosen, and • A process never learns that a value has been chosen unless it actually has been.

  11. Liveness requirements • Some proposed value is eventually chosen. • A process can eventually learn the value which has been chosen.

  12. Implementing a State Machine • Collection of servers, each implementing a state machine. • The i-th state machine command in the sequence is the value chosen by the i-th instance of the Paxos consensus algorithm. • A pre-decided set of commands is necessary.

  13. Planet-Lab • Planet-Lab is a global research network that supports the development of new network services. • Understanding the system is required • Monitoring is necessary • Generally, implemented via NSSL-lab.

  14. Project Design • Chosen language for implementation: Python • Network framework: Twisted Matrix • Implementation stages: • Single Decision on NSSL • Multiple Decisions on NSSL • Single Decision on Planet-Lab • Multiple Decisions on Planet-Lab

  15. Implementation • Use Cases • Acceptor disconnects? • Leader disconnects? • At which stage? • Acceptor message fails to deliver?

  16. Implementation • Leader Election • In fact an inherent part of the algorithm • Output and monitoring • Actual output not visible in general • Only via monitoring

  17. Flow • Register Nodes • Verify and install necessary files • Upload • Initiate Monitor • Run and wait for activity • Review results

  18. Implementation – File Structure

  19. Results • Everything works at the NSSL • In Real-Life, not necessarily • Communication phenomena – messages arriving unordered, in large chunks, etc. • Works well for up to 20-30 Nodes • Use cases tested in Lab

  20. Conclusions • Preliminary work needed to understand Twisted Matrix and Planet-Lab • Dealing with network problems • SSH Tunnel instead of “real” monitoring • Requirements fulfilled

  21. Further work • Optimize networking protocol • Improve client-server interface • Inefficient startup – N(N-1) for N machines • Partition Decision processes • Only few nodes decide each resolution

  22. Thank you

More Related