1 / 20

Fault-Tolerance Enablers in Legion

Fault-Tolerance Enablers in Legion. Anh Nguyen-Tuong February 18, 1997. Fault-Tolerance. Fault-Tolerance is one of the The Ten Challenges - “SHEEMSPRA F ” “Millions of Hosts, Billions of objects” Very high probability of failure scale networks

bedros
Download Presentation

Fault-Tolerance Enablers in Legion

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fault-Tolerance Enablers in Legion Anh Nguyen-Tuong February 18, 1997

  2. Fault-Tolerance • Fault-Tolerance is one of the The Ten Challenges - “SHEEMSPRAF” • “Millions of Hosts, Billions of objects” • Very high probability of failure • scale • networks • Writing distributed/parallel fault-tolerant applications is hard

  3. The Vision

  4. Literature full of FT protocols Most of these are never implemented FT protocols are difficult to understand write correctly reuse FT protocols thus the domain of experts Short-Term Reality

  5. Key Enabling Technologies “Use the FORCE” • Flexible and extensible protocol stack • Objects • Reflective architecture • Computation graphs • Exception propagation model

  6. Architecture

  7. Flexible & Extensible Protocol Stack • Event-based abstraction for building & extending the protocol stack • n.b.: actually more like a protocol graph • Fault-tolerance “wrapping” technology • Fundamental events for FT • messageIn, messageOut, messageError • methodIn, methodOut, methodError

  8. Objects • Legion architecture is object-oriented • n.b.: does not imply OO language! • Advantages of objects for fault-tolerance • unit of failure • communication via method invocation • semantic information available to FT designers • generic framework & services

  9. Framework for rollback-recovery FT protocols Exploit semantics Replication Service stateless & worm objects transparent replication of objects Generic Framework & Services

  10. Reflective Architecture • Introspective system: dynamic access to reflective information • access to their own implementations • protocol stacks & method invocations • access to their calling environments • similar to Unix shell variables • access to the future of the computation • access to semantic information • access to generic attributes via a Prolog-like interface

  11. Computation Graphs • Graphs have first class status • enables generic FT components that manipulate graphs • e.g. replicators & voters • enables development of new FT protocols by encapsulating information about the future of a computation • enables flexible exception propagation model

  12. D.yo() 2 3 A.bar() B.yo() C.foo() D.retVal() Graphs D.yo() { x = C.foo(A.bar(2),B.yo(3)); print x; }

  13. Generic Voting Replication(in: Graph, out: Graphs) 1st Class Graph V

  14. Environment Graph annotation List of generic items <String : Data> Method invocation carry calling environment Automatic propagation Hidden dynamic parameters useful for library writers fault-tolerance, debugger, security, exceptions... “debugger” : dLOID “console” : cLOID “debugger” : dLOID “console” : cLOID “debugger” : dLOID “console” : cLOID Legion Environments

  15. Legion Exception Propagation Model • “Exception” is a misnomer • security violations, communication errors, IDL errors, resource acquisition errors... • Basic failure detector • communication error detection & notification is oftentimes sufficient! • Exception propagation (not handling) • enables programming language specific exception handling models

  16. Exception Propagation • Key feature of model: • Associate Legion exceptions with computation graphs! • Flexible enough to handle • Backward error propagation • masking • Forward error propagation • Generic • does not have to use “call chain”

  17. Propagate to caller Forward propagate error token propagates forward through graph “excTracker” : computation graph D.yo “excTracker” : D A.bar B.yo C.foo D.ret Exception Propagation Graphs D.yo() { PL_Exception watch(&x); x = C.foo(A.bar(2),B.yo(3)); print x; if (watch.exceptionRaised()) // x is not valid … } Exception! Exception!

  18. Exception Graph • Generic graphs possible! “excTracker” : FD D.yo Failure Detector FD A.bar B.yo Exception! B cannot communicate with C C.foo D.ret

  19. Wrap-Up • Target audience for FT enablers are FT protocol designers • Encapsulate expert knowledge in reusable form • generic framework & services • generic components • Encourage reuse of FT protocols

  20. Key building blocks in place Flexible & extensible protocol stack Objects Reflection Computation graphs Exception model Need to populate with concrete implementations Mentat exception handling (Legion 0.5) FT protocols Method Based Logging (UCSD) 2 Phase Commit Coordinated Checkpointing Replication for stateless objects Status

More Related