140 likes | 322 Views
Amoeba Group Communication. CS294-4 P2P Systems 2003 David Ratajczak. Group Communication. Group is a collection of processes that can communicate 1-to-n Members join and leave group Ordering All members see same ( total ) order Broadcast messages Join/leave changes Crash/recovery
E N D
Amoeba Group Communication CS294-4 P2P Systems 2003 David Ratajczak
Group Communication • Group is a collection of processes that can communicate 1-to-n • Members join and leave group • Ordering • All members see same (total) order • Broadcast messages • Join/leave changes • Crash/recovery • Weaker ordering possible • Unordered • Causal order
Group Communication (cont’d) • Reliability • Handles message loss/duplication • Handles failstop failures • Unreliable failure detection based on heartbeats • Might excise non-failed nodes • Multi-phase protocol for calculating new view • In Amoeba, optional support for recovery • But no built-in support for state transfer • No automatic rejoining after network partitions
Amoeba GCS API • All primitives are blocking • Use threads for parallelism • Claim: this simplifies programming • Question: how does an application create a group?
Questions to think about • What about end-to-end argument? • How does this apply to p2p systems? • How does this compare to Castro/Liskov BFT algorithm?
Amoeba Total Ordering • Every group view includes a designated sequencer • If sequencer fails, must produce new view • PB method • Sender sends msg to sequencer • Sequencer affixes sequence number • Sequencer broadcasts to group
Amoeba Total Ordering (cont’d) • BB method • Sender broadcasts to group • Sequencer receives msg and broadcasts “accept” with sequence number to group • If messages are large, use BB • Less bandwidth and less stress on sequencer • Otherwise, use PB • Receivers only interrupted once
Amoeba Reliability • Use NACKs instead of ACKs • Fewer messages at sequencer • Assumes steady stream of traffic • Sequencer stores msgs until all members have received them (or view changes) • Members piggyback latest seq num recv’d • Settable resilience degree • Msg is tentative until sequencer hears r ACKs • Sequencer broadcasts accept msg • No flow control
Latency breakdown • 20 Mhz MC68030 • 10 Mbit/s Ethernet • What is the bottleneck? • What if no multicast?
Questions to think about • Can this scale to >100s of nodes? • Will this work in a WAN? • What about with resilience degree >0?