1 / 25

MRNet: From Scalable Performance to Scalable Reliability

MRNet: From Scalable Performance to Scalable Reliability. Dorian C. Arnold University of Wisconsin-Madison Paradyn/Condor Week April 14-16, 2004 Madison, WI. More HPC Facts. Statistics from Top500 List: 24%: number of processors ≥ 512 10%: number of processors ≥ 1024

Download Presentation

MRNet: From Scalable Performance to Scalable Reliability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MRNet:From Scalable Performance to Scalable Reliability Dorian C. Arnold University of Wisconsin-Madison Paradyn/Condor Week April 14-16, 2004 Madison, WI

  2. More HPC Facts • Statistics from Top500 List: • 24%: number of processors ≥ 512 • 10%: number of processors ≥ 1024 • 9 systems: number of processors ≥ 4096 • Largest system has 8192 processors • By 2009, 500th entry faster than today’s #1 • Bottom Line: HPC systems with many thousands of nodes will soon be the standard. Scalability and Reliability

  3. Applications Must Address Scalability! • Challenge 1: Scalable Performance • Provide distributed tools with a mechanism for scalable, efficient group communicationsand data analyses. • Scalable Multicast • Scalable Reductions • In-network data aggregations Scalability and Reliability

  4. Applications Must Address Scalability! • Scalability necessitates reliability! • Challenge 2: Scalable Reliability • Provide mechanisms for reliability in our large-scale environment that do not degrade scalability. • Scalable multicast • Scalable reductions • In-network data aggregations Scalability and Reliability

  5. Target Applications • Distributed tools and debuggers • Paradyn, Tau, PAPI’s perfometer, … • Grid and Distributed Middleware • Condor, Globus • Cluster and system monitoring applications • Distributed shell for command-line tools Goal: Provide a generic scaling mechanismfor monitoring, control, troubleshooting and general middleware components for Grid infrastructures. Scalability and Reliability

  6. Challenge 1: Scalable Performance Tool Front End • Problem: Centralization leads to poor scalability • Communication overhead does not scale. • Data Analyses restricted to front-end. BE0 BE1 BE2 BE3 BEn-4 BEn-3 BEn-2 BEn-1 a0 a1 a2 a3 an-4 an-3 an-2 an-1 Scalability and Reliability

  7. MRNet: Solution to Scalable Tool Performance Tool Front End • Multicast/Reduction Network • Scalable data multicast and reduction operations. • In-network data aggregations. … … … BE0 BE1 BE2 BE3 BEn-4 BEn-3 BEn-2 BEn-1 … a0 a1 a2 a3 an-4 an-3 an-2 an-1 Scalability and Reliability

  8. Paradyn/MRNet Integration • Scalable start-up • Broadcast metric data to daemons • Gather daemon data at front-end • Front-end/daemon clock skew detection • Performance data aggregation • Time-based synchronization Scalability and Reliability

  9. Paradyn Data Aggregation (32 metrics) Scalability and Reliability

  10. MRNet References • Technical papers: • Roth, Arnold, and Miller, “MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools”, in SC2003 (Phoenix, AZ, November 2003). • Roth, Arnold and Miller, “Benchmarking the MRNet Distributed Tool Infrastructure: Lessons Learned”, in 2004 High-Performance Grid Computing Workshop held in conjunction with IPDPS 2004 (Santa Fe, New Mexico, April 2004). Scalability and Reliability

  11. Scalable Performance Achieved:What Next? • More and increasingly complex components in large scale systems. A system with 10,000 nodes is 104 timesmore likely to fail than one with 100 nodes. Scalability and Reliability

  12. Challenge 2: Scalable Reliability • Goals: • Design scalable reliability mechanisms for communication infrastructures with reduction operations and in-network data aggregations. • Quantitative understanding of scalability trade-off between different levels of resiliency and reliability. Scalability and Reliability

  13. Challenge 2: Scalable Reliability • Reliability vs. Resiliency: • A reliable system executes correctly in the presence of (tolerated) failures. • A resilient system recovers to a mode in which it can once again execute correctly. • During a failure, errors are visible at the system interface level. Scalability and Reliability

  14. Challenge 2: Scalable Reliability • Problem: • Scalability → decentralization, low-overhead • Scalability wants simple systems. • Reliability → consensus, convergence, high-overhead • Reliability wants complex systems. • How can we leverage our tree-based topology to achieve scalable reliability? Scalability and Reliability

  15. Recovery Models and Semantics • Fault model: crash-stop failures • TCP-like reliability for tree-based multicast and reduction operations • System should tolerate any and all internal node failures • System slowly degrades to flat topology • Models based on operational complexity • E.g. Are in-network filters stateful? Scalability and Reliability

  16. Recovery Models and Semantics: Challenges • Detecting loss , duplication and ordering • Quick recovery from message loss • Correct recovery from failure • Recovery of state information from aggregation operations • Simultaneous failures • Validation of our scalability methodology Scalability and Reliability

  17. Challenge 2: Scalable Reliability Hypothesis: Aggregating control messagescan effectively achieve scalable, reliable systems. Scalability and Reliability

  18. Example: Scalable Failure Detection • Goal: A scalable failure-detection service with high rates of convergence. • Previous work: • non-scalable overhead • poor convergence properties • non-deterministic guarantees • costly assumptions • E.g. fully-connected meshes Scalability and Reliability

  19. Failure Detection Approaches • Gossip-style failure detection and propagation • Gupta et al, van Renesse et al Scalability and Reliability

  20. Failure Detection Approaches • Hierarchical heartbeat detection and propagation • Felber et al, Overcast, Grid monitoring Scalability and Reliability

  21. Scalable Failure Detection • Tracking senders in aggregated message: • Naïve approaches: • Append 32/64-bit source ID for each source • Pathological case: many senders • Bit-array where bits represent potential sources • Pathological case: many potential sources, few actual senders • Our Approach: • Variable size bit-array: • Number of bits vary according to descendants beneath the intermediate node (i.e. depth in topology) Scalability and Reliability

  22. 1 1 0 1 0 0 1 1 1 0 1 1 1 0 1 0 0 1 Scalable Failure Detection Hierarchical heartbeats/propagation (with message aggregation): Scalability and Reliability

  23. Scalable Failure Detection • Study scalability and convergence implications of our scalable failure detection protocol. • In theory: • Pure Hierarchical • msgs = nh x h • Hierarchical w/aggregation • msgs = ( (nh+1 – 1)/(n – 1) ) – 1 • Example n=8, h=4 (4096 leaves): • Pure hierarchical: 16,384 msgs • With aggregation: 4,680 msgs Scalability and Reliability

  24. Scalable Event Propagation • Implement generic event propagation service • Encode events into 1-byte codes • Combine with aggregation protocol for low-overhead control messages • Piggyback control messages with data messages Scalability and Reliability

  25. Summary • MRNet provides tools and grid services with scalable communications and data analyses. • We are studying techniques to provide high degrees of reliability at large scales. • MRNet website: • http://www.paradyn.org/mrnet darnold@cs.wisc.edu Scalability and Reliability

More Related