1 / 14

The Datacenter Needs an Operating System

The Datacenter Needs an Operating System. Matei Zaharia, Benjamin Hindman , Andy Konwinski , Ali Ghodsi , Anthony Joseph, Randy Katz, Scott Shenker , Ion Stoica. Background. Clusters of commodity servers have become a major computing platform in industry and academia

betty
Download Presentation

The Datacenter Needs an Operating System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica

  2. Background • Clusters of commodity servers have become a major computing platform in industry and academia • Driven by data volumes outpacing the processing capabilities of single machines • Democratized by cloud computing

  3. Background • Some have declared that “the datacenter is the new computer” • Claim: this new computer increasingly needs an operating system • Not necessarily a new host OS, but a common software layer that manages resources and provides shared services for the whole datacenter, like an OS does for one host

  4. Why Datacenters Need an OS • Growing number of applications • Parallel processing systems: MapReduce, Dryad, Pregel, Percolator, Dremel, MR Online • Storage systems: GFS, BigTable, Dynamo, SCADS • Web apps and supporting services • Growing number of users • 200+ for Facebook’s Hadoop data warehouse, running near-interactive ad hoc queries

  5. What Operating Systems Provide • Resource sharingacross applications & users • Data sharing between programs • Programming abstractions (e.g. threads, IPC) • Debugging facilities (e.g. ptrace, gdb) Result: OSes enable a highly interoperable software ecosystem that we now take for granted

  6. An Analogy • Today, a scientist analyzing data on a single machine can pipe it through a variety of tools, write new tools that interface with these through standard APIs, and trace across the stack • In the future, the scientist should be able to fire up a cloud on EC2 and do the same thing: • Intermix a variety of apps & programming models • Write new parallel programs that talk to these • Get a unified interface for managing the cluster • Debug and trace across all these components

  7. Today’s Datacenter OS • HadoopMapReduce as common execution and resource sharing platform • HadoopInputFormat API for data sharing • Abstractions for productivity programmers, but not for system builders • Very challenging to debug across all the layers

  8. Tomorrow’s Datacenter OS • Resource sharing: • Lower-level interfaces for fine-grained sharing (Mesos is a first step in this direction) • Optimization for a variety of metrics (e.g. energy) • Integration with network scheduling mechanisms (e.g. Seawall [NSDI ‘11], NOX, Orchestra)

  9. Tomorrow’s Datacenter OS • Data sharing: • Standard interfaces for cluster file systems, key-value stores, etc • In-memory data sharing (e.g. Spark, DFS cache), and a unified system to manage this memory • Streaming data abstractions (analogous to pipes) • Lineage instead of replication for reliability (RDDs)

  10. Tomorrow’s Datacenter OS • Programming abstractions: • Tools that can be used to build the next MapReduce / BigTable in a week (e.g. BOOM) • Efficient implementations of communication primitives (e.g. shuffle, broadcast) • New distributed programming models

  11. Tomorrow’s Datacenter OS • Debugging facilities: • Tracing and debugging tools that work across the cluster software stack (e.g. X-Trace, Dapper) • Replay debugging that takes advantage of limited languages / computational models • Unified monitoring infrastructure and APIs

  12. Putting it All Together • A successful datacenter OS might let users: • Build a Hadoop-like software stack in a week using the OS’s abstractions, while gaining other benefits (e.g. cross-stack replay debugging) • Share data efficiently between independently developed programming models and applications • Understand cluster behavior without having to log into individual nodes • Dynamically share the cluster with other users

  13. Conclusion • Datacenters need an OS-like software stack for the same reasons single computers did: manageability, efficiency & programmability • An OS is already emerging in an ad-hoc way • Researchers can help by taking a long-term approach towards these problems

  14. How Researchers can Help • Focus on paradigms, not performance • Industry is tackling performance but lacks luxury to take long-term view towards abstractions • Explore clean-slate approaches • Likelier to have impact here than in a “real” OS because datacenter software changes quickly! • Bring cluster computing to non-experts • Much harder and more rewarding than big users

More Related