The Datacenter Needs an Operating System

The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica

Background • Clusters of commodity servers have become a major computing platform in industry and academia • Driven by data volumes outpacing the processing capabilities of single machines • Democratized by cloud computing

Background • Some have declared that “the datacenter is the new computer” • Claim: this new computer increasingly needs an operating system • Not necessarily a new host OS, but a common software layer that manages resources and provides shared services for the whole datacenter, like an OS does for one host

Why Datacenters Need an OS • Growing number of applications • Parallel processing systems: MapReduce, Dryad, Pregel, Percolator, Dremel, MR Online • Storage systems: GFS, BigTable, Dynamo, SCADS • Web apps and supporting services • Growing number of users • 200+ for Facebook’s Hadoop data warehouse, running near-interactive ad hoc queries

What Operating Systems Provide • Resource sharingacross applications & users • Data sharing between programs • Programming abstractions (e.g. threads, IPC) • Debugging facilities (e.g. ptrace, gdb) Result: OSes enable a highly interoperable software ecosystem that we now take for granted

An Analogy • Today, a scientist analyzing data on a single machine can pipe it through a variety of tools, write new tools that interface with these through standard APIs, and trace across the stack • In the future, the scientist should be able to fire up a cloud on EC2 and do the same thing: • Intermix a variety of apps & programming models • Write new parallel programs that talk to these • Get a unified interface for managing the cluster • Debug and trace across all these components

Today’s Datacenter OS • HadoopMapReduce as common execution and resource sharing platform • HadoopInputFormat API for data sharing • Abstractions for productivity programmers, but not for system builders • Very challenging to debug across all the layers

Tomorrow’s Datacenter OS • Resource sharing: • Lower-level interfaces for fine-grained sharing (Mesos is a first step in this direction) • Optimization for a variety of metrics (e.g. energy) • Integration with network scheduling mechanisms (e.g. Seawall [NSDI ‘11], NOX, Orchestra)

Tomorrow’s Datacenter OS • Data sharing: • Standard interfaces for cluster file systems, key-value stores, etc • In-memory data sharing (e.g. Spark, DFS cache), and a unified system to manage this memory • Streaming data abstractions (analogous to pipes) • Lineage instead of replication for reliability (RDDs)

Tomorrow’s Datacenter OS • Programming abstractions: • Tools that can be used to build the next MapReduce / BigTable in a week (e.g. BOOM) • Efficient implementations of communication primitives (e.g. shuffle, broadcast) • New distributed programming models

Tomorrow’s Datacenter OS • Debugging facilities: • Tracing and debugging tools that work across the cluster software stack (e.g. X-Trace, Dapper) • Replay debugging that takes advantage of limited languages / computational models • Unified monitoring infrastructure and APIs

Putting it All Together • A successful datacenter OS might let users: • Build a Hadoop-like software stack in a week using the OS’s abstractions, while gaining other benefits (e.g. cross-stack replay debugging) • Share data efficiently between independently developed programming models and applications • Understand cluster behavior without having to log into individual nodes • Dynamically share the cluster with other users

Conclusion • Datacenters need an OS-like software stack for the same reasons single computers did: manageability, efficiency & programmability • An OS is already emerging in an ad-hoc way • Researchers can help by taking a long-term approach towards these problems

How Researchers can Help • Focus on paradigms, not performance • Industry is tackling performance but lacks luxury to take long-term view towards abstractions • Explore clean-slate approaches • Likelier to have impact here than in a “real” OS because datacenter software changes quickly! • Bring cluster computing to non-experts • Much harder and more rewarding than big users

The Datacenter Needs an Operating System

The Datacenter Needs an Operating System

Presentation Transcript

The operating system

An Introduction to Symbian Operating System

Monitors: An Operating System Structuring Concept

The Operating System

What is an Operating System?

An Operating System for the Home

An Operating System for the Home

An Operating System for the Home

The Operating System

Components of an operating system

The Home Needs an Operating System (and an App Store)

What is an Operating System?

An Introduction to Linux Operating System

FOS (Factored Operating System) An Operating System for Multicore and Clouds

Exokernel Operating System: An Introduction

What is an operating system?

The Operating System

The Datacenter Needs an Operating System

An Operating System for the Home

What is an Operating System