400 likes | 545 Views
Slicing with SHARP. Jeff Chase Duke University. Federated Resource Sharing. How do we safely share/exchange resources across domains? {Administrative, policy, security, trust} domains or sites Sharing arrangements form dynamic Virtual Organizations
E N D
Slicing with SHARP Jeff Chase Duke University
Federated Resource Sharing • How do we safely share/exchange resources across domains? • {Administrative, policy, security, trust} domains or sites • Sharing arrangements form dynamic Virtual Organizations • Physical and logical resources (data sets, services, applications).
Location-Independent Services The last decade has yielded immense progress on building and deploying large-scale adaptive Internet services. • Dynamic replica placement and coordination • Reliable distributed systems and clusters • P2P, indirection, etc. dynamic server set Clients request routing varying load Example services: caching network or CDN, replicated Web service, curated data, batch computation, wide-area storage, file sharing.
Managing Services Servers • A service manager adapts the service to changes in load and resource status. • e.g., gain or lose a server, instantiate replicas, etc. • The service manager itself may be decentralized. • The service may have contractual targets for service quality ─ Service Level Agreements or SLAs. Service Manager Clients sensor/actuator feedback control
An Infrastructure Utility Resource efficiency Surge protection Robustness “Pay as you grow” Economy of scale Geographic dispersion Resource pool • In a utility/grid, the service obtains resources from a shared pool. • Third-party hosting providers ─ a resource market. • Instantiate service wherever resources are available and demand exists. • Consensus: we need predictable performance and SLAs.
The Slice Abstraction • Resources are owned by sites. • E.g., node, cell, cluster • Sites are pools of raw resources. • E.g., CPU, memory, I/O, net • A slice is a partition or bundle or subset of resources. • The system hosts application services as “guests”, each running within a slice. Site A Service S1 Site B Service S2 Site C
Slicing for SLAs • Performance of an application depends on the resources devoted to it [Muse01]. • Slices act as containers with dedicated bundles of resources bound to the application. • Distributed virtual machine / computer • Service manager determines desired slice configuration to meet performance goals. • May instantiate multiple instances of a service, e.g., in slices sized for different user communities. • Services may support some SLA management internally, if necessary.
Example: Cluster Batch Pools in Cluster-on-Demand (COD) • Partition a cluster into isolated virtual clusters. • Virtual cluster owner has exclusive control over servers. • Assign nodes to virtual clusters according to load, contracts, and resource usage policies. • Example service: a simple wrapper for SGE batch scheduler middleware to assess load and obtain/release nodes. request Service Managers (e.g., SGE) COD Cluster grant Dynamic Virtual Clusters in a Grid Site Manager [HPDC 2003] with Laura Grit, David Irwin, Justin Moore, Sara Sprenkle
A Note on Virtualization • Ideology for the future Grid: adapt the grid environment to the service rather than the service to the grid. • Enable user control over application/OS environment. • Instantiate complete environment down to the metal. • Don’t hide the OS :it’s just another replaceable component. • Requires/leverages new underware for instantiation. • Virtual machines (Xen, Collective, JVM, etc.) • Net-booted physical machines (Oceano, UDC, COD) • Innovate below the OS and alongside it (infrastructure services for the control plane).
SHARP • Secure Highly Available Resource Peering • Interfaces and services for external control of federated utility sites (e.g., clusters). • A common substrate for extensible policies/structures for resource management and distributed trust management. • Flexible on-demand computing for a site, and flexible peering for federations of sites • From PlanetLab to the New Grid • Use it to build a resource “dictatorship” or a barter economy, or anything in between. • Different policies/structures may coexist in different partitions of a shared resource pool.
Goals • The question addressed by SHARP is: how do the service managers get their slices? • How does the system implement and enforce policies for allocating slices? • Predictable performance under changing conditions. • Establish priorities under local or global constraint. • Preserve resource availability across failures. • Enable and control resource usage across boundaries of trust or ownership (peering). • Balance global coordination with local control. • Extensible, pluggable, dynamic, decentralized.
Non-goals SHARP does NOT: • define a policy or style of resource exchange; • E.g., barter, purchase, or central control (e.g., PLC) • care how services are named or instantiated; • understand the SLA requirements or specific resource needs of any application service; • define an ontology to describe resources; • specify mechanisms for resource control/policing.
Resource Leases request Site A “Authority” S1 Service Manager grant <lease> <issuer> A’s public key </issuer> <signed_part> <holder> S1’s public key </holder> <rset> resource description </rset> <start_time> … </start_time> <end_time> … </end_time> <sn> unique ID at Site A </sn> </signed_part> <signature> A’s signature </signature> </lease>
Agents (Brokers) • Introduce agent as intermediary/middleman. • Factor policy out of the site authority. • Site delegates control over its resources to the agent. • Agent implements a provisioning/allocation policy for the resources under its control. request request S1 Service Manager Site A Authority grant grant
Leases vs. Tickets • The site authority retains ultimate control over its resources: only the authority can issue leases. • Leases are “hard” contracts for concrete resources. • Agents deal in tickets. • Tickets are “soft” contracts for abstract resources. • E.g., “You have a claim on 42 units of resource type 7 at 3PM for two hours…maybe.” (also signed XML) • Tickets may be oversubscribed as a policy to improve resource availability and/or resource utilization. • The subscription degree gives configurable assurance spanning continuum from a “hint” to a hard reservation.
Service Instantiation 7 instantiate service in virtual machine Site redeem ticket request 6 5 1 grant ticket grant lease 2 3 request grant ticket 4 Agent Service Manager Like an airline ticket, a SHARP ticket must be redeemed for a lease (boarding pass) before the holder can occupy a slice.
Ticket Delegation Transfer of resources, e.g., as a result of a peering agreement or an economic transaction. The site has transitive trust in the delegate. Agents may subdivide their tickets and delegate them to other entities in a cryptographically secure way. Secure ticket delegation is the basis for a resource economy. Delegation is accountable: if an agent promises the same resources to multiple receivers, it may/will be caught.
Peering Sites may delegate resource shares to multiple agents. E.g., “Let my friends at UNC use 20% of my site this week and 80% next weekend. UNC can allocate their share to their local users according to their local policies. Allocate the rest to my local users according to my policies.” Note: tickets issued at UNC self-certify their users.
A SHARP Ticket <ticket> <subticket> <issuer> A’s public key </issuer> <signed_part> <principal> B’s public key </principal> <agent_address> XML RPC redeem_ticket() </agent_address> <rset> resource description </rset> <start_time> … </start_time> <end_time> … </end_time> <sn> unique ID at Agent A </sn> </signed_part> <signature> A’s signature </signature> </subticket> <subticket> <issuer> B’s public key </issuer> <signed_part> … </signed_part> <signature> B’s signature </signature> </subticket> </ticket>
Tickets are Chains of Claims claimID = a holder = A a.Rset = {…} a.term claimID = b holder = B issuer = A parent = a b.rset/term
A Claim Tree anchor The set of active claims for a site forms a claim tree. The site authority maintains the claim tree over the redeemed claims. 40 ticket T 25 8 final claim 9 3 3 10
Ticket Distribution Example • Site transfers 40 units to Agent A A: 40 B: 8 D: 3 • A transfers to B and C E: 3 C: 25 H: 7 • B and C further subdivide resources F: 9 resource space • C oversubscribes its holdings in granting to H, creating potential conflict conflict G: 10 t0 t time
Detecting Conflicts A: 40 100 B: 8 D: 3 … E: 3 40 C: 25 H: 7 resource space F: 9 26 25 8 G: 10 10 9 7 3 3 t0 t time
Conflict and Accountability • Oversubscription may be a deliberate strategy, an accident, or a malicious act. • Site authorities serve oversubscribed tickets FCFS; conflicting tickets are rejected. • Balance resource utilization and conflict rate. • The authority can identify the accountable agent and issue a cryptographically secure proof of its guilt. • The proof is independently verifiable by a third party, e.g., a reputation service or a court of law. • The customer must obtain a new ticket for equivalent resource, using its proof of rejection.
Agent as Arbiter • Agents implement local policies for apportioning resources to competing customers. Examples: • Authenticated client identity determines priority. • Sell tickets to the highest bidder. • Meet long-term contractual obligations; sell the excess.
Agent as Aggregator Agents may aggregate resources from multiple sites. Example: PlanetLab Central • Index by location and resource attributes. • Local policies match requests to resources. • Services may obtain bundles of resources across a federation.
Division of Knowledge and Function Agent/Broker Authority Service Manager Knows the Application Instantiate app Monitor behavior SLA/QoS mapping Acquire contracts Renew/release Guesses global status Availability of resources What kind How much Where (site grain) How much to expose about resource types? About proximity? Knows local status Resource status Configuration Placement Topology Instrumentation Thermals, etc.
Issues and Ongoing Work • SHARP combines resource discovery and brokering in a unified framework. • Configurable overbooking degree allows a continuum. • Many possibilities exist for SHARP agent/broker representations, cooperation structures, allocation policies, and discovery/negotiation protocols. • Accountability is a fundamental property needed in many other federated contexts. • Generalize accountable claim/command exchange to other infrastructure services and applications. • Bidding strategies and feedback control.
Conclusion • Think of PlanetLab as an evolving prototype for planetary-scale on-demand utility computing. • Focusing on app services that are light resource consumers but inhabit many locations: network testbed. • It’s growing organically like the early Internet. • “Rough consensus and (almost) working code.” • PlanetLab is a compelling incubator and testbed for utility/grid computing technologies. • SHARP is a flexible framework for utility resource management. • But it’s only a framework….
Performance Summary • SHARP prototype complete and running across PlanetLab • Complete performance evaluation in paper • 1.2s end-to-end time to: • Request resource 3 peering hops away • Obtain and validate tickets, hop-by-hop • Redeem ticket for lease • Instantiate virtual machine at remote site • Oversubscription allows flexible control over resource utilization versus rate of ticket rejection
Related Work • Resource allocation/scheduling mechanisms: • Resource containers, cluster reserves, Overbooking • Cryptographic capabilities • Taos, CRISIS, PolicyMaker, SDSI/SPKI • Lottery ticket inflation • Issuing more tickets decreases value of existing tickets • Computational economies • Amoeba, Spawn, Muse, Millenium, eOS • Self-certifying trust delegation • PolicyMaker, PGP, SFS
Web interface Energy-managed reserve pool resource manager MySQL NIS DHCP confd Image upload MyDNS Web service vCluster Batch pool vCluster Physical cluster COD servers backed by configuration database Network boot Automatic configuration Resource negotiation Dynamic virtual clusters Database-driven network install
SHARP • Framework for distributed resource management, resource control, and resource sharing across sites and trust domains
space time A B C 40 claimID = a holder = A issuer = A parent = a.rset/term anchor 40 8 ticket T 25 3 claimID = b holder = B issuer = A parent = a b.rset/term final claim 3 25 8 future claim 9 10 claimID = c holder = C issuer = B parent = b c.rset/term 9 3 3 10 7 T claim tree at time t0 claim delegation t0 t ticket({c0,…,cn}) ci, i = 0..n-1: subclaim(ci+1, ci) & anchor(c0) anchor(claim a) a.issuer = a.holder & a.parent = null subclaim(claim c, claim p) c.issuer = p.holder c.claimID = p contains(p.rset, c.rset) subinterval(c.term, p.term) contains(rset p, rset c) c.type = p.typec.count p.count subinterval(term c, term p) p.start c.start p.end p.start c.end p.end agent
Mixed-Use Clustering Virtual clusters BioGeometry batch pool SIMICS/Arch batch pool Internet Web/P2P emulations Student semester project Somebody’s buggy hacked OS Physical clusters Vision: issues “leases” on isolated partitions of the shared cluster for different uses, with push-button Web-based control over software environments, user access, file volumes, DNS names.
Grids are federated utilities • Grids should preserve the control and isolation benefits of private environments. • There’s a threshold of comfort that we must reach before grids become truly practical. • Users need service contracts. • Protect users from the grid (security cuts both ways). • Many dimensions: • decouple Grid support from application environment • decentralized trust and accountability • data privacy • dependability, survivability, etc.
COD and Related Systems Other cluster managers based on database-driven PXE installs: Oceano hosts Web services under dynamic load. Dynamic clusters OS-Agnostic COD NPACI Rocks Netbed/Emulab Flexible configuration Open source configures Linux compute clusters. configures static clusters for emulation experiments. COD addresses hierarchical dynamic resource management in mixed-use clusters with pluggable middleware (“multigrid”).
Dynamic Virtual Clusters Web interface Reserve pool (off-power) COD database COD Manager Virtual Cluster #2 negotiate configure Allocate resources in units of raw servers Database-driven network install Pluggable service managers Batch schedulers (SGE, PBS), Grid PoP, Web services, etc. Virtual Cluster #1
Enabling Technologies DHCP host configuration Linux driver modules, Red Hat Kudzu, partition handling DNS, NIS, etc. user/group/mount configuration NFS etc. network storage automounter PXE network boot IP-enabled power units Power: APM, ACPI, Wake-on-LAN Ethernet VLANs
Recent Papers on Utility/Grid Computing • Managing Energy and Server Resources in Hosting Centers [ACM Symposium on Operating Systems Principles 2001] • Dynamic Virtual Clusters in a Grid Site Manager [IEEE High-Performance Distributed Computing 2003] • Model-Based Resource Provisioning for a Web Service Utility [USENIX Symposium on Internet Technologies and Systems 2003] • An Architecture for Secure Resource Peering [ACM Symposium on Operating Systems Principles 2003] • Balancing Risk and Reward in a Market-Based Task Manager [IEEE High-Performance Distributed Computing 2004] • Designing for Disasters [USENIX Symposium on File and Storage Technologies 2004] • Comparing PlanetLab and Globus Resource Management Solutions [IEEE High-Performance Distributed Computing 2004] • Interposed Proportional Sharing for a Storage Service Utility [ACM Symposium on Measurement and Modeling of Computer Systems 2004]