Adaptable Virtual Machine Environment for Heterogeneous Clusters

Adaptable Virtual Machine Environment for Heterogeneous Clusters Al Geist, Jim Kohl, Stephen Scott, Philip Papadopoulos, Oak Ridge National Laboratory Jack Dongarra, Graham Fagg University of Tennessee Vaidy Sunderam, Paul Gray, Mauro Magliardi Emory University September 2-4 Blackberry Farms TN

Harness Plug-in Machine Research Building on our experience and success with PVM create a fundamentally new heterogeneous virtual machine based on three research concepts: • Parallel Plug-in environment • Extend the concept of a plug-in to the parallel computing world. • Distributed peer-to-peer control • No single point of failure unlike typical client/server models. • Multiple distributed virtual machines merge/split • Provide a means for short-term sharing of resources and collaboration between teams. www.epm.ornl.gov/harness

Motivated by needs from Simulation Science • develop applications by plugging together • component models. • customize/tune virtual environment for application’s needs and for performance on existing resources. • support long-running simulations despite maintenance, faults, and migration (dynamically evolving VM). • adapt virtual machine to faults and dynamic • scheduling in large clusters (DASE). • Provide framework for collaborative simulations (in spirit of Cumulvs).

Discovery and registration Resource Catalog Directory Service Another VM Component based daemon Resource mgt process control Customization and extension by dynamically adding plug-ins communication user features HARNESS daemon Harness Architecture (extends successful PVM design) Host A Host D Virtual Machine Host B Host C Operation within VM uses Distributed Control

? Harness ResearchParallel Plug-in Environment • serial plug-in technology • User definable Control Messages • Netscape plug-in model • JavaBeans • CORBA IDL • ActiveX/DCOM model How do you write plug-ins for a heterogeneous distributed virtual machine? • One research goal is to understand and implement • a parallel plug-in environment within Harness • provides a method for many users to extend Harness (like LINUX) • taxonomy based on synchronization needs (three typical cases): • load plug-in into single host of VM w/o communication • load plug-in into single host broadcast to rest of VM • load plug-in into every host of VM w/ synchronization

Daemon plug-in interface based on re-definable message handlers Source,tag,context gPort concept from Common Component Architecture forum Required VM control messages Process Spawn plug-in Incoming mesg. PVM notify plug-in Data or Control messages User defined handlers New user feature plug-in Daemon services are triggered by control messages User can define new control messages and exchange handlers to required control messages MPI send plug-in Harness Daemon

Harness Research Multiple Virtual Machine Collaboration 1. Send messages between VM Distributed Virtual Machine Sharing information but not resources Distributed Virtual Machine 2. Merge into single asymmetric VM Sharing resources unequally f.e. user can only use I/O resources he contributes, but all CPUs Each user sees a single but different VM Sharing resources equally among all users 2. Merge into single symmetric VM

Harness ResearchDistributed Control Features • No synchronization step. Updates occur asynchronously while maintaining consistency. • All members can be injecting change requests at the same time. • Members can be added or deleted fast because the operation does not require a resynchronization on pending changes. • Failure of host does not negate any partially committed changes. i.e. no rollback required.

Symmetric peer-to-peer Distributed Control • No single point (or set of points) of failure for Harness. It survives as long as one member still lives. • All members know the state of the virtual machine, and their knowledge is kept consistent w.r.t. the order of changes of state. (Important parallel programming requirement!) • No member is more important than any other (at any instant) i.e. here isn’t a pass-around “control token” One of two schemes being investigated for Harness follows

Phase one of Arbitration:Update Pending List Virtual machine 3. Each adds request to a list of pending changes 2. Send host/T#/data to neighbor in ring 1. A task on this host requests a new host be added VM state held by each kernel Harness kernels on each host have arbitrary priority assigned to them (new kernels are always given the lowest priority)

Phase two of Arbitration:Update Distributed State Virtual machine Originating kernel receives its own initial request 1. 2. Creates unique transaction number and sends (host/T#/trans#) to neighbor Each kernel receives second request then moves pending data to state Db and forwards request 3.

Details of Pending List Structure Pending - list of pending transactions Forwarded transaction Hold - list of incoming pending transactions that are being held because this host has higher priority transactions still pending. Mine - list of transactions that this host has injected into the ring and not yet received the phase one reply. Incoming (commit or pending) transaction Inject - list of transactions local tasks have requested but can’t be injected until the pending transactions in Mine are done.

Multiple AsynchronousUpdates Virtual machine Each kernel can be injecting requests into ring at the same time Each holds start of second request phase until pending higher priority requests committed State Db are thus maintained consistently ordered across the entire VM Change requests

Adding New Host withoutClearing Existing Pending Lists Virtual machine Two phase commit on “add host” so all state Db updated New host is assigned the lowest priority so changes it begins injecting doesn’t affect any pending changes Requesting host sends new host a copy of its state Db and pending list, then adjusts links to add host to the ring. New host

Deleting Host withoutClearing Existing Pending Lists Virtual machine Two phase commit done on “delete host” so state Dbs are updated Requesting host adjusts links to bypass deleted host No changes required in “pending changes” lists Deleted host

Fast Recovery from Host FailureUsing existing Control Structure Virtual machine Kernel “A” detects failure of Host a. by seeing the TCP link drop b. unable to communicate or (reestablish) link Kernel “A” checks its host list and tries to establish link with next host in the ring. Continues around ring until successful. Kernel A Kernel “A” inserts delete-host request(s) into the control ring. Failed host

Parallel Recovery fromMulti-Host Failure Virtual machine Kernels detect failure of Hosts a. by seeing the TCP link drop b. unable to communicate or (reestablish) link Failed host In parallel each kernel tries to establish link with next host in the ring. Each kernel inserts delete-host request(s) into the control ring. Failed hosts

Status of Harness Research • Working prototype • demonstrates a pluggable daemon and no single point of failure. • IceT package • demonstrated merging and splitting of multiple virtual machines and soft-install of different communication API (MPI and CCTL). • Snipe environment • shows the use of resource catalog to manage distributed resources. • PVM 3.4 • extends heterogeneity by being able to transparently cluster Windows and Unix boxes. • Common Component Architecture Forum www.epm.ornl.gov/harness

Adaptable Virtual Machine Environment for Heterogeneous Clusters

Adaptable Virtual Machine Environment for Heterogeneous Clusters

Presentation Transcript

An Adaptable Virtual Learning Environment (VLE) incorporating Individualised Learning Profiles

Metropolis Design Environment for Heterogeneous Systems

Heterogeneous CPU/GPU co-processor clusters

Virtual Machine Hosting for Networked Clusters: Building the Foundations for “Autonomic” Orchestration

Virtual Machine

Tarazu Optimizing MapReduce On Heterogeneous Clusters

Dalvik Virtual Machine Vs Java Virtual Machine

Creating Clusters in a Virtual Environment

Online Performance Projection for Clusters with Heterogeneous GPUs

Modeling Environment for the Communication Virtual Machine

Virtual Machine

Dynamic Multi Phase Scheduling for Heterogeneous Clusters

Virtual machine

An OpenFlow based virtual network environment for Pragma Cloud virtual clusters

Virtual Machine

Heterogeneous Chip Multiprocessor Design for Virtual Machines

Virtual Machine

Virtual Environment

A Virtual Machine Monitor for Utilizing Non-dedicated Clusters

Virtual Machine

VIRTUAL ENVIRONMENT