1 / 1

gluepy : A Framework for Flexible Programming in Complex Grid Environments

gluepy : A Framework for Flexible Programming in Complex Grid Environments Ken Hironaka Hideo Saito Kei Takahashi Kenjiro Taura (University of Tokyo) { kenny , h_saito , kay , tau}@ logos.ic.i.u-tokyo.ac.jp Package available from Home Page: www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy.

oral
Download Presentation

gluepy : A Framework for Flexible Programming in Complex Grid Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. gluepy: A Framework for Flexible Programming in Complex Grid Environments Ken Hironaka Hideo Saito Kei Takahashi KenjiroTaura (University of Tokyo) {kenny, h_saito, kay, tau}@logos.ic.i.u-tokyo.ac.jp Package available from Home Page: www.logos.ic.i.u-tokyo.ac.jp/~kenny/gluepy • Overview • Grid-enabled distributed object oriented programming model • Distributed object model with implicit mutual exclusion • Programming model that allows join/failure of nodes • Incorporate NAT/firewalled clusters by using overlay • gluepy : “glue Python” • Distributed object oriented library extension for Python • Implements our proposed programming model for flexible Grid computing • Real Grid Applications on real Grid Environments • Over 900 real nodes across 9 clusters • Heterogeneous Network Settings (including NAT, firewalls) • Automatic Overlay Construction on Grid • Construction Scheme: Steps for each peer • obtain endpoint information to other peers • attempt TCP connections to a selected few peers • NAT-Cluster Peers • Connectable to global IP peers • Firewall-Cluster Peers • Automatic SSH-portforwarding • Adaptive routing on overlay [Perkins et al. 1997] • Failure Detection on Overlay • communication path is maintained for each RMI • Intermediate peers remember the next peer: Path Pointer • Path pointer garbage collected on return • On failure of connection, error is returned along path Firewall traversal NAT Global IP Path pointer RMI handler SSH Firewall Attempt connection • Related Works • Grid-enabled Programming Models • Satin [Wrzesinska et al. 2006], Jojo [Nakada et al. 2004], Jojo2 [Aoki et al. 2006] • Distributed Objects on the Grid • ProActive [Huet et al. 2004], Ibis RMI [van Nieuwpoort, et al. 2005] • Wide-area Connection Management • SmartSockets [Maassen et al. 2007], MC-MPI [Saito et al. 2007] return error failure Evaluation Results Experimental Environment Global IPs istbs(316) tsubame(64) okubo(28) • Programming Model • Asynchronous RMIs (Remote Method Invocations) with Futures • any invocation may be made asynchronous • returns a future, a place holder in which results will be returned • Serialization Semantics (Synchronization) • At most 1 running thread per object • RMIs are handled by a separate thread • At any given time, at most 1 thread can • execute an object’s method: the owner thread • (eliminate race-conditions) • If a thread blocks while in the method’s scope, • other threads are permitted to execute methods • on the object • (eliminate deadlocks for common usage) • Signals to Object • Signals may be sent to objects • Any thread blocking in the object’s context • will unblock and return None • Runtime Node Joins • Need to obtaining reference to existing objects • A fully decentralized remote object lookup scheme • Query for remote reference via random walking among peers • Node failure (RMI failure) detection • RMI failures are returned as Exceptions • Failure of object host process • Failure of communication or intermediate processes hongo(98) chiba(186) All packets dropped suzuk(72) kyoto(70) kototoi (88) imade(60) Private IPs Firewall waiting threads owner thread object Overlay Connectivity Simulation • Probability of connected random graph • 3 Cluster Combinations • hongo, chiba, okubo, suzuk, imade, kyoto, kototoi (4 Global clusters (384 peers), 3 Private clusters (218 peers) ) • okubo, suzuk, imade, kyoto, kototoi (2 Global clusters (100 peers), 3 Private clusters (218 peers) ) • okubo, imade, kyoto, kototoi (1Global clusters (28 peers), 3 Private clusters (218 peers) ) Th Th Th Th new owner thread object Th Th Th block Give-up Owner ship Th re-contest for ownership object Th Th Th Th Master-Worker application with node joins/failures Unblock On signal • A Simple Master Worker Program that distributes tasks to workers • New tasks to new workers via async. RMIs • Tasks given to failed workers are redistributed • By catching and handling RMI failure exceptions Example Master-Worker Excerpt class Master : def __init__(self): self.nodes= [] self.jobs= [] def nodeJoin(self , node): self.nodes.append(node) self.signal() def run (self): assigned = {} while True: while len(self.nodes)>0 and len(self.jobs)>0: node = self.nodes.pop() job = self.jobs.pop() f = node.doJob.future(job) assigned[f] = (node, job) readys = wait(assigned.keys()) if readys == None: continue for f in readys: node, job = assigned.pop(f) try: print ”done:”, f.get() self.nodes.append(node) except RemoteException, e: self.jobs.append(job) • Grid Application: Parallel Permutation Flowshop Solver • A combination optimization problem • Given a sequence of n jobs that use m machines, find a permutation of jobs that give the shortest makespan • Finds the optimal solution by parallel branch and bound • Master divides the search space • into sub-tasks to workers • Worker periodically exchange • latest bounds with master Master Signal thread blocking in master object exchange_bound() doJob() Worker Atomic Section aync. RMI, doJob() to idle workers Block and wait for some results None returns when unblocked by signal retrieve results Exception raised on failure Atomic Section • Future Work • Application to much wider range of Grid Applications • Development of library package • A prototype package is available at Home Page!!

More Related