1 / 106

Denis.Caromel@inria.fr

A Strong Programming Model Bridging Distributed and Multi-Core Computing. Denis.Caromel@inria.fr. Background: INRIA, Univ. Nice, OASIS Team Programming : Parallel Programming Models: Asynchronous Active Objects, Futures, Typed Groups High-Level Abstractions (OO SPMD, Comp., Skeleton)

sora
Download Presentation

Denis.Caromel@inria.fr

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Strong Programming Model Bridging Distributed and Multi-Core Computing Denis.Caromel@inria.fr • Background: INRIA, Univ. Nice, OASIS Team • Programming:Parallel Programming Models: • Asynchronous Active Objects, Futures, Typed Groups • High-Level Abstractions (OO SPMD, Comp., Skeleton) • Optimizing • Deploying Denis Caromel

  2. 1. Background and Team at INRIA Denis Caromel

  3. Computer Science and Control 8 Centers all over France Workforce: 3 800 Strong in standardization committees: IETF, W3C, ETSI, … Strong Industrial Partnerships Foster company foundation: 90 startups so far - Ilog (Nasdaq, Euronext) - … - ActiveEon OASIS Team & INRIA • A joint team between: INRIA, Nice Univ. CNRS • Participation to EU projects: • CoreGrid, EchoGrid, Bionets, SOA4ALL • GridCOMP (Scientific Coordinator) • ProActive 4.0.1: Distributed and Parallel: • From Multi-cores to Enterprise+Science GRIDs Denis Caromel

  4. Startup Company Born of INRIA • Co-developing, Providing support for Open Source ProActive Parallel Suite • Worldwide Customers (EU, Boston USA, etc.) Denis Caromel

  5. OASIS Team Composition (35) • PostDoc (1): • Regis Gascon (INRIA) • Engineers (10): • Elaine Isnard (AGOS) • Fabien Viale (ANR OMD2, Renault ) • Franca Perrina (AGOS) • Germain Sigety (INRIA) • Yu Feng (ETSI, FP6 EchoGrid) • Bastien Sauvan (ADT Galaxy) • Florin-Alexandru.Bratu (INRIA CPER) • Igor Smirnov (Microsoft) • Fabrice Fontenoy (AGOS) • Open position (Thales) • Trainee (2): • Etienne Vallette d’Osia (Master 2 ISI) • Laurent Vanni (Master 2 ISI) • Assistants (2): • Patricia Maleyran (INRIA) • Sandra Devauchelle (I3S) • Researchers (5): • D. Caromel (UNSA, Det. INRIA) • E. Madelaine (INRIA) • F. Baude (UNSA) • F. Huet (UNSA) • L. Henrio (CNRS) • PhDs (11): • Antonio Cansado (INRIA, Conicyt) • Brian Amedro (SCS-Agos) • Cristian Ruz (INRIA, Conicyt) • Elton Mathias (INRIA-Cordi) • Imen Filali (SCS-Agos / FP7 SOA4All) • Marcela Rivera (INRIA, Conicyt) • Muhammad Khan (STIC-Asia) • Paul Naoumenko (INRIA/Région PACA) • Viet Dung Doan (FP6 Bionets) • Virginie Contes (SOA4ALL) • Guilherme Pezzi (AGOS, CIFRE SCP) • + Visitors + Interns Denis Caromel

  6. ProActive Contributors Denis Caromel

  7. ProActive Parallel Suite:Architecture Denis Caromel

  8. Denis Caromel

  9. ProActive Parallel Suite Physical Infrastructure 9 Denis Caromel

  10. ProActive Parallel Suite 10 Denis Caromel

  11. 2. Programming Modelsfor Parallel & Distributed Denis Caromel

  12. ProActive Parallel Suite Denis Caromel

  13. ProActive Parallel Suite Denis Caromel

  14. Distributed and ParallelActive Objects 14 Denis Caromel

  15. ProActive : Active objects JVM A A WBN! A ag =newActive (“A”, […], VirtualNode) V v1 = ag.foo (param); V v2 = ag.bar (param); ... v1.bar(); //Wait-By-Necessity JVM ag v2 v1 V Wait-By-Necessity is a Dataflow Synchronization Java Object Active Object Req. Queue Future Object Proxy Thread Request 15 Denis Caromel

  16. First-Class FuturesUpdate 16 Denis Caromel

  17. Wait-By-Necessity: First Class Futures V= b.bar () b v v Futures are Global Single-Assignment Variables a b c.gee (V) c c 17 Denis Caromel

  18. Wait-By-Necessity: Eager Forward Based V= b.bar () b v v AO forwarding a future: will have to forward its value a b c.gee (V) c c 18 Denis Caromel

  19. Wait-By-Necessity: Eager Message Based V= b.bar () b v v AO receiving a future: send a message a b c.gee (V) c c 19 Denis Caromel

  20. Standard system at Runtime:No Sharing NoC: Network On Chip Proofs of Determinism 20 Denis Caromel

  21. (2) ASP: Asynchronous Sequential Processes • ASP  Confluence and Determinacy • Future updates can occur at any time • Execution characterized by the order of request senders • Determinacy of programs communicating over trees, … • A strong guide for implementation, • Fault-Tolerance and checkpointing, Model-Checking, … Denis Caromel

  22. No Sharing even for Multi-CoresRelated Talks at PDP 2009 • SS6 Session, today at 16:00 • Impact of the Memory Hierarchy on Shared Memory Architectures • in Multicore Programming Models • Rosa M. Badia, Josep M. Perez, Eduard Ayguade, and Jesus Labarta • Realities of Multi-Core CPU Chips and Memory Contention • David P. Barker Denis Caromel

  23. TYPED ASYNCHRONOUS GROUPS 23 Denis Caromel

  24. Creating AO and Groups A V A ag =newActiveGroup (“A”, […], VirtualNode) V v = ag.foo(param); ... v.bar(); //Wait-by-necessity JVM Group, Type, and Asynchrony are crucial for Composition Typed Group Java or Active Object 24 Denis Caromel

  25. Broadcast and Scatter ag JVM c3 c3 c3 c3 c3 c3 c3 c1 c1 c1 c1 c1 c1 c1 c2 c2 c2 c2 c2 c2 c2 JVM JVM s s JVM • Broadcast is the default behavior • Use a group as parameter, Scattered depends on rankings cg ag.bar(cg); // broadcast cg ProActive.setScatterGroup(cg); ag.bar(cg); // scatter cg 25 Denis Caromel

  26. Static Dispatch Group c4 c4 c4 c6 c6 c6 c5 c5 c5 c7 c7 c7 c8 c8 c8 c0 c0 c0 c9 c9 c9 empty queue c3 c3 c3 c1 c1 c1 c2 c2 c2 JVM Slowest ag cg JVM Fastest JVM ag.bar(cg); JVM 26 Denis Caromel

  27. Dynamic Dispatch Group c4 c4 c4 c6 c6 c6 c5 c5 c5 c7 c7 c7 c8 c8 c8 c0 c0 c0 c9 c9 c9 c3 c3 c3 c1 c1 c1 c2 c2 c2 JVM Slowest ag cg JVM Fastest JVM ag.bar(cg); JVM 27 Denis Caromel

  28. Handling Group Failures (2) Except. List ag vg JVM JVM JVM Except. failure JVM V vg = ag.foo (param); Group groupV = PAG.getGroup(vg); el = groupV.getExceptionList(); ... vg.gee(); 28 Denis Caromel

  29. Abstractions for ParallelismThe right Tool to execute the Task Denis Caromel

  30. Object-Oriented SPMD 30 Denis Caromel

  31. OO SPMD A A ag = newSPMDGroup (“A”, […], VirtualNode) // In each member myGroup.barrier (“2D”); // Global Barrier myGroup.barrier (“vertical”); // Any Barrier myGroup.barrier (“north”,”south”,“east”,“west”); Still, not based on raw messages, but Typed Method Calls ==> Components 31 Denis Caromel

  32. Object-Oriented SPMDSingle Program Multiple Data Motivation Use Enterprise technology (Java, Eclipse, etc.) for Parallel Computing Able to express in Java MPI’s Collective Communications: broadcast reduce scatter allscatter gather allgather Together with Barriers, Topologies. 32 Denis Caromel

  33. MPI Communication primitives For some (historical) reasons, MPI has many com. Primitives: MPI_Send Std MPI_Recv Receive MPI_Ssend Synchronous MPI_Irecv Immediate MPI_Bsend Buffer … (any) source, (any) tag, MPI_Rsend Ready MPI_Isend Immediate, async/future MPI_Ibsend, … I’d rather put the burden on the implementation, not the Programmers ! How to do adaptive implementation in that context ? Not talking about: the combinatory that occurs between send and receive the semantic problems that occur in distributed implementations 33 Denis Caromel

  34. Application Semantics rather thanLow-Level Architecture-Based Optimization • MPI: • MPI_Send MPI_Recv MPI_Ssend MPI_Irecv MPI_Bsend MPI_Rsend MPI_Isend MPI_Ibsend • What we propose: • High-level Information from Application Programmer • (Experimented on 3D ElectroMagnetism, and Nasa Benchmarks) • Examples: • ro.foo ( ForgetOnSend (params) ); • ActiveObject.exchange(…params ); Optimizations for Both • Distributed & • Multi-Core Denis Caromel

  35. NAS Parallel Benchmarks • Designed by NASA to evaluate benefits of high performance systems • Strongly based on CFD • 5 benchmarks (kernels) to test different aspects of a system • 2 categories or focus variations: • communication intensive and • computation intensive

  36. Communication IntensiveCG Kernel (Conjugate Gradient) • Floating point operations • Eigen value computation • High number of unstructured communications • 12000 calls/node • 570 MB sent/node • 1 min 32 • 65 % comms/WT Data density distribution Message density distribution Denis Caromel

  37. Communication IntensiveCG Kernel (Conjugate Gradient)  Comparable Performances Denis Caromel

  38. Communication IntensiveMG Kernel (Multi Grid) • 600 calls/node • 45 MB sent • 1 min 32 • 80 % comms • Floating point operations • Solving Poisson problem • Structured communications Data density distribution Message density distribution Denis Caromel

  39. Communication IntensiveMG Kernel (Multi Grid) Pb. With high-rate communications 2D  3D matrix access Denis Caromel

  40. Computation IntensiveEP Kernel (Embarrassingly Parallel) • Random numbers • generation • Almost no • communications This is Java!!! Denis Caromel

  41. Related Talk at PDP 2009 • T4 Session, today at 14:00 • NPB-MPJ: NAS Parallel Benchmarks Implementation for • Message Passing in Java • Damián A. Mallón, Guillermo L. Taboada, Juan Touriño, Ramón Doallo • Univ. Coruña, Spain Denis Caromel

  42. Parallel Components 42 Denis Caromel

  43. GridCOMP Partners Denis Caromel

  44. Objects to Distributed Components IoC: Inversion Of Control (set in XML) A Example of component instance V Truly Distributed Components Typed Group Java or Active Object JVM 44 Denis Caromel

  45. GCM Scopes and Objectives: Grid Codes that Compose and Deploy No programming, No Scripting, … No Pain Innovation: Abstract Deployment Composite Components Multicast and GatherCast MultiCast GatherCast

  46. Optimizing MxN Operations 2+ composites can be involved in the Gather-multicast Denis Caromel

  47. Related Talk at PDP 2009 • T1 Session, Yesterday at 11:30 • Towards Hierarchical Management of Autonomic Components: • a Case Study • Marco Aldinucci, Marco Danelutto, and Peter Kilpatrick Denis Caromel

  48. Skeleton Denis Caromel

  49. Algorithmic Skeletons for Parallelism • High Level Programming Model [Cole89] • Hides the complexity of parallel/distributed programming • Exploits nestable parallelism patterns BLAST Skeleton Program Parallelism Patterns farm while d&c(fb,fd,fc)‏ Task pipe if for pipe fork seq(f3)‏ divide & conquer seq(f2)‏ seq(f1)‏ Data map fork

  50. Algorithmic Skeletons for Parallelism publicboolean condition(BlastParams param){ File file = param.dbFile; return file.length() > param.maxDBSize; } • High Level Programming Model [Cole89] • Hides the complexity of parallel/distributed programming • Exploits nestable parallelism patterns BLAST Skeleton Program Parallelism Patterns farm while d&c(fb,fd,fc)‏ Task pipe if for pipe fork seq(f3)‏ divide & conquer seq(f2)‏ seq(f1)‏ Data map fork

More Related