1 / 19

SciDAC Software Infrastructure for Lattice Gauge Theory

SciDAC Software Infrastructure for Lattice Gauge Theory. Richard C. Brower All Hands Meeting JLab June 1-2, 2005. Code distribution see http://www.usqcd.org/usqcd-software (Links to documentation at bottom). Major Participants in Soft w are Project. UK Peter Boyle

adora
Download Presentation

SciDAC Software Infrastructure for Lattice Gauge Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SciDAC Software Infrastructurefor Lattice Gauge Theory Richard C. Brower All Hands Meeting JLab June 1-2, 2005 Code distribution see http://www.usqcd.org/usqcd-software (Links to documentation at bottom)

  2. Major Participants in Software Project UK Peter Boyle Balint Joo (Mike Clark) * Software Coordinating Committee

  3. OUTLINE • QCD APIDesign • Completed Components • Performance • Future Note: Thursday 9:30 Applications of the SciDAC API TBA = Carleton DeTar & Balint Joo

  4. Lattice QCD – extremely uniform • (Perfect) Load Balancing: Uniform periodic lattices & identical sublattices per processor. • (Complete) Latency hiding: overlap computation /communications • Data Parallel:operations on Small 3x3 Complex Matrices per link. • Critical kernel : Dirac Solver is ~90% of Flops. Lattice Dirac operator:

  5. Optimised for P4 and QCDOC Optimized Dirac Operators, Inverters ILDG collab Level 3 Exists in C/C++ C/C++, implemented over MPI, native QCDOC, M-via GigE mesh SciDAC QCD API QDP (QCD Data Parallel) QIO Binary DataFiles / XML Metadata Level 2 Lattice Wide Operations, Data shifts QLA (QCD Linear Algebra) Level 1 QMP (QCD Message Passing)

  6. Fundamental Software Components are in Place: • QMP (Message Passing) version 2.0 GigE, in native QCDOC & over MPI. • QLA (Linear Algebra) • C routines (24,000 generated by Perl with test code). • Optimized (subset) for P4 Clusters in SSE assembly. • QCDOC optimized using BAGEL -- assembly generator for PowerPC, IBM power, UltraSparc, Alpha… by P. Boyle UK • QDP (Data Parallel) API • C version for MILC applications code • C++ version for new code base Chroma • I/0 and ILDG: Read/Write for XML/binary files compliant with ILDG • Level 3 (Optimized kernels) Dirac inverters for QCDOC and P4 clusters

  7. Example of QDP++ Expression Typical for Dirac Operator: • QDP/C++ code: • Portable Expression Template Engine (PETE) Temporaries eliminated, expressions optimized

  8. Gauge Covariant Version of QDP/C++ code ? • Lattice gauge algebra has bifundamentals: • QDP new type: (Field U, displacement y-x) • No shifts! All data movement is implicit in * operator. • Gauge non-invariant expressions are impossible, except with special identity field for pure shifts ( Identity[vec])

  9. Performance of Optimized Dirac Inverters • QCDOC Level 3 inverters on 1024 processor racks • Level 2 on Infiniband P4 clusters at FNAL • Level 3 GigE P4 at JLab and Myrinet P4 at FNAL • Future Optimization on BlueGene/L et al.

  10. Level 3 on QCDOC

  11. Level 3 Aqstad QCDOC (double precision)

  12. Level 2 Asqtad on single P4 node

  13. Asqtad on Infiniband Cluster

  14. Level 3 Domain Wall CG Inverter y (32 nodes) 42 6 16 (4) 52 20 (6)102 40 6 163 y Ls = 16, 4g is P4 2.8MHz, 800MHz FSB

  15. Alpha Testers on the 1K QCDOC Racks • MILC C code: Level 3 Asqtad HMD: 403 * 96 lattice for 2 + 1 flavors (Each trajectory: 12hrs 18 min, Dru Renner) • Chroma C++ code: Level 3 domain wall RHMC: 243 * 32 (Ls = 8) for 2+1 flavors (James Osborn & Robert Edwards & UKQCD) • I/O: Reading and writing lattices to RAID and host AIX box • File transfers: 1 hop file transfers between BNL, FNAL & JLab

  16. BlueGene/L Assessment • June ’04: 1024 node BG/L Wilson HMC sustain 1 Teraflop (peak 5.6) Bhanot, Chen, Gara, Sexton & Vranas hep-lat/0409042 • Feb ’05: QDP++/Chroma running on UK BG/L Machine • June ’05: BU and MIT each acquire 1024 Node BG/L • ’05-’06 Software Development: Optimize Linear Algebra (BAGEL assembler from P. Boyle) at UK Optimize Inverter (Level 3) at MIT Optimize QMP (Level 1) at BU

  17. Future Software Directions • On going: Optimization, Testing and Hardening of components. New Level 3 and application routines. Executables for BG/L, X1E, XT3, Power 3, PNNL cluster, APEmille… • Integration: Uniform Interfaces, Uniform Execution Environment for 3 Labs. Middleware for data sharing with ILDG. Data, Code, Job control as 3 Lab Meta Facility (leverage PPDG) • Algorithm Research: Increased performance (leverage TOPS& NSF IRT). New physics (Scattering, resonances, SUSY, Fermion signs, etc?) • Generalize API: QCD like extensions of SM, SUSY, New Lattices, Related Apps? (See FermiQCD) • Graphics: Debugger, GUI for Job and Data Management, Physics & PR ? • Today at 5:30 MG talk by James Brannick

  18. Old MG resultsy to set context for Brannick’s talk:2x2 Blocks for Dirac Operator = 1 2-d Lattice, U(x) on links (x) on sites Gauss-Jacobi (Diamond), CG (circle), MG with V cycle (square), W cycle (star) y R. C. Brower, R. Edwards, C.Rebbi,and E. Vicari, "Projective multigrid forWilson fermions", Nucl. Phys.B366 (1991) 689 (aka Spectral AMG, Tim Chatier, 2002)

  19. Limitation of earlier MG methods with gauge invariant but non-adaptive projections for 2-d Schwinger Modely Universality of  vs mq/1/2 Gauss-Jacobi (Diamond), CG (circle), 3 level MG (square & star) = 3 (cross), 10(plus), 100( square)

More Related