NAMD and BG/L

Chee Wai Lee cheelee@uiuc.edu Parallel Programming Laboratory Computer Science Department University of Illinois at Urbana-Champaign http://charm.cs.uiuc.edu NAMD and BG/L

Outline • BG/L Platform overview • Optimization Efforts: Context • Optimization Efforts: Approaches • Topology Awareness • Load Balancing • Parallelism • Computation/Communication Overlap • Results

Bluegene/L Platform Review • Hardware characteristics: • PowerPC 440 700 Mhz 32-bit processors • 2 Processors per node, no cache coherence • 4MB L3 Cache • 512 MB memory per node • 6 outgoing FIFO links per node • 3D Torus interconnect

Bluegene/L Platform Review (2) • Other characteristics: • Microkernel on compute nodes, minimal OS interference.

Objectives • Scale the 92,000 atom benchmark apoa1 as far as possible. • Sought understanding of scaling issues involved on the BG/L machine.

Topology Awareness • Distribute Patches according to the topology. • Logically align the NAMD 3D patch grid to BG/L's processor grid. • Patch Grid divided by Orthogonal Recursive Bisection (ORB) scheme. • Processor Grid is divided in similar proportions and assigned to corresponding Patch subgrids. • Topology aware spanning tree for multicasts.

Load Balancing • Framework optimizations • Memory footprint had to be reduced to accommodate the desired number of processors. • Spanning Tree implemented to handle large numbers of incoming messages to pe 0. • Spread non-migratable work better • Bonded computations (eg. Dihedrals) allocated off processors with Patch work where possible.

More Parallelism • 2-away computation. Patches interact with neighbors of neighbors. • User-tunable configuration option. • Break up compute objects. • Another User-tunable configuration option. • Balance tradeoffs in grainsize vs overheads. • PME pencil decomposition efforts.

Overlap of Computation and Communication • Hurt by lack of cache-coherence. • One processor can serve as communication co-processor if the L1 caches are flushed for large messages. Hurts too much. • Make use of FIFO link buffers. Every so often in NAMD's outer loop, we make AdvanceCommunication() calls.

Outline • BG/L Platform overview • Optimization Efforts: Context • Optimization Efforts: Approaches • Results

Results Nodes Processors Mode Time (watson) 32 32 co 347 ms 128 128 co 97.2 ms 512 512 co 23.7 ms 1024 1024 co 13.8 ms 2048 2048 co 8.6 ms 4096 4096 co 6.2 ms 8192 Processor scaling was achieved at 5.2ms per step

NAMD and BG/L

NAMD and BG/L

Presentation Transcript

Future Direction with NAMD

Need for Comprehensive Canadian Strategy Role of the Provincial Governments

Performance and Scaling Effects of MD Simulations using NAMD 2.7 and 2.8

Wednesday of Week 3

Molecular Simulation

NAMD: Moving Past the Hype: Real World Payment Reforms in Virginia

把原子看作经典质点，解牛顿运动方程。