690 likes | 817 Views
EAVL Extreme-scale Analysis and Visualization Library. Jeremy Meredith SDAV Next-Gen Library Meeting September, 2012. History. Originally ORNL LDRD Jeremy Meredith, Sean Ahern, Dave Pugmire plus Rob Sisneros joined as a postdoc
E N D
EAVLExtreme-scale Analysis and Visualization Library Jeremy Meredith SDAV Next-Gen Library Meeting September, 2012
History • Originally ORNL LDRD • Jeremy Meredith, Sean Ahern, Dave Pugmire • plus Rob Sisneros joined as a postdoc • Many hours sitting in conference rooms arguing over things like “what does it mean to have one of your dimensions be unstructured?” • then determine what to do that’s practical without falling off the data modeling deep end . . . . • Exascale focus
Approaching the Exascale Problems • Update traditional data model to handle modern simulation codes and a wider range of data. • Investigate how an updated data and execution model can achieve the necessary computational, I/O, and memory efficiency. • Explore methods for visualization algorithm developers to achieve these efficiency gains and better support exascale architectures.
A Traditional Data Set Model Data Set Rectilinear Structured Unstructured
Challenge: Non-Physical Data Analysis • Graph Data • topologically 0D vertices, 1D edges • non-spatial; storing X/Y/Z values is wasted space • Pure Parameter Studies • e.g. reaction rate of combustion • FOUR “spatial” dimensions • e.g. methane concentration vs oxygen concentration vs temperature vs pressure • more complex reaction higher dimensionality methane oxygen pressure temperature
Challenge: Molecular Data(e.g., LAMMPS, VASP) BondStr 1 1 1 1 2 AtomicNum 6 6 1 1 1 1 • To represent using vtkPolyData or vtkUnstructuredGrid: • VTK_VERTEX cells for the atoms • VTK_LINE cells for the bonds • Any field data must exist on both element types • Not only inefficient: • dummy bond strengths on the atoms? • dummy atomic numbers on the bonds? • But also incorrect: • e.g. average(BondStrength) uses dummy values from atoms? H H C C H H
Challenge: Side Sets(e.g. Exodus, flux surfaces) • The flow from A to B is defined on a set of faces • The flux variable is defined only on those faces • do you combine them into a single mesh? • waste space on dummy values, potentially introducing errors • or create a separate mesh and lose the mapping info? • horribly expensive and error-prone to recalculate mapping flux surface lives inside the volumetric mesh A B
Challenge: Dimensionality, Refinement(e.g. GenASiS) • (a) seven (or eight) dimensional mesh • f(x,y,z,ϴ,ϕ,λ,F)=E, plus time • (b) refinement occurs on a per-cell basis • can’t assume per-block refinement • sometimes referred to as “unstructured AMR”
Challenge: Unique Mesh Topologies(e.g. MADNESS) • MADNESS does not have a traditional mesh • Just a quad-tree with polynomial coefficients • Up to 30 refinement levels / tree depth root 1 2 3 4 5 1 5 2 4 3 • spatial structure internal tree representation www.vacet.org
Challenge: Very High Order Fields(e.g. MADNESS) • Legendre polynomial series at each tree node • Each tree node has Kdim coefficients • K can be up to approx. 20 • i.e. 400 coeffs per tree node in 2D, 8000 in 3D (example with K=3, dim=2) 0.834 0.592 0.003 0.592 0.003 0.010 0.003 0.010 0.007 www.vacet.org
A Traditional Data Set Model (again) Data Set Rectilinear Structured Unstructured
The EAVL Data Set Model CellSet Data Set Explicit Structured QuadTree Subset Coords Field
Example: An Unstructured Grid(with interleaved coordinates) eavlExplicitCellSet eavlDataSet eavlCoordinates eavlField
Example: An Unstructured Grid(with separated coordinates) eavlExplicitCellSet eavlDataSet eavlCoordinates eavlField #0 eavlField #1 eavlField #2
Example: A Curvilinear Grid eavlStructuredCellSet eavlDataSet eavlCoordinates eavlField #0 eavlField #1 eavlField #2
Example: A Rectilinear Grid eavlStructuredCellSet eavlDataSet eavlCoordinates eavlField #0 eavlField #1 eavlField #2
Example: High-Dimensional Grid eavlStructuredCellSet eavlDataSet eavlCoordinates eavlField #0 eavlField #1 eavlField #2 eavlField #3 eavlField #4
Example: Geospatial Data eavlStructuredCellSet eavlDataSet eavlCoordinates eavlCoordinates eavlField #0 eavlField #1 eavlField #2
Example: Molecular Data eavlExplicitCellSet #0 eavlExplicitCellSet #1 eavlDataSet eavlCoordinates eavlField #0 eavlField #1 eavlField #2
Example: Face-centered Data eavlExplicitCellSet eavlAllFacesOfExplicit eavlDataSet eavlCoordinates eavlField #0 eavlField #1
Data flow networks in EAVL (or not) • A “Filter” is a stage in a data flow network • Creates a new data set from an old one • Many operations do not change a mesh structure (assuming data model is sufficiently descriptive) • Arithmetic expressions: only modifies fields • External facelist: points and structure remain • Feature edges: just a new cell set with old points • Smooth, displace, elevate: only modify coordinates • So: eavlMutatoris an alternative to eavlFilter • Modifies a data set in-place
eavlMutator • In-place data set modification • Support for destructive in-place operation • free memory as you go • Execute multiple mutators simultaneously on the same data set (barring conflicts) • e.g. displace (coords) + threshold (cells) concurrently • How about data flow network support? • encapsulate an eavlMutator through a eavlFilterFromMutator facade • Of course, some operations are natively eavlFilters • can facade through eavlMutatorFromFilter (?)
Example: Thresholding an RGrid (a) • Explicit cells can be combined with structured coordinates. eavlStructuredCellSet eavlExplicitCellSet eavlCoordinates eavlCoordinates eavlField#0 eavlField#1 eavlField#2 eavlField#0 eavlField#1 eavlField#2
Example: Thresholding an RGrid (b) • A second Cell Set can be added which refers to the first one eavlStructuredCellSet eavlSubset eavlStructuredCellSet eavlCoordinates eavlCoordinates eavlField#0 eavlField#1 eavlField#2 eavlField#0 eavlField#1 eavlField#2
Example: Structured External Facelist • Add six new subset-cell sets to original mesh x6 eavlStructSubset eavlStructSubset eavlStructuredCellSet eavlStructCellSet eavlStructSubset eavlCoordinates eavlCoordinates eavlField#0 eavlField#1 eavlField#2 eavlField#0 eavlField#1 eavlField#2
Example: Elevating a Structured Grid • No problem-sized data modifications. • Interleaved and separated coordinates can be used simultaneously. eavlStructuredCellSet eavlStructuredCellSet eavlCoordinates eavlCoordinates eavlField#0 eavlField#1 eavlField#0 eavlField#1
Example: Elevating a Regular Grid • No problem-sized data modifications. • Some axes on logical dims, with others on the points. eavlStructuredCellSet eavlStructuredCellSet eavlCoordinates eavlCoordinates eavlField#0 eavlField#1 eavlField#2 eavlField#0 eavlField#1 eavlField#2
Concurrency at Multiple Levels • Distributed Parallelism • Message passing still works well • Avoid global communication • local domain interconnectivity information • Hybrid (e.g. spatiotemporal) parallelism • Task Parallelism • Fine-grain dependency tracking • e.g. displace (coords) + threshold (cells) concurrently • eavlMutator helps • single eavlDataSet container class helps • Thread Parallelism • Fine-grain data parallelism; CUDA, OpenMP
Data Parallelism for Developers • Functor + iterator paradigm • Iteration patterns for mesh topologies • CUDA + OpenMP execution back-ends
A Simple Data-Parallel Operation void CellToCellDivide(Field &a, Field &b, Field &c) { for_each(i) c[i] = a[i] / b[i]; } void CalculateDensity(...) { //... CellToCellDivide(mass, volume, density); } Internal Library API Provides This Algorithm Developer Writes This
Functor + Iterator Approach void CalculateDensity(...) { //... CellToCellBinaryOp(mass, volume, density, Divide()); } template <class T>void CellToCellBinaryOp<T>(Field &a, Field &b, Field &c T &f) { for_each(i) f(a[i],b[i],c[i]); } structDivide { void operator()(float &a, float &b, float &c) { c = a / b; } }; Internal Library API Provides This Algorithm Developer Writes This
Custom Functor void CalculateDensity(...) { //... CellToCellBinaryOp(mass, volume, density, MyFunctor()); } template <class T>void CellToCellBinaryOp<T>(Field &a, Field &b, Field &c T &f) { for_each(i) f(a[i],b[i],c[i]); } structMyFunctor { void operator()(float &a, float &b, float &c) { c = a + 2*log(b); } }; Internal Library API Provides This Algorithm Developer Writes These
Functor Efficiency on CPU and GPU • Data: noise.silo • Surface normal
Binding Values to Functors structScaleByConst { float scale; ScaleByConst(float s) : scale(s) { } voidoperator()(float &a, float &b) { b = a * scale; } }; voidCalculateDensity(...) { //... cell_volume = mesh_volume / mesh_numcells; CellToCellUnaryOp(mass, density, ScaleByConst(1.0/cell_volume)); }
Map with 1 input, 1 output Simplest data-parallel operation. Each result item can be calculated from its corresponding input item alone. 0 1 2 3 4 5 6 7 8 9 10 11 x 3 7 0 1 4 0 0 4 5 3 1 0 result 6 14 0 2 8 0 0 8 10 6 2 0 structf { float operator()(float x) { return x*2; } };
Map with 2 inputs, 1 output With two input arrays, the functor takes two inputs. You can also have multiple outputs. 0 1 2 3 4 5 6 7 8 9 10 11 x 3 7 0 1 4 0 0 4 5 3 1 0 y 2 4 2 1 8 3 9 5 5 1 2 1 result 5 11 2 2 12 3 9 9 10 4 3 1 structf { float operator()(float a, floatb) { return a+b; } };
Scatter with 1 input (and thus 1 output) Possibly inefficient, risks of race conditions and uninitialized results. (Can also scatter to larger array if desired.) Often used in a scatter_if–type construct. 0 1 2 3 4 5 6 7 8 9 10 11 x 3 7 0 1 4 0 0 4 5 3 1 0 indices 2 4 1 5 5 0 4 2 1 2 1 4 result 0 1 3 0 4 No functor
Gather with 1 input (and thus 1 output) Unlike scatter, no risk of uninitialized data or race condition. Plus, parallelization is over a shorter indices array, and caching helps more, so can be more efficient. 0 1 2 3 4 5 6 7 8 9 10 11 x 3 7 0 1 4 0 0 4 5 3 1 0 indices 1 9 6 9 3 result 7 3 0 3 1 No functor
Reduction with 1 input (and thus 1 output) Example: max-reduction. Sum is also common. Often a fat-tree-based implementation. 0 1 2 3 4 5 6 7 8 9 10 11 x 3 7 0 1 4 0 0 4 5 3 1 0 result 7 structf { float operator()(float a, floatb) { return a>b ? a : b; } };
Inclusive Prefix Sum (a.k.a. Scan)with 1 input/output Value at result[i] is sum of values x[0]..x[i]. Surprisingly efficient parallel implementation. Basis for many more complex algorithms. 0 1 2 3 4 5 6 7 8 9 10 11 x 3 7 0 1 4 0 0 4 5 3 1 0 + + + + + + + + + + + result 3 10 10 11 15 15 15 19 24 27 28 28 No functor.
Exclusive Prefix Sum (a.k.a. Scan)with 1 input/output Initialize with zero, value is sum of only up to x[i-1]. May be more commonly used than inclusive scan. 0 1 2 3 4 5 6 7 8 9 10 11 x 3 7 0 1 4 0 0 4 5 3 1 0 + + + + + + + + + + + 0 result 0 3 10 10 11 15 15 15 19 24 27 28 No functor.
Example: Surface Normal • For each 2D cell(i.e. each polygon): • Get three adjacent points • Pair-wise vector subtract • Cross product • Data-parallel: • Repeat for all cells
Example: Surface Normal • OUTPUT: • 3-component surface normals array onthe mesh CELLS • example: length = 4 • INPUT: • 3-dimensional coordinates array on the mesh NODES • example: length = 9
Under the Covers: Node-to-Cell on CPU void NodeToCellOp3::ExecuteCPU() { #pragmaomp parallel for for (inti=0; i<input->NumCells(); i++) { // get cell node indices intnNodes, nodeIds[8]; floatnodeValues[3][8]; conn.GetCellNodes(index, nNodes, nodeIds); // get coordinates for nodes for (inti=0; i<nNodes; i++) { nodeValues[0][i] = array0[nodeIds[i]]; nodeValues[1][i] = array1[nodeIds[i]]; nodeValues[2][i] = array2[nodeIds[i]]; } // call functor functor(nodeValues[0], nodeValues[1], nodeValues[2], &out0[i], &out1[i], &out2[i]); } }