270 likes | 374 Views
Ax=b: The Link between Gyrokinetic Particle Simulations of Turbulent Transport in Burning Plasmas and Micro-FE Analysis of Whole Vertebral Bodies in Orthopaedic Biomechanics. Mark F. Adams SciDAC - 27 June 2005. Outline. Algebraic multigrid (AMG) introduction Micro-FE bone modeling
E N D
Ax=b: The Link between Gyrokinetic Particle Simulations of Turbulent Transport in Burning Plasmas and Micro-FE Analysis of Whole Vertebral Bodies in Orthopaedic Biomechanics Mark F. Adams SciDAC - 27 June 2005
Outline • Algebraic multigrid (AMG) introduction • Micro-FE bone modeling • Olympus parallel FE framework • Scalability study on IBM SPs • Gyrokinetic Particle Simulations of Turbulent Transport in Burning Plasmas
smoothing The Multigrid V-cycle Finest Grid Restriction (R) Note: smaller grid First Coarse Grid Prolongation (P=RT) Multigrid smoothing and coarse grid correction (projection)
Multigrid V(n1,n2) - cycle • Given smoother S and coarse grid space (P) • Columns of “prolongation” operator P, discrete rep. of coarse grid space • Function u = MG-V(A,f) • if A is small • u A-1f • else • u Sn1(f, u) -- n1 steps of smoother (pre) • rH PT( f – Au ) • uHMG-V(PTAP, rH ) -- recursion (Galerkin) • u u + PuH • u Sn2(f, u) -- n2 steps of smoother (post) • Iteration matrix w/ R = PT: T = S ( I - P(RAP)-1RA ) S • multiplicative
B P0 Smoothed Aggregation • Coarse grid space & smoother MG method • Piecewise constant function: “Plain” agg. (P0) • Start with kernel vectors B of operator • eg, 6 RBMs in elasticity • Nodal aggregation • “Smoothed” aggregation: lower energy of functions • One Jacobi iteration: P ( I - D-1 A ) P0
Outline • Algebraic multigrid (AMG) introduction • Micro-FE bone modeling • Olympus parallel FE framework • Scalability study on IBM SPs • Gyrokinetic Particle Simulations of Turbulent Transport in Burning Plasmas
Cortical bone Trabecular bone Trabecular Bone 5-mm Cube
Methods: FE modeling Mechanical Testing E, yield, ult, etc. 3D image FE mesh Micro-Computed Tomography CT @ 22 m resolution 2.5 mm cube 44 m elements
the vertebral body you are showing is pretty healthy from a 80 year old female and it is a T-10 that is thoracic. So it is pretty close to the mid-spine. Usually research is done from T-10 downward to the lumbar vertebral bodies. There are 12 thoracic VB's and 5 lumbar. The numbers go up as you go down.
1 mm slice from vertebral body Motivation • Calibrate material models for continuum elements • eg, explicit computation of a yield surface • Validation for low order model • Investigation of effects that are not accessible with lower order models • role of cortical shell in load carrying of vertebra • effects of drug treatment on continuum properties
Outline • Algebraic multigrid (AMG) introduction • Micro-FE bone modeling • Olympus parallel FE framework • Scalability study on IBM SPs • Gyrokinetic Particle Simulations of Turbulent Transport in Burning Plasmas
Computational Architecture Silo DB Silo DB Silo DB Silo DB FE MeshInput File ParMetis Athena Partition to SMPs FE input file(in memory) FE input file(in memory) • Athena: Parallel FE • ParMetis • Parallel Mesh Partitioner (Univerisity of Minnesota) • Prometheus • Multigrid Solver • FEAP • Serial general purpose FE application (University of California) • PETSc • Parallel numerical libraries (Argonne National Labs) ParMetis Athena Athena File File File File FEAP FEAP FEAP FEAP Material Card pFEAP Olympus METIS METIS METIS Prometheus METIS Visit ParMetis PETSc
Geometric& Material non-linear2.25% strain8 procs.DataStar (SP4at UCSD)
Outline • Algebraic multigrid (AMG) introduction • Micro-FE bone modeling • Olympus parallel FE framework • Scalability study on IBM SPs • Gyrokinetic Particle Simulations of Turbulent Transport in Burning Plasmas
80 µm w/ shell Vertebral Body With Shell • Large deformation elast. • 6 load steps (3% strain) • Scaled speedup • ~131K dof/processor • 7 to 537 million dof • 4 to 292 nodes • IBM SP Power3 • 14 of 16 procs/node used • Double/Single Colony switch
Scalability • Inexact Newton • CG linear solver • Variable tolerance • Smoothed aggregation AMG preconditioner • Nodal block diagonal smoothers: • 2nd order Chebeshev (add.) • Gauss-Seidel (multiplicative) 80 µm w/o shell
Computational phases • Mesh setup (per mesh): • Coarse grid construction (aggregation) • Graph processing • Matrix setup (per matrix): • Coarse grid operator construction • Sparse matrix triple product RAP (expensive for S.A.) • Subdomain factorizations • Solve (per RHS): • Matrix vector products (residuals, grid transfer) • Smoothers (Matrix vector products)
131K dof / proc - Flops/sec/proc .47 Teraflop/s - 4088 processors
Outline • Algebraic multigrid (AMG) introduction • Micro-FE bone modeling • Olympus parallel FE framework • Scalability study on IBM SPs • Gyrokinetic Particle Simulations of Turbulent Transport in Burning Plasmas
Finite Element (FEM) Elliptic Solver Developed for GTC Global Field Aligned Mesh • FEM adapted for logically non-rectangular grids. • Need adjustments of elements at different toroidal angles. • Linear sparse matrix solver • PETSc (ANL) • Enabled implementing split-weight (Manuilskiy & Lee, POP2000) • and hybrid electron models (Lin & Chen, PoP2001) • Ongoing studies of kinetic electron effects on ITG and TEM turbulence • Ongoing studies of electromagnetic turbulences:
Performance • Multigrid preconditioned Krylov solver • Prometheus (Columbia) & HYPRE (LLNL) • Scaled speedup • ~38K dof per processor • 1 to 32 processors/plane • 8 planes, 20 time steps, 4 particles per cell
Thank You Gordon Bell Prize winner 2004: Ultrascalable implicit finite element analyses in solid mechanics with over a half a billion degrees of freedom M.F. Adams, H.H. Bayraktar,T.M. Keaveny, P. Papadopoulos ACM/IEEE Proceedings of SC2004: High Performance Networking and Computing