Tu AHM 2005

Uploaded from authorPOINT
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Hercules: A System for End-to-End Parallel Finite Element Earthquake Simulations: 

Hercules: A System for End-to-End Parallel Finite Element Earthquake Simulations Tiankai Tu Computer Science Department Carnegie Mellon University Joint work with: Jacobo Bielak, Omar Ghattas, Julio Lopez, Kwan-Liu Ma (UCDavis), David O’Hallaron, Leonardo Ramirez-Guzman, Hongfeng Yu (UCDavis)

Slide2: 

FEM mesh ~100GB Partitioned mesh ~1000 files Spatial-temporal output ~10TB Traditional file-based finite element simulations

Slide3: 

Finite element earthquake modeling Meshing: Octree-based wavelength adaptive unstructured hexahedral meshes. Typical 103 x reduction in number of grid points vs. uniform grids. A computational database system that breaks the memory barrier. Platform: single-processor servers with enough disk space. Partitioning: Graph theory based partitioner: Metis/ParMETIS Platform: single-processor servers or multi-processor parallel computers. Solving: Galerkin trilinear finite elements in space. Explicit central difference in time. Low memory requirement of stencil-based methods. Good cache performance (due to element-based matrix-vector operations). Platform: multi-thousand processor parallel computers. Visualizing: Ray-casting algorithm with multi-level resolutions. Overlapping I/O with volume rendering. Platform: multi-hundred processor parallel computers.

Slide4: 

Pitfalls of traditional simulation processes Heterogeneous platforms: Simulation components run on different computer systems. Input and output require pre-processing and post-processing. File-based data interface: Massive datasets (hundred-gigabyte to multi-terabyte). I/O and network bandwidth-bound. Offline visualization: No steering capability at solving time. Excessive volume (spatial-temporal) output.

Slide5: 

New online end-to-end supercomputing approach Goals: Achieve high scalability and performance. Avoid immediate I/O. Support online visualization.

Slide6: 

Hercules system features End-to-End: Problem description as input. (SCEC CVM material database). 3D volume animation as output (a sequence of jpegs, e.g. 1,2). One platform: All simulation components run on the same processors. Tightly integrated: One executable. No intermediate I/O. Scalable: Scale up to thousands of processors. Good iso-granular and fix-sized parallel efficiency.

Slide7: 

Hercules system overview PE … Interconnect network PE PE PE Massive storage system A parallel, distributed octree as the backbone data structure. Parallel abstract data types as interface between simulation components. Generate unstructured finite element mesh upfront (before the solver starts running). Produce 3D volume rendering images as the solver is running (at certain visualization interval).

Slide8: 

Meshing NEWTREE REFINETREE COARSENTREE BALANCETREE PARTITIONTREE EXTRACTMESH Octree and mesh handles to solver and visualizer Upfront adaptation guided by material property or geometry Online adaptation guided by solver’s output (e.g. error est.) Anchored nodes Dangling nodes Hex elements In-situ mesh generation. Space-filling curve based partitioning.

Slide9: 

Solving Obtain abstract mesh handle (similar to UNIX FILE abstraction). Support both point source and kinematic source.

Slide10: 

Visualizing Access simulation results indirectly via the backbone octree structure. Obtain visualization parameters at runtime. Unbalanced workload on different processors.

Slide11: 

Portability of the Hercules system PSC AlphaServer cluster (lemieux.psc.edu): 3,000 HP Alpha processors (750 4-way SMP nodes) + Tru64Unix. Quadrics interconnect network. HP proprietary compiler and MPI implementation. CMU lab server (manteo.cmcl.cs.cmu.edu): Dual-processor Intel Xeon processors + Linux. Shared memory. GNU compilers and MPICH-p4. USC HPCC system (hpc.usc.edu): Large number of Intel Xeon processors + Linux. Myrinet interconnect. GNU compilers and MPICH-gm. SCEC cluster (dynamic.usc.edu): 16 AMD Opteron processors + Linux. Ethernet interconnect. GNU compilers and MPICH-p4.

Slide12: 

Performance characteristics (Lemieux at PSC) Iso-granular scalability: 82% efficiency on 784 processors; 60% efficiency on 2000 processors. Fixed-size scalability: 1Hz simulation (114m mesh nodes); 86% efficiency on 2048 processors (base 128PEs). Processor utilization: ~635 MFlops/sec/PE; 33% of the peak performance (2GFlops/sec/PE). LA Basin (100km x 100km x 37.5 km), SCEC CVM Version 2.0; minimum shear wave velocity 100m/s

Slide13: 

Summary and future work Breakthroughs: First large-scale online, end-to-end, parallel finite element simulation system. First parallel in-situ unstructured finite element mesher that scales to thousands of processors. First parallel visualization pipeline that runs on thousands of processors. Practical implications: Turn 'heroic' runs into daily exercises. Provide a standalone computational module. Enable solving terascale/petascale inverse problems. Future work: Run Terashake. Develop browser-based online visualization steering capability. Extend the system to include online data compression and indexing for volume output. Integrate with the SCEC/CME workflow system. Couple with inverse solvers.