90 likes | 223 Views
Analyzing large data sets quickly. Hasan Abbasi Matthew Wolf Jay Lofstead Fang Zheng Greg Eisenhauer Karsten Schwan. Scott Klasky Ron Oldfield Norbert Podhorszki. HPC project thrusts. Staging. ADIOS. Adaptive I/O. PreDatA. EnStage. Staging.
E N D
Analyzing large data sets quickly Hasan Abbasi Matthew Wolf Jay Lofstead Fang Zheng Greg Eisenhauer Karsten Schwan Scott Klasky Ron Oldfield Norbert Podhorszki
HPC project thrusts Staging ADIOS Adaptive I/O PreDatA EnStage
Staging • Use additional resources in the compute node • ADIOS staging method – included in version 1.2 release • High performance asynchronous output • State aware schedulers for limiting interference Abbasi, H., Wolf, M., Eisenhauer, G., Klasky, S., Schwan, K., and Zheng, F. 2009. DataStager: scalable data staging services for petascale applications. In Proceedings of the 18th ACM international Symposium on High Performance Distributed Computing (Garching, Germany, June 11 - 13, 2009). HPDC '09. Hasan Abbasi, Jay Lofstead, Fang Zheng, Scott Klasky, Karsten Schwan, Matthew Wolf. "Extending I/O through High Performance Data Services." Cluster Computing 2009, New Orleans, LA. August 2009. Julian Cummings, Alexander Sim, ArieShoshani, Jay Lofstead, Karsten Schwan, CiprianDocan, Manish Parashar, Scott Klasky, Norbert Podhorszki and RoselyneBarreto. "EFFIS: an End-to-end Framework for Fusion Integrated Simulation". PDP 2010 - Th 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, February 2010, Pisa, Italy
EnStage • Extends the staging concept to allow computation within the application • Used C-o-D with binary code generation to flexibly move computation • Global operations are performed in the staging area • Feature extraction and function specialization enable pseudo-collective operations in independent SmartTaps
Customizations • Copy based sampling (ST1): The ADIOS buffer is created and then subsampled • Inline sampling (ST2): The data is subsampled as the buffer is created • Staging area sampling (C-Stager): The data is subsampled on the staging area
Customizations • Only Data Output: Output data without any subsampling • Tagged Data Output: Calculate statistical characteristics for output data • Bounding Box (ST1): Output particles within a bounding box using copy based marshalling • Bounding Box (ST2): Same as ST1, but use inline subsampling • Bounding Box (C-Stager): Same as ST1, but subsample in the staging area • Statistical subsample: Use the Tagged Data Output on the compute node and reduce the tags on the staging area to specialize a data reduction function using global characteristics.