1 / 24

Improved Performance in Data-Intensive Computational Turbulence with I/O Streaming

This research paper presents an I/O streaming method for batch queries in data-intensive computational turbulence, leveraging partial sums to access data in any order and eliminate redundant I/O. The method achieves an order of magnitude improvement in performance over direct query evaluation.

ammons
Download Presentation

Improved Performance in Data-Intensive Computational Turbulence with I/O Streaming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. I/O Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov, Eric Perlman, Randal Burns, Yanif Ahmad, and Alexander Szalay Johns Hopkins University

  2. I/O Streaming For Batch Queries • Based on partial sums • Allows access to the underlying data in any order and in parts • Data streamed from disk in a single pass • Eliminates redundant I/O • Over an order of magnitude improvement in performance over direct evaluation of queries

  3. Introduction • Data-intensive computing breakthroughs have allowed for new interaction with scientific numerical simulations • Formerly, analysis performed during the computation • No data stored for subsequent examination • Turbulence Database Cluster • Stores entire space-time evolution of the simulation • Two datasets totaling 70TB; part of the 1.1PB GrayWulf cluster • Provides public access to world-class simulation • Implements “immersive turbulence*” approach *E. Perlman, R. Burns, Y. Li, and C. Meneveau. Data exploration of turbulence simulations using a database cluster. In Supercomputing, 2007.

  4. Turbulence Database Cluster

  5. Motivation • Without I/O streaming: • Heavy DB usage slows down the service by a factor of 10 to 20 • Query evaluation techniques adapted from simulation code do not access data coherently • Substantial storage overhead (~42%) incurred to localize each computation • Turbulence queries: • 95% of queries perform Lagrange Polynomial interpolation • Can be evaluated in parts

  6. Processing a Batch Query 10 11 14 15 8 9 12 13 2 3 6 7 0 1 4 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

  7. Processing a Batch Query query 2 10 11 14 15 • Redundant I/O • Multiple disk seeks 8 9 12 13 2 3 6 7 0 1 4 5 query 1 query 3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 6 8 9 12 q1: 9 11 12 14 q2: q3: 4 5 6 7

  8. Streaming Evaluation Method • Linear data requirements of the computation allow for: • Incremental evaluation • Streaming over the data • Concurrent evaluation of batch queries

  9. Processing a Batch Query query 2 10 11 14 15 • Sequential I/O • Single pass 8 9 12 13 2 3 6 7 0 1 4 5 query 1 query 3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 11 12 14 I/O Streaming: q1 q1 q1 q1 q1 q3 q1 q3 q1 q1 q2 q1 q2 q2 q3 q3 q2

  10. Lagrange Polynomial Interpolation Lagrange coefficients Data

  11. Processing a Batch Query • Input queries pre-processed into a key-value dictionary • Keys are z-index values of data atoms stored in DB • Entries are lists of queries • Temp table is created out of dictionary keys • Execute a join between temp table and data table • When data atom is read-in all queries that need data from it are processed and their partial sums updated

  12. Experimental Evaluation • Random workloads: • across the entire cube space • a 1283 subset of the entire space • Workload from the usage log of the Turbulence cluster • Compare with direct methods of evaluation: • Direct • Sorting • Join/Order By

  13. 3D Workload • Used for generating global statistics

  14. 128 Workload • Used for: • Examining ROI • Creating visualizations

  15. Experimental Evaluation • Random workloads: • across the entire cube space • a 1283 subset of the entire space • Workload from the usage log of the Turbulence cluster • Compare with direct methods of evaluation: • Direct • Sorting • Join/Order By

  16. Setup • Experimental version of the MHD database • ~300 timesteps of the velocity fields of the MHD simulation • Two 2.33 GHz dual quad-core Windows 2003 servers with SQL Server 2008 and 8GB of memory • Part of the 1.1PB GrayWulf cluster with aggregate low-level throughput of 70 GB/sec • Data tables striped across 7 disks per node

  17. 3D Workload • I/O Streaming • Each atom is read only once • Effective cache usage • Join/Order By executes entire batch as a join • Sorting leads to a more sequential acces • Over an order of magnitude improvement

  18. 128 Workload • Less I/O • More data sharing

  19. I/O Streaming alleviates I/O bottleneck • Computation emerges as the more costly operation

  20. 128 Workload

  21. Future Work • Extend I/O streaming technique to other decomposable kernel computations: • Differentiation • Temporal interpolation • Filtering • Multi-job batch scheduling: • Integrate into a batch scheduling framework such as JAWS* *X. Wang, E. Perlman, R. Burns, T. Malik, T. Budavari, C. Meneveau, and A. Szalay. Jaws: Job-aware workload scheduling for the exploration of turbulence simulations. In Supercomputing, 2010.

  22. Summary • I/O Streaming method for data-intensive batch queries • Single pass by means of partial-sums • Effective exploitation of data sharing • Improved cache locality • Over an order of magnitude improvement in performance

  23. Questions Images courtesy of Kai Buerger (buerger@tum.de)

More Related