1 / 51

Elastic and Efficient Execution of Data-Intensive Applications on Hybrid Cloud

Elastic and Efficient Execution of Data-Intensive Applications on Hybrid Cloud. Tekin Bicer Computer Science and Engineering Ohio State University. Introduction. Scientific simulations and instruments X-ray Photon Correlation Spectroscopy CCD Detector: 120MB/s now; 44GB/s by 2015

perrin
Download Presentation

Elastic and Efficient Execution of Data-Intensive Applications on Hybrid Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Elastic and Efficient Execution of Data-Intensive Applications on Hybrid Cloud Tekin Bicer Computer Science and Engineering Ohio State University

  2. Introduction • Scientific simulations and instruments • X-ray Photon Correlation Spectroscopy • CCD Detector: 120MB/s now; 44GB/s by 2015 • Global Cloud Resolving Model • 1PB for 4km grid-cell • Performed on local clusters • Not sufficient • Problems • Data Analysis, Storage, I/O performance • Cloud Technologies • Elasticity • Pay-as-you-go Model

  3. Hybrid Cloud Motivation • Cloud technologies • Typically associated with computational resources • Massive data generation • Exhaust local storage • Hybrid Cloud • Local Resources: Base • Cloud Resources: Additional • Cloud • Compute and storage resources

  4. Usage of Hybrid Cloud Local Nodes Local Storage Data Source Cloud Storage Cloud Compute Nodes

  5. Challenges • Data-Intensive Processing • Transparent Data Access and Analysis • Programmability of Large-Scale Applications • Meeting User Constraints • Enabling Cloud Bursting • Minimizing Storage and I/O Cost • Domain Specific Compression • In-Situ and In-Transit Data Analysis MATE-HC: Map-reduce with AlternaTE APIover Hybrid Cloud Dynamic Resource Allocation Framework for Hybrid Cloud Compression Methodology and System for Large-Scale App.

  6. Programmability of Large-Scale Applications on Hybrid Cloud • Geographically distributed resources • Ease of programmability • Reduction-based programming structures • MATE-HC • A middleware for transparent data access and processing • Selective job assignment • Multi-threaded data retrieval

  7. Middleware for Hybrid Cloud GlobalReduction GlobalReduction Job Assignment Job Assignment Remote DataAnalysis 7

  8. MATE vs. Map-Reduce Processing Structure • Reduction Objectrepresents the intermediate state of the execution • Reduce func. is commutative and associative • Sorting, grouping.. overheads are eliminated with red. func/obj.

  9. Simple Example 3 5 8 4 1 3 5 2 6 7 9 4 2 4 8 Our large Dataset  Local Reduction(+) Local Reduction (+) Local Reduction(+) Robj[1]= 21 8 Robj[1]= 15 23 Robj[1]= 14 27 Our Compute Nodes Result= 71 Global Reduction(+)

  10. Experiments • 2 geographically distributed clusters • Cloud: EC2 instances running on Virginia • 22 nodes x 8 cores • Local: Campus cluster (Columbus, OH) • 150 nodes x 8 cores • 3 applications with 120GB of data • KMeans: k=1000; KNN: k=1000; • PageRank: 50x10 links w/ 9.2x10 edges • Goals: • Evaluating the system overhead with different job distributions • Evaluating the scalability of the system 10

  11. System Overhead: K-Means 11

  12. Scalability: K-Means 12

  13. Summary • MATE-HC is a data-intensive middleware developed for Hybrid Cloud • Our results show that • Low inter-cluster comm. overhead • Job distribution is important • Overall slowdown is modest • Proposed system is scalable 13

  14. Outline • Data-Intensive Processing • Programmability of Large-Scale Applications • Transparent Data Access and Analysis • Meeting User Constraints • Enabling Cloud Bursting • Minimizing Storage and I/O Cost • Domain Specific Compression • In-Situ and In-Transit Data Analysis MATE-HC: Map-reduce with AlternaTE APIover Hybrid Cloud Dynamic Resource Allocation Framework for Cloud Bursting Compression Methodology and System for Large-Scale App.

  15. Dynamic Resource Allocation for Cloud Bursting • Performance of cloud resources and workload vary • Problems: • Extended execution times • Unable to meet user constraints • Cloud resources can dynamically scale • Cloud Bursting • In-house resources: Base workload • Cloud resources: Adopt performance requirements • Dynamic Resource Allocation Framework • A model for capturing “Time” and “Cost” constraints with cloud bursting

  16. System Components • Local cluster and Cloud • MATE-HC processing structure • Pull-based job distribution • Head Node • Coarse grained job assignment • Consideration of locality • Master node • Fine grained job assignment • Job Stealing • Remote data processing 16

  17. Resource Allocation Framework Estimate required time for local cluster processing Estimate required time for cloud cluster processing All variables can be profiled during execution, except estimated # stolen jobs Estimate remaining # jobs after local jobs are consumed Ratio of local computational throughput in system 17

  18. Execution of Resource Allocation Framework • Head Node • Evaluates profiled info. • Estimates # cloud inst. • Before each job assign. • Informs Master nodes • Master Node • Each cluster has one • Collects profile info. • During job req. time • (De)allocates resources • Slave Nodes • Request and consume jobs 18

  19. Experimental Setup • Two Applications • KMeans (520GB): Local=104GB; Cloud=416GB • PageRank (520GB): Local=104GB; Cloud=416GB • Local cluster: Max. 16 nodes x 8 cores = 128 cores • Cloud resources: Max. 16 nodes x 8 cores = 128 cores • Evaluation of model • Local nodes are dropped during execution • Observed how system is adopted 19

  20. KMeans – Time Constraint • System is not able to meet the time constraint because max. # of cloud instances is reached • # Local Inst.: 16 (fixed) • # Cloud Inst.: Max 16 (Varies) • Local: 104GB, Cloud:416GB • All other configurations meet the time constraint with <1.5% error rate 20

  21. KMeans – Cloud Bursting • # Local Inst.: 16 (fixed) • # Cloud Inst.: Max 16 (Varies) • Local: 104GB, Cloud:416GB • 4 local nodes are dropped … • After 25% and 50% of time constraints are elapsed, error rate <1.9% • After 75% of time constraint is elapsed, error rate <3.6% • Reason of higher error rate: Shorter time to profile new environment 21

  22. Summary • MATE-HC: MapReduce type of processing • Federated resources • Developed a resource allocation model • Based on feedback mechanism • Time and cost constraints • Two data-intensive applications (KMeans, PR) • Error rate for time < 3.6% • Error rate for cost < 1.2% 22

  23. Outline • Data-Intensive Processing • Programmability of Large-Scale Applications • Transparent Data Access and Analysis • Meeting User Constraints • Enabling Cloud Bursting • Minimizing Storage and I/O Cost • Domain Specific Compression • In-Situ and In-Transit Data Analysis MATE-HC: Map-reduce with AlternaTE APIover HC Dynamic Resource Allocation Framework for Cloud Bursting Compression Methodology and System for Large-Scale App.

  24. Data Management using Compression • Generic compression algorithms • Good for low entropy sequence of bytes • Scientific dataset are hard to compress • Floating point numbers: Exponent and mantissa • Mantissa can be highly entropic • Using compression is challenging • Suitable compression algorithms • Utilization of available resources • Integration of compression algorithms

  25. Compression Methodology • Common properties of scientific datasets • Multidimensional arrays • Consist of floating point numbers • Relationship between neighboring values • Domain specific solutions can help • Approach: • Prediction-based differential compression • Predict the values of neighboring cells • Store the difference

  26. Example: GCRM Temperature Variable Compression • E.g.: Temperature record • The values of neighboring cells are highly related • X’ table (after prediction): • X’’ compressed values • 5bits for prediction + difference • Lossless and lossy comp. • Fast and good compression ratios

  27. Compression Framework • Improve end-to-end application performance • Minimize the application I/O time • Pipelining I/O and (de)compression operations • Hide computational overhead • Overlapping application computation with compression framework • Easy implementation of compression algorithms • Easy integration with applications • Similar API to POSIX I/O

  28. A Compression Framework for Data Intensive Applications Chunk Resource Allocation (CRA) Layer • Initialization of the system • Generate chunk requests, enqueue processing • Converting original offset and data size requests to compressed Parallel I/O Layer (PIOL) • Creates parallel chunk requests to storage medium • Each chunk request is handled by a group of threads • Provides abstraction for different data transfer protocols Parallel Compression Engine (PCE) • Applies encode(), decode() functions to chunks • Manages in-memory cache with informed prefetching • Creates I/O requests 28

  29. Integration with a Data-Intensive Computing System • Remote data processing • Sensitive to I/O bandwidth • Processes data in… • local cluster • cloud • or both (Hybrid Cloud)

  30. Experimental Setup • Two datasets: • GCRM: 375GB (L:270 + R:105) • NPB: 237GB (L:166 + R:71) • 16x8 cores (Intel Xeon 2.53GHz) • Storage of datasets • Lustre FS (14 storage nodes) • Amazon S3 (Northern Virginia) • Compression algorithms • CC, FPC, LZO, bzip, gzip, lzma • Applications: AT, MMAT, KMeans

  31. Performance of MMAT Breakdown of Performance • Overhead (Local): 15.41% • Read Speedup: 1.96

  32. Lossy Compression (MMAT) Lossy • #e: # dropped bits • Error bound: 5x(1/10^5)

  33. Summary • Management and analysis of scientific datasets are challenging • Generic compression algorithms are inefficient for scientific datasets • Proposed a compression framework and methodology • Domain specific compression algorithms are fast and space efficient • 51.68% compression ratio • 53.27% improvement in exec. time • Easy plug-and-play of compression • Integration of the proposed framework and methodology with a data analysis middleware

  34. Outline • Data-Intensive Processing • Programmability of Large-Scale Applications • Transparent Data Access and Analysis • Meeting User Constraints • Enabling Cloud Bursting • Minimizing Storage and I/O Cost • Domain Specific Compression • In-Situ and In-Transit Data Analysis MATE-HC: Map-reduce with AlternaTE APIover Hybrid Cloud Dynamic Resource Allocation Framework for Cloud Bursting Compression Methodology and System for Large-Scale App.

  35. In-Situ and In-Transit Analysis • Compression can ease data management • But may not always be sufficient • In-situ data analysis • Co-locate data source and analysis code • Data analysis during data generation • In-transit data analysis • Remote resources are utilized • Forward generated data to “staging nodes”

  36. In-Situ and In-Transit Data Analysis • Significant reduction in generated dataset size • Noise elimination, data filtering, stream mining… • Timely insights • Parallel data analysis • MATE-Stream • Dynamic resource allocation and load balancing • Hybrid data analysis • Both in-situ and in-transit

  37. Parallel In-Situ Data Analysis LR Robj[...] DataSource LR Disp Robj[...] LRobj[...] LR Robj[...] LR Robj[...] • Data Generation • Scientific instruments, simulations, etc. • (Un)bounded data • Local Reduction • Filtering, stream mining • Data reduction • Continuous local red. • Local Combination • Intermediate results • Timely insights • Continuous global red.

  38. Elastic In-Situ Data Analysis LR Robj[...] DataSource LR Disp Robj[...] LR Robj[...] LR Robj[...] LRobj[...] LR Robj[...] LR Robj[...] LR Robj[...] • Insufficient resource utilization • Dynamically extend resources • New local reduction proc. LR Robj[...]

  39. Elastic In-Situ and In-Transit Data Analysis N0 LR Robj[...] DataSource LR Disp Robj[...] LRobj[...] LR Robj[...] LR Robj[...] GRobj[...] LR Robj[...] LR Disp Robj[...] LRobj[...] LR Robj[...] LR • Staging node is set • Forward data Reduction process: Local comb. Global comb. Robj[...] N1

  40. Future Directions • Scientific applications are difficult to modify • Integration with existing data sources • GridFTP, (P)NetCDF and HDF5 etc. • Data transfer is expensive (especially for in-transit) • Utilization of advanced network technologies • Software-Defined Networking (SDN) • Long running nature of large-scale app. • Failures are inevitable • Exploit features of processing structure

  41. Conclusions • Data-intensive applications and instruments can easily exhaust local resources • Hybrid cloud can provide additional resources • Challenges: Transparent data access and processing; meeting user constraints; minimizing I/O and storage cost • MATE-HC: Transparent and efficient data processing on Hybrid Cloud • Developed a “dynamic resource allocation framework” and integrated with MATE-HC • Time and cost sensitive data processing • Proposed a “compression methodology and a system” to minimize storage cost and I/O bottleneck • Design of “in-situ and in-transit data analysis” (on going work)

  42. Thanks for your attention!

  43. MATE-EC2 Design • Data organization • Three levels: Buckets, Chunks and Units • Metadata information • Chunk Retrieval • Threaded Data Retrieval • Selective Job Assignment • Load Balancing and handling heterogeneity • Pooling mechanism

  44. MATE-EC2 vs. EMR • PageRank • Speedups • vs. combine4.08 – 7.54 • KMeans • Speedups • vs. combine3.54 – 4.58

  45. Different Chunk Sizes • KMeans • 1 retrieval threads • Performance increase • 128KB vs. >8M • 2.07 to 2.49

  46. K-Means (Data Retrieval) Fig. 1 Fig. 2 • Fig 1: 16 Retrieval Threads • 8M vs. others speedup: 1.13-1.30 • Fig. 2: 128M Chunk Size • 1 Thread vs. others speedup: 1.37-1.90 • Dataset: 8.2GB

  47. Job Assignment • KMeans: • 1.01 (8M) and 1.10-1.14 (for others) • PCA (2 iterations): • Speedups : 1.19-1.68

  48. Heterogeneous Conf. Overheads • KMeans: 1% • PCA: 1.1%, 7.4%, 11.7%

  49. Kmeans – Cost Constraint • System meets the cost constraints with <1.1% error rate • System tries to minimize the execution time with provided cost constraint • Maximum # cloud instances is allocated error rate is again <1.1% 49

  50. Prefetching and In-Memory Cache • Overlapping application layer computation with I/O • Reusability of already accessed data is small • Prefetching and caching the prospective chunks • Default is LRU • User can analyze history and provide prospective chunk list • Cache uses row-based locking scheme for efficient consecutive chunk requests Informed Prefetching prefetch(…)

More Related