1 / 9

Lan Yi lan.yi@intel Senior Software Engineer Intel China Software Center 2013.07.16

Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation. Lan Yi lan.yi@intel.com Senior Software Engineer Intel China Software Center 2013.07.16. HiBench. Micro Benchmarks. Web Search. Different from GrixMix, SWIM? Micro Benchmark?

emma
Download Presentation

Lan Yi lan.yi@intel Senior Software Engineer Intel China Software Center 2013.07.16

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experience with HiBenchFrom Micro-Benchmarks toward End-to-End PipelinesWBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer Intel China Software Center 2013.07.16

  2. HiBench Micro Benchmarks Web Search Different from GrixMix, SWIM? Micro Benchmark? Isolated components? End-2-end Benchmark? We need ETL-Recommendation Pipeline • Nutch Indexing • Page Rank • Sort • WordCount • TeraSort HiBench Machine Learning HDFS • Bayesian Classification • K-Means Clustering • Enhanced DFSIO See our paper “The HiBench Suite: Characterization of the MapReduce-Based Data Analysis” in ICDE’10 workshops (WISS’10)

  3. ETL-Recommendation (hammer) TPC-DS Sales updates h1 h2 h24 Cookies updates ETL ETL-sales ETL-logs CF Test WP log table Sales tables Item-item similarity matrix Statistics & Measurements ip agent Retcode cookies Pref Pref-sales Pref-logs Offline test Sales preferences Browsing preferences Pref-comb Mahout Test data User-item preferences Item based Collaborative Filtering HIVE-Hadoop Cluster (Data Warehouse)

  4. ETL-Recommendation (hammer) • Task Dependences ETL-sales ETL-logs Pref-sales Pref-logs Offline test Pref-comb Item based Collaborative Filtering

  5. Empirical Data (hammer) Intel Xeon E5-2600 @ 2.2Ghz, sandyBridge 2 x 8x HT = 32 cores 192G Mem, WD 7200 0.3x12x4=14.4T 1000M net, 300M~400M/s 4-node cluster , RHL6.2, cdh4.1.2 HiBench etl-recomm branch, HiTune-0.9 Sales ~14G (TPC-DS scale 100), logs ~105G

  6. Empirical Data (hammer)

  7. Empirical Data (hammer)

  8. LinkBench • Benchmark for Social Graph Service • Originally Developed by Facebook on Top of MySQL • Simulate social graph workloads similar to Facebook’s online service • Key workload properties match Facebook’s real production workload • Different from Analytical Workloads • Our Work • Port LinkBench to HBase • On top of Phoenix (SQL support over HBase)

  9. Resources • HiBench • https://github.com/intel-hadoop/HiBench • HiBench ETL-Recomm Branch • https://github.com/intel-hadoop/HiBench/tree/etl-recomm • LinkBench • https://github.com/intel-hadoop/linkbench • HiTune • https://github.com/intel-hadoop/HiTune • Phoenix • https://github.com/intel-hadoop/phoenix

More Related