1 / 15

“Introducing Hadoop on Azure:

hello Map-Reduce!”. “Introducing Hadoop on Azure:. Joe Hummel, PhD Visiting Researcher: U. of California, Irvine Adjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago. Materials: http ://www.joehummel.net/downloads.html Email: joe@joehummel.net. A little history….

abla
Download Presentation

“Introducing Hadoop on Azure:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. hello Map-Reduce!” “Introducing Hadoop on Azure: Joe Hummel, PhD Visiting Researcher: U. of California, IrvineAdjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials: http://www.joehummel.net/downloads.html Email: joe@joehummel.net

  2. A little history… • Map-Reduce is from functional programming // function returns 1 if i is prime, 0 if not: letisPrime(i) = ... // sums 2 numbers: letsum(x, y) = return x + y // count the number of primes in 1..N: letcountPrimes(N) = let L = [ 1 .. N ] // [ 1, 2, 3, 4, 5, 6, ... ] let T = mapisPrime L // [ 0, 1, 1, 0, 1, 0, ... ] let count = reducesum T // 42 return count Hadoop on Azure

  3. A little more history… • Hadoop: • Created by to drive internet search • Parallelism • Data partitioning • Fault tolerance BIG Data page hits

  4. Hadoop today • Freely-available framework for big data • http://hadoop.apache.org/ • Based on concept of Map-Reduce: mapfunction reduce intermediate results BIG data Map Map Reduce R Map Map . . . . . .

  5. Workflow Data Map Map Map [ <key1,value>, <key4,value>, <key2,value>, … ] Sort Sort Sort [ <key1,value>, <key1,value>, … ] Merge [ <key1, [value,value,…]>, <key2, [value,value,…]>, … ] Reduce R [ <key1, value>, <key2, value>… ]

  6. Data set for demo • We’ll be working with Chicago crime data… • https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2 • http://www.cityofchicago.org/city/en/narr/foia/CityData.html 1 GB 5M rows

  7. Goal? • Compute top-10 crimes… IUCR Count 0486 366903 0820 308074 . . . 0890 166916 IUCR = Illinois Uniform Crime Codes https://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e

  8. Demo • Hadoop on Azure… • // Javascript version: • varmap= function (key, value, context) • { • var values = value.split(","); • context.write(values[4], 1); • }; • varreduce= function (key, values, context) • { • var sum = 0; • while (values.hasNext()) • { • sum += parseInt(values.next()); • } • context.write(key, sum); • }; 0486 366903 0820 308074 . . . Hadoop on Azure

  9. Hadoop++ • Rich ecosystem around Hadoop • Pig • Hive • HBASE • … • // interactive PIG with explicit Map-Reduce functions: • pig.from("CC-from-2001.txt"). • mapReduce("IUCR-Count.js", "IUCR, Count:long"). • orderBy("Count DESC"). • take(10). • to("output-from-2001") • // interactive PIG without explicit Map-Reduce: • schema = "ID,CaseNumber,Date,Block,IUCR,..." • pig.from("CC-from-2001.txt", schema). • groupBy("IUCR"). • select("group, SUM($1.Count"). • orderBy("Count DESC"). • take(10). • to("output-from-2001") Hadoop on Azure

  10. Hadoop on Azure • Microsoft is offering free access to Hadoop • Request invitation @ http://www.hadooponazure.com/ • Hadoopconnector for Excel • Process data using Hadoop, analyze/visualize using Excel Hadoop on Azure

  11. PowerPivot • Freely-available plugin for Excel 2010 • http://www.powerpivot.com/ • Turns Excel into an in-memory database • More precisely, turns spreadsheet into an OLAP cube • Note: • If you have 32-bit Excel, install 32-bit PowerPivot • If you have 64-bit Excel, install 64-bit PowerPivot • GBs of data will require 64-bit • [ How to tell what version of Excel you have? File menu, help… ] Big Data Processing, Cheap

  12. Demo • PowerPivot… • Install • PowerPivot menu • PowerPivot Window • Get Data... • PivotTable… Big Data Processing, Cheap

  13. Compare and contrast Big Data Processing, Cheap

  14. That’s it! Big Data Processing, Cheap

  15. Thank you for attending • Presenter: Joe Hummel • Email: joe@joehummel.net • Materials: http://www.joehummel.net/downloads.html • Keep an eye for final release of: • Hadoop on Azure • Hadoop on Windows • PowerView plugin for Excel 2013 Big Data Processing, Cheap

More Related