1 / 10

Kevin Bacon

Kevin Bacon. The Question You are Going to Answer (again) …. Which pair of actors/actresses have acted together the most times?. Kevin Bacon. 1. Download the project. http:// aidanhogan.com/teaching/cc5212-1/mdp-lab6.zip. 2. Install Highlighter for Pig.

yaron
Download Presentation

Kevin Bacon

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kevin Bacon

  2. The Question You are Going to Answer (again) … Which pair of actors/actresses have acted together the most times?

  3. Kevin Bacon

  4. 1. Download the project http://aidanhogan.com/teaching/cc5212-1/mdp-lab6.zip

  5. 2. Install Highlighter for Pig • Put org.apache.pig.contrib.eclipse_1.0.0.jar into plugins/ folder of Eclipse • Start Eclipse and import the project (can omit org.apache.pig.contrib.eclipse_1.0.0.jar)

  6. 3. Reference Material • Things to help you! • http://aidanhogan.com/teaching/cc5212-1/MDP-07-Pig-20140421.pptx • (7th Lecture) • http://pig.apache.org/docs/r0.12.1/ • (Official Pig documentation)

  7. 4. Get Started • In Eclipse, file actor-count.pig, change your username • STORE ordered_actor_pair_count INTO '/uhadoop/[username]/pig-debug/'; • Open WinSCP and copy actor-count.pig to /data/2014/uhadoop/[username]/ • Open PuTTY, navigate to /data/2014/uhadoop/[username]/ and call • pig actor-count.pig • In PuTTY, look at the output • hadoop fs -cat /uhadoop/[username]/pig-debug/part-m-00000 | more

  8. 5. Implement the Script • Process is same as before: • filter everything but “THEATRICAL MOVIE” in type • unique movie name = title+”##”+year+”##”+num • map from raw data to actor pairs starring in the same movie • count them and sort them. • Use reference material • Methodology: • Add one script line at a time … STORE new relation • Copy new script to the server • Delete old output (careful) • hadoopfs -rmr /uhadoop/[username]/pig-debug/ • Run new script and check output • If it looks okay, GOTO 1 

  9. Output for Small File 20 Gy�rffy, Gy�rgy (I)##Gy�rgy, L�szl� (I) 15 Guerrero, Eddie (I)##Guti�rrez, Oscar (III) 13 Guill�nCuervo, Fernando##Guill�n, Fernando 13 Gregurevic, Ivo##Grgic, Goran 12 Gyenge, �rp�d##Gy�rgy, L�szl� (I) 11 Guevara, Luis (I)##Guti�rrez, Alfredo (I) 10 Gross, Walter (I)##Gro�kurth, Kurt 9 Gurza, Humberto##Gurza, Miguel 9 Guerrero, Eddie (I)##Guerrero, Sal 9 Gr�nberg, �ke##Gustafson, Eric (I)

  10. 6. Run for all data • raw = LOAD 'hdfs://cm:9000/uhadoop/imdb/full/actpersons-to-movies.tsv‘ … • STORE ordered_actor_pair_count INTO '/uhadoop/[username]/imdbfull/';

More Related