1 / 36

Predictive modeling competitions

Predictive modeling competitions. making data science a sport. Anthony Goldbloom CEO, Kaggle e-mail anthony.goldbloom@kaggle.com twitter @antgoldbloom. Photo by mikebaird, www.flickr.com/photos/mikebaird. Global competitions. Predicting HIV viral load. Competition closes 77%.

alban
Download Presentation

Predictive modeling competitions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predictive modeling competitions making data science a sport Anthony Goldbloom CEO, Kaggle e-mail anthony.goldbloom@kaggle.com twitter @antgoldbloom Photo by mikebaird, www.flickr.com/photos/mikebaird

  2. Global competitions Predicting HIV viral load Competition closes 77% 1½ weeks 70.8% State of the art 70%

  3. Diverse experts solving diverse problems Grant Application Forecasting Stock Price Prediction HIV Research Chess Ratings Travel Time Prediction Edmund & Adrian London & USA Dr. Derek Gatherer UK Felipe Maia Uppsala University Ivan Russian Federation Dr. Christopher Hefele, New York Philipp Emanuel Widmann Heidelberg, DE Chih-Li Sung & Roy Tseng Penghu & Taipei Robert Warsaw Gzegorz Swiszcz Gera Cole Harris Texas Jure Zbontar Ljubljana Giuseppe Ragusa Rome Chris DuBois Portland Claudio Perlich USA Edmund & Adrian London & USA Jason Trigg Pennsylvania John Blatz Baltimore Rajstennaj Barrabas USA Chris Raimondi Batimore Jason Trigg Pennsylvania Uri Blass Tel-Aviv Lee Baker Las Cruces, NM Nan Zhou Pittsburgh Jeremy Howard Australia Thomas Mahony Canberra Glen Maher Canberra Emir Delic Australia

  4. Motivation • Why host a competition? • Why compete? • How it works • Heritage Health Prize • Questions

  5. “I keep saying the sexy job in the next ten years will be statisticians.” Hal Varian Google Chief Economist 2009

  6. Crowdsourcing Mismatch between those with data andthose with the skills to analyse it

  7. Countless possible approaches to any data prediction problem. Which to choose? 7

  8. 18 year old beating his professors 8

  9. Motivation • Why host a competition? • Why compete? • How it works • Heritage Health Prize • Questions

  10. Tourism Forecasting Competition Forecast Error(MASE) Existing model Aug 9 2 weeks later 1 month later Competition End

  11. Chess Ratings Competition Existing model (ELO) Error Rate(RMSE) Aug 4 1 month later 2 months later Today

  12. Our User Base

  13. Users apply different techniques • neural networks • logistic regression • support vector machine • decision trees • ensemble methods • adaBoost • Bayesian networks • genetic algorithms • random forest • Monte Carlo methods • principal component analysis • Kalman filter • evolutionary fuzzy modeling

  14. Benchmarking

  15. ~25% Successful grant applications NASA tried, now it’s our turn

  16. Ideal for complex problems

  17. ~25% Outcomes of a competition to predict the success of grant applications: Successful grant applications • Better identify likely successes to avoid wasting resources on hopeless applications • Identify and communicate the characteristics of a successful application to future applicants

  18. Motivation • Why host a competition? • Why compete? • How it works • Heritage Health Prize • Questions

  19. Why Participants Compete 2 1 More fun than Sudoku Clean, Real world data Professional Reputation & Experience 4 3 Interactions with experts in related fields Prizes

  20. User base

  21. User base

  22. Motivation • Why host a competition? • Why compete? • How it works • Heritage Health Prize • Questions

  23. 2 3 1 Upload Submit Evaluate & Exchange

  24. Use the wizard to post a competition

  25. Participants make their entries

  26. Competitions are judged based on predictive accuracy

  27. Competition Mechanics Competitions are judged on objective criteria

  28. Motivation • Why host a competition? • Why compete? • How it works • Heritage Health Prize • Questions

  29. An upcoming competition, powered by Kaggle • De-identified dataset containing medical records of 100,000 Americans • $3 million prize http://www.heritagehealthprize.com

  30. & Unfilled Prescriptions & Hypertension & High Cholesterol Diabetes Probability of going to hospital in the next year

  31. NetFlix Prize 2006 – 2009 $1 million prize 50,000 registrations 2011 $3 million prize Projected 100,000 registrations

  32. Motivation • Why host a competition? • Why compete? • How it works • Heritage Health Prize • Questions

  33. Chess Ratings – Elo vs. the Rest of the World IJCNN Social Network Challenge Tourism Forecasting (Part 2) Predict Grant Applications

  34. Jeff Moser Jeremy Howard Nicholas Gruen Anthony Goldbloom

  35. What could the world’s bestanalysts find in your data? e-mail anthony.goldbloom@kaggle.com phone +61438400053 Photo by gidzy, www.flickr.com/photos/gidzy

More Related