Competitions in machine learning: the fun, the art, and the science Isabelle Guyon

Competitions in machine learning: the fun, the art, and the science Isabelle Guyon Clopinet, Berkeley, California http://clopinet.com/challengesisabelle@clopinet.com

My Itinerary

My Kids

My Company

Recent Projects Drug toxicity via flow cytometry Melanoma App

How to keep up with the s.o.a?

Organize Challenges!

This Year: Unsupervised and Transfer Learning Challenges http://clopinet.com/ulul@clopinet.com http://clopinet.com/gesturegesture@clopinet.com

Applications Gesture and action recognition Image or video indexing/retrieval Recognition of sign languages Handwriting recognition Text processing Ecology

Just starting … First ULT challenge Free registrations, cash prizes, 2 workshops (IJCNN 2011, ICML2011), proceedings in JMLR W&CP, much fun!http://clopinet.com/ul

Gesture Recognition Challenge Large database of American Sign Language developed at Boston University (http://www.bu.edu/asllrp/), including 3,800 signs with 3,000 unique class labels produced by native signers and 15 short narratives for a total of over 11,000 sign tokens altogether.

Live Competition STEP 1: Develop a system that can learn new signs with a few examples. First screening of the competitors based on their performance of a validation dataset. STEP 2: On the site of the life competition, train the system with given “flash cards” to recognize new signs. Second screening of the competitors based on the learning curve of their system. STEP 3: Perform short sequences of signs (charades) in front of an audience. Win if your system gets the best recognition rate.

When and Where? DECEMBER 2011 NIPS 2001: Launching of UTL challenge. JUNE 2011 Workshop at CVPR 2011 (accepted) Launching Gesture Recognition challenge. JULY 2011 Workshop at ICML 2011 (planned). AUGUST 2011 Workshop at IJCNN 2011 (accepted). Results of UTL challenge. NOVEMBER 2011 Live Gesture Recognition Competition: ICCV 2011 (planned).

Lessons learned!

Thousands to millions of low level features: select the most relevant one to build better, faster, and easierto understand learning machines. n’ NIPS 2003 Feature Selection Challenge n X m

Applications of Feature Selection examples Customer knowledge Quality control Market Analysis 106 OCR HWR Machine vision 105 Text Categorization 104 System diagnosis 103 Bioinformatics 102 10 variables/features 10 102 103 104 105

Feature subset All features Filter Predictor Multiple Feature subsets All features Predictor Feature subset Embedded method All features Wrapper Predictor Filters, Wrappers, and Embedded Methods

1) For each feature subset, train predictor on training data. 2) Select the feature subset, which performs best on validation data. Repeat and average if you want to reduce variance (cross-validation). 3) Test on test data. m1 m2 m3 Bilevel Optimization N variables/features Split data into 3 sets: training, validation, and test set. M samples

Complexity of Feature Selection With high probability: Generalization_errorValidation_error + e(C/m2) Error m2: number of validation examples, N: total number of features, n: feature subset size. n Try to keep C of the order of m2.

Anxiety Peer Pressure Born an Even Day Yellow Fingers Smoking Genetics Allergy Lung Cancer Attention Disorder Coughing Fatigue Car Accident WCCI 2008: Causation and Prediction Challenge

Insensitivity to Irrelevant Features Simple univariate predictive model, binary target and features, all relevant features correlate perfectly with the target, all irrelevant features randomly drawn. With 98% confidence, abs(feat_weight) < w and Siwixi< v. ngnumber of “good” (relevant) features nbnumber of “bad” (irrelevant) features m number of training examples.

Active Learning Challenge AISTATS & WCCI 2010

Credits Web platform: Server made available by Prof. Joachim Buhmann, ETH Zurich, Switzerland. Computer admin.: Thomas Fuchs, ETH Zurich. Webmaster: Olivier Guyon, MisterP.net, France. Protocol review and advising: • David W. Aha, Naval Research Laboratory, USA. • Abe Schneider, Knexus Research, USA. • Graham Taylor, NYU, New-York. USA. • Andrew Ng, Stanford Univ., Palo Alto, California, USA • Vassilis Athitsos, University of Texas at Arlington, Texas, Usa. • Ivan Laptev, INRIA, France. • Jitendra Malik, UC Berkeley, California, USA • Christian Vogler, ILSP Athens, Greece. • Sudeep Sarkar, University of South Florida, USA. • Philippe Dreuw, RWTH Aachen University, Germany. • Richard Bowden, Univ. Surrey, UK. • Greg Mori, Simon Fraser University, Canada. Data collection and preparation: • Vassilis Athitsos, University of Texas at Arlington, Texas, USA • Isabelle Guyon, Clopinet, California, USA. • Graham Taylor, NYU, New-York. USA. • Ivan Laptev, INRIA, France. • Jitendra Malik, UC Berkeley, California, USA. Baseline methods and beta testing: The following researchers experienced in the domain will be providing baseline results: • Vassilis Athitsos, University of Texas at Arlington, Texas, USA. • Graham Taylor, NYU, New-York. USA. • Andrew Ng, Stanford Univ., Palo Alto, California, USA. • Yann LeCun, NYU. New-York, USA. • Ivan Laptev, INRIA, France. The following researchers were top ranking participants in past challenges but are not experienced in the domain will also give it a try: • Alexander Borisov (Intel, USA) • Hugo-Jair Escalante (INAOE, México) • Amir Saffari (Graz Univ., Austria) • Alexander Statnikov (NYU, USA)

Resources • Feature Extraction, • Foundations and Applications • I. Guyon, S. Gunn, et al. • Springer, 2006. • http://clopinet.com/fextract-book • 2) Challenges in Machine Learning • Collection published by Microtome. • Papers on the challenges reprinted from JMLR and JMLR W&CP

Join the Hall of Frame!

Competitions in machine learning: the fun, the art, and the science Isabelle Guyon