1 / 66

Introduction to Machine Learning

Introduction to Machine Learning. Laurent Orseau AgroParisTech laurent.orseau@agroparistech.fr EFREI 2010-2011 Based on slides by Antoine Cornuejols. Overview. Introduction to Induction (Laurent Orseau) Neural Networks Support Vector Machines Decision Trees

lotus
Download Presentation

Introduction to Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Machine Learning Laurent Orseau AgroParisTech laurent.orseau@agroparistech.fr EFREI 2010-2011 Based on slides by Antoine Cornuejols

  2. Overview • Introduction to Induction (Laurent Orseau) • Neural Networks • Support Vector Machines • Decision Trees • Introduction to Data-Mining (Christine Martin) • Association Rules • Clustering • Genetic Algorithms

  3. Overview: Introduction • Introduction to Induction • Examples of applications • Learning types • Supervised Learning • Reinforcement Learning • No-supervised Learning • Machine Learning Theory • What questions to ask?

  4. Introduction

  5. Introduction What is Machine Learning ? • Memory • Knowledge acquisition • Neurosciences • Short-term (working) • Keep 7±2 objects at a time • Long-term • Procedural • Action sequences • Declarative • Semantic (concepts) • Episodic (facts) • Learning Types • By heart • From rules • By imitation / demonstration • By trial & error • Knowledge reuse • In similar situations

  6. Introduction What is Machine Learning? • "The field of study that gives computers the ability to learn without being explicitly programmed " Arthur Samuel, 1959 Samuel's Checkers > Schaeffer 2007 (solved) + TD-Gammon, Tesauro 1992

  7. Introduction What is Machine Learning? Given: • Experience E, • A class of tasksT • A performancemeasure P, A computer is said to learn if its performance on a task of T measured by P increases with experience E Tom Mitchell, 1997

  8. Introduction Terms related to Machine Learning • Robotic • Automatic Google Cars, Nao • Prediction / forecasting • Stock exchange, pollution peaks, … • Recognition • Face, language, writing, moves, … • Optimization • Subway speed, traveling salesman, … • Regulation • Heat, traffic, fridge temperature, … • Autonomy • Robots, hand prosthesis • Automatic problem solving • Adaptation • User preferences, robot in changing environment • Induction • Generalization • Automatic discovery • …

  9. Some applications

  10. Applications Learning to cook • Learning by imitation / demonstration • Procedural Learning (motor precision) • Object recognition

  11. Applications DARPA Grand challenge (2005)

  12. Applications > DARPA Grand Challenge 200km of desert Natural and artificial dangers No driver No remote control

  13. Applications > DARPA Grand Challenge 5 Finalists

  14. Applications > DARPA Grand Challenge Recognition of the road

  15. “Face Recognition: Component-based versus Global Approaches” (B. Heisele, P. Ho, J. Wu and T. Poggio), Computer Vision and Image Understanding, Vol. 91, No. 1/2, 6-21, 2003. Applications Learning to label images:Face recognition

  16. Applications > Reconnaissance d'images Feature combinations

  17. Applications Hand prosthesis • Recognition of pronator and supinator signals • Imperfect sensors • Noise • Uncertainty

  18. Applications Autonomous robot rover on Mars

  19. Supervised Learning Learning by heart? UNEXPLOITABLE • Generalize How to encode forms?

  20. Introduction to Machine Learning Theory

  21. Introduction to Machine Learning theory • Supervised Learning • Reinforcement Learning • Unsupervised Learning (CM) • Genetic Algorithms (CM)

  22. Supervised Learning • Set of examples xilabeled ui • Find a hypothesis h so that: h(xi) = ui ? h(xi): predicted label • Best hypothesis h* ?

  23. Supervised Learning Supervised Learning: 1st Example • Houses: Price / m² • Searching for h • Nearest neighbors? • Linear, polynomial regression? • More information • Localization (x, y ? or symbolic variable?), age of building, neighborhood, swimming-pool, local taxes, temporal evolution,…?

  24. Supervised Learning Problem Prediction du prix du m² pour une maison donnee. • Modeling • Data gathering • Learning • Validation • Use in real case Ideal Practice

  25. Supervised Learning 1) Modeling • Input space • What is the meaningful information? • Variables • Output space • What is to be predicted? • Hypothesis space • Input –(computation) Output • What (kind of) computation?

  26. Supervised Learning > 1) Modeling 1-a) Input space: Variables • What is the meaningful information? • Should we get as much as possible? • Information quality? • Noise • Quantity • Cost of information gathering? • Economic • Time • Risk (invasive?) • Ethic • Law (CNIL) • Definition domain of each variable? • Symbolic, bounded numeric, not bounded, etc.

  27. Supervised Learning > 1) Modeling > a) Variables Price of m²: Variables • Localization • Continuous: (x, y) longitude latitude ? • Symbolic: city name? • Age of building • Year of creation? • Relative to present or to creation date? • Nature of soil • Swimming-pool?

  28. Supervised Learning > 1) Modeling 1-b) Output space • What do we want on output? • Symbolic classes? (classification) • Boolean Yes/No (concept learning) • Multi-valued A/B/C/D/… • Numeric? (regression) • [0 ; 1] ? • [-∞ ; +∞] ? • How many outputs? • Multi-valued  Multi-class ? • 1 output for each class • Learn a model for each output? • More "free" • Learn 1 model for all outputs? • Each model can use others' information

  29. Supervised Learning > 1) Modeling 1-c) Hypothesis space • Critical! • Depends on the learning algorithm • Linear Regression: space = ax + b • Parameters: a and b • Polynomial regression • # parameters = polynomial degree • Neural Networks, SVM, Gen Algo, … • …

  30. Choice of hypothesis space Estimation Error Approximation Error Total Error

  31. Supervised Learning > 1) Modeling > c) Hypothesis space Choice of hypothesis space • Space too "poor"  Inadequate solutions • Ex: model sin(x) with y=ax+b • Space too "rich" •  risk of overfitting • Defined by set of parameters • High # params learning more difficult • But prefer a richer hypothesis space! • Use of generic methods • Add regularization

  32. Supervised Learning 2) Data gathering • Gathering • Electronic sensors • Simulation • Polls • Automated on the Internet • … • Get highest quantity of data • Collect cost • Data as "pure" as possible • Avoid all noise • Noise in variables • Noise in labels! • 1 example = 1 value for each variable • missing value = useless example?

  33. Supervised Learning > 2) Data gathering Gathered data measured Output / Class / Label Inputs / Variables But true label y unreachable !

  34. Supervised Learning > 2) Data gathering Data preprocessing • Clean up data • ex: Reduce background noise • Transform data • Final format adapted to task • Ex: Fourier Transform of radio signaltime/amplitude  frequency/amplitude

  35. Supervised Learning 3) Learning • Choice of program parameters • Choice of inductive test • Running the learning program • Performance test If bad, return to a)…

  36. Supervised Learning > 3) Learning a) Choice of program parameters • Max allocated computation time • Max accepted error • Learning parameters • Specific to model • Knowledge introduction • Initialize parameters to "ok" values? • …

  37. ò ( ) R ( h ) = l h ( x ), y dP ( x , y ) ´ X Y Supervised Learning > 3) Learning b) Choice of inductive test Goal: find hypothesis hH minimizing real risk(risk expectancy, generalization error) Joint probability law over XY Loss function predicted label true label y (or desired u)

  38. ò ( ) R ( h ) = l h ( x ), y dP ( x , y ) ´ X Y Supervised Learning > 3) Learning > b) Inductive test Real risk • Goal: Minimize real risk • Real risk is not known, in particular P(X,Y). • Discrimination • Regression

  39. Supervised Learning > 3) Learning > b) Inductive test Empirical Risk Minimization • ERM principle • Find hH minimizing empirical risk • Least error on training set

  40. "error" Learning curve Supervised Learning > 3) Learning > b) Inductive test > Empirical risk Learning curve • Data quantity is important! Training set size

  41. Supervised Learning > 3) Learning > b) Inductive test > Empirical risk Test / Validation • Measures overfitting /generalization • Acquired knowledge can be reused in new circumstances? • Do NOT validate over training set! • Validation over additional test set • Cross Validation • Useful when few data • leave-p-out

  42. Supervised Learning > 3) Learning > b) Inductive test > Empirical risk Overfitting Overfitting Real Risk Emprirical Risk Data quantity

  43. Supervised Learning > 3) Learning > b) Inductive test > Empirical risk Regularization • Limit overfitting before measuring it on test set • Add penalization in inductive test • Ex: • Penalize large number • Penalize resource use • …

  44. Supervised Learning > 3) Learning > b) Inductive test Maximum a posteriori • Bayesian approach • We suppose there exists a prior probability distribution over space H:pH(h) Maximum A Posteriori principle(MAP): • Search for most probable h after observing data S • Ex: Observation of sheep color • h = "A sheep is white"

  45. Supervised Learning > 3) Learning > b) Inductive test Minimum Description Length Principle • Occam Razor "Prefer simplest hypotheses" • Simplicity: size of h  Maximum compression • Maximum a posteriori with pH(h) = 2-d(h) • d(h): length in bits of h • Compression  generalization

  46. Supervised Learning > 3) Learning c) Running the learning program • Search for h • Use examples of training set • One by one • All together • Minimize inductive test

  47. Supervised Learning > 3) Learning > c) Running the program Finding the parameters of the model • Explore hypothesis space H • Best hypothesis given inductive test? • Fundamentally depends on H • Structured exploration • Local exploration • No exploration

  48. Supervised Learning > 3) Learning > c) Running the program > Exploring H Structured exploration • Structured by generality relation (partial order) • Version space • ILP (Inductive Logic Programming) • EBL (Explanation Based Learning) • Grammatical inference • Program enumeration

  49. Supervised Learning > 3) Learning > c) Running the program > Exploring H Representation of the version space Structured by: • Upper bound: G-set • Lower bound: S-set • G-set = Set of all most general hypotheses consistent with known examples • S-set = Set of all most specific hypotheses consistent with known examples

  50. Supervised Learning > 3-c) > Exploring H > Version space Learning… … by iterated updates of the version space Idea: update S-set and G-set after each new example Candidate elimination algorithm • Example: rectangles (cf. blackboard…)

More Related