1 / 0

Introduction to Machine Learning

Amit Sethi , EEE, IIT G @ Cepstrum , Oct 16, 2011. Introduction to Machine Learning. A high-level view of Machine Learning. Objectives: Understand what is machine learning Motivate why it has become so important Identify Types of learning and salient frameworks, algorithms and their utility

dulcea
Download Presentation

Introduction to Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AmitSethi, EEE, IIT G @ Cepstrum, Oct 16, 2011

    Introduction to Machine Learning

  2. A high-level view of Machine Learning Objectives: Understand what is machine learning Motivate why it has become so important Identify Types of learning and salient frameworks, algorithms and their utility Take a sneak peak at the next set of problems
  3. Organization What is learning? Why learn? Types of learning and salient frameworks Frontiers
  4. Learning is improving task performance based on experience Example: Learning to ride a bicycle T: Task of learning to ride a bicycle P: Performance of balancing while moving E: Experience of riding in many situations Is it wise to memorize all situations and appropriate responses by observing an expert?
  5. More examples Improve on task, T, with respect to performance metric, P, based on experience, E. T: Playing checkers P: Percentage of games won against an arbitrary opponent E: Playing practice games against itself T: Recognizing hand-written words P: Percentage of words correctly classified E: Database of human-labeled images of handwritten words T: Driving on four-lane highways using vision sensors P: Average distance traveled before a human-judged error E: A sequence of images and steering commands recorded while observing a human driver. T: Categorize email messages as spam or legitimate. P: Percentage of email messages correctly classified. E: Database of emails, some with human-given labels Source: Introduction to Machine Learning by Raymond J. Mooney
  6. Mathematically speaking… Determine f such that yn=f(xn) and g(y, x) is minimized for unseen values of y and x pairs. Form of f is fixed, but some parameters can be tuned: So, y=fθ(x), where, x is observed, and y needs to be inferred e.g. y=1, if mx > c, 0 otherwise, so θ = (m,c) Machine Learning is concerned with designing algorithms that learn “better” values of θ given “more” x (and y) for a given problem
  7. Some pertinent questions to ask What is the scope of the task? How will performance be measured? How should learning be approached? Scalability: How can we learn fast? How much resources are needed to learn? Generalization: How will it perform in unseen situations? Online learning: Can it learn and improve while performing the task?
  8. Related Disciplines Artificial Intelligence Data Mining Probability and Statistics Information theory Numerical optimization Adaptive Control Theory Neurobiology Psychology (cognitive, perceptual, dev.) Linguistics
  9. Organization What is learning? Why learn? Types of learning and salient frameworks Frontiers
  10. Solve problems efficiently and better Develop systems that are too difficult/expensive to construct manually because they require specific detailed skills or knowledge tuned to a specific task (knowledge engineering bottleneck). Develop systems that can automatically adapt and customize themselves to individual users. Personalized news or mail filter Personalized tutoring Discover new knowledge from large databases (data mining). Market basket analysis (e.g. diapers and beer) Medical text mining (e.g. migraines to calcium channel blockers to magnesium) Source: Introduction to Machine Learning by Raymond J. Mooney
  11. Understand how we learn and our limitations Computational studies of learning may help us understand learning in humans and other biological organisms. Hebbian neural learning “Neurons that fire together, wire together.” Power law of practice log(perf. time) log(# training trials) Source: Introduction to Machine Learning by Raymond J. Mooney
  12. The Time is Ripe Many basic effective and efficient algorithms available Large amounts of data available Large amounts of computational resources available Source: Introduction to Machine Learning by Raymond J. Mooney
  13. Some recent success stories
  14. Organization What is learning? Why learn? Types of learning and salient frameworks Frontiers
  15. Framing the problem Remember, y=fθ(x)? y can be continuous or categorical y may be known for some x or none at all f can be simple (e.g. linear) or complex f can incorporate some knowledge of how x was generated or be blind to the generation etc…
  16. Based on availability of desired output Supervised learning: For, y=fθ(x), a set of xi,yi (usually classes) are known Now predict yj for new xj Examples: Two classes of protein with given amino acid sequences Labeled male and female face images
  17. Neural Networks - MLP In a nutshell: Input is non-linearly transformed by hidden layers usually a “fuzzy” linearly classified combination Output is a linear combination of the hidden layer Use when: Want to model a non-linear function Labeled data is available Don’t want to write new s/w Variations: Competitive learning for classification Many more…
  18. Support Vector Machines In a nutshell: Learns optimal boundary between two classes (red line) Use when: Labeled class data is available Want to minimize chance of error in the test case Variations: Non-linear mapping of the input vectors using “Kernels”
  19. Based on availability of desired output Unsupervised learning: For, y=fθ(x), only a set of xi are known Predict y, such that y is simpler than x but retains its essence Examples: Clustering (when y is a class label) Dimensionality reduction (when y is continuous)
  20. Clustering In a nutshell: Grouping a similar objects based on a definition of similarity That is, intra vs. inter cluster similarity, e.g. distance from center of the cluster Use when: Class labels are not available, but you have a desired number of clusters in mind Variations: Different similarity measures Automatic detection of number of clusters Online clustering
  21. Principal Component Analysis In a nutshell: High dimensional data, where not all dimensions are independent, e.g. (x1, x2, x3), where x3=ax1+bx2+c Use when: You want to perform linear dimensionality reduction Variations: ICA Online PCA
  22. Manifold Learning In a nutshell: Learning a lower-dimensional manifold (e.g. surface) close to which the data lies Use when: You want to perform non-linear dimensionality reduction Variations: SOM
  23. Based on use of knowledge about the process Generative models: For, y=fθ(x), we have some idea of how x was generated given x and θ Examples: HMMs: Given phonemes and {age, gender}, we know how the speech can be generated Bayesian Networks: Given {gender, age, race} we have some idea of what a face will look like for different emotions
  24. Based on use of knowledge about the process Discriminative Models: Do not care about how the data was generated Finding the right features is of prime importance Followed by finding the right classifier Examples: SVM MLP Source: “Automatic Recognition of Facial Actions in Spontaneous Expressions” by Bartlett et al in Journal of Multimedia, Sep 2006
  25. Organization What is learning? Why learn? Types of learning and salient frameworks Frontiers
  26. History of Machine Learning (1/2) 1980s: Advanced decision tree and rule learning Explanation-based Learning (EBL) Learning and planning and problem solving Utility problem Analogy Cognitive architectures Resurgence of neural networks (connectionism, backpropagation) Valiant’s PAC Learning Theory Focus on experimental methodology 1990s Data mining Adaptive software agents and web applications Text learning Reinforcement learning (RL) Inductive Logic Programming (ILP) Ensembles: Bagging, Boosting, and Stacking Bayes Net learning Source: Introduction to Machine Learning by Raymond J. Mooney
  27. History of Machine Learning (2/2) 2000s Support vector machines Kernel methods Graphical models Statistical relational learning Transfer learning Sequence labeling Collective classification and structured outputs Computer Systems Applications Compilers Debugging Graphics Security (intrusion, virus, and worm detection) E mail management Personalized assistants that learn Learning in robotics and vision Source: Introduction to Machine Learning by Raymond J. Mooney
  28. Application Frontiers (1/2)
  29. Application Frontiers (2/2)
  30. Theoretical Frontiers Learning the structure of classifiers Automatic feature discovery and active learning Discovering the limits of learning Information theoretic bounds? Learning that never ends Explaining human learning Computer languages with ML primitives Adapted from: “The Discipline of Machine Learning” by Tom Mitchell, 2006
  31. Questions? Thank you!
  32. Appendix: Some definitions Inference: Using a system to get the output variable for a given input variable Learning: Changing parameters according to an algorithm to improve performance Training: Using machine learning algorithm to learn function parameters based on input and (optionally) output dataset known as “training set” Validation and Testing: Using inference (without training) to test the performance of the learned system on data Offline learning: When all training happens prior to testing, and no learning takes place during testing Online learning: When learning and testing happen for the same data
More Related