1 / 15

Introduction to Machine Learning course 67577 fall 2007

Introduction to Machine Learning course 67577 fall 2007. Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering Hebrew University. What is Machine Learning?.

ray-petty
Download Presentation

Introduction to Machine Learning course 67577 fall 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Machine Learningcourse 67577 fall 2007 • Lecturer: Amnon Shashua • Teaching Assistant: Yevgeny Seldin • School of Computer Science and Engineering • Hebrew University

  2. What is Machine Learning? • Inference engine (computer program) that when given sufficient data (examples) computes a function that matches as close as possible the process generating the data. • Make accurate prediction based on observed data • Algorithms to optimize a performance criterion based on observed data • Learning to do better in the future based on what was experienced in the past • Programming by examples: instead of writing a program to solve a task directly, machine learning seeks methods by which the computer will come up with its own program based on training examples.

  3. Why Machine Learning? • Data-driven algorithms are able examine large amounts of data. A human expert on the other hand is likely to be guided by subjective impressions or by examining a relatively small number of examples. • Humans often have trouble expressing what they know but have no difficulty in labeling data • Machine learning is effective in domains where declarative (rule based) knowledge is difficult to obtain yet generating training data is easy

  4. Typical Examples • Visual recognition (say, detect faces in an image): the amount of variability in appearance introduce challenges that are beyond the capacity of direct programming • Spam filtering: data-driven programming can adapt to changing tactics by spammers • Extract topics from documents: categorize news articles whether they are about politics, sports, science, etc. • Natural language understanding: from spoken words to text; categorize the meaning of spoken sentences • Optical character recognition (OCR) • Medical diagnosis: from symptoms to diagnosis • Credit card transaction fraud detection • Wealth prediction

  5. Fundamental Issues • Over-fitting: doing well on a training set does not guarantee accuracy on new examples • What is the resource we wish to optimize? For a given accuracy, use the smallest size training set • Examples are drawn from some (fixed) distribution D over X x Y (instance space x output space). Does the learner actually need to recover D during the learning process? • How does the learning process depend on the complexity of the family of learning functions (concept class C)? How does one define complexity of C? • When the goal is to learn the joint distribution D then the problem is computationally unwieldy because the joint distribution table is exponentially large. What assumptions can be made to simplify the task?

  6. Supervised vs. Un-supervised Supervised Learning Models: where X is the instance (data) space and Y is the output space Multiclass classification. K=2 is normally of most interest. Regression. Predict the price of a used car given brand, year, mileage.. Kinematics of a robot arm; navigate by determining steering angle from image input.. Un-supervised Learning Models: Find regularities in the input data assuming there is some structure in the input space • Density estimation • Clustering (non-parametric density estimation): divide customers to groups which have similar attributes.. • Latent class models: extract topics from documents • Compression: represent the input space with fewer parameters; projection to lower-dimensional spaces

  7. Notations X is the instance space: space from which observations are drawn. Examples, input instance, a single observation. Examples, Y is the output space: set of possible outcomes that can be associated with a measurement. Examples, An example is an instance-label pair (x,y). If |Y|=2 one typically uses {0,1} or {-1,1}. We say that an example (x,y) is positive if y=1 and otherwise we call it a negative example A training set Z consists of m instance-label pairs: In some cases we refer to the training set without labels:

  8. Notations A concept (hypothesis) class C is a set (not necessarily finite) of functions of the form: Each is called a concept or hypothesis or classifier. Example, if then C might be: Other examples: Decision trees: when then any boolean function can be described by a binary tree. Thus, C consists of decision trees ( ) Conjunction learning: a conjunction is a special case of a Boolean formula. A literal Is a variable or its negation and a term is a conjunction of literals, i.e. A target function is a term which consists of a subset of literals. In this case and Separating hyperplanes: a concept h(x) is specified by a vector and a scalar b such that:

  9. The Formal Learning ModelProbably Approximate Correct (PAC) • Distribution invariant: Learner does not need to estimate the joint distribution D over X x Y. Assumptions are that examples arrive i.i.d. and that D exists and is fixed. • The training sample complexity (size of the training set Z) depends only the desired accuracy and confidence parameters - does not depend on D. • Not all concept classes D are PAC-learnable. But some interesting classes are.

  10. PAC Model Definitions sampled randomly and independently (i.i.d) according to some (unknown) Distribution D, i.e., S is distributed according to the product distribution Realizable case: when a target concept is known to lie inside C. In this case, the training set is Unrealizable case: when and the training set is and D is over XxY Given a concept function is the probability that an instance x sampled according to D will be labeled incorrectly by h(x)

  11. PAC Model Definitions Note: in realizable case because given to the learner specifies desired accuracy, i.e. given to the learner specifies desired confidence, i.e. The learner is allowed to deviate occasionally from the desired accuracy but only rarely so..

  12. PAC Model Definitions We will say that an algorithm L learns C if for every and for every D over XxY, L generates a concept function such that the probability that is at least

  13. Formal Definition of PAC Learning A learning algorithm L is a function: from the set of all training examples to C with the following property: given any there is an integer such that if then, for any probability distribution D on XxY, if Z is a training set of length m drawn randomly according to , then with probability of at least then hypothesis is such that We say that C is learnable (or PAC-learnable) if there is a learning algorithm for C

  14. Formal Definition of PAC Learning Notes: does not depend on D, i.e., PAC model is distribution invariant The class C determines the sample complexity. For “simple” classes would be small compared to more “complex” classes.

  15. Course Syllabus 3 x PAC: 2 x Separating Hyperplanes: Support Vector Machine, Kernels, Linear Discriminant Analysis 3 x Unsupervised Learning: Dimensionality Reduction (PCA), Density Estimation, Non-parametric Clustering (spectral methods) 5 x Statistical Inference: Maximum Likelihood, Conditional Independence, Latent Class Models, Expectation-Maximization Algorithm, Graphical Models

More Related