1 / 61

Machine Learning

Machine Learning. A Little Introduction Only. Introduction to Machine Learning. Decision Trees. Overfitting. Artificial Neuronal Nets. Why Machine Learning (1). Growing flood of online data Budding industry Computational power is available progress in algorithms and theory.

Download Presentation

Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning

  2. A Little Introduction Only Introduction to Machine Learning Decision Trees Overfitting Artificial Neuronal Nets

  3. Why Machine Learning (1) • Growing flood of online data • Budding industry • Computational power is available • progress in algorithms and theory

  4. Why Machine Learning (2) • Data mining: using historical data to improve decision • medical records ⇒ medical knowledge • log data to model user • Software applications we can’t program by hand • autonomous driving • speech recognition • Self customizing programs • Newsreader that learns user interests

  5. Some success stories • Data Mining, Lernen im Web • Analysis of astronomical data • Human Speech Recognition • Handwriting recognition • Fraudulent Use of Credit Cards • Drive Autonomous Vehicles • Predict Stock Rates • Intelligent Elevator Control • World champion Backgammon • Robot Soccer • DNA Classification

  6. Problems Too Difficult to Program by Hand ALVINN drives 70 mph on highways

  7. Credit Risk Analysis If Other-Delinquent-Accounts > 2, and Number-Delinquent-Billing-Cycles > 1 Then Profitable-Customer? = No [Deny Credit Card application] If Other-Delinquent-Accounts = 0, and (Income > $30k) OR (Years-of-Credit > 3) Then Profitable-Customer? = Yes [Accept Credit Card application] Machine Learning, T. Mitchell, McGraw Hill, 1997

  8. Typical Data Mining Task • 9714 patient records, each describing a pregnancy and birth • Each patient record contains 215 features Given: Learn to predict: • Classes of future patients at high risk for Emergency Cesarean Section Machine Learning, T. Mitchell, McGraw Hill, 1997

  9. Datamining Result IF No previous vaginal delivery, and Abnormal 2nd Trimester Ultrasound, and Malpresentation at admission THEN Probability of Emergency C-Section is 0.6 Over training data: 26/41 = .63, Over test data: 12/20 = .60 Machine Learning, T. Mitchell, McGraw Hill, 1997

  10. How does an Agent learn? Priorknowledge B Knowledge-basedinductive learning Hypotheses Observations Predictions H E

  11. Machine Learning Techniques • Decision tree learning • Artificial neural networks • Naive Bayes • Bayesian Net structures • Instance-based learning • Reinforcement learning • Genetic algorithms • Support vector machines • Explanation Based Learning • Inductive logic programming

  12. What is the Learning Problem? • Improve over Task T • with respect to performance measure P • based on experience E Learning = Improving with experience at some task

  13. The Game of Checkers

  14. Learning to Play Checkers • T: Play checkers • P: Percent of games won in world tournament.. • E: games played against self.. • What exactly should be learned? • How shall it be represented? • What specific algorithm to learn it?

  15. A Representation for Learned Function V’(b) Target function: V: Board IR Target function representation: • x1: number of black pieces on board b • x2 :number of red pieces on board b • x3 :number of black kings on board b • x4 number of red kings on board b • x5 number of read pieces threatened by black (i.e., which can be taken on black’s next turn) • x6 number of black pieces threatened by red V’(b)= w0 + w1* x1 + w2* x2+ w3* x3+ w4*x4 + w5* x5+ w6*x6

  16. Function Approximation Algorithm* • V(b): the true target function • V’(b): the learned function • Vtrain(b): the training value • (b, Vtrain(b)) training example One rule for estimating training values: • Vtrain(b) ← V’(Successor(b)) for intermediate b

  17. Contd: Choose Weight Tuning Rule* • Select a training example b at random LMS Weight update rule: Do repeatedly: • Compute error(b) with current weightserror(b) = Vtrain(b) – V’(b) • For each board feature xi, update weight wi:wi← wi + c * xi * error(b) c is small constant to moderate the rate of learning

  18. ...A.L. Samuel

  19. Design Choices for Checker Learning

  20. Overview Introduction to Machine Learning • Inductive Learning • Decision Trees • Ensemble Learning • Overfitting • Artificial Neuronal Nets

  21. Supervised Inductive Learning (1) Why is learning difficult? • inductive learning generalizes from specific examples cannot be proven true; it can only be proven false • not easy to tell whether hypothesis h is a good approximation of a target function f • complexity of hypothesis – fitting data

  22. Supervised Inductive Learning (2) To generalize beyond the specific examples, one needs constraints or biases on what h is best. For that purpose, one has to specify • the overall class of candidate hypotheses •  restricted hypothesis space bias • a metric for comparing candidate hypotheses to determine whether one is better than another •  preference bias

  23. Supervised Inductive Learning (3) Having fixed the bias, learning can be considered as search in the hypothesis space which is guided by the used preference bias.

  24. Decision Tree Learning Quinlan86, Feigenbaum61 Goal predicate: PlayTennis Hypotheses space: Preference bias: temperature = hot & windy = true & humidity = normal & outlook = sunny  PlayTennis = ?

  25. Illustrating Example (RusselNorvig) The problem: wait for a table in a restaurant?

  26. Illustrating Example: Training Data

  27. A Decision Tree for WillWait (SR)

  28. Path in the Decision Tree TAFEL

  29. General Approach • let A1, A2, ..., and An be discrete attributes, i.e. each attribute has finitely many values • let B be another discrete attribute, the goal attribute Learning goal: learn a function f: A1 x A2 x ... x An  B Examples: elements from A1 x A2 x ... x An x B

  30. General Approach Restricted hypothesis space bias: the collection of all decision trees over the attributes A1, A2, ..., An, and B forms the set of possible candidate hypotheses Preference bias: prefer small trees consistent with the training examples

  31. Decision Trees: definition for record • A decision tree over the attributes A1, A2,.., An, and B is • a tree in which • each non-leaf node is labelled with one of the attributes A1, A2, ..., and An • each leaf node is labelled with one of the possible values for the goal attribute B • a non-leaf node with the label Ai has as many outgoing arcs as there are possible values for the attribute Ai; each arc is labelled with one of the possible values for Ai

  32. Decision Trees: application of tree for record Let x be an element from A1 x A2 x ... x An and let T be a decision tree. The element x is processed by the tree T starting at the root and following the appropriate arc until a leaf is reached. Moreover, x receives the value that is assigned to the leaf reached.

  33. A1 0 1 A2 A2 0 1 1 0 0 1 0 1 Expressiveness of Decision Trees Any boolean function can be written as a decision tree. A1 A2 B 0 0 0 0 1 1 1 0 0 1 1 1

  34. Decision Trees • fully expressive within the class of propositional languages • in some cases, decision trees are not appropriate • sometimes exponentially large decision trees • (e.g. parity function; returns 1 iff an even number • of inputs are 1) • replicated subtree problem • e.g. when coding the following two rules in a tree: • „if A1 and A2 then B“ • „if A3 and A4 then B“

  35. Decision Trees Finding a smallest decision tree that is consistent with a set of examples presented is an NP-hard problem. instead of constructing a smallest decision tree the focus is on the construction of a pretty small one  greedy algorithm smallest „=“ minimal in the overall number of nodes

  36. Inducing Decision Trees Algorithm for record function DECISION-TREE-LEARNING(examples, attribs, default) returns a decision tree inputs: examples, set of examples attribs, set of attributes default, default value for the goal predicate if examples is empty then returndefault else if all examples have the same classification then return the classification else if attribs is empty then return MAJORITY-VALUE(examples)else best  CHOOSE-ATTRIBUTE(attribs, examples) tree  a new decision tree with root test best m  MAJORITY-VALUE(examplesi) for each value vi of bestdo examplesi {elements of examples with best = vi} subtree  DECISION-TREE-LEARNING(examplesi, attribs – best, m) add a branch to tree with label vi and subtree subtree returntree

  37. Training Examples T. Mitchell, 1997

  38. Entropy n = 2 • S is a sample of training examples • p+ is the proportion of positive examples in S • p- is the proportion of negative examples in S • Entropy measures the impurity of S Entropy(S) ≡ -p+ log2p+ - p- log2 p-

  39. Example WillWait (do it yourself) the problem of whether to wait for a table in a restaurant

  40. WillWait (do it yourself) Which attribute to choose?

  41. Learned Tree WillWait

  42. Assessing Decision Trees Assessing the performance of a learning algorithm: a learning algorithm has done a good job, if its final hypothesis predicts the value of the goal attribute of unseen examples correctly • General strategy (cross-validation) • collect a large set of examples • divide it into two disjoint sets: the training set and the test set • apply the learning algorithm to the training set, generating a hypothesis h • measure the quality of h applied to the test set • repeat steps 1 to 4 for different sizes of training sets and different randomly selected training sets of each size

More Related