1 / 64

Machine Learning: Making Computer Science Scientific

Machine Learning: Making Computer Science Scientific. Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd. Acknowledgements. VLSI Wafer Testing Tony Fountain Robot Navigation Didac Busquets Carles Sierra

carrington
Download Presentation

Machine Learning: Making Computer Science Scientific

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning: Making Computer Science Scientific Thomas G. Dietterich Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.cs.orst.edu/~tgd

  2. Acknowledgements • VLSI Wafer Testing • Tony Fountain • Robot Navigation • Didac Busquets • Carles Sierra • Ramon Lopez de Mantaras • NSF grants IIS-0083292 and ITR-085836

  3. Outline • Three scenarios where standard software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science

  4. Scenario 1: Reading Checks Find and read “courtesy amount” on checks:

  5. Possible Methods: • Method 1: Interview humans to find out what steps they follow in reading checks • Method 2: Collect examples of checks and the correct amounts. Train a machine learning system to recognize the amounts

  6. Scenario 2: VLSI Wafer Testing • Wafer test: Functional test of each die (chip) while on the wafer

  7. Which Chips (and how many) should be tested? • Tradeoff: • Test all chips on wafer? • Avoid cost of packaging bad chips • Incur cost of testing all chips • Test none of the chips on the wafer? • May package some bad chips • No cost of testing on wafer

  8. Possible Methods • Method 1: Guess the right tradeoff point • Method 2: Learn a probabilistic model that captures the probability that each chip will be bad • Plug this model into a Bayesian decision making procedure to optimize expected profit

  9. Scenario 3: Allocating mobile robot camera Binocular No GPS

  10. Camera tradeoff • Mobile robot uses camera both for obstacle avoidance and landmark-based navigation • Tradeoff: • If camera is used only for navigation, robot collides with objects • If camera is used only for obstacle avoidance, robot gets lost

  11. Possible Methods • Method 1: Manually write a program to allocate the camera • Method 2: Experimentally learn a policy for switching between obstacle avoidance and landmark tracking

  12. Challenges for SE Methodology • Standard SE methods fail when… • System requirements are hard to collect • The system must resolve difficult tradeoffs

  13. (1) System requirements are hard to collect • There are no human experts • Cellular telephone fraud • Human experts are inarticulate • Handwriting recognition • The requirements are changing rapidly • Computer intrusion detection • Each user has different requirements • E-mail filtering

  14. (2) The system must resolve difficult tradeoffs • VLSI Wafer testing • Tradeoff point depends on probability of bad chips, relative costs of testing versus packaging • Camera Allocation for Mobile Robot • Tradeoff depends on probability of obstacles, number and quality of landmarks

  15. Machine Learning: Replacing guesswork with data • In all of these cases, the standard SE methodology requires engineers to make guesses • Guessing how to do character recognition • Guessing the tradeoff point for wafer test • Guessing the tradeoff for camera allocation • Machine Learning provides a way of making these decisions based on data

  16. Outline • Three scenarios where software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science

  17. Basic Machine Learning Methods • Supervised Learning • Density Estimation • Reinforcement Learning

  18. 1 0 6 3 8 Supervised Learning Training Examples New Examples Learning Algorithm Classifier 8

  19. AT&T/NCR Check Reading System Recognition transformer is a neural network trained on 500,000 examples of characters The entire system is trained given entire checks as input and dollar amounts as output LeCun, Bottou, Bengio & Haffner (1998) Gradient-Based Learning Applied to Document Recognition

  20. Check Reader Performance • 82% of machine-printed checks correctly recognized • 1% of checks incorrectly recognized • 17% “rejected” – check is presented to a person for manual reading • Fielded by NCR in June 1996; reads millions of checks per month

  21. Supervised Learning Summary • Desired classifier is a function y = f(x) • Training examples are desired input-output pairs (xi,yi)

  22. Density Estimation Training Examples Partially-tested wafer Learning Algorithm Density Estimator P(chipi is bad) = 0.42

  23. W . . . C1 C2 C3 C209 On-Wafer Testing System • Trained density estimator on 600 wafers from mature product (HP; Corvallis, OR) • Probability model is “naïve Bayes” mixture model with four components (trained with EM)

  24. One-Step Value of Information • Choose the larger of • Expected profit if we predict remaining chips, package, and re-test • Expected profit if we test chip Ci, then predict remaining chips, package, and re-test [for all Ci not yet tested]

  25. On-Wafer Chip Test Results 3.8% increase in profit

  26. Density Estimation Summary • Desired output is a joint probability distribution P(C1, C2, …, C203) • Training examples are points X= (C1, C2, …, C203) sampled from this distribution

  27. agent Reinforcement Learning state s Environment reward r action a Agent’s goal: Choose actions to maximize total reward Action Selection Rule is called a “policy”: a = p(s)

  28. Reinforcement Learning for Robot Navigation • Learning from rewards and punishments in the environment • Give reward for reaching goal • Give punishment for getting lost • Give punishment for collisions

  29. Experimental Results:% trials robot reaches goal Busquets, Lopez de Mantaras, Sierra, Dietterich (2002)

  30. Reinforcement Learning Summary • Desired output is an action selection policy p • Training examples are <s,a,r,s’> tuples collected by the agent interacting with the environment

  31. Outline • Three scenarios where software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science

  32. Fundamental Issues in Machine Learning • Incorporating Prior Knowledge • Incorporating Learned Structures into Larger Systems • Making Reinforcement Learning Practical • Triple Tradeoff: accuracy, sample size, hypothesis complexity

  33. Incorporating Prior Knowledge • How can we incorporate our prior knowledge into the learning algorithm? • Difficult for decision trees, neural networks, support-vector machines, etc. • Mismatch between form of our knowledge and the way the algorithms work • Easier for Bayesian networks • Express knowledge as constraints on the network

  34. Incorporating Learned Structures into Larger Systems • Success story: Digit recognizer incorporated into check reader • Challenges: • Larger system may make several coordinated decisions, but learning system treated each decision as independent • Larger system may have complex cost function: Errors in thousands place versus the cents place: $7,236.07

  35. Making Reinforcement Learning Practical • Current reinforcement learning methods do not scale well to large problems • Need robust reinforcement learning methodologies

  36. The Triple Tradeoff • Fundamental relationship between • amount of training data • size and complexity of hypothesis space • accuracy of the learned hypothesis • Explains many phenomena observed in machine learning systems

  37. Learning Algorithms • Set of data points • Class H of hypotheses • Optimization problem: Find the hypothesis h in H that best fits the data Training Data h Hypothesis Space

  38. Triple Tradeoff Amount of Data – Hypothesis Complexity – Accuracy N = 1000 Accuracy N = 100 N = 10 Hypothesis Space Complexity

  39. Triple Tradeoff (2) H3 Hypothesis Complexity H2 Accuracy H1 Number of training examples N

  40. Intuition • With only a small amount of data, we can only discriminate between a small number of different hypotheses • As we get more data, we have more evidence, so we can consider more alternative hypotheses • Complex hypotheses give better fit to the data

  41. Fixed versus Variable-Sized Hypothesis Spaces • Fixed size • Ordinary linear regression • Bayes net with fixed structure • Neural networks • Variable size • Decision trees • Bayes nets with variable structure • Support vector machines

  42. Corollary 1:Fixed H will underfit H2 underfit Accuracy H1 Number of training examples N

  43. Corollary 2:Variable-sized H will overfit overfit Accuracy N = 100 Hypothesis Space Complexity

  44. Ideal Learning Algorithm: Adapt complexity to data N = 1000 Accuracy N = 100 N = 10 Hypothesis Space Complexity

  45. Adapting Hypothesis Complexity to Data Complexity • Find hypothesis h to minimize error(h) + l complexity(h) • Many methods for adjusting l • Cross-validation • MDL

  46. Outline • Three scenarios where software engineering methods fail • Machine learning methods applied to these scenarios • Fundamental questions in machine learning • Statistical thinking in computer science

  47. The Data Explosion • NASA Data • 284 Terabytes (as of August, 1999) • Earth Observing System: 194 G/day • Landsat 7: 150 G/day • Hubble Space Telescope: 0.6 G/day http://spsosun.gsfc.nasa.gov/eosinfo/EOSDIS_Site/index.html

  48. The Data Explosion (2) • Google indexes 2,073,418,204 web pages • US Year 2000 Census: 62 Terabytes of scanned images • Walmart Data Warehouse: 7 (500?) Terabytes • Missouri Botanical Garden TROPICOS plant image database: 700 Gbytes

  49. Old Computer Science Conception of Data Store Retrieve

  50. New Computer Science Conception of Data Problems Store Build Models Solve Problems Solutions

More Related