1 / 35

A Brief Introduction to Adaboost

A Brief Introduction to Adaboost. Hongbo Deng 6 Feb, 2007. Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman . Outline. Background Adaboost Algorithm Theory/Interpretations. What’s So Good About Adaboost. Can be used with many different classifiers

robbin
Download Presentation

A Brief Introduction to Adaboost

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Brief Introduction to Adaboost Hongbo Deng 6 Feb, 2007 Some of the slides are borrowed from Derek Hoiem & Jan ˇSochman.

  2. Outline • Background • Adaboost Algorithm • Theory/Interpretations

  3. What’s So Good About Adaboost • Can be used with many different classifiers • Improves classification accuracy • Commonly used in many areas • Simple to implement • Not prone to overfitting

  4. A Brief History Resampling for estimating statistic • Bootstrapping • Bagging • Boosting (Schapire 1989) • Adaboost (Schapire 1995) Resampling for classifier design

  5. Bootstrap Estimation • Repeatedly draw n samples from D • For each set of samples, estimate a statistic • The bootstrap estimate is the mean of the individual estimates • Used to estimate a statistic (parameter) and its variance

  6. Bagging - Aggregate Bootstrapping • For i = 1 .. M • Draw n*<n samples from D with replacement • Learn classifier Ci • Final classifier is a vote of C1 .. CM • Increases classifier stability/reduces variance D2 D1 D3 D

  7. Boosting (Schapire 1989) • Consider creating three component classifiers for a two-category problem through boosting. • Randomly select n1 < nsamples from D without replacement to obtain D1 • Train weak learner C1 • Select n2 < nsamples from D with half of the samples misclassified by C1 toobtain D2 • Train weak learner C2 • Select all remainingsamples from D that C1 and C2 disagree on • Train weak learner C3 • Final classifier is vote of weak learners D D3 D1 D2 - + + -

  8. Adaboost - Adaptive Boosting • Instead of resampling, uses training set re-weighting • Each training sample uses a weight to determine the probability of being selected for a training set. • AdaBoost is an algorithm for constructing a “strong” classifier as linear combination of “simple” “weak” classifier • Final classification based on weighted vote of weak classifiers

  9. Adaboost Terminology • ht(x) … “weak” or basis classifier (Classifier = Learner = Hypothesis) • … “strong” or final classifier • Weak Classifier: < 50% error over any distribution • Strong Classifier: thresholded linear combination of weak classifier outputs

  10. Discrete Adaboost Algorithm Each training sample has a weight, which determines the probability of being selected for training the component classifier

  11. Find the Weak Classifier

  12. Find the Weak Classifier

  13. The algorithm core

  14. Reweighting y * h(x) = 1 y * h(x) = -1

  15. Reweighting In this way, AdaBoost “focused on” the informative or “difficult” examples.

  16. Reweighting In this way, AdaBoost “focused on” the informative or “difficult” examples.

  17. Algorithm recapitulation t = 1

  18. Algorithm recapitulation

  19. Algorithm recapitulation

  20. Algorithm recapitulation

  21. Algorithm recapitulation

  22. Algorithm recapitulation

  23. Algorithm recapitulation

  24. Algorithm recapitulation

  25. Pros and cons of AdaBoost Advantages • Very simple to implement • Does feature selection resulting in relatively simple classifier • Fairly good generalization Disadvantages • Suboptimal solution • Sensitive to noisy data and outliers

  26. References • Duda, Hart, ect – Pattern Classification • Freund – “An adaptive version of the boost by majority algorithm” • Freund – “Experiments with a new boosting algorithm” • Freund, Schapire – “A decision-theoretic generalization of on-line learning and an application to boosting” • Friedman, Hastie, etc – “Additive Logistic Regression: A Statistical View of Boosting” • Jin, Liu, etc (CMU) – “A New Boosting Algorithm Using Input-Dependent Regularizer” • Li, Zhang, etc – “Floatboost Learning for Classification” • Opitz, Maclin – “Popular Ensemble Methods: An Empirical Study” • Ratsch, Warmuth – “Efficient Margin Maximization with Boosting” • Schapire, Freund, etc – “Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods” • Schapire, Singer – “Improved Boosting Algorithms Using Confidence-Weighted Predictions” • Schapire – “The Boosting Approach to Machine Learning: An overview” • Zhang, Li, etc – “Multi-view Face Detection with Floatboost”

  27. Appendix • Bound on training error • Adaboost Variants

  28. Bound on Training Error (Schapire)

  29. Discrete Adaboost (DiscreteAB)(Friedman’s wording)

  30. Discrete Adaboost (DiscreteAB)(Freund and Schapire’s wording)

  31. Adaboost with Confidence Weighted Predictions (RealAB)

  32. Adaboost Variants Proposed By Friedman • LogitBoost • Solves • Requires care to avoid numerical problems • GentleBoost • Update is fm(x) = P(y=1 | x) – P(y=0 | x) instead of • Bounded [0 1]

  33. Adaboost Variants Proposed By Friedman • LogitBoost

  34. Adaboost Variants Proposed By Friedman • GentleBoost

  35. Thanks!!!Any comments or questions?

More Related