Bagging

Bagging LING 572 Fei Xia 1/25/07

Classifiers we have learned so far • Naïve Bayes • kNN • Rocchio • Decision tree • Decision list  Similarities and Differences?

How to improve performance? • Bagging: bootstrap aggregating • Boosting • System combination • …

Outline • An introduction to the bootstrap • Bagging: basic concepts (Breiman, 1996) • Case study: bagging a treebank parser (Henderson and Brill, ANLP 2000)

Introduction to bootstrap

Motivation • What’s the average price of house prices? • Get a sample {x1, x2, …, xn}, and calculate the average u. • Question: how reliable is u? What’s the standard error of u? what’s the confidence interval?

Solutions • One possibility: get several samples. • Problem: it is impossible (or too expensive) to get multiple samples. • One solution: bootstrap

Bootstrap

The general bootstrap algorithm Let the original sample be L={x1,x2,…,xn} • Repeat B time: • Generate a sample Lk of size n from L by sampling with replacement. • Compute for x*.  Now we end up with bootstrap values • Use these values for calculating all the quantities of interest (e.g., standard deviation, confidence intervals)

X1=(1.57, 0.22,19.67, 0, 0.22, 3.12) X=(3.12, 0, 1.57, 19.67, 0.22, 2.2) Mean=4.13 Mean=4.46 X2=(0, 2.2, 2.2, 2.2, 19.67, 1.57) Mean=4.64 X3=(0.22, 3.12,1.57, 3.12, 2.2, 0.22) Mean=1.74 An example

A quick view of bootstrapping • Introduced by Bradley Efron in 1979 • Named from the phrase “to pull oneself up by one’s bootstraps”, which is widely believed to come from “the Adventures of Baron Munchausen”. • Popularized in 1980s due to the introduction of computers in statistical practice. • It has a strong mathematical background. • It is well known as a method for estimating standard errors, bias, and constructing confidence intervals for parameters.

Bootstrap distribution • The bootstrap does not replace or add to the original data. • We use bootstrap distribution as a way to estimate the variation in a statistic based on the original data.

Sampling distribution vs. bootstrap distribution • The population: certain unknown quantities of interest (e.g., mean) • Multiple samples  sampling distribution • Bootstrapping: • One original sample  B bootstrap samples • B bootstrap samples  bootstrap distribution

Bootstrap distributions usually approximate the shape, spread, and bias of the actual sampling distribution. • Bootstrap distributions are centered at the value of the statistic from the original sample plus any bias. • The sampling distribution is centered at the value of the parameter in the population, plus any bias.

Cases where bootstrap does not apply • Small data sets: the original sample is not a good approximation of the population • Noisy data: outliers add variability in our estimates. • Dependence structures (e.g., time series, spatial problems): Bootstrap is based on the assumption of independence. • …

How many bootstrap sample are needed? Choice of B depends on • Computer availability • Type of the problem: standard errors, confidence intervals, … • Complexity of the problem

Resampling methods • Bootstrap • Permutation tests • Jackknife: we ignore one observation at each time • …

Bagging: basic concepts

Bagging • Introduced by Breiman (1996) • “Bagging” stands for “bootstrap aggregating”. • It is an ensemble method: a method of combining multiple predictors.

Predictors • Let L be a training set {(xi, yi) | xi in X, yi in Y}, drawn from the set Λ of possible training sets. • A predictor Φ: X  Y is a function that for any given x, it produces y=Φ(x). • A learning algorithm (a.k.a. learner) Ψ: Λ that given any L in Λ, it produces a predictor Φ=Ψ(L) in . • Types of predictors: • Classifiers: DT, DL, kNN, … • Estimators: Regression trees • Others: parsers

Bagging algorithm Let the original training data be L • Repeat B times: • Get a bootstrap sample Lk from L. • Train a predictor using Lk. • Combine B predictors by • Voting (for classification problem) • Averaging (for estimation problem) • …

Bagging ML f1 ML f2 f ML fB bootstrap + system combination

Bagging decision trees 1. Splitting the data set into training set T1 and test set T2. 2. Bagging using 50 bootstrap samples. 3. Repeat Steps 1-2 100 times, and calculate average test set misclassification rate.

How many bootstrap samples are needed? • Bagging decision trees for the waveform task: • Unbagged rate is 29.0%. • We are getting most of the improvement using • only 10 bootstrap samples.

Bagging regression trees Bagging with 25 bootstrap samples. Repeat 100 times.

Bagging k-nearest neighbor classifiers 100 bootstrap samples. 100 iterations. Bagging does not help.

Experiment results • Bagging works well for “unstable” learning algorithms. • Bagging can slightly degrade the performance of “stable” learning algorithms.

Learning algorithms • Unstable learning algorithms: small changes in the training set result in large changes in predictions. • Neural network • Decision tree • Regression tree • Subset selection in linear regression • Stable learning algorithms: • kNN

Case study

Experiment settings • Henderson and Brill ANLP-2000 paper • Parser: Collins’s Model 2 (1997) • Training data: sections 01-21 • Test data: Section 23 • Bagging: • Different ways of combining parsing results

Techniques for combining parsers(Henderson and Brill, EMNLP-1999) • Parse hybridization: combining the substructures of the input parses • Constituent voting • Naïve Bayes • Parser switching: selecting one of the input parses • Similarity switching • Naïve Bayes

Experiment results • Baseline (no bagging): 88.63 • Initial (one bag): 88.38 • Final (15 bags): 89.17

Training corpus size effects

Summary • Bootstrap is a resampling method. • Bagging is directly related to bootstrap. • It uses bootstrap samples to train multiple predictors. • Output of predictors are combined by voting or other methods. • Experiment results: • It is effective for unstable learning methods. • It does not help stable learning methods.

Uncovered issues • How to determine whether a learning method is stable or unstable? • Why bagging works for unstable algorithms?

Bagging

Bagging

Presentation Transcript

Bagging the Elephant

Bagging the Elephant

Ensemble Methods: Bagging and Boosting

Bagging

Prime Time Investigation 3.3 “Bagging Snacks”

Internal Envelope Vac Bagging

Bagging-based System Combination for Domain Adaptation

Bagging

Ensemble Methods: Bagging and Boosting

Boosting and Bagging For Fun and Profit

Automate Seeds Packaging by Using Latest Bagging Machine

Why landscaping companies prefer mulching over bagging

Sensual Tips: Tea-Bagging and Other Terms

Study Matched Betting Using Benefit Bagging

Bagging Machine

Automated Bagging and Packaging

bulk bagging machine maker and provider

GM Continuous Bagging Scale B103B-12T

Optima Weightech PVT.LTD. - Bagging Machine

How do Automatic Bagging Machines Work

High Speed Bagging Machines and Their Uses