750 likes | 1.41k Views
Bagging. LING 572 Fei Xia 1/25/07. Classifiers we have learned so far. Naïve Bayes kNN Rocchio Decision tree Decision list Similarities and Differences?. How to improve performance?. Bagging: b ootstrap agg regat ing Boosting System combination …. Outline.
E N D
Bagging LING 572 Fei Xia 1/25/07
Classifiers we have learned so far • Naïve Bayes • kNN • Rocchio • Decision tree • Decision list Similarities and Differences?
How to improve performance? • Bagging: bootstrap aggregating • Boosting • System combination • …
Outline • An introduction to the bootstrap • Bagging: basic concepts (Breiman, 1996) • Case study: bagging a treebank parser (Henderson and Brill, ANLP 2000)
Motivation • What’s the average price of house prices? • Get a sample {x1, x2, …, xn}, and calculate the average u. • Question: how reliable is u? What’s the standard error of u? what’s the confidence interval?
Solutions • One possibility: get several samples. • Problem: it is impossible (or too expensive) to get multiple samples. • One solution: bootstrap
The general bootstrap algorithm Let the original sample be L={x1,x2,…,xn} • Repeat B time: • Generate a sample Lk of size n from L by sampling with replacement. • Compute for x*. Now we end up with bootstrap values • Use these values for calculating all the quantities of interest (e.g., standard deviation, confidence intervals)
X1=(1.57, 0.22,19.67, 0, 0.22, 3.12) X=(3.12, 0, 1.57, 19.67, 0.22, 2.2) Mean=4.13 Mean=4.46 X2=(0, 2.2, 2.2, 2.2, 19.67, 1.57) Mean=4.64 X3=(0.22, 3.12,1.57, 3.12, 2.2, 0.22) Mean=1.74 An example
A quick view of bootstrapping • Introduced by Bradley Efron in 1979 • Named from the phrase “to pull oneself up by one’s bootstraps”, which is widely believed to come from “the Adventures of Baron Munchausen”. • Popularized in 1980s due to the introduction of computers in statistical practice. • It has a strong mathematical background. • It is well known as a method for estimating standard errors, bias, and constructing confidence intervals for parameters.
Bootstrap distribution • The bootstrap does not replace or add to the original data. • We use bootstrap distribution as a way to estimate the variation in a statistic based on the original data.
Sampling distribution vs. bootstrap distribution • The population: certain unknown quantities of interest (e.g., mean) • Multiple samples sampling distribution • Bootstrapping: • One original sample B bootstrap samples • B bootstrap samples bootstrap distribution
Bootstrap distributions usually approximate the shape, spread, and bias of the actual sampling distribution. • Bootstrap distributions are centered at the value of the statistic from the original sample plus any bias. • The sampling distribution is centered at the value of the parameter in the population, plus any bias.
Cases where bootstrap does not apply • Small data sets: the original sample is not a good approximation of the population • Noisy data: outliers add variability in our estimates. • Dependence structures (e.g., time series, spatial problems): Bootstrap is based on the assumption of independence. • …
How many bootstrap sample are needed? Choice of B depends on • Computer availability • Type of the problem: standard errors, confidence intervals, … • Complexity of the problem
Resampling methods • Bootstrap • Permutation tests • Jackknife: we ignore one observation at each time • …
Bagging • Introduced by Breiman (1996) • “Bagging” stands for “bootstrap aggregating”. • It is an ensemble method: a method of combining multiple predictors.
Predictors • Let L be a training set {(xi, yi) | xi in X, yi in Y}, drawn from the set Λ of possible training sets. • A predictor Φ: X Y is a function that for any given x, it produces y=Φ(x). • A learning algorithm (a.k.a. learner) Ψ: Λ that given any L in Λ, it produces a predictor Φ=Ψ(L) in . • Types of predictors: • Classifiers: DT, DL, kNN, … • Estimators: Regression trees • Others: parsers
Bagging algorithm Let the original training data be L • Repeat B times: • Get a bootstrap sample Lk from L. • Train a predictor using Lk. • Combine B predictors by • Voting (for classification problem) • Averaging (for estimation problem) • …
Bagging ML f1 ML f2 f ML fB bootstrap + system combination
Bagging decision trees 1. Splitting the data set into training set T1 and test set T2. 2. Bagging using 50 bootstrap samples. 3. Repeat Steps 1-2 100 times, and calculate average test set misclassification rate.
How many bootstrap samples are needed? • Bagging decision trees for the waveform task: • Unbagged rate is 29.0%. • We are getting most of the improvement using • only 10 bootstrap samples.
Bagging regression trees Bagging with 25 bootstrap samples. Repeat 100 times.
Bagging k-nearest neighbor classifiers 100 bootstrap samples. 100 iterations. Bagging does not help.
Experiment results • Bagging works well for “unstable” learning algorithms. • Bagging can slightly degrade the performance of “stable” learning algorithms.
Learning algorithms • Unstable learning algorithms: small changes in the training set result in large changes in predictions. • Neural network • Decision tree • Regression tree • Subset selection in linear regression • Stable learning algorithms: • kNN
Experiment settings • Henderson and Brill ANLP-2000 paper • Parser: Collins’s Model 2 (1997) • Training data: sections 01-21 • Test data: Section 23 • Bagging: • Different ways of combining parsing results
Techniques for combining parsers(Henderson and Brill, EMNLP-1999) • Parse hybridization: combining the substructures of the input parses • Constituent voting • Naïve Bayes • Parser switching: selecting one of the input parses • Similarity switching • Naïve Bayes
Experiment results • Baseline (no bagging): 88.63 • Initial (one bag): 88.38 • Final (15 bags): 89.17
Summary • Bootstrap is a resampling method. • Bagging is directly related to bootstrap. • It uses bootstrap samples to train multiple predictors. • Output of predictors are combined by voting or other methods. • Experiment results: • It is effective for unstable learning methods. • It does not help stable learning methods.
Uncovered issues • How to determine whether a learning method is stable or unstable? • Why bagging works for unstable algorithms?