170 likes | 329 Views
Fitting Models to Data. Linear and Quadratic Discriminant Analysis. Decision Trees. AID: Automatic Interaction Detector. Association Co- O ccurence. CHAID. CART: Classification and Regression Trees . CART family is oriented to statistics using the concept of impurity.
E N D
Fitting Models to Data Linear and Quadratic Discriminant Analysis Decision Trees
AID: Automatic Interaction Detector Association Co-Occurence
CART: Classification and Regression Trees CART family is oriented to statistics using the concept of impurity Measures how well are the two classes separated – Ideally we would like toseparateall 0s and 1 http://freakonometrics.hypotheses.org/1279
Bagging • Builds multiple decision trees by repeatedly resampling training data with replacement • Fit a Model to each Sample • Voting across the trees for a consensus prediction.
Boosting • Learns slowly • Given the current model, we fit a decision tree to the residuals (misclassifications) from the model. • We then add this new decision tree into the fitted function in order to update the residuals. • Each of these trees can be rather small, with just a few terminal nodes, determined by the parameter d in the algorithm. • By fitting small trees to the residuals, we slowly improve fit in areas where it does not perform well
Many Algorithms Decision Trees Random Forests rpart (CART) tree (CART) ctree (conditional inference tree) CHAID (chi-squared automatic interaction detection) evtree (evolutionary algorithm) mvpart (multivariate CART) knnTree (nearest-neighbor-based trees) RWeka (J4.8, M50, LMT) LogicReg (Logic Regression) BayesTree TWIX (with extra splits) party (conditional inference trees, model-based trees) randomForest(CART-based random forests) randomSurvivalForest(for censored responses) party(conditional random forests) gbm(tree-based gradient boosting) mboost(model-based and tree-based gradient boosting)