140 likes | 146 Views
Learn how to create classification trees and regression trees to predict outcomes based on explanatory variables. Use various metrics to assess model quality and explore the concept of interaction in tree models.
E N D
Stat 324 – Day 35 Classification and Regression Trees (15.5)
Recap – Classification Trees • With a categorical response, we can perform successive binary splits of the data set according to various explanatory variables • Goal is to create groups where probability of success is close to zero or one to minimize prediction errors • Use R2, AICc, RMSE to judge quality of model • Continue splitting until these level off? • Following the branches enables you to predict outcome of new observations
Practice problem • Yes, there is evidence of an interaction because people with extra credit cards were split at 49.73 purchases, and people without credit cards split at 35.93 purchases. This suggests that people who make more purchases are more likely to upgrade their credit card if they already have extra cards. Meanwhile, people who make do not have extra cards are more likely to upgrade their card if they've made less purchases than the people with extra cards.
Practice problem No interaction With interaction
CART √
Trees vs. Models Models Trees Still making predictions More flexible Robust with outliers Splitting history to assess variable importance Interactions, quadratic Quantitative or Categorical response • Prediction equation • Rate of change • Sensitive to outliers • Stepwise regression to assess variable importance • Interactions, quadratic • Quantitative or Categorical response • Validate theory
Project 2 Comments • Scoring; Underlining • Watch terminology: positive association, model vs. data, beta’s are significant, standardizing • Software package, intercept and slope interpretations • Seemed to work best when • Narrowed down to a few good variables, then looked at saturated model, then narrowed in on final model • Much improved integrating output into discussion • Project 3 • Aim for more efficient discussion • Video in lieu of class presentation
Announcements • No official class meeting Wed/Thur • Nick available here for project/video questions • Email me project questions! • Can also do Zoom chats in evening • Course evaluations • Two! • Please!
Where to go from here • Design of Experiments: Stat 323 • Survey data: Stat 421 • Using R: Stat 331 (after CPE 101) • Using SAS: Stat 330 • Time series, correlated data: Stat 418 • Statistical learning: Stat 4xx, Data 301 • Categorical data analysis: Stat 418 • Clustering, PCA: Stat 419 • Probability: Stat 305 (Bayesian Stat 4xx)