1 / 17

Chapter 9: Regression Wisdom

Chapter 9: Regression Wisdom. AP Statistics. Issues and Problems with Regression. Subsets and curves Dangers of extrapolation Possible effects of outliers, high leverage, and influential points Problems with regression of summary data Mistakes of inferring causation.

jessie
Download Presentation

Chapter 9: Regression Wisdom

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 9: Regression Wisdom AP Statistics

  2. Issues and Problems with Regression • Subsets and curves • Dangers of extrapolation • Possible effects of outliers, high leverage, and influential points • Problems with regression of summary data • Mistakes of inferring causation

  3. What else can residuals tell us? • Histograms (and other graphs) of residuals can reveal “Subsets” of data that will enhance our understanding of the original data. • May lead us to analyzing the “subsets” seperately.

  4. What else can residuals tell us? Histogram of residuals Scatterplot of residuals

  5. Hard to See CurvesSometimes the scatterplot looks “straight enough”, but a non-linear relationship only comes to light after you look at residual plot.

  6. Extrapolation • The farther our x value is from the mean of x, the less we trust our predicted value. • Once we venture into new x territory our predicted value is an extrapolation. • Our extrapolations not reliable because we are operating under the assumption that the relationship between x and y has changed, even for these extreme values of x. • Don’t extrapolate into the future!!!!!!!!

  7. Extrapolation

  8. Outliers, Leverage and Influence Unusual point vocabulary: High Leverage Points: Points that have an x value that is far from Influential Points: Points that change the model (change the slope of the line) High leverage points can also be influential, but do not need to be

  9. Outliers, Leverage and Influence Three types of unusual points: 1. High Leverage points with small residuals. These points confirm the pattern, but are extreme values. The slope and intercept are mostly unaffected, but the R-squared value will increase—don’t be misled that the model is now stronger.

  10. Outliers, Leverage, and Influence 2. Outliers—Not high leverage, not influential and large residual: Does not affect slope, but aren’t consistent with pattern. Will change the intercept. Don’t throw away. x value is near center of mean of x values

  11. Outliers, Leverage, and Influence • Influential Points—also high leverage and probably small residual: These are most troublesome. They aren’t consistent with model and if the point is removed the slope of line dramatically changes—it changes the model. Don’t throw it our without thinking.

  12. Lurking Variables and Causation • With observational data, as opposed to designed experiments, there is not way to be sure that a lurking variable is not the cause of any apparent association. • The lurking variable is some third variable (not the explanatory or predictor variable) that is driving both variables you have observed.

  13. Lurking Variables and Causationz is the lurking variable

  14. Lurking Variables and Causation There have been many studies showing a strong positive association between hours spent in religious activities (going to church, attending religious classes, praying, etc) and life expectancy. NOT CAUSATION. There is confoudnding—on average, people who attend relgiousactivites also take better care of themselves than non-church attendants. They are also less likely to smoke, more likely to exercise and less likely to be overweight. These effects of good habits (lurking variables) are confounded with the direct effects of attending religious activities.

  15. Working With Summary Values • Be cautious when working with data values that are summaries, such as mean and medians. • These values have less variability and therefore inflate the strength of the relationship (correlation).

  16. Summary Data

  17. All Data Points

More Related