180 likes | 192 Views
Stat 324 – Day 15. Review. Last Time – Assessing Independence. Detection Residuals vs. order (for data in order) Auto-correlation graph (different lags) Durbin-Watson statistic (like corr ( r i , r i -1 )) Values near 2 when no lag one correlation in the residuals
E N D
Stat 324 – Day 15 Review
Last Time – Assessing Independence • Detection • Residuals vs. order (for data in order) • Auto-correlation graph (different lags) • Durbin-Watson statistic (like corr(ri, ri-1)) • Values near 2 when no lag one correlation in the residuals • Can approximate a two-sided p-value for H0: r = 0 • To fix • Time series methods (Stat 416) • Transformed variables (e.g., Cochran-Orcutt) • Including new variables…
Announcements • HW 3 graded • (problem 1 24 vs. 29 points) • Access to annotated handouts • Future discussion board posts • Project • Data file • Appendix of output in report? • HW 4 posted this weekend
Big Picture • Have a response variable, with variability • Believe it is associated with another variable • But not a deterministic relationship • DATA = MODEL + ERROR • Is the variability explained by the model large compared to the random chance variability? • Special model = Means follow a line (constant rate of change)
Big Picture • Describing the relationship between x and y • Scatterplot: Direction, form, strength, outliers • Finding a model/Making predictions • Smoothers (nonparametric models) • Least Squares regression • Median-Median line • Transformations • Quadratic • Weighted least squares
Least Square Regression Model • Validating the model • LINE conditions and how to check (graph, p-value) • Interpreting the model • Slope interpretation • Log transformations • R2, s – model performance • Making inferences about slope, intercept, predictions • p-values, confidence intervals for slope, intercept • prediction intervals for future (individual, mean) values • Back transformations • Consequences, Properties • Resistant procedures, No intercept models, (xbar, ybar)
The Translations • Describe the relationship • Scatterplot, corr coefficient: form, direction, strength, unusual observations, context • Fit a model, Prediction Eq • Run a regression (consider transformations) • Increase in y with each increase in x • Slope • Response when x = 0 • Intercept • Is the model valid? Is the model appropriate? • Residual analysis • Is the model accurate • R2, s • Is the relationship statistically significant? • p-value (watch for one-sided), hypotheses, df • Is the model useful? • R2, significance • Estimate y from x* • Point estimate vs interval estimate • Individual vs. mean • Generalizabilty, cause and effect • Look at study design
Unusual Observations • Outlier in x, outlier in y, outlier in regression • Standardized residuals • Studentized residual – p-value • Leverage/hat values • Influence: Cook’s Distance
Weighted Regression • The “model” is nice because all x values inform the estimate for y at x*. • But is that always a good thing? • Could downweight some observations because we don’t think they are as reliable • Smaller sample size • More variability in responses • Not as recent
Weighted Regression • When we included all counties but weighted the regression by the number of people living in each county, the statistical significance of the opposite effects in Wisconsin and Texas both evaporated. • And I weighted the regression by FiveThirtyEight’s aggregate polling weight in each state, so that Ohio and Florida (for example) are much more influential than West Virginia or Hawaii.
Distance vs. time Decrease y power (.5, log, -1) or increase x power (2)
LINE • Linearity between E(Y) and x • Want no pattern in residuals vs fits • Lack of fit F test (if replication) • How to fix: • If monotonic, power transformation • Box-Cox to minimize SSE using power • If turns, quadratic • Other • Independence of residuals • Residuals vs. order, ACF • Durbin-Watson
LINE • Normality of residuals • Histogram, normal probability plot (linear) • Anderson Darling and/or Shapiro Wilks p-value • How to fix: Transformation of Y • Equal variability in residuals • Want no fanning in residuals vs. fits • How to fix: Transformation of Y • If variability increases with Y, lower power on Y • If variability increasing with X, weighted regression
Interpreting log transformations • Log(x): multiplying x by base is associated with slope change in (predicted) mean of y • Log(y): increasing x by 1 is associated with multiplying (predicted) median of y by baseslope • Log(x) and Log (y): Multiplying x by C is associated with multiplying (predicted) median of y by Cslope • See extra details handout online
Study Advice • Review homework problems, solutions • Be ready to explain your reasoning • Be ready to apply your knowledge • Work problems • Ask review questions • Study as if a closed book exam • Know all the different confidence intervals • Know all the different standard deviations
Test Taking Advice • Point allocation • Mix of easy and challenge questions • Partial Credit • Get something written down • Parts of a problem usually do not have to be answered in order • Or give suitable symbol and move on • Quickly read through all questions first? Any expected topics not there?