1 / 71

Regression Diagnostics

Regression Diagnostics. Prior to interpreting your regression results, you should examine your data for potential problems that could affect your findings using various diagnostic techniques. Types of possible problems. Assumption violations Outliers and influential cases Multicollinearity.

clivingston
Download Presentation

Regression Diagnostics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Diagnostics SRM 625 Applied Multiple Regression, Hutchinson

  2. Prior to interpreting your regression results, you should examine your data for potential problems that could affect your findings using various diagnostic techniques SRM 625 Applied Multiple Regression, Hutchinson

  3. Types of possible problems • Assumption violations • Outliers and influential cases • Multicollinearity SRM 625 Applied Multiple Regression, Hutchinson

  4. Regression Assumptions • Error-free measurement • Correct model specification • Assumptions about residuals SRM 625 Applied Multiple Regression, Hutchinson

  5. Assumption that variables are measured without error • Presence of measurement error in Y leads to increase in standard error of estimate • If standard error of estimate is inflated what happens to the F test for R2? • (hint: think about the relationship between the standard error and mean square error) SRM 625 Applied Multiple Regression, Hutchinson

  6. In a bivariate regression, measurement error in X always leads to underestimation of regression coefficient • What are the implications of this for interpreting results regarding X? SRM 625 Applied Multiple Regression, Hutchinson

  7. What are the possible consequences of measurement error when one or more IVs has poor reliability in a multiple regression model? SRM 625 Applied Multiple Regression, Hutchinson

  8. Evidence to assess violation of the assumption of error-free measurement • Reliability estimates for your independent and dependent variables • What would constitute "acceptable" reliability? SRM 625 Applied Multiple Regression, Hutchinson

  9. How might you attempt to minimize violation of the assumption during the design and planning phase of your study? SRM 625 Applied Multiple Regression, Hutchinson

  10. Assumption that the regression model has been correctly specified • Linearity • Inclusion of all relevant independent variables • Exclusion of irrelevant independent variables SRM 625 Applied Multiple Regression, Hutchinson

  11. Assumption of Linearity • Violation of this assumption can lead to downward bias of regression coefficients • If data are curvilinearly related there are methods for dealing with curvilinear data • Require use of multiple regression and transformation of variables • Note: we will discuss methods for addressing nonlinear relationships later in the course SRM 625 Applied Multiple Regression, Hutchinson

  12. Detecting nonlinearity • In bivariate, can examine scatterplots of X and Y • Not sufficient in multiple regression • However, can examine partial regression plots between each IV and the DV, controlling for other IVs • In multiple regression, residuals plots are primarily used SRM 625 Applied Multiple Regression, Hutchinson

  13. Residuals plots • Typically involve scatterplots with either standardized, studentized, or unstandardized residuals plotted against predicted Y, i.e., versus SRM 625 Applied Multiple Regression, Hutchinson

  14. A residuals scatterplot should reflect a broad horizontal band of points (i.e., should look like scatterplot for r = 0). • If plot forms some type of pattern, it could indicate an assumption violation • Specifically, for nonlinearity the plot would reflect a curve SRM 625 Applied Multiple Regression, Hutchinson

  15. Sample residuals plot Does this appear to be a correlation = 0? SRM 625 Applied Multiple Regression, Hutchinson

  16. Sample partial regression plot SRM 625 Applied Multiple Regression, Hutchinson

  17. Assumption that all important independent variables have been included • If omitted variables are correlated with variables in equation, violation of this assumption can lead to biased parameter estimates (e.g., incorrect values of regression coefficients) • Fairly serious violation SRM 625 Applied Multiple Regression, Hutchinson

  18. Violation can also lead to non-random residuals (i.e., residuals that include systematic variance associated with the omitted variables) • If omitted variables are not correlated with variables in the model, parameter estimates are not biased, but standard errors associated with the independent variables are biased upward (i.e., inflated) SRM 625 Applied Multiple Regression, Hutchinson

  19. For example: Error includes: autonomy task enjoyment working conditions etc. Job Satisf Salary Therefore, if autonomy, task enjoyment, etc. are correlated with job satisfaction, residuals (which reflect autonomy, task enjoyment, etc.), would be correlated with predicted job satisfaction

  20. How do we determine if this assumption is violated? • Can examine residuals plots • Again, plot residuals against predicted values of Y • Again, hope to see a broad horizontal band of points • If plot reflects some type of discernable pattern, e.g., a linear pattern, it could suggest omitted variables SRM 625 Applied Multiple Regression, Hutchinson

  21. What can you do if it appears you have violated this assumption? SRM 625 Applied Multiple Regression, Hutchinson

  22. How might we attempt to prevent violation of this assumption? SRM 625 Applied Multiple Regression, Hutchinson

  23. Assumption that no irrelevant independent variables have been included • Will lead to inflated standard errors for the regression coefficients (not just those corresponding to the irrelevant variables) • What effect could this have on conclusions you draw about the contributions of your independent variables? SRM 625 Applied Multiple Regression, Hutchinson

  24. How can you determine if you have violated this assumption? SRM 625 Applied Multiple Regression, Hutchinson

  25. What might you do to avoid this potential assumption violation? SRM 625 Applied Multiple Regression, Hutchinson

  26. Assumptions about errors • Residuals have mean of zero • Residuals are random • Residuals are normally distributed • Residuals have equal variance (i.e., homoscedasticity) SRM 625 Applied Multiple Regression, Hutchinson

  27. Residuals (or errors) are random • Residuals should be uncorrelated with both Y and predicted Y • Residuals should be uncorrelated with independent variables • Residuals should be uncorrelated with one another • This is comparable to the independence of observations assumption • What this means is that the reason for prediction error for one person should be unrelated to the reason for prediction error for another person SRM 625 Applied Multiple Regression, Hutchinson

  28. If violate, tests of significance cannot be trusted • F and t tests are not robust to violations of this assumption • This assumption is most likely to be violated: • in longitudinal studies, or • when important variables have been left out of the equation, or • if observations are clustered, e.g., • When subjects are sampled from intact groups or in cluster sampling SRM 625 Applied Multiple Regression, Hutchinson

  29. Residuals are normally distributed • Residuals are assumed to be normally distributed around the regression line for all values of X • This is analogous to the normality assumption in a t-test or ANOVA SRM 625 Applied Multiple Regression, Hutchinson

  30. Illustration of data which violate assumption of normality

  31. Normal probability plot of residuals SRM 625 Applied Multiple Regression, Hutchinson

  32. Residuals have equal variance • Residuals should be evenly spread around the regression line • Known as the assumption of homoscedasticity • Same as assumption of homogeneity of variance in ANOVA but with equal variances on Y for each value of X SRM 625 Applied Multiple Regression, Hutchinson

  33. Illustration of homoscedastic data SRM 625 Applied Multiple Regression, Hutchinson

  34. Illustration of heteroscedasticity SRM 625 Applied Multiple Regression, Hutchinson

  35. Further evidence of heteroscedasticity and nonnormality SRM 625 Applied Multiple Regression, Hutchinson

  36. Why is violation of the homoscedasticity assumption a problem? SRM 625 Applied Multiple Regression, Hutchinson

  37. What can you do if your data are heteroscedastic? • Can use weighted least squares instead of ordinary least squares as your estimation procedure • WLS weights each case so that cases with larger error variances receive less weight (in OLS each case is weighted 1) SRM 625 Applied Multiple Regression, Hutchinson

  38. Outliers and Influential Cases • Outliers • Influential observations • Leverage • Extreme on both X and Y SRM 625 Applied Multiple Regression, Hutchinson

  39. What is an outlier? • A case with an extreme value of Y • Presence of outliers can be detected by examination of residuals SRM 625 Applied Multiple Regression, Hutchinson

  40. Types of residuals used in outlier detection • Standardized residuals • Studentized residuals • Studentized deleted residuals SRM 625 Applied Multiple Regression, Hutchinson

  41. Standardized Residuals • Unstandardized residuals that have been converted to z-scores • Not recommended by some because their calculation makes the assumption that all residuals have the same variance (as measured by the overall Sy.x) SRM 625 Applied Multiple Regression, Hutchinson

  42. Studentized Residuals • Similar to standardized residuals but use different standard deviations for each residual • Generally more sensitive than standardized residuals • Follow an approximate t distribution SRM 625 Applied Multiple Regression, Hutchinson

  43. Studentized Deleted Residuals • Studentized deleted residuals are the same as studentized residuals except they remove the case with the extreme value from their calculation • Addresses a potential problem of studentized residuals which include the outlier in their calculation (thus increasing risk of inflated standard error) SRM 625 Applied Multiple Regression, Hutchinson

  44. Comparing the three types of residuals SRM 625 Applied Multiple Regression, Hutchinson

  45. Leverage • Reflects cases with extreme values on one or more of the independent variables • May or may not exert influence on the equation SRM 625 Applied Multiple Regression, Hutchinson

  46. How does one identify cases with high leverage? • SPSS produces values of leverage (h) which can range between 0 and 1 • One "rule of thumb" suggests h > 2(k + 1)/N as a high leverage value • Another rule of thumb is that h ≤ .2 indicates trivial leverage whereas values > suggests substantial leverage requiring further examination • Other researchers recommend looking at relative differences SRM 625 Applied Multiple Regression, Hutchinson

  47. Leverage Example (based on 3 IVS, N = 171) SRM 625 Applied Multiple Regression, Hutchinson

  48. Mahalanobis distance (D2) • A method for detecting multivariate outliers, i.e., cases with unexpected combinations of independent variables • Represents the distance of a case from the centroid of the remaining cases, where the centroid represents the intersection of the means of all the variables • One rule of thumb suggests high values exceed the 2 critical with degrees of freedom equal to the number of IVs in the model SRM 625 Applied Multiple Regression, Hutchinson

  49. Mahalanobis D2 example Note: model based on 6 IVs SRM 625 Applied Multiple Regression, Hutchinson

  50. It should be noted that just because a case is an outlier and/or exhibits high leverage does not necessarily mean it is influential SRM 625 Applied Multiple Regression, Hutchinson

More Related