1 / 48

Statistics Workshop Multiple Regression Spring 2009 Bert Kritzer

Statistics Workshop Multiple Regression Spring 2009 Bert Kritzer. Regression. Simple regression vs. multiple regression Linear vs. nonlinear relationships Least squares estimation vs. maximum likelihood estimation b’s vs. β ’s Standardized vs. unstandardized estimates

rolf
Download Presentation

Statistics Workshop Multiple Regression Spring 2009 Bert Kritzer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics Workshop Multiple RegressionSpring 2009Bert Kritzer

  2. Regression • Simple regression vs. multiple regression • Linear vs. nonlinear relationships • Least squares estimation vs. maximum likelihood estimation • b’s vs. β’s • Standardized vs. unstandardized estimates • Regression models and causation

  3. Bivariate Regression

  4. Conditional distributions all have the same variance (“homoscedasticity”) Linearity: “Conditional Expectations” fall on a straight Line Statistical ModelBivariate Regression Y’s are statistically independent

  5. “Normal” Regression Model

  6. Tort Reform by Citizen Liberalism For every ten point increase in citizen liberalism, one less tort reform was adopted Y = 12.89 – 0.10X

  7. Tort Reform by Citizen Liberalism SSDY SSDe se

  8. Three Dimensions

  9. 1 predictor (bivariate): 2 predictors: 3 predictors: Regression Models

  10. Statistical ModelMultiple Regression Random variables Y1, Y2,… Yn, are statistically independent with conditional mean: and Conditional variance =2 Therefore:

  11. Matrix Presentation

  12. Tort Reform by Citizen Liberalism & Elite Liberalism

  13. Multiple Regression: Tort Reform by Citizen and Elite Liberalism

  14. Correlation Between Predictors Elite Liberalism by Citizen Liberalism

  15. Attitude toward the Police by Stops and Age

  16. Attitude toward the Police by Stops and AgeUsing Excel

  17. Multiple Regression Coefficients as Random Variables I ONE SAMPLE OF 100 Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 2, 97) = 2.73 Model | 473.549401 2 236.7747 Prob > F = 0.0701 Residual | 8405.0406 97 86.6499031 R-squared = 0.0533 -------------+------------------------------ Adj R-squared = 0.0338 Total | 8878.59 99 89.6827273 Root MSE = 9.3086 ------------------------------------------------------------------------------ police | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- stops | .4482355 .9257543 0.48 0.629 -1.38913 2.285601 age | .131513 .056274 2.34 0.021 .0198247 .2432013 _cons | 33.54175 2.883879 11.63 0.000 27.81805 39.26545 ------------------------------------------------------------------------------

  18. Multiple Regression Coefficients as Random Variables II 1,000 samples of 100 observations

  19. InteprettingResults • R2 is proportion of variation explained (8.8%). • adjusted R2 • F provides a test of whether the results could occur by chance if allβ’s were 0. • Coefficient for stops means, if age is constant, one additional stop decreases support by about 2 points (1. 922). • Coefficient for age means, if number of stops is constant, support goes up about a tenth of a point (0.11) for each additional year; or, an increase of ten years in age, will increase support about one point. • t’s for each coefficient test whether that individual coefficient differs from 0 (both are statistically significant). • Each coefficient has both a point and an interval estimate

  20. R2

  21. Model Specification • “Misspecification” • Impact of omitting variables • correlated or uncorrelated with other predictors • Impact of including irrelevant variables • Specifying the “form” of the relationship • Linear vs. nonlinear

  22. Omission of Significant Variables • Bias (in the statistical sense) estimates of other variables • Can make other variables look more important • Can make other variables look less important or even not significant • Cope example (p. 93)

  23. Testing Subsets of Variables vs.

  24. Residuals in Multiple Regression • Normality: Small vs. large samples • Residual plots • Against predicted value and/or individual predictors • Heteroscedasticity • Nonlinearity • Outliers • Complex outliers and influential observations • “Regression diagnostics • Large samples • Time series data and “autocorrelation”

  25. Prototypical Residual Plots

  26. More Residual Plots

  27. With Outliers

  28. Without Outliers

  29. Homoscedasticity and Heteroscedasticity

  30. Robust Standard Errors • Heteroscedasticity • Systematic nonindependence • clusters • Alternative solutions • “Purging” heteroscedasticity by transforming the data • Modeling the heteroscedasticity with ML

  31. ML for Heteroscedastic, Normal Regression

  32. Standardized Regression • “beta” vs. “b” • Standardize all variables to have mean of 0 and standard deviation of 1 • Interpret results in terms of how many the dependent variable changes for a 1 standard deviation change in predictor • Danger of comparing across groups when means and standard deviations vary across groups

  33. Practical Problems in Multiple Regression • Choosing from among a large number of predictors • Stepwise regression • Sample size constraints on the number of predictors • Intercorrelations among predictors and the idea of “holding constant” • Multicollinearity • Outliers • Errors in Variables • Nonlinearity

  34. Errors in Variables • Errors in Y (dependent variable) • depress fit (R2) but does not affect coefficients • depressing fit can affect power of significance tests because standard errors are biased upward • Error in X’s (predictor variables) • depressing regression coefficients • depresses power of significance tests

  35. Nonlinear Relationships

  36. Transforming X

  37. Transforming Y

  38. Logarithms: Definition X is called the logarithm of N to the base bif bx = N, where N and b are both positive numbers and b ≠ 1 If and only if because because because

  39. Standard LogarithmsBase 10 and Base e “Common” logarithm: Base 10: Log101000 = 3 “Natural” logarithm: Base e: Loge 1000 = 6.908 Ln 1000 = 6.908

  40. Rules of Logarithms

  41. Example

  42. Log Likelihood for Binomial “is proportional to”

  43. Causal Inference • Spurious relationships • Time ordering • Elimination of alternatives • Mutual causation • identification

  44. Regression in Wage Discrimination CasesBazemore v. Friday, 478 U.S. 385 (1986)

  45. University of Wisconsin1997 Gender Equity Pay Study

  46. University of Wisconsin1997 Gender Equity Pay StudyCollege of Letters & Science Regression Model: MODEL1 Dependent Variable: LNSAL ln(Salary) Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 54 45.42831 0.84127 43.187 0.0001 Error 781 15.21362 0.01948 C Total 835 60.64193 Root MSE 0.13957 R-square 0.7491 Dep Mean 11.03245 Adj R-sq 0.7318 C.V. 1.26508 Parameter Estimates Parameter Standard T for H0: Variable Variable DF Estimate Error Parameter=0 Prob > |T| Label INTERCEP 1 11.163387 0.07575212 147.367 0.0001 Intercept GENDER 1 -0.021302 0.01263912 -1.685 0.0923 Male WHITE 1 -0.010214 0.01651535 -0.618 0.5364 White/Unknown PROF 1 0.175458 0.01853981 9.464 0.0001 Full Professor ASST 1 -0.193622 0.02286049 -8.470 0.0001 Assistant Prof ANYDOC 1 0.017376 0.03510405 0.495 0.6208 Any Terminal Degree COH2 1 -0.085045 0.02458236 -3.460 0.0006 Hired 1980-88 COH3 1 -0.153097 0.03408703 -4.491 0.0001 Hired 1989-93 COH4 1 -0.168758 0.04543305 -3.714 0.0002 Hired 1994-98 DIFYRS 1 0.003513 0.00156769 2.241 0.0253 YRS SINCE DEG BEFORE UW INASTYRS 1 -0.018596 0.00380222 -4.891 0.0001 YRS AS INSTR/ASST PROF ASSOYRS 1 -0.020570 0.00244673 -8.407 0.0001 YRS AS UW ASSOC FULLYRS 1 0.003528 0.00146692 2.405 0.0164 YRS AS UW FULL PROF LNRATIO 1 0.481871 0.21528902 2.238 0.0255 ln(mkt ratio) PLUS 41 DEPARTMENT “FIXED EFFECTS”

  47. Equity Study: Fixed Effects DEPARTMENT FIXED EFFECTS Parameter Standard T for H0: Variable Variable DF Estimate Error Parameter=0 Prob > |T| Label AFRLANG 1 -0.037307 0.07287210 -0.512 0.6088 ANTHRO 1 -0.042490 0.05677832 -0.748 0.4545 AFRAMER 1 0.067777 0.06028682 1.124 0.2613 ARTHIST 1 -0.009346 0.06446204 -0.145 0.8848 ASTRON 1 0.025805 0.05767292 0.447 0.6547 BOTANY 1 -0.023055 0.06263077 -0.368 0.7129 COMMUN 1 -0.043242 0.06234593 -0.694 0.4882 CHEM 1 0.007705 0.04325153 0.178 0.8587 CLASSICS 1 -0.013697 0.07344295 -0.186 0.8521 COMMDIS 1 0.035164 0.05853836 0.601 0.5482 COMPLIT 1 -0.027078 0.07883924 -0.343 0.7313 COMPUT 1 0.198201 0.04934743 4.016 0.0001 EASIALG 1 -0.053194 0.06957342 -0.765 0.4448 ECON 1 0.169280 0.05319197 3.182 0.0015 ENGLISH 1 -0.053755 0.05584121 -0.963 0.3360 FRENITAL 1 -0.073378 0.05724591 -1.282 0.2003 GEOG 1 -0.014052 0.05781558 -0.243 0.8080 GEOLOGY 1 0.007804 0.05502894 0.142 0.8873 GERMAN 1 -0.079744 0.06744970 -1.182 0.2375 HEBREW 1 0.016752 0.09408135 0.178 0.8587 HISTORY 1 -0.031301 0.05059288 -0.619 0.5363 HISTSC 1 0.047905 0.07102221 0.675 0.5002 JOURNAL 1 -0.045840 0.05939580 -0.772 0.4405 LIBRYSC 1 -0.079658 0.06446705 -1.236 0.2170 LINGUIS 1 -0.105136 0.07404040 -1.420 0.1560 MATH 1 -0.034484 0.04433476 -0.778 0.4369 METEOR 1 -0.020649 0.05059822 -0.408 0.6833 MUSIC 1 -0.084759 0.06710503 -1.263 0.2069 PHILOS 1 -0.060066 0.05534808 -1.085 0.2782 PHYSICS 1 0.035945 0.04208888 0.854 0.3934 POLISC 1 0.001526 0.04407509 0.035 0.9724 PSYCH 1 0.043498 0.04718937 0.922 0.3569 SCAND 1 -0.068544 0.09877777 -0.694 0.4879 SLAVIC 1 0.081673 0.06944784 1.176 0.2399 SOCWORK 1 0.038894 0.05518913 0.705 0.4812 SOCIOL 1 0.034492 0.04455797 0.774 0.4391 SASIAN 1 -0.146444 0.07595848 -1.928 0.0542 SPANPORT 1 -0.102875 0.06176804 -1.666 0.0962 THEATRE 1 -0.076231 0.06933522 -1.099 0.2719 URBPLAN 1 -0.013524 0.05830072 -0.232 0.8166 ZOOL 1 -0.055001 0.05418789 -1.015 0.3104

  48. Extending Regression • Qualitative predictors • Nonlinear relationships • Time series data • Panel models • “Limited” dependent variables • Nominal (including dichotomous) dependent variables • Count variables • “Selection” models • Tobit • Switching • Mutual causation models

More Related