480 likes | 618 Views
Statistics Workshop Multiple Regression Spring 2009 Bert Kritzer. Regression. Simple regression vs. multiple regression Linear vs. nonlinear relationships Least squares estimation vs. maximum likelihood estimation b’s vs. β ’s Standardized vs. unstandardized estimates
E N D
Statistics Workshop Multiple RegressionSpring 2009Bert Kritzer
Regression • Simple regression vs. multiple regression • Linear vs. nonlinear relationships • Least squares estimation vs. maximum likelihood estimation • b’s vs. β’s • Standardized vs. unstandardized estimates • Regression models and causation
Conditional distributions all have the same variance (“homoscedasticity”) Linearity: “Conditional Expectations” fall on a straight Line Statistical ModelBivariate Regression Y’s are statistically independent
Tort Reform by Citizen Liberalism For every ten point increase in citizen liberalism, one less tort reform was adopted Y = 12.89 – 0.10X
Tort Reform by Citizen Liberalism SSDY SSDe se
1 predictor (bivariate): 2 predictors: 3 predictors: Regression Models
Statistical ModelMultiple Regression Random variables Y1, Y2,… Yn, are statistically independent with conditional mean: and Conditional variance =2 Therefore:
Multiple Regression: Tort Reform by Citizen and Elite Liberalism
Correlation Between Predictors Elite Liberalism by Citizen Liberalism
Multiple Regression Coefficients as Random Variables I ONE SAMPLE OF 100 Source | SS df MS Number of obs = 100 -------------+------------------------------ F( 2, 97) = 2.73 Model | 473.549401 2 236.7747 Prob > F = 0.0701 Residual | 8405.0406 97 86.6499031 R-squared = 0.0533 -------------+------------------------------ Adj R-squared = 0.0338 Total | 8878.59 99 89.6827273 Root MSE = 9.3086 ------------------------------------------------------------------------------ police | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- stops | .4482355 .9257543 0.48 0.629 -1.38913 2.285601 age | .131513 .056274 2.34 0.021 .0198247 .2432013 _cons | 33.54175 2.883879 11.63 0.000 27.81805 39.26545 ------------------------------------------------------------------------------
Multiple Regression Coefficients as Random Variables II 1,000 samples of 100 observations
InteprettingResults • R2 is proportion of variation explained (8.8%). • adjusted R2 • F provides a test of whether the results could occur by chance if allβ’s were 0. • Coefficient for stops means, if age is constant, one additional stop decreases support by about 2 points (1. 922). • Coefficient for age means, if number of stops is constant, support goes up about a tenth of a point (0.11) for each additional year; or, an increase of ten years in age, will increase support about one point. • t’s for each coefficient test whether that individual coefficient differs from 0 (both are statistically significant). • Each coefficient has both a point and an interval estimate
Model Specification • “Misspecification” • Impact of omitting variables • correlated or uncorrelated with other predictors • Impact of including irrelevant variables • Specifying the “form” of the relationship • Linear vs. nonlinear
Omission of Significant Variables • Bias (in the statistical sense) estimates of other variables • Can make other variables look more important • Can make other variables look less important or even not significant • Cope example (p. 93)
Residuals in Multiple Regression • Normality: Small vs. large samples • Residual plots • Against predicted value and/or individual predictors • Heteroscedasticity • Nonlinearity • Outliers • Complex outliers and influential observations • “Regression diagnostics • Large samples • Time series data and “autocorrelation”
Robust Standard Errors • Heteroscedasticity • Systematic nonindependence • clusters • Alternative solutions • “Purging” heteroscedasticity by transforming the data • Modeling the heteroscedasticity with ML
Standardized Regression • “beta” vs. “b” • Standardize all variables to have mean of 0 and standard deviation of 1 • Interpret results in terms of how many the dependent variable changes for a 1 standard deviation change in predictor • Danger of comparing across groups when means and standard deviations vary across groups
Practical Problems in Multiple Regression • Choosing from among a large number of predictors • Stepwise regression • Sample size constraints on the number of predictors • Intercorrelations among predictors and the idea of “holding constant” • Multicollinearity • Outliers • Errors in Variables • Nonlinearity
Errors in Variables • Errors in Y (dependent variable) • depress fit (R2) but does not affect coefficients • depressing fit can affect power of significance tests because standard errors are biased upward • Error in X’s (predictor variables) • depressing regression coefficients • depresses power of significance tests
Logarithms: Definition X is called the logarithm of N to the base bif bx = N, where N and b are both positive numbers and b ≠ 1 If and only if because because because
Standard LogarithmsBase 10 and Base e “Common” logarithm: Base 10: Log101000 = 3 “Natural” logarithm: Base e: Loge 1000 = 6.908 Ln 1000 = 6.908
Log Likelihood for Binomial “is proportional to”
Causal Inference • Spurious relationships • Time ordering • Elimination of alternatives • Mutual causation • identification
Regression in Wage Discrimination CasesBazemore v. Friday, 478 U.S. 385 (1986)
University of Wisconsin1997 Gender Equity Pay StudyCollege of Letters & Science Regression Model: MODEL1 Dependent Variable: LNSAL ln(Salary) Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 54 45.42831 0.84127 43.187 0.0001 Error 781 15.21362 0.01948 C Total 835 60.64193 Root MSE 0.13957 R-square 0.7491 Dep Mean 11.03245 Adj R-sq 0.7318 C.V. 1.26508 Parameter Estimates Parameter Standard T for H0: Variable Variable DF Estimate Error Parameter=0 Prob > |T| Label INTERCEP 1 11.163387 0.07575212 147.367 0.0001 Intercept GENDER 1 -0.021302 0.01263912 -1.685 0.0923 Male WHITE 1 -0.010214 0.01651535 -0.618 0.5364 White/Unknown PROF 1 0.175458 0.01853981 9.464 0.0001 Full Professor ASST 1 -0.193622 0.02286049 -8.470 0.0001 Assistant Prof ANYDOC 1 0.017376 0.03510405 0.495 0.6208 Any Terminal Degree COH2 1 -0.085045 0.02458236 -3.460 0.0006 Hired 1980-88 COH3 1 -0.153097 0.03408703 -4.491 0.0001 Hired 1989-93 COH4 1 -0.168758 0.04543305 -3.714 0.0002 Hired 1994-98 DIFYRS 1 0.003513 0.00156769 2.241 0.0253 YRS SINCE DEG BEFORE UW INASTYRS 1 -0.018596 0.00380222 -4.891 0.0001 YRS AS INSTR/ASST PROF ASSOYRS 1 -0.020570 0.00244673 -8.407 0.0001 YRS AS UW ASSOC FULLYRS 1 0.003528 0.00146692 2.405 0.0164 YRS AS UW FULL PROF LNRATIO 1 0.481871 0.21528902 2.238 0.0255 ln(mkt ratio) PLUS 41 DEPARTMENT “FIXED EFFECTS”
Equity Study: Fixed Effects DEPARTMENT FIXED EFFECTS Parameter Standard T for H0: Variable Variable DF Estimate Error Parameter=0 Prob > |T| Label AFRLANG 1 -0.037307 0.07287210 -0.512 0.6088 ANTHRO 1 -0.042490 0.05677832 -0.748 0.4545 AFRAMER 1 0.067777 0.06028682 1.124 0.2613 ARTHIST 1 -0.009346 0.06446204 -0.145 0.8848 ASTRON 1 0.025805 0.05767292 0.447 0.6547 BOTANY 1 -0.023055 0.06263077 -0.368 0.7129 COMMUN 1 -0.043242 0.06234593 -0.694 0.4882 CHEM 1 0.007705 0.04325153 0.178 0.8587 CLASSICS 1 -0.013697 0.07344295 -0.186 0.8521 COMMDIS 1 0.035164 0.05853836 0.601 0.5482 COMPLIT 1 -0.027078 0.07883924 -0.343 0.7313 COMPUT 1 0.198201 0.04934743 4.016 0.0001 EASIALG 1 -0.053194 0.06957342 -0.765 0.4448 ECON 1 0.169280 0.05319197 3.182 0.0015 ENGLISH 1 -0.053755 0.05584121 -0.963 0.3360 FRENITAL 1 -0.073378 0.05724591 -1.282 0.2003 GEOG 1 -0.014052 0.05781558 -0.243 0.8080 GEOLOGY 1 0.007804 0.05502894 0.142 0.8873 GERMAN 1 -0.079744 0.06744970 -1.182 0.2375 HEBREW 1 0.016752 0.09408135 0.178 0.8587 HISTORY 1 -0.031301 0.05059288 -0.619 0.5363 HISTSC 1 0.047905 0.07102221 0.675 0.5002 JOURNAL 1 -0.045840 0.05939580 -0.772 0.4405 LIBRYSC 1 -0.079658 0.06446705 -1.236 0.2170 LINGUIS 1 -0.105136 0.07404040 -1.420 0.1560 MATH 1 -0.034484 0.04433476 -0.778 0.4369 METEOR 1 -0.020649 0.05059822 -0.408 0.6833 MUSIC 1 -0.084759 0.06710503 -1.263 0.2069 PHILOS 1 -0.060066 0.05534808 -1.085 0.2782 PHYSICS 1 0.035945 0.04208888 0.854 0.3934 POLISC 1 0.001526 0.04407509 0.035 0.9724 PSYCH 1 0.043498 0.04718937 0.922 0.3569 SCAND 1 -0.068544 0.09877777 -0.694 0.4879 SLAVIC 1 0.081673 0.06944784 1.176 0.2399 SOCWORK 1 0.038894 0.05518913 0.705 0.4812 SOCIOL 1 0.034492 0.04455797 0.774 0.4391 SASIAN 1 -0.146444 0.07595848 -1.928 0.0542 SPANPORT 1 -0.102875 0.06176804 -1.666 0.0962 THEATRE 1 -0.076231 0.06933522 -1.099 0.2719 URBPLAN 1 -0.013524 0.05830072 -0.232 0.8166 ZOOL 1 -0.055001 0.05418789 -1.015 0.3104
Extending Regression • Qualitative predictors • Nonlinear relationships • Time series data • Panel models • “Limited” dependent variables • Nominal (including dichotomous) dependent variables • Count variables • “Selection” models • Tobit • Switching • Mutual causation models