1 / 21

Sociology 601 Class 21: November 10, 2009

Sociology 601 Class 21: November 10, 2009. Review formulas for b and se(b) stata regression commands & output Violations of Model Assumptions, and their effects (9.6) Causality (10). Formulas for b , a, r , and se(b ). Stata Example of Inference about a Slope.

yelena
Download Presentation

Sociology 601 Class 21: November 10, 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sociology 601 Class 21: November 10, 2009 • Review • formulas for b and se(b) • stata regression commands & output • Violations of Model Assumptions, and their effects (9.6) • Causality (10)

  2. Formulas for b, a, r, and se(b)

  3. Stata Example of Inference about a Slope • . summarize murder poverty • Variable | Obs Mean Std. Dev. Min Max • -------------+-------------------------------------------------------- • murder | 51 8.727451 10.71758 1.6 78.5 • poverty | 51 14.25882 4.584242 8 26.4 • . regress murder poverty • Source | SS df MS Number of obs = 51 • -------------+------------------------------ F( 1, 49) = 23.08 • Model | 1839.06931 1 1839.06931 Prob > F = 0.0000 • Residual | 3904.25223 49 79.6786169 R-squared = 0.3202 • -------------+------------------------------ Adj R-squared = 0.3063 • Total | 5743.32154 50 114.866431 Root MSE = 8.9263 • ------------------------------------------------------------------------------ • murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • poverty | 1.32296 .2753711 4.80 0.000 .7695805 1.876339 • _cons | -10.1364 4.120616 -2.46 0.017 -18.41708 -1.855707 • -----------------------------------------------------------------------------

  4. Stata Example of Inference about a Slope . correlate murder poverty (obs=51) | murder poverty -------------+------------------ murder | 1.0000 poverty | 0.5659 1.0000 . correlate murder poverty, covariance (obs=51) | murder poverty -------------+------------------ murder | 114.866 poverty | 27.8024 21.0153 sqrt(114.866) = 14.26 = sd(y); sqrt (21.0153) = 8.73 = sd(x)

  5. Alternative Formula for b b = 27.8024 / 21.0153 = 1.323

  6. Stata Example of Inference about a Slope scatter murder poverty || lfit murder poverty

  7. Stata Example of Inference about a Slope . regress murder poverty if state!="DC" Source | SS df MS Number of obs = 50 -------------+------------------------------ F( 1, 48) = 31.36 Model | 307.342297 1 307.342297 Prob > F = 0.0000 Residual | 470.406476 48 9.80013492 R-squared = 0.3952 -------------+------------------------------ Adj R-squared = 0.3826 Total | 777.748773 49 15.8724239 Root MSE = 3.1305 ------------------------------------------------------------------------------ murder | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- poverty | .5842405 .104327 5.60 0.000 .3744771 .7940039 _cons | -.8567153 1.527798 -0.56 0.578 -3.92856 2.215129 ------------------------------------------------------------------------------

  8. Assumptions Needed to make Population Inferences for slopes. • The sample is selected randomly. • X and Y are interval scale variables. • The mean of Y is related to X by the linear equation E{Y} =  + X. • The conditional standard deviation of Y is identical at each X value. (no heteroscedasticity) • The conditional distribution of Y at each value of X is normal. • There is no error in the measurement of X.

  9. Common Ways to Violate These Assumptions • The sample is selected randomly. • Cluster sampling (e.g., census tracts / neighborhoods) causes observations in any cluster to be more similar than to observations outside the cluster. • Autocorrelation (spatial and temporal) • Two or more siblings in the same family. • Sample = populations (e.g., states in the U.S.) • X and Y are interval scale variables. • Ordinal scale attitude measures • Nominal scale categories (e.g., race/ethnicity, religion)

  10. Common Ways to Violate These Assumptions (2) • The mean of Y is related to X by the linear equation E{Y} =  + X. • U-shape: e.g., Kuznets inverted-U curve (inequality <- GDP/capita) • Thresholds: • Logarithmic (e.g., earnings <- education) • The conditional standard deviation of Y is identical at each X value. (no heteroscedasticity) • earnings <- education • hours worked <- years • adult child occupational status <- parental occupational status

  11. Common Ways to Violate These Assumptions (3) • The conditional distribution of Y at each value of X is normal. • earnings (skewed) <- education • Y is binary • Y is a % • There is no error in the measurement of X. • almost everything • what is the effect of measurement error in x on b?

  12. Things to watch out for: extrapolation. • Extrapolation beyond observed values of X is dangerous. • The pattern may be nonlinear. • Even if the pattern is linear, the standard errors become increasingly wide. • Be especially careful interpreting the Y-intercept: it may lie outside the observed data. • e.g., year zero • e.g., zero education in the U.S. • e.g., zero parity

  13. Things to watch out for: outliers • Influential observations and outliers may unduly influence the fit of the model. • The slope and standard error of the slope may be affected by influential observations. • This is an inherent weakness of least squares regression. • You may wish to evaluate two models; one with and one without the influential observations.

  14. Things to watch out for: truncated samples • Truncated samples cause the opposite problems of influential observations and outliers. • Truncation on the X axis reduces the correlation coefficient for the remaining data. • Truncation on the Y axis is a worse problem, because it violates the assumption of normally distributed errors. • Examples: Topcoded income data, health as measured by number of days spent in a hospital in a year.

  15. Causality • We never prove that x causes y • Research and theory make it increasingly likely • Criteria: • association • time order • no alternative explanations • is the relationship spurious?

  16. Alternative Explanations • Example: Neighborhood poverty -> Low Test Scores

  17. Alternative Explanations • Example: Neighborhood poverty -> Low Test Scores • Possible solutions: • multivariate models • e.g., control for parents’ education, income • controls for other measureable differences • fixed effects models • e.g., changes in poverty -> changes in test scores • controls for constant, unmeasured differences • instrumental variables • find an instrument that affects x1 but not y • experiments • e.g., Moving to Opportunity • randomize increases in $

  18. Alternative Explanations • Example: Fertility -> Lower Mothers’ LFP • Possible solutions:

  19. Alternative Explanations • Example: Fertility -> Lower Mothers’ LFP • Possible solutions: • multivariate models • e.g., control for gender attitudes • controls for other measureable differences • fixed effects models • e.g., changes in # children -> dropping out • controls for constant, unmeasured differences • instrumental variables • find an instrument that affects x1 but not y • e.g., mothers of two same sex children • experiments • not feasible (or ethical)

  20. Types of 3-variable Causal Models • Spurious • x2 causes both x1 and y • e.g., religion causes fertility and women’s lfp • Intervening • x1 causes x2 which causes y • e.g., fertility raises time spent on children which lowers time in the labor force • What is the statistical difference between these?

  21. Another type of 3-varaible relationship: Statistical Interaction Effects • Example: Fertility -> Lower Mothers’ LFP • The relationship between x1 and y depends on the value of another variable, x2 • e.g., marital status -> earnings depends on gender

More Related