Regression Analysis

Regression Analysis Modeling Theory and Practice Hypothesis Testing Issues

y=0+1x1+2x2+3x3+...+ kxk+with n observations on y and k x’s • Model is linear in parameters •  is normally distributed with mean zero and constant variance, ~N(0,2) • Disturbances are not correlated, E(ij)=0, ij • No exact linear relation exists between any of the x’s

Model Estimation • True Model: y=0+1x1+2x2+3x3+...+ kxk+ • Fitted Model: y=b0+b1x1+b2x2+b3x3+...+ bkxk+e • We want the b’s to be unbiased, E(bi)=i • We want the b’s to be as close to the ’s as possible • Choose b’s such that the sum of the squared deviations from the observations and the estimated line (plane) is at a minimum

Decomposition of Variation • Actual: y=b0+b1x1+b2x2+b3x3+...+ bkxk+e • Simulated: ÿ=b0+b1x1+b2x2+b3x3+...+ bkxk • So, Residual, e=Actual-Simulated=y- ÿ • Compute, plot and analyze patterns • Variation in y values about their mean, E(y) • Sum of squares total, SST • What we are trying to explain • Variation in ÿ values about their mean (also, E(y)) • Sum of squares regression, SSR • What we’ve explained with this regression model

Decomposition of Variation • Variation in residuals about their mean (=0) • Sum of squares residual or errors, SSE • What we haven’t explained • SST = SSR + SSE • Coefficient of multiple determination, R2 • R2 = SSR/SST = 1 - SSE/SST • Coefficient of multiple correlation, R • The square root of R2

Model Variation • R2 increases as we add independent variables no matter what their statistical contribution • Adjusted R2 = R2 - (k/(n-k-1))(1-R2) where k is the number of independent variables excluding the constant term • Variance and the Standard Error of the Estimate • Variance: s2 = SSE/(n-k-1) • Standard Error: s = the square root of s2

Coefficient Variation • The variances and standard errors of the coefficients depend on the variance and standard error of the estimate (s2 and s) as well as the variation in and covariations among the independent variables, the x’s • Each coefficient, b, is an estimate of  • Estimation gives us the coefficient, the estimated variance of the coefficient and the estimated covariation between it and all of the other coefficients • We use the coefficient, bi, and its standard error, sbi, to test hypotheses • Standardized coefficient, b*i = bisxi/sy

Hypothesis Testing • Making inferences about the universe or population from our sample and estimation • Null hypothesis • H0: i=B where B is some number including zero • Alternative hypotheses for possible testing • HA: i<B Left-tail test • HA: i>B Right-tail test • HA:iB Two-tail test

Hypothesis Testing • It can be shown that the ratio • t = (i-B)/i estimated as t = (bi-B)/sbi • is distributed according to the Student’s t distribution with n-k-1 degrees of freedom (df) • Degrees of freedom • SST, n-1 df n observations less one for mean of y • SSR, k df k independent variables • SSE, n-k-1 df [(n-1)=k+(n-k-1)] as SST=SSR+SSE

Student’s t DistributionAppendix B: Pages 82-83

Hypothesis Testing Procedure • Develop null and alternative hypotheses • Choose confidence interval, CI% • Determine degrees of freedom • Find critical t value from t table • One-tail tests: Use column 100%-CI% • Two-tail tests: Use column (100%-CI%)/2 • Compare absolute value of computed t ratio to the critical t value from the t table

Hypothesis Testing Procedure • If the computed t value is less than the critical table value then we cannot reject the null hypothesis at our chosen level of confidence • Does not mean that the null is true • If the computed t value exceeds the critical table value then we reject the null hypothesis in favor of the alternative hypothesis at our chosen level of confidence • Does not mean that the alternative is true

Multiple Regression How can you find area abc? (which = R2Y•X1X2) Y f • Remember… • r2YX1•X2 = a/(a+f) • r2YX2•X1 = c/(c+f) • r2Y(X1•X2) = a • r2Y(X2•X1) = c • r2X1Y = a+b • r2X1X2 = b+d • r2X2Y= b+c c a b e d g X1 X2

Multiple Regression Finding area abc: find area ab = r2(YX1) then add c = r2Y(X2•X1) R2Y•X1X2= r2YX1+ r2Y(X2•X1) or find area bc = r2(YX2) then add a = r2Y(X1•X2) R2Y•X1X2= r2YX2+ r2Y(X1•X2) Y f X2 c a b e d g Get the same R2but, order of entry can importantly influence your conclusions. X1

Multiple Regression Coefficient Equation • R2Y•X1X2X3…Xn= r2YX1+ r2Y(X2•X1) + r2Y(X3•X1X2)+… + r2Y(Xn•X1X2X3…Xn-1)

Multiple Regression Methods • Direct (Simple) Regression • Forward regression • Backward regression • Stepwise Regression • Hierarchical Regression

Direct (Simple) Regression • All available predictor variable are put into the equation at once and they are assessed as if they had been entered last • They are assessed on the basis of the proportion of variance in the criterion variable (Y) they uniquely account for (Squared Semi-Partial Correlations). • called simple regression in Bordens and Abbott • With SPSS, specify them with method =ENTER

Forward Regression • Sequentially add variables, one at a time based on the strength of their squared semi-partial correlations (or simple bivariate correlation in the case of the first variable to be entered into the equation)

Backward Regression • Start with them all in the equation then delete them on the bases of smallest change in the R2

Stepwise Regression • A combination of forward and backward • At each step one variable can be entered (on basis of greatest improvement in R2) but one may also be removed if the change (reduction) in R2 is not significant. • In the Bordens and Abbott text, it sounds like they use this term to mean Forward regression.

Hierarchical Regression • This is the only method with which the researcher assumes control over the analyses. • On basis of theory or practicality • This is especially important when multicollinearity is a problem

Regression Analysis