Multiple Regression

Multiple Regression Lecture 13

Today’s plan • Moving from the bi-variate to the multivariate • Looking at how the multivariate equation relates to the bi-variate equation • Derivation • The difference between true and estimated models

Introduction • In multivariate regressions, your number of X variables are restricted by (n-k) > 0, where k is the number of parameters in your model • In the bi-variate case we had If (n-k)  0 we wouldn’t be able to calculate test statistics • We will use an example where earnings is our dependent variable, years of schooling (YRS_SCH) is X1, and age is X2

Derivation • The rules for the derivation of the parameters are the same as for the bi-variate world • Our g function will be g(a, b1, b2) • We will still want to minimize  e2 • Our model will be Y = a + b1X1 + b2X2 +e • We can rewrite this in terms from deviations from mean values (coded variables): y = b1x1 + b2x2 + e

Derivation (2) • We can rearrange our model in terms of e: e = Y - a - b1X1 - b2X2 • Differentiating with respect to each of the parameters gives us:

Derivation (3) • To get our estimate of we use the FOC that the sum of the errors equal zero. We substitute in for e and solve: • As we include more variables, we need more terms to calculate the intercept • Calculating is more complicated

Derivation (4) • We have the first order conditions for • The multivariate case is much more complicated than the bi-variate case, but the pattern remains the same • Denominator still considers the variation in X • The numerator still considers the variation of X1, X2, and Y

Derivation (6) • The multivariate case is much more complicated than the bi-variate case, but the pattern remains the same • Denominator still considers the variation in X • The numerator still considers the variation of X1, X2, and Y

Matrix of products & cross-products • This will help us calculate b1 and b2, as well as other test statistics we’ll need • The matrix of products and cross-products is symmetric

Example • On L12.xls there is an example of a matrix of products and cross-products that we’re interested in. • This spreadsheet also shows that LINEST can also accommodate a multivariate regression • From the spreadsheet we know:

Example (2) • We can then calculate: • We can also calculate

Example (3) • So now we can ask: What was the effect of including age? • Had we not included age, our bi-variate regression equation would be: Y = 4.53 + 0.097 X1,where X1 is years of schooling • Including age, the multivariate regression equation is: Y = 4.135 + 0.057 X1 + 0.023 X2 • By including age, we reduce the coefficient on education (X1) by nearly a half!

True & estimated models • A true model can come from: 1) Economic theory • an example of this is the Cobb-Douglas production function Y=ALK • the form is provided by economic theory • we want to test if  +  = 1 2) Ad-hoc variable inclusion • The justification for the variables comes from economic theory, but we include variables on the basis of significance in statistical tests • An example: the Phillips Curve

Omitted Variable Bias • Let’s go back to the returns to education example in L12.xls and examine Omitted Variable Bias: • Let’s assume that the true model is: Y = a + b1X1 + b2X2 +e • But what if we instead estimate the following model: Y = a + b1X1 +u where X1 is still years of education

True & estimated models (3) • Reasons why we might not estimate the true model • we might not be able to collect the necessary data • we might simply forget to include other variables such as age in the regression • Let’s rewrite our equations in terms of deviations from the mean: True model: y = b1x1 + b2x2 + e Estimated model: y = b1x1 + u

Omitted variable bias • Our estimate of the slope coefficient for the bi-variate model will be: • If we know the true model we can plug it into the above expression and take the expectation to get:

This represents the omitted variable bias Omitted variable bias (2) • We can multiply out the terms and simplify the expression: • Recall that one of our CLR assumptions is E(x1 e) = 0, so

This leads to a biased estimate of Omitted variable bias example • Returning to the L13.xls example, we have • If we think that then:

Recap / what’s to come • We learned that deriving the multivariate regression equation is similar to deriving the bi-variate case • We worked with a matrix of products and cross-products • We looked at the difference between true and estimated regression models • We learned to calculate the omitted variable bias • In the next few lectures we’ll be doing some more with multivariate models and applications

Unnecessary Variables • What happens if variables that are included in the estimated model, are not relevant under the ‘true’ model. Estimated model: y = b1x1 + b2x2 + e True model: y = b1x1 + u • If variables are unnecessary, they will not count in the estimated model. • How to detect that: t-ratio hypothesis tests/Joint hypothesis tests using the F-distribution. • Helps to make models parsimonious.

Multiple Regression