370 likes | 596 Views
Research Method. Lecture 1 (Ch1, Ch2) Simple linear regression. The goal of econometric analysis. To estimate the causal effect of one variable on another The effect of one variable on another, holding all other relevant factors constant.
E N D
Research Method Lecture 1 (Ch1, Ch2) Simple linear regression
The goal of econometric analysis • To estimate the causal effect of one variable on another The effect of one variable on another, holding all other relevant factors constant. • Causal effect in other words is cetris paribus effect, which means “other relevant factors being constant”
For example consider the following model (Crop yield)= β0+ β1(fertilizer)+u You are interested in the causal effect of the amount of fertilizer on crop yield. u contains all relevant factors which are unobserved by the researcher, such as the quality of land.
One way to obtain the causal effect is to control for all other relevant variables, like (Crop yield)= β0+ β1(fertilizer)+ β2(land quality)+. . . . +u In reality, we do not have all the relevant variables in the data set.
However, under certain conditions, even if we do not have all the relevant variables in the data, we can estimate the causal effect. • In this lecture, you will learn such conditions for the case of simple linear regression.
Type of data sets • Cross sectional • Time series • Pooled cross sectional • Panel Data
A simple linear regression Assumptions SLR.1: Linear in parameters In the population model the dependent variable, y, is related to the independent variable, x and the error term, u ,as y=β0+β1x+u
Assumption SLR.2: Random sampling We have a random sample of size n, {xi,yi} for i=1,..,n, following the population model.
Understanding SLR.2 is important. Suppose you have the following data.Then SLR,2 means the following • SLR.2a: y1, y2,.., yn are independently and identically distributed • SLR.2b: x1, x2,.., xn are independently and identically distributed. • SLR.2c: xi and yj are independent for i≠j • SLR.2d: u1 u2,…, un are independently and identically distributed
Assumption SLR.3 The sample outcome of x, namely, x1,x2,…,xn are not all the same value.
Assumption SLR.4: Zero conditional mean Given any value of x, the expected value of u is zero, that is E(u|x)=0
Combined with SLR.2 and SLR.4, we have the following. Given the data {xi,yi} for i=1,2,…,n we have SLR4.a E(ui|xi)=0 for i=1,2,…,n SLR.4b E(ui|x1,x2,…,xn)=0 for i=1,2,…,n We usually write this as E(ui|X)=0 for short hand notation.
Note the following • E(u|x)=0 implies cov(u,x)=0 • But cov(u,x)=0 does not necessarily imply E(u|x)=0 • E(u|x)=0 does not imply that u and x are independent. • But if u and x are independent, E(u|x)=0 is always satisfied. SLR.4 is the assumption that allows you to interpret the result as “causal effect”.
Estimation of β0 and β1 • From the assumptions, we can motivate the estimation procedure. SLR.4 implies the following E(u)=0 E(ux)=0 This motivates the following empirical counter parts.
The hat above the coefficients indicate that they are the estimates of the true parameter β0 and β1 Let us call the above two equations as “the first order condition (FOCs)” for the simple linear regression. By solving FOCs for beta coefficients, we have the following estimates. (See next page)
The estimators for simple OLS Proof: See the front board These are called the ordinary least square (OLS) estimators.
After estimating coefficients, you can compute the residual, which is the estimated value of the error term, u.
Some useful results • From the FOCs, the following equations follow. We will use above equations many times in the proofs of various theorems.
SST, SSE and SSR Total sum of squares: Explained sum of squares: Residual sum of squares: • There are the following relationship • SST=SSE+SSR • Proof: See front board
R squared R squared is a measure of fit. R squared is always between 0 and 1.
Unit of measurements and functional form • Level-Level from Example: the determinants of CEO salary Salary = β0+β1(Sales)+u Where Salary is in $1000 and sales is in $1000. Then β1 shows the change in CEO salary in $1000 when sales increases by $1000.
Log-log form Suppose you regress log(salary) on log(sales) in the CEO compensation example, Log(Salary) = β0+β1log(Sales)+u Then, β1 shows the % change. That is if sales increases by 1%, salary would increases by β1%.
Log-level form Example: the return on education Log(wage) = β0+β1(educ)+u Where wage is the hourly wage in $1, educ is the years of education. Then, if education increases by 1 year, wage increases by 100×β1%.
Unbiasedness of OLS Theorem 2.1 Under SLR.1 through SLR.4, we have Proof: See the front board.
Variance of OLS estimators • First, we introduce one more assumption Assumption SLR.5: Homoskedasticity Var(u|x)=σ2 This means that the variance of u does not depend on the value of x.
Combining SLR.5 with SLR.2, we also have MRL.4a Var(ui|X)=σ2 for i=1,…,n where X denotes the independent variable for all the observations. That is, x1, x2,…, xn.
Theorem 2.2 where Proof: See front board
The standard deviations of the estimated parameters are then given by
Estimating the error variance • In Theorem 2.2, σ2 is unknown, which has to be estimated. • The estimate of σ2 is given by
Theorem 2.3:Unbiased estimator of σ2 . Under SLR.1 through SLR.5, we have Proof: See the front board
Estimates of the variance and the standard errors of OLS slope parameter • We replace the σ2 in the theorem 2.2 by to get the estimate of the variance of the OLS parameters. This is given by Note the is a hat indicating that this is an estimate. • Then the standard error of the OLS estimate is the square root of the above. This is the estimated standard deviation of the slope parameter.