Common Errors: How to (and Not to) Control for Unobserved Heterogeneity

Common Errors: How to (and Not to) Control for Unobserved Heterogeneity Lecture slides by Todd Gormley

What are these slides? • The following slides are a combination of lecture slides used by Todd Gormley in his Ph.D. course on “Empirical Methods in Corporate Finance” at The Wharton School • For more details about the issues discussed in these slides, please see the below article • Gormley, T. and D. Matsa, 2014, “Common Errors: How to (and Not to) Control for Unobserved Heterogeneity,”Review of Financial Studies 27(2): 617-61.

Motivation [Part 1] • Controlling for unobserved heterogeneity is a fundamental challenge in empirical finance • Unobservable factors affect corporate policies and prices • These factors may be correlated with variables of interest • Important sources of unobserved heterogeneity are often common across groups of observations • Demand shocks across firms in an industry, differences in local economic environments, etc.

Motivation [Part 2] • E.g. consider a the firm-level estimation where leverage is debt/assets for firm i, operating in industry j in year t, and profit is the firms net income/assets What might be some unobservable omitted variables in this estimation?

Motivation [Part 3] • Oh, there are so, so many… • Managerial talent and/or risk aversion • Cost of capital • Industry supply and/or demand shock • Regional demand shocks • And so on… • Easy to think of ways these might be affect leverage and be correlated with profits Sadly, this is easy to do with other dependent or independent variables…

Panel data to the rescue… • Thankfully, panel data can help us with a particular type of unobserved variable… • What type of unobserved variable does panel data help us with, and why? • Answer = It helps with any unobserved variable that doesn’t vary within groups of observations

Outline for lecture • Panel data and fixed effects (FE) • How not to control for unobserved heterogeneity • General implications • Benefits and limitations of FE model • Estimating high-dimensional FE models

Panel data • Panel data = whenever you have multiple observations per unit of observation i (e.g. you observe each firm over multiple years) • Let’s assume N units i • And, J observations per unit i [i.e. balanced panel] • E.g., You observe 5,000 firms in Compustat over a twenty year period [i.e. N=5,000, J=20]

When unobserved heterogeneity is thought to be present, researcher implicitly assumes the following: i indexes groups of observations (e.g. industry); j indexes observations within each group (e.g. firm) yi,j= dependent variable Xi,j= independent variable of interest fi= unobserved group heterogeneity = error term The underlying model [Part 1]

The following standard assumptions are made: The underlying model [Part 2] N groups, J observations per group, where J is small and N is large X and ε are i.i.d. across groups, but not necessarily i.i.d. within groups Simplifies some expressions, but doesn’t change any results

Finally, the following assumptions are made: The underlying model [Part 3] What do these imply? Answer = Model is correct in that if we can control for f, we’ll properly identify effect of X; but if we don’t control for f there will be omitted variable bias Source of identification concern

By failing to control for group effect, fi, OLS suffers from omitted variable bias OLS estimate of β is inconsistent True model is: But OLS estimates: Alternative estimation strategies are required…

Can solve this by transforming data • First, notice that if you take the population mean of the dependent variable for each unit of observation, i, you get… where Again, I assumed there are J obs. per unit i

Transforming data [Part 2] • Now, if we subtract from , we have • And look! The unobserved variable, fi , is gone (as is the constant) because it is group-invariant • With our earlier assumptions, easy to see that is uncorrelated with the new disturbance, , which means… ?

Fixed effects (or within) estimator • Answer: OLS estimation of transformed model will yield a consistent estimate of β • The prior transformation is called the “within transformation” because it demeans all variables within their group • This is also called the FE estimator

Least Squares Dummy Variable (LSDV) • Another way to do the FE estimation is by adding indicator (dummy) variables • I.e. create a dummy variable for each group i, and add it to the regression • This is least squares dummy variable model • Now, our estimation equation exactly matches the true underlying model

LSDV versus FE [Part 1] • Why do both approaches work? Well… • Frisch-Waugh-Lovell Theorem shows us there are two ways to estimate the below β1… • Estimate directly; i.e. regress y onto both x and z • OR we can just partial z out from both y and x before regressing y on x (i.e. regress residuals from regression of y on z onto residuals from regression of x on z)

LSDV versus FE [Part 2] • Can show that LSDV and within-transformation of FE are identical because demeaned variables of within regression are the residuals from a regression onto group dummies!

Other approaches… • Gormley and Matsa (RFS 2014) notes that existing literature uses various other strategies to control for unobserved group-level heterogeneity… Their questions – How do each of the approaches differ? And, when are they consistent? Their answer – Some popular strategies can distort inferences and should not be used; FE estimator should be used instead

They focus on two popular strategies • “Adjusted-Y” (AdjY) – dependent variable is demeaned within groups [e.g. ‘industry-adjust’] • “Average effects” (AvgE) – uses group mean of dependent variable as control [e.g. ‘state-year’ control]

AdjY & AvgE are widely used • In Journal of Finance, Journal of Financial Economics, and Review of Financial Studies • Used since at least the late 1980s • Still used, 60+ papers published in 2008-2010 • Variety of subfields; asset pricing, banking, capital structure, governance, M&A, etc. • Also been used in papers published in the American Economic Review, Journal of Political Economy, and Quarterly Journal of Economics

But, AdjY and AvgE are inconsistent • As Gormley and Matsa (RFS 2014) shows… • Both can be more biased than OLS • Both can get opposite sign as true coefficient • In practice, bias is likely and trying to predict its sign or magnitude will typically impractical

Other, related strategies should also not be used “Characteristically-adjusted” stock returns in AP “Adjusted” stock returns when trying to estimate firms’ internal value of cash Simple comparisons of benchmark-adjusted outcomes before & after events (like M&A) “Diversification discount” Using group average of an independent variable as instrumental variable Now, let’s see why… More implications of GM (RFS 2014)

Tries to remove unobserved group heterogeneity by demeaning the dependent variable within groups Adjusted-Y (AdjY) AdjY estimates: where Note: Researchers often exclude observation at hand when calculating group mean or use a group median, but both modifications will yield similarly inconsistent estimates

Example AdjY estimation One example – firm value regression: = Tobin’s Q for firm j, industry i, year t = mean of Tobin’s Q for industry i in year t Xi,j,t = vector of variables thought to affect value Researchers might also include firm & year FE Anyone know why AdjY is going to be inconsistent?

Rewriting the group mean, we have: Therefore, AdjY transforms the true data to: Here is why… What is the AdjY estimation forgetting?

can be inconsistent when By failing to control for , AdjY suffers from omitted variable bias when AdjY has omitted variable bias True model: But, AdjY estimates: In practice, a positive covariance between X and will be very common

Further analysis of AdjY estimate • Bias doesn’t disappear as group size J increases • Can be inconsistent even when OLS is not; this happens when σXf = 0 and Bias is more complicated with two variables…

AdjY estimates with 2 variables • Suppose, there are instead two RHS variables • Use same assumptions as before, but add: True model:

AdjY estimates with 2 variables [Part 2] • With a bit of algebra, it is shown that: Estimates of both β and γ can be inconsistent Determining sign and magnitude of bias will typically be difficult

AvgE uses group mean of dependent variable as control for unobserved heterogeneity Average effects (AvgE) AvgE estimates:

Example AvgE estimation Following profit regression is an AvgE example: ROAs,t = mean of ROA for state s in year t Xi,s,t = vector of variables thought to profits Researchers might also include firm & year FE Anyone know why AvgE is going to be inconsistent?

AvgE uses group mean of dependent variable as control for unobserved heterogeneity Average effects (AvgE) AvgE estimates: Recall, true model: Problem is that measures fi with error

Recall that group mean is given by Therefore, measures fi with error As is well known, even classical measurement error causes all estimated coefficients to be inconsistent Bias here is complicated because error can be correlated with both mismeasured variable, , and with Xi,j when AvgE has measurement error bias

AvgE estimate of β with one variable • With a bit of algebra, it is shown that: Determining magnitude and direction of bias is difficult Covariance between X and again problematic, but not needed for AvgE estimate to be inconsistent Even non-i.i.d. nature of errors can affect bias!

How common will the bias be? • First, we look at when by separating Xi,jinto it’s group and idiosyncratic components Idiosyncratic component distributed with mean 0 and variance Assume group means are i.i.d. with mean zero and variance And, assume

AdjY and AvgE bias very common • Both AdjY and AvgE biased when • But with prior setup, we can show that… Or, bias whenever observations within groups are not independent! Bias whenever different means across groups! * Solved excluding observation at hand (most common approach)

Analytical comparisons • Next, we use analytical solutions to compare relative performance of OLS, AdjY, and AvgE • To do this, we re-express solutions… • We use correlations (e.g. solve bias in terms of correlation between X and f, , instead of ) • We also assume i.i.d. errors [just makes bias of AvgE less complicated]

ρXfhas large effect on performance (from Figure 1A) AdjY more biased than OLS, except for large values for ρXf Estimate, AvgE worst for low correlations, best for high OLS True β = 1 AdjY Other parameters held constant AvgE

Relative variation across groups key (from Figure 1B) Estimate, OLS AvgE AdjY

More observations need not help! (from Figure 1F) Estimate, OLS AvgE AdjY J

Summary of OLS, AdjY, and AvgE • In general, all three estimators are inconsistent in presence of unobserved group heterogeneity • AdjY and AvgE may not be an improvement over OLS; depends on various parameter values • AdjY and AvgE can yield estimates with opposite sign of the true coefficient

Comparing FE, AdjY, and AvgE • To estimate effect of X on Y controlling for Z • One could regress Y onto both X and Z… • Or, regress residuals from regression of Y on Z onto residuals from regression of X on Z Add group FE Within-group transformation! • AdjY and AvgE aren’t the same as finding the effect of X on Y controlling for Z because... • AdjY only partials Z out from Y • AvgE uses fitted values of Y on Z as control

The differences matter! Example #1 Consider the following capital structure regression: (D/A)it= book leverage for firm i, year t Xi,t = vector of variables thought to affect leverage fi = firm fixed effect We now run this regression for each approach to deal with firm fixed effects, using 1950-2010 data, winsorizing at 1% tails…

Estimates vary considerably (from Table 2)

The differences matter! Example #2 Consider the following firm value regression: Q = Tobin’s Q for firm i, industry j, year t Xi,j,t = vector of variables thought to affect value fj,t = industry-year fixed effect We now run this regression for each approach to deal with industry-year fixed effects…

Estimates vary considerably (from Table 4)

The differences matter! Example #3 It also matters in literature on antitakeover laws Past papers used AvgE to control for unobserved, time-varying differences across states & industries Gormley and Matsa (2014) show that properly using industry-year, state-year, and firm FE estimator changes estimates considerably E.g., using this framework, they show that managers have an underlying preference to “Play it Safe” For details, see http://ssrn.com/abstract=2465632

Common Errors: How to (and Not to) Control for Unobserved Heterogeneity