1 / 37

GY460 Techniques of Spatial Analysis

GY460 Techniques of Spatial Analysis. Lecture 4: Techniques for dealing with spatial sorting and selection: (fixed effects, diff-in-diff, matching and discontinuities etc.). Steve Gibbons. Introduction. Sometimes we just want to eliminate problems induced by spatial ‘sorting’ and heterogeneity

blue
Download Presentation

GY460 Techniques of Spatial Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GY460 Techniques of Spatial Analysis Lecture 4: Techniques for dealing with spatial sorting and selection:(fixed effects, diff-in-diff, matching and discontinuities etc.) Steve Gibbons

  2. Introduction • Sometimes we just want to eliminate problems induced by spatial ‘sorting’ and heterogeneity • i.e. differences between places which may lead to ‘confounding’ factors and biased estimates of relationships of interest • Selection (sorting) on observable and unobservable characteristics • Examples: • Eliminating spatial factors from models of firm behaviour • Eliminating geographical influences from models of school quality • Various methods are available for dealing with this; we have looked some of these already…

  3. Regression models with spatial effects

  4. Data with discrete zones • N observations in the data • Grouped in to M zones (regions, districts, neighbourhoods) • E.g. • Cross-section data with >1 cross-sectional observations in each neighbourhood • Or panel data with more than one time period for each neighbourhood

  5. Spatial variation in the mean • Empirical model, with discrete ‘neighbourhoods’ m • yim for observation i in place m, depends on: • xim : characteristics of observation i in place m • im : unobserved factors for observation i in place m • um : Unobserved factors common to all observations in place m • X-sectional case: i = cross-sectional units, m = places • Panel data case: i = time units, m=places

  6. Random effects • Empirical model, with discrete ‘neighbourhoods’ m • If um uncorrelated with xim, then OLS consistent; just like spatial error model • Error terms  are correlated within spatial groups m • But uncorrelated between spatial groups • Use GLS or ML (assuming normality) for efficient estimates and unbiased s.e.s (multi-level modelling)

  7. Fixed area effects – dummy variables • Empirical model, with discrete ‘neighbourhoods’ m • If um correlated with xim, then OLS inconsistent. • Options: • Estimate the area ‘fixed effects’ using OLS • Least Squares Dummy variable model: neighbourhood dummy variables

  8. Fixed area effects – within groups • Or ‘within-groups’ transformation: difference the variables from the neighbourhood mean • Where is the mean of y in group m • Eliminates um • Estimate by OLS • Only uses deviation of variables from neighbourhood means – so only within-neighbourhood variation counts • LSDV and Within Groups (or (‘Fixed Effect’) models are equivalent

  9. Fixed area effects – panel data • Even better: information with repeated observations on panel units (individuals, firms, regions etc.) over time • Panel data • Now all relationships of interest can be estimated from variation within panel units over time • Use within-groups or first-differences over time, e.g. • Q: what does (vt-vt-1 ) represent? How could you control for it? Then, what variation in the data allows us to estimate ? Hence what do we assume, if  is to be estimated consistently?

  10. Dynamic panel data models • It would be useful to estimate this model – e.g. to estimate the dependence of y on past values (or control for mean reversion) • Q: Can this within group model be estimated consistently by OLS? • See Nickell (1981) Econometrica • What about the first differenced model? • Q: Is there a useful IV here?

  11. Dynamic panel data models • In principle you could use instruments for : • This is the basis of the Arrelano Bond estimator (1991, Review of Economic Studies) • They develop a GMM estimator which weights the instruments taking into account the first-differenced error structure e.g. implemented in “xtabond” in STATA • Problems: serial correlation in error terms?, if  is close to zero the instruments will be very weak (since lagged values don’t predict current values if =0) • Can also use as instruments for • System GMM (Blundell and Bond 1998): xtabond2

  12. Spatial panel data models • These look attractive e.g. to eliminate sorting i.e. u_i • But this still suffers from the simultaneity problems of the spatial y model – requires maximum likelihood or instruments for • Also difficult to defend that there is spatial correlation, but no time-dynamics • So you have to estimate • Have to deal with time dynamic y and spatial y!

  13. Spatial panel data models • Probably more useful to consider the reduced form e.g.

  14. Difference in difference

  15. Difference-in-difference • Suppose we have places, firms individuals i observed over time. • Treatment group D=1 is exposed to some treatment x=1,0 at time t=1, whereas a control group D=0 is not • There is selection into treatment group (E[f|D]0) and common time effects g

  16. Difference-in-difference • The effect of the treatment can be estimated by a “Difference in difference” estimator • Note that this is the same as you’d get from OLS on

  17. Difference-in-difference • The DiD estimator is commonly used for evaluation of policy interventions • DiD doesn’t work if the treatment and control groups have different time trends • If the composition of the treatment or control groups change before and after treatment e.g.

  18. Matching

  19. Matching estimators • ‘Matching’ tries to do something similar, when treatment and control group are not both observed pre and post policy • Suppose we observe two groups • Suppose the goal is to estimate the “Average effect of the Treatment on the Treated” (ATT) • As we know, simple difference in means won’t work: • i.e. because the treated and non-treated would have different Y in the absence of treatment

  20. Matching estimators • But suppose we have some observable characteristics Z for which • i.e. mean pre-treatment Y for individuals with characteristics Z is the same, whether or not they are in the treatment group • Called “Conditional Independence Assumption CIA” • Allows for selection into treated and non-treated groups by Z (selection on observables), but not by unobservables. • So if you can find individuals in group 0 who have the same Z as those in group 1 you can estimate from the individuals in group 0 • If Z is discrete this is straightforward..

  21. Matching estimators • So we can estimate • The naïve estimate of the effect of the treatment is 190-125 = 65

  22. Matching estimators • For the treated, Y0 is unobserved but can be estimated by re-weighting (under the CIA assumption) • So the ATT is 190-180 = 10

  23. Matching estimators • But what if (as is usual) Z is not discrete? Propensity score matching does this reweighting using an estimate of the probabilty that individual with characteristics z is in the treatment group • (Rosenbaum and Rubin (1983) Biometrika) • Requires a first stage estimate of Pr(D=1 | Z) e.g. from a probit or logit regression on Z • Then the treatment effect for an individual i in the treated group can be estimated as • Where the weights depend on the difference between the propensity score for individual i and the untreated controls j, and:

  24. Matching estimators • In practice Matching estimators behave like ‘kitchen sink’ regressions: you are just controlling for as many observable characteristics as possible (Z) • However, you are controlling for these Z in a very non-linear way: like having lots of control variables and their interactions in an OLS regression • Matching estimators allow for heterogenous treatment effects • You can re-weight in other ways, e.g. to estimate the effect of the treatment on the population, or on the un-treated • No solution to selection on unobservables – which is surely the main issue! • Requires “common support”: no overlap between Z in the treated and untreated groups  you can’t match.

  25. Discontinuity designs

  26. Discontinuity designs • Regression discontinuity method tries to identify causal effects from abrupt changes • Requires a discontinuity induced by institutional rules, policy etc. • e.g. majority voting • Class size rules – e.g. Maimonides rule • Geographical administrative boundaries • Assumption is that assignment to treatment is determined by some covariate X when it reaches a value d • The outcome is otherwise only related to X by a smooth function e.g. E[y|X] = m(X)

  27. Discontinuity designs

  28. Discontinuity designs • So • Idea is to estimate the average effect of the treatment at the discontinuity point • We could control for a m(x) parametrically (polynomial series etc.) • Or restrict the sample to observations for which x is close to c i.e.

  29. Admissions boundary Boundary discontinuities School quality in district B +ve quality-price relationship across boundary Price, homeowner characteristics Price, homeowner characteristics School quality in district A Unobserved local amenity

  30. Discontinuity designs • In principle, X is identical for treatment and controls exactly at the discontinuity • But practical applications require non-zero differences between X and discontinuity • E.g. can rarely find a large enough sample of housing transactions exactly on the boundary • Trade off between adequate sample size and elimination of biases due to m(x) • We looked at practical spatial examples – e.g. Black (1999), Duranton et al (2006) • See also Gibbons, S., Machin, S and Silva, O. (2009), Valuing School Quality Using Boundary Discontinuity Regressions, SERC DP0018 http://www.spatialeconomics.ac.uk/textonly/SERC/publications/download/sercdp0018.pdf

  31. Applications to spatial policy evaluation • Research designs can incorporate elements of all these methods e.g. match treatment and control groups using propensity score matching, then implement dif in dif • Machin, S., McNally, S., Meghir,C. (2007), Resources and Standards in Urban Schools, IZA DP2653 http://ftp.iza.org/dp2653.pdf • Busso, M. and P. Kline (2006) Do Local Economic Development Programs Work, Evidence from Federal Empowerment Zone Program, http://www.econ.berkeley.edu/~pkline/papers/Busso-Kline%20EZ%20(web).pdf • Romero, R. and M. Noble (2008) Evaluating England’s New Deal for Communities Programme Using the Difference in Difference method, Journal of Economic Geography 8(6): 1-20

  32. The partial linear model

  33. Continuous space • A general model with spatial heterogeneity: • Si is an index of the location of observation i • Model continuous unobserved variation over space • m(.) is supposed to represent large-scale predictable variation over space – e.g. land values •  random shocks – sales price of specific houses • We discussed these issues in the lecture on smoothing • Could do it parametrically e.g. polynomial series or Cheshire and Sheppard (1995) – see earlier lectures

  34. Partial linear model • Suppose • If we know , function m(.) is just the expected (mean) value of y-xb given the location s1, s2 • Refer to the lecture on smoothing: this can be inferred from values of y in neighbouring locations once we know  • Spatial weighting again • Kernel weighting, nearest neighbours etc..

  35. Semi-parametric spatial models • Must get estimates of beta first? How? • e.g. see Robinson (1988), Econometric, Root-n consistent Semiparametric Regression • Estimate averages of y and all x at each point in the data, non-parametrically • Estimate the betas by OLS on • Note: analogy to the within-groups model • Can then estimate

  36. Applications to housing analysis • Clapp, J. M., H.-J. Kim, and A. E. Gelfand (2002): "Predicting Spatial Patterns of House Prices Using Lpr and Bayesian Smoothing," Real Estate Economics, 30, (4), 505-532 • Use of non-parametric methods to construct house price indices • Gibbons, S., and S. Machin (2003): "Valuing English Primary Schools," Journal of Urban Economics, 53, (2). • Use of the semi-parametric model for eliminating larger-scale neighbourhood effects on school performance

  37. Conclusions • Underlying issue we have considered is selection or sorting e.g. people, firms etc of different types sort into different locations and this can lead to biased estimates of causal relationships • Selection can be on unobservables, or observables • We considered various techniques for dealing with these problems • Other solutions – random assignment, IV we have or will consider elsewhere.

More Related