An introduction to Impact Evaluation

An introduction to Impact Evaluation Markus Goldstein Poverty Reduction Group The World Bank

My question is: Are we making an impact?

2 parts • Monitoring, evaluation and Impact Evaluation • The impact evaluation problem • Introduce fertilizer example

What is M&E? There is a difference between M and E! Monitoring: The gathering of evidence to show what progress has been made in the implementation of programs. Focuses on inputs and outputs but will often include outcomes as well. Evaluation: Measuring changes in outcomes and evaluating the impact of specific interventions on those outcomes.

Indicator 50% 40% 20 % 30% Year 1 2 3 4 5 Monitoring • Periodically collect data on the indicators and compare actual results with targets • To identify bottle-necks and red flags (time-lags, fund flows) • Point to what should be further investigated Regular collection and reporting of information to track whether actual results are being achieved as planned

Withproject Indicator 50% 15 % 40% With outproject 30% Year 1 2 3 4 5 Evaluation • Analyses why intended results were or were not achieved • Explores unintended results • Provides lessons learned and recommendations for improvement Analytical efforts to answer specific questions about performance of program activities. Oriented to answering WHY? And HOW?

Monitoring Routine collection of information Tracking implementation progress Measuring efficiency “Is the project doing things right ?” Evaluation Ex-post assessment of effectiveness and impact Confirming (or not) project expectations Measuring impacts “Is the project doing the right things?” Complementary roles for M&E

monitoring

Understanding the different levels of indicators

Selecting Indicators The “CREAM” of Good Performance Salvatore-Schiavo-Campo 2000

Compare with SMART Indicators … Specific Measurable Attributable Realistic and relevant Time-bound

And some other thoughts on monitoring • Information must be available in time for it be put to use • Think about the use you will put the information to when deciding what to collect • Monitoring is not about the quantity of indicators, it is about their quality

evaluation

Thinking about types of evaluation • “e” lies in between M and IE (impact evaluation) • Analyzing existing information (baseline data, monitoring data) • Drawing intermediate lessons • Serves as a feed-back loop into project design • Useful for analyzing and understanding processes, not for establishing causality

Examples of non-impact evaluation approaches • Non-comparative designs – no counterfactual required: • Case study • Rate of return analysis – present discounted value (e.g. by subprojects in CDD portfolio) • Process analysis (e.g. understanding how inputs translate into outputs) • Lot quality assurance

Examples of how “e” helps Timely information to: • Revise targeting: A watershed project found that 30% of the livelihood component (meant exclusively for marginal and landless) was benefiting small-big farmers. This information was used to do some mid course corrections. • Monitor progress: In a CDD project, several goats purchased with project funding died. This led to the introduction of livestock insurance as a pre requisite. • Monitor implementing agency: an NGO only built pit green houses (supply driven or demand driven?)

impact evaluation

Monitoring and IE

Monitoring and IE IMPACTS Program impacts confounded by local, national, global effects difficulty of showing causality OUTCOMES Users meet service delivery OUTPUTS Gov’t/program production function INPUTS

Impact evaluation • Many names (e.g. Rossi et al call this impact assessment) so need to know the concept. • Impact is the difference between outcomes with the program and without it • The goal of impact evaluation is to measure this difference in a way that can attribute the difference to the program, and only the program

Why it matters • We want to know if the program had an impact and the average size of that impact • Understand if policies work • Justification for program (big $$) • Scale up or not – did it work? • Compare different policy options within a program • Meta-analyses – learning from others • (with cost data) understand the net benefits of the program • Understand the distribution of gains and losses

What we need  The difference in outcomes with the program versus without the program – for the same unit of analysis (e.g. individual) • Problem: individuals only have one existence • Hence, we have a problem of a missing counter-factual, a problem of missing data

Thinking about the counterfactual • Why not compare individuals before and after (the reflexive)? • The rest of the world moves on and you are not sure what was caused by the program and what by the rest of the world • We need a control/comparison group that will allow us to attribute any change in the “treatment” group to the program (causality)

comparison group issues • Two central problems: • Programs are targeted  Program areas will differ in observable and unobservable ways precisely because the program intended this • Individual participation is (usually) voluntary • Participants will differ from non-participants in observable and unobservable ways • Hence, a comparison of participants and an arbitrary group of non-participants can lead to heavily biased results

Example: providing fertilizer to farmers • The intervention: provide fertilizer to farmers in a poor region of a country (call it region A) • Program targets poor areas • Farmers have to enroll at the local extension office to receive the fertilizer • Starts in 2002, ends in 2004, we have data on yields for farmers in the poor region and another region (region B) for both years • We observe that the farmers we provide fertilizer to have a decrease in yields from 2002 to 2004

Did the program not work? • Further study reveals there was a national drought, and everyone’s yields went down (failure of the reflexive comparison) • We compare the farmers in the program region to those in another region. We find that our “treatment” farmers have a larger decline than those in region B. Did the program have a negative impact? • Not necessarily (program placement) • Farmers in region B have better quality soil (unobservable) • Farmers in the other region have more irrigation, which is key in this drought year (observable)

OK, so let’s compare the farmers in region A • We compare “treatment” farmers with their neighbors. We think the soil is roughly the same. • Let’s say we observe that treatment farmers’ yields decline by less than comparison farmers. Did the program work? • Not necessarily. Farmers who went to register with the program may have more ability, and thus could manage the drought better than their neighbors, but the fertilizer was irrelevant. (individual unobservables) • Let’s say we observe no difference between the two groups. Did the program not work? • Not necessarily. What little rain there was caused the fertilizer to run off onto the neighbors’ fields. (spillover/contamination)

The comparison group • In the end, with these naïve comparisons, we cannot tell if the program had an impact  We need a comparison group that is as identical in observable and unobservable dimensions as possible, to those receiving the program, and a comparison group that will not receive spillover benefits.

What difference do unobservables make? Microfinance in Thailand • 2 NGOs in north-east Thailand • Village banks with loans of 1500-7500 (300 US$) baht • Borrowers (women) form peer groups, which guarantee individual borrowing • What would we expect impacts to be?

Comparison group issues in this case: • Program placement: villages which are selected for the program are different in observable and unobservable ways • Individual self-selection: households which choose to participate in the program are different in observable and unobservable ways (e.g. entrepreneurship) • Design solution: 2 groups of villages -- in comparison villages allow membership but no loans at first

Results from Coleman (JDE 1999)

summing up Step back and put it into context…

So … where do you begin? • Clear objectives of the project (what is the problem?) • Clear idea of how you will achieve the objectives (causal chain or storyline) • Outcome focused: • Answer the question: What visible changes in behavior can be expected among end users as a result of the project, thus validating the causal chain? Ideally at preparation stage

Design the “M, e and IE” Plan • What? • Type of information and data to be consolidated • How? • Procedures and approaches including methods for data collection and analysis • Why? • How the collected data will support monitoring and project management • When? • Frequency of data collection and reporting • Who? • Focal points, resource persons and responsibilities

Choose your tools and what they will cover • Monitoring, a must – for key indicators • Evaluation – to understand processes, analyze correlations • Impact evaluation – where you want to establish causal effects

Thank you

An introduction to Impact Evaluation