1 / 44

Effect Sizes in Education Research: What They Are, What They Mean, and Why They’re Important

Effect Sizes in Education Research: What They Are, What They Mean, and Why They’re Important. Howard Bloom (MDRC; Howard.Bloom2@mdrc.org) Carolyn Hill (Georgetown; cjh34@georgetown.edu) Alison Rebeck Black (MDRC; alison.black@mdrc.org) Mark Lipsey (Vanderbilt; mark.lipsey@vanderbilt.edu).

Jims
Download Presentation

Effect Sizes in Education Research: What They Are, What They Mean, and Why They’re Important

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effect Sizes in Education Research: What They Are, What They Mean, and Why They’re Important Howard Bloom (MDRC; Howard.Bloom2@mdrc.org) Carolyn Hill (Georgetown; cjh34@georgetown.edu) Alison Rebeck Black (MDRC; alison.black@mdrc.org) Mark Lipsey (Vanderbilt; mark.lipsey@vanderbilt.edu) Institute of Education Sciences 2006 Research Conference Washington DC

  2. Today’s Session • Goal: introduce key concepts and issues • Approach: focus on nexus between analytics and interpretation • Agenda • Core concepts • Empirical benchmarks • Important applications

  3. Part 1: The Nature (and Pitfalls) of the Effect Size Howard Bloom MDRC

  4. Starting Point • Statistical significance vs. substantive importance • Effect size measures for continuous outcomes (our focus) • Effect size measures for discrete outcomes

  5. The standardized mean difference

  6. Relativity of statistical effect sizes

  7. Variance components framework Decomposing the total national variance

  8. Ratio of Student-level to school-level standard deviations

  9. Unadjusted vs. regression-adjusted standard deviations

  10. Career Academies andFuture Earnings for Young Men Impact on Earnings Dollars per month increase $212 Percentage increase 18 % Effect size 0.30s

  11. Rate of Heart Attacks With placebo 1.71 % With aspirin 0.94 % Difference 0.77 % Effect Size0.06 s Aspirin and heart attacks Measures of Effect Size,” in Harris Cooper and Larry V. Hedges, The Handbook of Research Synthesis (New York: Russell Sage Foundation)

  12. Five-year impacts of the Tennessee class-size experiment Treatment: 13-17 versus 22-26 students per class Effect sizes: 0.11s to 0.22s for reading and math Findings were summarized from Nye, Barbara, Larry V. Hedges and Spyros Konstantopoulos (1999) “The Long-Term Effects of Small Classes: A Five-Year Follow-up of the Tennessee Class Size Experiment,” Educational Evaluation and Policy Analysis, Vol. 21, No. 2: 127-142.

  13. Part 2: What’s a Big Effect Size, and How to Tell? Carolyn Hill, Georgetown University Alison Rebeck Black, MDRC

  14. Need to interpret an effect size when: Designing an intervention study Interpreting an intervention study Synthesizing intervention studies To assess practical significance of an effect size: Compare to external criterion/standard Related to outcome construct Related to context How Big is the Effect?

  15. Cohen (speculative) Small = 0.20 s Medium = 0.50 s Large = 0.80 s Cohen, Jacob (1988) Statistical Power Analysis for the Behavioral Sciences 2nd edition (Hillsdale, NJ: Lawrence Erlbaum). Lipsey (empirical) Small = 0.15 s Medium = 0.45 s Large = 0.90 s Lipsey, Mark W. (1990) Design Sensitivity: Statistical Power for Experimental Research (Newbury Park, CA: Sage Publications). Prevailing Practice for Interpreting Effect Size: “Rules of Thumb”

  16. Preferred Approaches for Assessing Effect Size (K-12) • Compare ES from the study with: • ES distributions from similar studies • Student attainment of performance criterion without intervention • Normative expectations for change • Subgroup performance gaps • School performance gaps

  17. Percentile 50th 25th 75th 5th 95th -0.06 0.07 0.16 0.25 0.39 Effect Size (σ) ES Distribution from Similar Studies Percentile distribution of 145 achievement effect sizes from meta-analysis of comprehensive school reform studies (Borman et al. 2003):

  18. Attainment of Performance Criterion Based on Effect Size

  19. Attainment of Performance Criterion (continued)

  20. Normative Expectations for Change:Estimating Annual Reading and Math Gains in Effect Size from National Norming Samples for Standardized Tests • Seven tests were used for reading and six tests were used for math • The mean and standard deviation of scale scores for each grade were obtained from test manuals • The standardized mean difference across succeeding grades was computed • These results were averaged across tests and weighted according to Hedges (1982)

  21. Annual Reading and Math Growth Reading Math Grade Growth Growth Transition Effect Size Effect Size --------------------------------------------------------------- K - 1 1.59s 1.13s 1 - 2 0.94 1.02 2 - 3 0.57 0.83 3 - 4 0.37 0.50 4 - 5 0.40 0.59 5 - 6 0.35 0.41 6 - 7 0.21 0.30 7 - 8 0.25 0.32 8 - 9 0.26 0.19 9 - 10 0.20 0.22 10 - 11 0.21 0.15 11 - 12 0.03 0.00 ---------------------------------------------------------------------------------------- Based on work in progress using documentation on the national norming samples for the CAT5, SAT9, Terra Nova CTBS, Gates MacGinitie, MAT8, Terra Nova CAT, and SAT10.

  22. Demographic Performance Gaps from Selected Tests • Interventions may aim to close demographic performance gaps • Effectiveness of interventions can be judged relative to the size of gaps they are designed to close • Effect size gaps vary across grades, years, tests, and districts

  23. Performance Gaps between “Average” and “Weak” Schools • Main idea: • What is the performance gap (effect size) for the same types of students in different schools? • Approach: • Estimate a regression model that controls for student characteristics: race/ethnicity, prior achievement, gender, overage for grade, and free lunch status. • Infer performance gap (effect size) between schools at different percentiles of the performance distribution

  24. Interpreting the Magnitude of Effect Sizes • “One size” does not fit all • Instead, interpret magnitudes of effects in context • Of the interventions being studied • Of the outcomes being measured • Of the samples/subsamples being examined • Consider different frames of reference in context, instead of a universal standard: • ES distributions, external performance criteria, normative change, subgroup/school gaps, etc.

  25. Part 3: Using Effect Sizes in Power Analysis and Research Synthesis Mark W. Lipsey Vanderbilt University

  26. Statistical Power • The probability that a true intervention effect will be found statistically significant.

  27. Estimating Statistical Power Prospectively: Finding the MDE Specify: • alpha level– conventionally .05 • sample size (at all levels if multilevel design) • correlation between any covariates to be used and dependent variable • intracluster correlation coefficients (ICCs) if multilevel design • target power level– conventionally set at .80 Estimate: minimum detectable effect size

  28. Assessing the MDE • Compare with a target effect size-- the smallest ES judged to have practical significance in the intervention context • Design is underpowered if MDE > target (back to the drawing board) • Design is adequately powered if MDE ≤ target value

  29. Where Do You Get the Target Value for Practical Significance? • NOT some broad rule of thumb, e.g, Cohen’s “small,” “medium,” and “large” • Use a frame of reference appropriate to the outcome, population, and intervention • meaningful success criterion • research findings for similar interventions • change expected without intervention • gaps between relevant comparison groups • et cetera

  30. Selecting the Target MDE • Identify one or more reference frames that may be applicable to the intervention circumstances • Use that frame to guide selection of an MDE; involve other stakeholders • Use different reference frames to consider: • which is most applicable to the context • how sensitive the choice is to the frames • what the most conservative selection might be

  31. Power for Different Target MDEs(2-level design: students in classrooms) ES=.80 .80 ES=.50 ICC=.15 ES=.20 Number of Classrooms of N=20

  32. Power for Different Target MDEs(same with classroom covariate R2 =.50) ES=.80 .80 ES=.50 ES=.20 ICC=.15 Number of Classrooms of N=20

  33. Interpreting Effect Sizes Found in Individual Studies & Meta-Analysis • The practical significance of empirically observed effect sizes should be interpreted using approaches like those described here • This is especially important when disseminating research results to practitioners and policymakers • For standardized achievement measures, the practical significance of ES values will vary by student population and grade.

  34. Example: Computer-Assisted Instruction for Beginning Reading (Grades 1-4) Consider an MDE = .25 • Mean ES=.25 found in Blok et al 2002 meta-analysis • 27-65% increase over “normal” year-to-year growth depending on age • About 30% of the Grade 4 majority-minority achievement gap

  35. References Bloom, Howard S. 2005. “Randomizing Groups to Evaluate Place-Based Programs.” In Howard S. Bloom, editor. Learning More from Social Experiments: Evolving Analytic Approaches. New York: Russell Sage Foundation, pp. 115-172. Bloom, Howard S. 1995. “Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs.” Evaluation Review 19(5): 547-56. Borman, Geoffrey D., Gina M. Hewes, Laura T. Overman, and Shelly Brown. 2003. “Comprehensive School Reform and Achievement: A Meta-Analysis.” Review of Educational Research 73(2): 125-230. Hedges, Larry V. 1982. “Estimation of Effect Size from a Series of Independent Experiments.” Psychological Bulletin 92(2): 490-499. Kane, Thomas J. 2004. “The Impact of After-School Programs: Interpreting the Results of Four Recent Evaluations.” William T. Grant Foundation Working Paper, January 16. http://www.wtgrantfoundation.org/usr_doc/After-school_paper.pdf Konstantopoulos, Spyros, and Larry V. Hedges. 2005. “How Large an Effect Can We Expect from School Reforms?” Working paper #05-04, Institute for Policy Research, Northwestern University. http://www.northwestern.edu/ipr/publications/papers/2005/WP-05-04.pdf. Lipsey, Mark W. 1990. Design Sensitivity: Statistical Power for Experimental Research. Thousand Oaks, CA: Sage Publications. Schochet, Peter Z. 2005. “Statistical Power for Random Assignment Evaluations of Education Programs.” Project report submitted by Mathematic Policy Research, Inc. to Institute of Education Sciences, U.S. Department of Education. http://www.mathematica-mpr.com/publications/PDFs/statisticalpower.pdf

  36. Contact Information Howard Bloom (Howard.Bloom2@mdrc.org) Carolyn Hill (cjh34@georgetown.edu) Alison Rebeck Black (alison.black@mdrc.org) Mark Lipsey (mark.lipsey@vanderbilt.edu)

More Related