1 / 17

Test co-calibration and equating

Test co-calibration and equating. Paul K. Crane, MD MPH General Internal Medicine University of Washington. Outline. Definitions and motivation Educational testing literature Concurrent administration designs Separate administration designs PARSCALE coding considerations

tpryor
Download Presentation

Test co-calibration and equating

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Test co-calibration and equating Paul K. Crane, MD MPH General Internal Medicine University of Washington

  2. Outline • Definitions and motivation • Educational testing literature • Concurrent administration designs • Separate administration designs • PARSCALE coding considerations • Illustration with CSI ‘D’ and CASI • Coming attractions; comments

  3. Definition • Distinction between “equating” and “co-calibration” • We almost always mean “co-calibration” • General idea is to get all tests of a kind on the same metric • Error terms will likely differ, but tests are trying to measure the same thing

  4. 5 things needed for “equating” • Scale measures same concept • Scales have same level of precision • Procedures from scale A to B are inverse of scale B to A • Distribution of scores should be identical for individuals of a given level • Equating function should be population invariant • (Linn, 1993; Mislevy, 1992; Dorans, 2000)

  5. Motivation for co-calibration • Many tests measure “the same thing” • MMSE, 3MS, CASI, CSI ‘D’, Hasegawa, Blessed…. • PRIME-MD, CESD, HAM-D, BDI, SCID…. • Literature only interpretable if one is familiar with the nuances of the test(s) used • Studies that employ multiple measures (such as the CHS) face difficulty in incorporating all their data into their analyses • In sum: facilitates interpretation and analysis

  6. Educational literature • Distinct problems: • Multiple levels of same topic, e.g. 4th grade math, 5th grade math, etc. (“vertical” equating) • Multiple forms of same test, e.g. dozens of forms of SAT, GRE to prevent cheating (“horizontal” equating) • Making sure item difficulty is constant year to year (item drift analyses)

  7. Strategies are the same • Either need to have common items in different populations, or common people with different tests • Analyze big dataset that contains all items and people • Verify that common (people or items) are acting as expected

  8. Concurrent administration • Common population design:

  9. Separate administration • Anchor test design – e.g., McHorney

  10. Item bank development

  11. Comments • Fairly simple; God is in the details! • Afternoon workgroup will address the details • Illustration to follow

  12. PARSCALE code • For concurrent administration, it’s as if there is a single longer test • For separate administration, basically a lot of missing data • Once data are in correct format, PARSCALE does the rest

  13. Illustration: CSI‘D’ and CASI

  14. Information curves

  15. SEM

  16. Relative information

  17. Coming attractions • Optimizing screening tests from a pool of items (on Friday) • Item banking and computer adaptive testing (PROMIS initiative) • Incorporation of DIF assessment (tomorrow) • Comments and questions

More Related