1 / 14

Overview of Presentation

A Model for Scaling, Linking, and Reporting Through-Course Summative Assessments Rebecca Zwick Robert J. Mislevy Educational Testing Service NCSA – Orlando, June 20, 2011. Overview of Presentation. Desired properties of a model for analyzing through-course summative assessments (TCAs)

ulf
Download Presentation

Overview of Presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Model for Scaling, Linking, and Reporting Through-Course Summative AssessmentsRebecca ZwickRobert J. MislevyEducational Testing ServiceNCSA – Orlando, June 20, 2011

  2. Overview of Presentation • Desired properties of a model for analyzing through-course summative assessments (TCAs) • Description of the proposed model • A possible simplification • Recommendations

  3. Desired Properties The model must be able to: • Yield proficiency estimates for individuals & groups, accommodating different patterns of instruction • Provide end-of year summaries & growth measures • Provide results that are comparable across classrooms, schools, districts, & states • Incorporate items that vary in instructional sensitivity (implies multidimensionality)

  4. Model Characteristics • Accommodates inferences desired from TCAs; can be a framework for studying plausible submodels • Has 2 components: • Multidimensional item response theory (MIRT) component specifies dependency of item responses on proficiency • Population component models association between proficiency and background variables. • Builds on NAEP, TIMSS, & PISA experience

  5. Notation: = vector of student proficiencies (note that is multidimensional) = vector of item responses for student i (includes dichotomous & polytomous) =vector of background variables for student i(more later on this)

  6. Bayesian model (see Mislevy, 1985): posterior distribution of for student i “ is proportional to” MIRT model conditional distribution of , given background variables

  7. Model Assumptions Our model relies on the traditional IRT assumption of conditional independence: i.e., item responses are independent, conditional on , which implies that the same item response model holds across groups and over time. MIRT makes this possible; our goal is to find an interpretable model that satisfies the assumption. Model testing is needed.

  8. Why include any background variables in the model? • Improves precision of estimation and avoids biases, especially when item data are sparse. • includes both demographic variables and instructional variables, such as nature and order of curriculum. ( will vary over time.) • For fairness reasons, demographic variables are not included when estimating individual scores.

  9. Estimating individual and group proficiency • For individual estimates, use mean or mode of posterior distribution • Group characteristics are NOT estimated using aggregates of individual estimates • Distribution of optimal individual estimates is not the optimal estimate of the population distribution.

  10. Market-basket (MB) reporting (Mislevy, 2003) • Report results in terms of a scale based on a “market- basket” of selected tasks (must be calibrated). • Using observed data, we can generate predicted responses to these MB tasks:

  11. MB Reporting - Example • Suppose 100 items are to be administered during a year; these could be the market-basket • Assume 4 TCAs, each with 25 of these 100 items • For each TCA, each student has actual responses for 25 items and predicted responses for 75 items • Score is expectation of sum of responses over all 100 items

  12. Market-basket reporting(cont.) • “Behind-the-scenes” machinery is complex, but resulting scores “look like” ordinary test scores: multivariate is mapped onto unidimensional scale. • Can use for year-end summary of TCAs and for growth measurement (e.g., last TCA minus first TCA). • Can predict end-of-year performance, achieved by adjusting values of . Given his current , how will Johnny perform with a whole year of instruction?

  13. Can a simpler model work? • Simplifications become more feasible if demands on model are scaled down. • Example: Use a more traditional assessment for comparisons across schools, districts and states • Machine-scoreable items; no complex tasks • Administer to a random sample only • Might eliminate need for a population model • Could then use less constrained test forms, including complex tasks, to inform instruction

  14. Recommendations • Use pilot and field test periods to test the model and explore simplifications 2. Recognize that a tradeoff exists between inferential demands and procedural simplicity. • Reducing demands makes simpler approaches more feasible

More Related