Estimating Growth when Content Specifications Change:

Estimating Growth when Content Specifications Change: A Multidimensional IRT Approach Mark D. Reckase Tianli Li Michigan State University

The Problem • State curriculum frameworks often change from one grade to the next reflecting the addition of new instructional content. • For example, at grade 7 algebra may be introduced as an instructional goal. • At grade 6, algebra is not an important component of the curriculum. • Tests at the two grades reflect the instructional content so the 6th grade test does not include algebra and the 7th grade test does. • How can the score scales of these tests be linked?

Research Questions • What do changes on the linked score scale mean, when the scale is produced using the usual unidimensional IRT models? • Can multidimensional IRT be used to form vertical scales? If so, how do the results compare to the unidimensional results?

The Approach • State testing data were analyzed using multidimensional IRT to develop a realistic model for the test data at two grade levels. • The results of the real data analyses were idealized to create the specifications for simulating the tests at two grade levels. • Simulate data with known structure to determine how unidimensional and multidimensional procedures function.

The Simulated Data Design • Grade 6 – two major constructs • Arithmetic • Problem Solving • Grade 7 – three major constructs • Arithmetic • Problem Solving • Algebra

Simulated Test Structure Note: The numbers in parentheses are the common items between the two forms of the tests.

Mean Vectors at each Grade Level Note: Values in parentheses are the observed means from the simulated data

Covariance Matrices Covariance Matrix for Grade 6 Covariance Matrix for Grade 7 Note: Values in parentheses are estimated from the simulated data.

Orientation of Items

Effect Size Built into Data

Unidimensional Basisfor Comparison • Imagine that the full set of 70 items from both test levels are administered to the students at both grade levels. • The matrix of 2000 + 2000 students from the two grades by 70 items can be analyzed with the unidimensional models to serve as a basis for comparison for the vertical scaling result. • Analyze the matrix using 2pl and Rasch model.

2PL Solution

Rasch Model Solution

Vertical Scaling Analysis • Common-item concurrent calibration • BILOGMG • Off grade items coded as not reached • Both 2pl and Rasch model used for analysis • Determine effect size of difference in mean of two grade levels

Vertically Scaled Effect Sizes

Vertically Scaled Effect Sizes • Linked effect size is smaller than full data effect size. • Rasch effect size is less than 2pl effect size. • Full data set effect size is less than modeled effect size.

Alternative Linking Method • Common-item, separate calibration • Common item parameter relationship was poor

MIRT Analysis • Full data analysis with TESTFACT • Three dimensional analysis • Determine effect size for each dimension • Correlate each estimated q with the generating qs to determine meaning of the results.

MIRT Effect Sizes

Correlation between Trueand Estimated qValues

Interpretation of MIRT Solution • Results are difficult to interpret because of the default procedures in TESTFACT. • Solution needs to be rotated to have axes align with content dimensions. • Current solution shows that q1is related to algebra and shows the big algebra effect. • q2is a combination of arithmetic and problem solving with the emphasis on problem solving. • Most likely it has the sign of the a-parameters reversed.

Concurrent MIRT Analysis • Use concurrent calibration of data from the two grade levels. • Three dimensional solution • No rotation • Determine effect sizes and correlations with true q values.

Concurrent MIRT Calibration

Concurrent MIRT Calibration • Scale on Dimension 3 is reversed and it has a large effect size (algebra). • Dimension 1 is most related to arithmetic and problem solving with a moderate effect size. • Dimension 2 is moderately related to algebra and has a large effect size. • The overall result gives a reasonable estimate of effects, but the dimensions need to be rotated to match the constructs.

Conclusions • Unidimensional linking of the two level tests underestimate the effect size. • Rasch model gives a smaller effect size than the two parameter logistic model. • MIRT solution shows promise. • Need to determine how to rotate solution to match constructs. • TESTFACT has problems converging on estimates because of mismatch between assumptions and reality.

Estimating Growth when Content Specifications Change: