260 likes | 394 Views
Estimating Growth when Content Specifications Change:. A Multidimensional IRT Approach Mark D. Reckase Tianli Li Michigan State University. The Problem. State curriculum frameworks often change from one grade to the next reflecting the addition of new instructional content.
E N D
Estimating Growth when Content Specifications Change: A Multidimensional IRT Approach Mark D. Reckase Tianli Li Michigan State University
The Problem • State curriculum frameworks often change from one grade to the next reflecting the addition of new instructional content. • For example, at grade 7 algebra may be introduced as an instructional goal. • At grade 6, algebra is not an important component of the curriculum. • Tests at the two grades reflect the instructional content so the 6th grade test does not include algebra and the 7th grade test does. • How can the score scales of these tests be linked?
Research Questions • What do changes on the linked score scale mean, when the scale is produced using the usual unidimensional IRT models? • Can multidimensional IRT be used to form vertical scales? If so, how do the results compare to the unidimensional results?
The Approach • State testing data were analyzed using multidimensional IRT to develop a realistic model for the test data at two grade levels. • The results of the real data analyses were idealized to create the specifications for simulating the tests at two grade levels. • Simulate data with known structure to determine how unidimensional and multidimensional procedures function.
The Simulated Data Design • Grade 6 – two major constructs • Arithmetic • Problem Solving • Grade 7 – three major constructs • Arithmetic • Problem Solving • Algebra
Simulated Test Structure Note: The numbers in parentheses are the common items between the two forms of the tests.
Mean Vectors at each Grade Level Note: Values in parentheses are the observed means from the simulated data
Covariance Matrices Covariance Matrix for Grade 6 Covariance Matrix for Grade 7 Note: Values in parentheses are estimated from the simulated data.
Unidimensional Basisfor Comparison • Imagine that the full set of 70 items from both test levels are administered to the students at both grade levels. • The matrix of 2000 + 2000 students from the two grades by 70 items can be analyzed with the unidimensional models to serve as a basis for comparison for the vertical scaling result. • Analyze the matrix using 2pl and Rasch model.
Vertical Scaling Analysis • Common-item concurrent calibration • BILOGMG • Off grade items coded as not reached • Both 2pl and Rasch model used for analysis • Determine effect size of difference in mean of two grade levels
Vertically Scaled Effect Sizes • Linked effect size is smaller than full data effect size. • Rasch effect size is less than 2pl effect size. • Full data set effect size is less than modeled effect size.
Alternative Linking Method • Common-item, separate calibration • Common item parameter relationship was poor
MIRT Analysis • Full data analysis with TESTFACT • Three dimensional analysis • Determine effect size for each dimension • Correlate each estimated q with the generating qs to determine meaning of the results.
Interpretation of MIRT Solution • Results are difficult to interpret because of the default procedures in TESTFACT. • Solution needs to be rotated to have axes align with content dimensions. • Current solution shows that q1is related to algebra and shows the big algebra effect. • q2is a combination of arithmetic and problem solving with the emphasis on problem solving. • Most likely it has the sign of the a-parameters reversed.
Concurrent MIRT Analysis • Use concurrent calibration of data from the two grade levels. • Three dimensional solution • No rotation • Determine effect sizes and correlations with true q values.
Concurrent MIRT Calibration • Scale on Dimension 3 is reversed and it has a large effect size (algebra). • Dimension 1 is most related to arithmetic and problem solving with a moderate effect size. • Dimension 2 is moderately related to algebra and has a large effect size. • The overall result gives a reasonable estimate of effects, but the dimensions need to be rotated to match the constructs.
Conclusions • Unidimensional linking of the two level tests underestimate the effect size. • Rasch model gives a smaller effect size than the two parameter logistic model. • MIRT solution shows promise. • Need to determine how to rotate solution to match constructs. • TESTFACT has problems converging on estimates because of mismatch between assumptions and reality.