1 / 22

Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

Using the Many-Faceted Rasch Model to Evaluate Standard Setting Judgments: An IllustrationWith the Advanced Placement Environmental Science Exam. http://des.emory.edu/home/people/faculty/Engelhard.html. Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

mandell
Download Presentation

Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using the Many-FacetedRasch Model to EvaluateStandard Setting Judgments:An IllustrationWith the Advanced PlacementEnvironmental Science Exam http://des.emory.edu/home/people/faculty/Engelhard.html Pamela K. Kaliski, Stefanie A. Wind, George Engelhard Jr., Deanna L.Morgan, Barbara S. Plake, and Rosemary. Reshetar psychology and applied statistics http://ncaase.com/about/bio?id=4

  2. 1. Many-Faceted Rasch Model 2. Multiple Yes-No (MYN) method 3. Instrument 4. Results and Conclusion Contents

  3. introduction • Standard setting ‘‘. . . standard setting refers to the process of establishing one or more cut scores on examinations. The cut scores divide the distribution of examinees’ test performances in two or more categories’’ Cizek and Bunch (2007) • criteria • The criteria for evaluating panelist judgments • Procedural validity: implementation issues and documentation • Internal validity: interpanelist and intrapanelist consistency • External validity: comparisons with other methods

  4. Many-Faceted Rasch Model • n: panelist • k: a standard setting modified Angoff rating • i :item;j : round • n is the judged severity for panelist n, • iis the average judged item difficulty for item i, • j is the judged average performance level for round j • jk is the cut score, or threshold coefficient, from round j for standard setting ratings of k.(rating k relative to k − 1)

  5. Rating quality indices • Rating quality indices • (a) panelist severity/leniency measures: separation statistics/chi-square statistic • (b) model–data fit:Outfit MSE • (c) the creation of a visual display for comparing panelist judgments on the latent variable

  6. Multiple Yes-No (MYN) method • MYN requires panelists to consider the borderline examinee at each cut score and to identify at which level the borderline examinee would be able to answer each item correctly. • panelists considered each item and decided whether or not the borderline examinee in each category would be able to identify the correct answer

  7. PLDs

  8. Would a borderline-1/2 student be able to answer this item correctly? If yes, then the panelist would circle the 1/2 cut score on the rating form and move on to the next item. • If no, then the panelists would consider the next question about the same item: Would a borderline-2/3 student be able to answer this item correctly? • If yes, the 2/3 cut score would be circled for that item and the panelist would move on to the next item. If no, the panelist would consider the next question about the same item: Would a borderline-3/4 student be able to answer this item correctly? • If yes, the 3/4 cut score would be circled for that item and the panelist would move on to the next item. • If no, then the panelists would consider the next question about the same item: Would a borderline-4/5 student be able to answer this item correctly? If yes, the 4/5 cut score would be circled for that item and the panelist would move on to the next item. If no, then the panelist would consider the final question about the same item: Would the above borderline-5 student be able to answer this item correctly? • If yes (which is likely given that all other possible borderline students have been considered), then the Above 5 score would be circled for that item.

  9. instuments • The Advanced Placement (AP) program (Advanced Placement Environmental Science (APES) examination) is composed of 34 courses and corresponding examinations in 22 subject areas. • Data used in this study come from the 2011 administration of the APES exam and the standard setting for this examination. • 100 MC items and four CR items • Data used in this study are the ratings that resulted from two rounds of item-level judgments provided by the 15 APES panelists

  10. Research Purpose • the MFR model is used to evaluate the quality of judgments on MC items provided by panelists who participated in a modified Angoff standard setting that used the MYN method for MC items, the 2011 APES exam. • panelist characteristics (gender and level of teaching) are incorporated into the MFR model to determine whether or not these are explanatory variables that account for differences in panelist ratings

  11. Results and conclusion

  12. Results and conclusion P397

  13. P398 MSE[0.6-1.5]

  14. P400

  15. P401

  16. P402

  17. P404

  18. P405

  19. P405

  20. Future study • additional explanatory variables • additional statistical models • MFR model+other modified Angoff procedures, or Bookmark procedures • overall contribution of each facet • CR questions

  21. Thus the interaction between theta and omega should be considered in Equation 1 • different rating scale structure and a random effect approach • Through the PC power, let panelists use computer to do standard setting. We can record the time spent • transform cut score of each category to the expected scores

More Related