1 / 24

On-demand learning-embedded benchmark assessment using classroom-accessible technology

On-demand learning-embedded benchmark assessment using classroom-accessible technology. Discussant Remarks: Mark Wilson UC, Berkeley. Outline. What does “Validity” look like for these papers? What is it that these papers are distinguishing themselves from? Where might one go from here?.

colum
Download Presentation

On-demand learning-embedded benchmark assessment using classroom-accessible technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On-demand learning-embedded benchmark assessment using classroom-accessible technology Discussant Remarks: Mark Wilson UC, Berkeley

  2. Outline • What does “Validity” look like for these papers? • What is it that these papers are distinguishing themselves from? • Where might one go from here?

  3. Need for strong concern about validity • Effect of NCLB requirements: • Schools are instituting frequent “benchmark” tests • Intended to guide teachers as to students strengths abd weaknesses • Often just little copies of the “State test” • Teachers are complaining that it puts a vice-like grip on the curriculum

  4. The Triangle of Learning: standard interpretation

  5. The “vicious” triangle

  6. Validity • 1999 AERA/APA/NCME Standards for educational and psychological tests • Five types of validity evidence: • Evidence based on test content • Evidence based on response processes • Evidence based on internal structure • Evidence based on external structure • Evidence based on consequences

  7. Paper 1: Falmange et al-ALEKS • Reliability => Validity • “the collection of all the problems potentially used in any assessment represents a fully comprehensive coverage of a particular curriculum, ..[hence]...[a]rguing that such an assessment, if it is reliable, is also automatically endowed with a corresponding amount of validity is plausible.”

  8. Paper 1: Falmange et al-ALEKS • Test content • Theory of the Learning Space • “inner fringe” and “outer fringe” • “the summary is meaningful for an instructor” • Database of Problems • “a consensus among educators that the database of problems is a comprehensive compendium for testing the mastery of a scholarly subject. This phase is relatively straightforward.” • Evidence: Who were the experts?/What did they do?/How much did they agree?

  9. Paper 1: Falmange et al-ALEKS • Evidence based on response processes • E.g., for selected K, Do students in K say things that are consistent/inconsistent with that • Evidence based on internal structure • E.g., for selected K, Do students in K have high/low success rates at “instances” in K • Evidence based on external structure • E.g., comparison with teacher judgments of student ability • Evidence based on consequences • E.g., use of “fringes”…does this help/hinder teacher interpretations

  10. Paper 2: Shute et al-ACED • Two “validity studies” • Study 1: Evidence based on external structure: • Prediction of residuals from external post-test after controlling for pre-test • Informative design on conditions: elaborated feedback better • Study 2: Evidence based on response processes • “Usability” study for students with disabilities

  11. Paper 2: Shute et al-ACED • Evidence based on test content • reference to earlier paper • Evidence based on internal structure • Could easily be investigated, as there is interesting internal structure (Fig. 1) • Evidence based on consequences • Probably not any real consequences yet

  12. Paper 3: Heffernan et al -ASSISTment System • Evidence based on test content • Items coded by: 2 experts, 7 hrs. • “skill of Venn Diagram” • Evidence based on internal structure • Which skill-model fits best--1, 5, 39, 106 skills? • Which number is different? • 4.10, 4.11, 4.12, 4.10, 4.10 • 1, 5, 39, 106 (twice)

  13. Paper 3: Heffernan et al -ASSISTment System • Evidence based on external structure • Prediction of MCAS 23/38 = 61% don’t fit well for the “best” model (WPI-39 (B)).

  14. Paper 3: Heffernan et al -ASSISTment System • Evidence based on response processes • ? • Evidence based on consequences • Probably are real consequences

  15. Paper 4: Junker-ASSISTment System • Two “Validity studies” • Study 1: Evidence based on external structure • Prediction of MCAS scores • Study 2: Evidence based on internal structure • 4 internal structure patterns • 2 questions • Q1: Regarding how scaffolds get easier--what happens when you get a scaffold wrong? • Q2: What about the gap?

  16. Paper 4: Junker-ASSISTment System • Rest of types of validity--see Paper 3

  17. Looking Beyond • What does this group of papers have to offer? • What should it be looking out for?

  18. Paper 1: Falmange et al-ALEKS • Inner and Outer Fringe • What do teachers think of them, what do they do with them? • “Standardized tests,” “psychometrics” as straw men • Alternative: compare ones work to the latest developments in item response modeling (e.g., EIRM)

  19. Paper 2: Shute et al-ACED • “Weight of Evidence” • Good alternative to Fisher information • Transparent, easily interpretable • Models for people with disabilities • Most likely going to have different internal structure • Need to develop broader view of internal structure criteria

  20. Paper 3: Heffernan et al -ASSISTment System • MCAS as starting point for diagnostic testing? • Using released items?!? • What is “unidimensionality”

  21. Paper 3: Heffernan et al -ASSISTment System • In a latent class model, the latent class looks like this: • In an item response model (e.g., Rasch model), unidimensionality looks like this: … See: Karelitz, T.M., Wilson, M.R., & Draney, K.L. (2005). Diagnostic Assessment using Continuous vs. Discrete Ability Models. Paper presented at the NCME Annual Meeting in San Francisco, CA.

  22. Paper 4: Junker-ASSISTment System • What is the effect of assuming MCAR/MAR assumptions when neither is true? • Relevant to all CAT • Or of assuming you know the response under NMAR • Is there a discrimination paradox in DINA models? • Why do scaffold questions get easier?

  23. Future Directions • What is a “Knowledge State” (KS) • How do we test if it’s a unitary thing? • What if it isn’t? • Mixture models--structured KSs • Do teachers (and other practitioners) find the KSs useful • How to adjust if they don’t? • finer/coarser grained • structured

More Related