1 / 45

Standard Setting for Professional Certification

Standard Setting for Professional Certification. Brian D. Bontempo Mountain Measurement, Inc. brian@mountainmeasurement.com (503) 284-1288 ext 129. Overview. Definition of Standard Setting Management Issues relating to Standard Setting Standard Setting Process Methods of Standard Setting

nassor
Download Presentation

Standard Setting for Professional Certification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Standard Setting forProfessional Certification Brian D. Bontempo Mountain Measurement, Inc. brian@mountainmeasurement.com (503) 284-1288 ext 129

  2. Overview • Definition of Standard Setting • Management Issues relating to Standard Setting • Standard Setting Process • Methods of Standard Setting • Using multiple methods of Standard Setting

  3. Definition of Standard Setting • Standard setting is a process whereby decision makers render judgments about the performance level required of minimally competent examinees

  4. Types of Standards • Relative Standard (Normative Standards) • Top 70% of scores pass • 20 points above average • Criterion-Referenced Standard (Absolute Standards) • 70% of the items correct • 600 out of 800 scaled score • .05 logits • 20 items correct

  5. Why do we conduct Standard Setting? • To objectively involve stakeholders in the test decision making process • To connect the expectations of employers to the test decision making process • To connect the reality of training to the test decision making process • To ensure psychometric soundness & legal defensibility

  6. When to (re)set a passing standard • For a new exam, after Beta Test data have been analyzed, typically after “Live” Test Forms have been constructed • For exam revisions, when the expectations of a job role have changed • Practice has changed • Content domain has changed • It is not appropriate to change the passing standard whenever a test or training has been revised. • It is not appropriate to change the passing standard because of supply and demand issues (too many/few certified professionals)

  7. Who should lead a standard setting panel? • An experienced Psychometrician • Insider perspective, familiar with your certification and exam development • Outsider perspective, not familiar with your certification and exam development

  8. How rigid should you be in your direction to the Psychometrician? • I recommend a conversation between the Psychometrician and the Test Sponsor to figure out what works best. Typically a test sponsor will specify a framework (e.g., Angoff) and let the Psychometrician dictate the specifics.

  9. Outcomes of Standard Setting • A conceptual (qualitative) definition of minimal competency • A proposed numeric (quantitative) passing standard • A set of alternate passing standards based on errors in the process • Expected passing rate(s) from each standard • A report documenting the process and the psychometric quality of the process

  10. Standard Setting Process

  11. Standard Setting Process • Gather test data • Assemble a group of judges • Define minimal competency • Train judges on the method • Render judgments on the performance of borderline examinees • Calculate the passing standard by aggregating the judgments • Evaluate the outcome by calculating the expected passing rate

  12. Selecting your judges • Representative Sample • Hiring Managers • Trainers • Entry-Level Practitioners • How many judges is enough? • For a low stakes exam • at least 8 judges • For a medium stakes exam • at least 12 judges • For a high stakes exam • at least 16 judges

  13. Developing a Definition of Minimal Competency • Identify 3 common tasks within each domain of the test blueprint (an easy, a hard, and a “Borderline” task) • Characterize the performance of minimally competent examinees on each of the major tasks • Write text that summarizes these discussions

  14. Training Judges • Instruct them on their task • Practice rating items • Two sets of practice items • Practice discussing items • Explain the stats that you will be providing them • Set the tone and boundaries for good ‘group psychology’

  15. Standard Setting Methods

  16. Types of Standard Setting Methods • Examinee-Centered Methods • Judges use external criteria, such as on the job performance, to evaluate the competency of real examinees • Test-Centered Methods • Judges evaluate the performance of imaginary examinees on real test items • Adjustments • in order to account for inaccuracy in the standard setting process, Psychometricians use real test data to provide a range of probable values for the passing standard

  17. Examinee-Centered Methods • Borderline group • Using external criteria (such as performance on the job), judges identify a group of examinees that they think are borderline examinees. The average score of this group is the passing standard • Contrasting groups • Using external criteria, judges classify examinees as passers or failers. The passing standard is established by determining the point which discriminates the best between the scores of both groups

  18. Test-Centered • Modified-Angoff • Angoff, W.H. (1971) Scales, Norms, and equivalent scores. In R.L. Thorndike (Editor) Educational Measurement 2nd edition: Washington, DC American Council on Education. • Bookmark • Mitzel, H.C., Lewis, D.M., Patz, R.J., & Green, D.R. (2001). The Bookmark Procedure: Psychological perspectives. In G.J. Cizek (Editor), Setting Performance Standards: Mahwah, NJ Lawrence Erlbaum Associates.

  19. Basic Angoff Process • Judges evaluate each item • What percentage of MC examinees would get the item correct? • Feedback/Discussion • Judges make adjustments to their ratings • Average of all items is the judges passing standard • Average of all judges’ standards is the passing standard

  20. Common Angoff Issues • What percentage of • MCs vs. all • MCs is correct • candidates • “would” vs. “should” • “would” is correct • get the item correct?

  21. Common Angoff Issues • What type of ratings should judges make? • 1/0 (Yes/No) • Percentage of Borderline examinees • Round to 1 decimal (.9) • Round to 2 decimals (.92) • NEVER use percentage of all examinees

  22. Common Angoff Issues • Types of Feedback to provide • Group Discussion • Relate to conceptual definition of minimal competency • Typical or atypical content • Relevancy • Relate to item nuances • Item Stem • Item Distractors • “I expect a lot of the MC because this is core content and the item is straightforward.” • “I would like to cut the MC some slack because this is not covered well in training and the scenario is a little abstract.”

  23. Common Angoff Issues • Types of Feedback to provide • Empirical Data • Answer Key – Yes! • Percentage of Borderline examinees answering the item correctly – If possible yes • P-Value (Percentage of examinees answering the item correctly) – Only if the percentage of Borderline examinees is not available

  24. Common Angoff Issues • When to provide feedback? • Initial Rating • Discuss items • Secondary Rating • Provide Empirical Data • Tertiary Rating

  25. Bookmark • Test is divided up into sub tests • By domain OR • Equal variance of difficulty across sub tests • Items are sorted from easiest to hardest • By judges OR • By actual value • Judges bookmark the subtest at the point where the MC examinee would stop getting items correct and start getting them incorrect • The lowest possible standard • The expected standard • The high possible standard • Judges discuss ratings & make adjustments • Passing standard is average # of items answered correct

  26. Common Bookmark Issues • How many Ordered Item Booklets (OIB) • One for each content domain • An equivalent number that meet the test plan

  27. Common Bookmark Issues • How should I select Items for the OIB? • Minimize the distance in difficulty between any two adjacent items. • Ensure that there are enough items at all difficulty levels for each OIB • Ensure that the variance in item difficulty is the same for each OIB

  28. Common Bookmark Issues • How should I sort the item booklets? • Easiest to Hardest • Hardest to Easiest

  29. Common Bookmark Issues • How do I know when the MC would stop getting items correct and start getting them incorrect? (What is the appropriate RP value?) • .5 • .67* Most Common • .75

  30. Common Bookmark Issues • How do I convert the bookmark to a passing standard? • Previous Item (PI) – Take the difficulty of the easier of the two items on either side of the bookmark • Between Item (BI) – Take the average of difficulty of the two items

  31. Compare Angoff and Bookmark • Angoff requires less preparation • Select a real test form as opposed to building the OIBs • Judges understand Bookmark better • Rating the difficulty of an item is a difficult task • Bookmark requires more test items • I’d recommend an item pool of at least 40 solid test items per content domain

  32. Other Test Centered Methods • Ebel • Nedelsky • Jaeger • Rasch Item Mapping

  33. Ebel • Judges sort each item into piles • How difficult is this item for the MC examinee? • Easy, moderate, or hard • How relevant is this content for practice? • Critical, Moderately important, Not relevant • Judges then estimate the percentage of items in each that MC examinees would get correct • The passing standard is then determined by multiplying the number of items in each cell by the percentage and sum all values

  34. Nedelsky • Judges determine which response options are unrealistic for each item • The probability of a guessed correct response is calculated • The sum of the probabilities is the passing standard

  35. Jaeger • Judges evaluate each item • Yes/No - “Should every entry-level practitioner answer this item correctly?” • Judges discuss ratings & make adjustments • Judges are provided passing rate based on standard & make adjustments • Passing standard is calculated by summing the number of “Yes” responses

  36. Test-Centered Options • What the ratings are based on • Should or would MC get this right • How ratings are made • Yes/No, Percentage • Relevance adjustments • Guessing adjustments • What kind of feedback is provided • Passing rate • Other judges ratings • Actual item difficulty

  37. Using Multiple Methods of Standard Setting

  38. Why use Multiple Methods? • There is error in every standard setting • Allows policymakers to “decide” on the standard rather than science simply documenting the outcomes of a panel • Allows for the recovery of standard setting sessions that go awry • Involves more stakeholders

  39. Adjustments • Simple Stats – Calculate the confidence interval around the estimate • Beuk – Judges provide an expected passing score and an expected passing rate. Calculations are made that are based on the variability in these two estimates • De Gruijter – Similar to Beuk, judges also provide an estimate of the uncertainty of their judgments. • Hofstee – Judges indicate the highest and lowest passing score and passing rate. These values are plotted along with the cumulative frequency distribution and the point of intersection is the passing standard

  40. Survey of Hiring Managers • Ask hiring managers about the workforce • What percentage of certified persons do you believe to be minimally competent? • Are your certified persons more competent that your uncertified persons? • Expands the reach of your exam

  41. Triangulating results • Psychometrician should present the outcome of each method and the passing rate associated with each outcome • A range of possible values • Policymakers can use this information and “their professional experience” to set the actual passing standard

  42. Wrap-Up

  43. 3 Vital Recommendations • Have more judges at standard setting • Spend more time training your judges • With each standard setting ensure that you take the time to define minimal competency conceptually and don’t forget to document this definition.

  44. Concluding Remarks • Many people like to think of test makers as big bad people which is obviously not true. Standard setting is one example of how inclusive the scientific process of test development can be. I encourage folks to make this process light and fun.

  45. Thank you for paying attention! Questions & Comments: brian@mountainmeasurement.com

More Related