1 / 71

CHAPTER 5 Test Scores as Composites

CHAPTER 5 Test Scores as Composites. This Chapter is about the Quality of Items in a Test. Test Scores as Composites. What is the Composite Test Score?

dotty
Download Presentation

CHAPTER 5 Test Scores as Composites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHAPTER 5Test Scores as Composites This Chapter is about the Quality of Items in a Test.

  2. Test Scores as Composites • What is the Composite Test Score? • A composite test score is a total test scorecreated by summing two or more subtest scoresi.e., WAIS IV Full Scale IQ consisted of 1-Verbal Comprehension Index, 2-Perceptual Reasoning Index, 3-Working Memory Index, and 4-Processing Speed Index. Qualifying Examinations and EPPP Exams are also composite test scores.

  3. Item Scoring Schemes[skeems]SystemsWe have 2 different scoring system • 1. Dichotomous Scores Dichotomous Scores are restricted to 0 and 1 such as scores on True and False, and multiple-choice question • 2. Non-dichotomous Scores Non dichotomous Scores are not restricted to 0 and 1 Can have range of possible points such as in essays. 1,2, 3, 4, 5……..

  4. Dichotomous Scheme Examples • 1. The space between nerve cell endings is called the a. Dendrite b. Axon ; c. Synapse d. Neutron (In this item, responses a, b, and d are scored 0; response c is scored 1.) • 2. Teachers in public school systems should have the right to strike. a. Agree b. Disagree (In this item, a response of Agree is scored 1; Disagree is scored 0) . Or, you can use True or False.

  5. Practical Implication for Test Construction Variance and Covariance measure the quality of items in a test. Reliability and validity measure the quality of the entire test. • σ²=SS/N  used by one set of data Varianceis the degree of variability of scores from mean.

  6. Practical Implication for Test Construction Correlationis based on a statistic called Covariance (Covxy or S xy) COVxy=SP/N-1  used for 2 sets of data Covarianceis a number that reflects the degree to which 2 variables vary together. • r=sp/√ssx.ssy

  7. Variance • X σ² = ss/N Pop 1 s² = ss/n-1 or ss/df Sample 2 4 5 SS=Σx²-(Σx)²/N SS=Σ(x-μ)² Sum of SquaredDeviation from Mean

  8. Covariance • Covariance is a number that reflects the degree to which 2 variables vary together. • Original Data X Y 1 3 2 6 4 4 5 7

  9. Covariance COVxy=SP/N-1 2 ways to calculate the SP SP= Σxy-(Σx.Σy/N) SP= Σ(x-μx)(y-μy) SP requires 2 sets of data SS requires only one set of data

  10. Descriptive Statistics for Dichotomous Data

  11. Descriptive Statistics for Dichotomous DataItem Variance & Covariance

  12. Descriptive Statistics for Dichotomous Data • P=Item Difficulties: P= (#of examinees who answered an item correctly / total # of examinees or P=f/N See handout The higher the P value The easier the item

  13. Relationship between Item DifficultyP and σ² Variance σ² (quality) • 0 difficult 0.5 1 easy • P= Item Difficulty

  14. Non-dichotomous Scores Examples • 1. Write a grammatically correct German sentence using the first person singular form of the verb verstehen. (A maximum of 3 points may be awarded and partial credit may be given.) • 2. An intellectually disabledperson is a nonproductive member of society. 5. Strongly agree 4. Agree, 3. No opinion 2. Disagree 1. Strongly disagree (Scores can range from 1 to 5 points. with high scores indicating a positive attitude toward intellectually disabled citizens.)

  15. Descriptive Statistics for Non-dichotomous Variables

  16. Descriptive Statistics for Non-dichotomous Variables

  17. Variance of a Composite “σ²C” σ²=SS/N σ²a=SSa/Na σ²b=SSb/Nb σ²C= σ²a+σ²b Ex. From WAIS III-- FSIQ=VIQ+PIQ If More than 2 subtests, σ²C=σ²a+σ²b+σ²c…Calculate the variance for each subtest and add them up.

  18. Variance of a Composite “σ²C” • What is the Composite Test Score? • Ex. WAIS IV Full Scale IQ which consist of a-Verbal Comprehension Index, b-Perceptual Reasoning Index, c-Working Memory Index, and • d-Processing Speed Index. • More than 2 subtests • σ²C=σ²a+σ²b+σ²c+σ²d

  19. *Suggestions to Increase the Total Score Variance of a Test • 1-Increase the number of items in a test • 2-Item difficulties p (medium range) • 3-Items with similar content have higher correlations & higher covariance • 4-Item scores & total scores variances aloneare not indices(in-də-ˌcēz) of test quality (reliability and validity).

  20. *1-Increase the Number of Items in a Test (how to calculate the test variance) • Variance for a test of 25 items is higher than a variance for a test of 20 items. • σ²=N(σ²x)+N(N-1)(COVx)= • Ex. If the COVx=items covariance = (0.10) σ²x=items variance  (0.20) N=#of items in a test -- first try N=20 σ²=test variance  For 20 items 42 , then try N=25 and σ²=test variancefor 25 items 65

  21. 2-Item Difficulties • Item difficulties should be almost equal for all of the items and difficulty levels should be in the medium range.

  22. 3-Items with Similar Content have Higher Correlations & Higher Covariance

  23. 4- Item Scores & Total Scores Variances Alone are not Indices (in-də-ˌcēz) of Test Quality Variance and Covariance are important and necessary however, they are not sufficient to determine the test quality. To determine a higher level of test quality we use Reliability and Validity.

  24. UNIT II RELIABILITY CHAP 6: RELIABILITY AND THE CLASSICAL TRUE SCORE MODEL CHAP 7: PROCEDURES FOR ESTIMATING RELIABILITY CHAP 8: INTRODUCTION TO GENERALIZABILITY THEORY CHAP 9: RELIABILITY COEFFICIENTS FOR CRITERION-REFERENCED TESTS

  25. CHAPTER 6Reliability and the Classical True Score Model • Reliability (p)=Reliabilityis a measure of consistency/dependability, or when a test measures same thing more than once and results in same outcome. • Reliability refers to the consistency of examinees performance over repeated administrations of the same test or parallel forms of the test (Linda Crocker Text).

  26. THE MODERN MODELS

  27. *TYPES OF RELIABILITY

  28. Test-Retest • Class IQ Scores • Students X 1st time on MonY2nd time on Fri • John 125 120 • Jo 110 112 • Mary 130 128 • Kathy 122 120 • David 115120

  29. Parallel/alternate Forms • Scores on 2 forms of stats tests • Students Form A Form B • John 95 92 • Jo 84 82 • Mary 90 88 • Kathy 76 80 • David 81 78

  30. Test-Retest with Alternate Forms On Monday, you administer form A to 1st half of the group and form B to the second half.On Friday, you administer form B to 1st half of the group and form A to the 2nd half Students Form A to 1st group (Mon) Students Form B to 2nd group (Mon) David 85 Mark 82 Mary 94 Jane 95 Jo 78 George 80 John 81 Mona 80 Kathy 67 Maria 70 • Next slide

  31. Test-Retest with Alternate Forms • On Friday, you administer form B to 1st half of the group and form A to the second Students Form B to 1st group (Fri) Students Form A to 2nd group (FRi) David 85 Mark 82 Mary 94 Jane 95 Jo 78 George 80 John 81 Mona 80 Kathy 67 Maria 70

  32. HOW RELIABILITY IS MEASURED • Reliability is Measured by Using a • Correlation Coefficient • r test1•test2 or r x.y • Reliability Coefficients: • Indicates how scores on one test change, relative to scores on a second test • Can range from 0.0 to ±1 • ±1.00 = perfect reliability • 0.00 = no reliability

  33. THE CLASSICAL MODEL

  34. Method Error • ObservedScore = True Score ± ErrorScore • Trait Error • X=T±E A CONCEPTUAL DEFINITION OF RELIABILITYCLASSICAL MODEL

  35. Classical Test Theory • The Observed Score, X=T+E • X is the score you actually record or observe on a test. • The True Score, T=X-E or, the difference between the Observed score and Error score is the True score • T score is the reflection of the examinee true knowledge • The Error Score, E =X-T or, the difference between the Observed score and True score is the Error score. • E are factors that cause the True Score and observed score to differ.

  36. Method Error Observed Score = True Score ± Error Score Trait Error A CONCEPTUAL DEFINITION OF RELIABILITY • (X) Observed Score X=T±E • Score that actually observed • Consists of two components • True Score • Error Score

  37. Method Error Observed Score = True Score ± Error Score Trait Error A CONCEPTUAL DEFINITION OF RELIABILITY • True Score T=X-E • Perfect reflection of true value for individual • Theoretical score

  38. Method Error Observed Score = True Score ± Error Score Trait Error A CONCEPTUAL DEFINITION OF RELIABILITY • Method error is due to characteristics of thetest or testing situation • Trait error is due to individual characteristics • Conceptually, Reliability = • True Score • Observed Score • Reliability of the observed score becomes higher if error is reduced!! True Score True Score + Error Score

  39. Method Error Observed Score = True Score ± Error Score Trait Error A CONCEPTUAL DEFINITION OF RELIABILITY OR • Error Score E=X-T • Is the Difference between Observed and True score ± • X=T±E • 95=90+5 or 85=90-5 The difference between T and Xis 5points or E=±5

  40. The Classical True Score Model • X=T±E • X= Represents the observed test score • T= Represents the individual's True knowledge of score • E= Represents the random error component

  41. Classical Test Theory • What Makes up the Error Score? E=X-T Error Score consist of; 1-Method Error and 2-Trait Error 1-Method Error Method Error is the difference between True & Observed Scores resulting from the test or testing situation. 2-Trait Error Trait Error is the difference between True & Observed Scores resulting from the characteristics of examinees. See next slide

  42. What Makes up the Error Score?

  43. Expected Value of True Score • Definition of the True Score • The True score is defined as the expected value of the examinees’ test scores (mean of observed scores) over many repeated testing with the same test.

  44. Error Score • Definition of the Error Score • Error scores for an examinee over many repeated testing should be Zero. • eEj=Tj-Tj=0 • eEj=Expected value of Error • Tj=Examinee’ True Score Ex. next

  45. Error Score • X-E=T or, the difference between the Observed score and Error score is the True score (scores are from the same examinee) 98-8= 90 88+2=90 80+10=90 100-10=90X±E=T 95-5=90 81+9=90 88+2=90 90-0=90 -8+2+10-10-5+9+2-0=0

  46. *INCREASING THE RELIABILITY OF A TESTMeaning Decreasing Error7 Steps • 1. Increase Sample Size (n) • 2. Eliminate Unclear Questions • 3. Standardize Testing Conditions • 4. Moderate the Degree of Difficulty of the tests (P) • 5. Minimize the Effects of External Events • 6. Standardize Instructions (Directions) • 7. Maintain ConsistentScoring Procedures (use rubric)

  47. *Increasing Reliability of your Items in a Test

  48. *Increasing Reliability Cont..

  49. How Reliability (p) is Measured for an Item/score • P=True Score/True Score + Error Score orp=T/T+E 0=== p === ±1 Note: In this formula you always add your Error(the difference between T and X) to the True Score in the denominator (±) , Whether is positive or negative. p=T/T + (the difference between T and X which is E) p=T/T+E

  50. Which Item has the Highest Reliability?Maximum points for this question is 10p=T/T+E • +2= 8……….. 8/10=0.80 • -3=6…………. 6/9=0.666 • +7=1……….…1/8=0.125 • -1=9…………..9/10=0.90 • +4=6………....6/10=0.60 • -4=6……….....6/10=0.60 • +1=7………....7/8=0.875 • 0=10…………10/10=1.0 • -5=4…………..4/9=0.444 • +6=3…………..3/9=0.333 • >MORE ERROR <LESS RELIABLE

More Related