Reliability and Validity: Understanding Measurement Consistency and Truthfulness

Chapter 6 Norm-Referenced Reliability and Validity

Topics for Discussion ReliabilityConsistency Repeatability ValidityTruthfulness ObjectivityInter-rater reliability

Observed, Error, and True Scores Observed Score = True Score + Error Score

Reliability Reliability is that proportion of observed score variance that is true score variance.

Table 6.1 Systolic Blood Pressure Recordings for 10 Subjects Subject Observed BP = True BP + Error BP 1 103 105 -2 2 117 115 +2 3 116 120 -4 4 123 125 -2 5 127 125 +2 6 125 125 0 7 135 125 +10 8 126 130 -4 9 133 135 -2 10 145 145 0 Sum (S) 1250 1250 0 Mean (M) 125.0 125.0 0 Std. Dev. (s) 11.6 10.8 4.1 Variance (s2) 133.6= 116.7+ 16.9

Interclass Reliability Pearson Product Moment Test retest Equivalence Split halves Form A Odd Form B Even Trial 1 Trial 2

Table 6.2 Sit-up Performance for 10 Subjects Subject Trial 1 Trial 2 1 45 49 2 38 36 3 54 50 4 38 38 5 47 49 6 39 38 7 39 43 8 42 43 9 29 30 10 42 42 Sum (S) 413 418 Mean (M) 41.3 41.8 Std. Dev (s) 6.6 6.5 Variance (s2) 43.6 41.7 rxx’ = .927

Spearman Brown Prophecy Formula k = the number of items I WANT to estimate the reliability for divided by the number of items I HAVE reliability for

Table 6.3 Odd-Even Scores for 10 Subjects Subject Odd Even 1 12 13 2 9 11 3 10 8 4 9 6 5 11 8 6 7 10 7 9 9 8 12 10 9 5 4 10 8 7 Sum (S) 92 86 Mean (M) 9.2 8.6 Std. Dev (s) 2.2 2.6 Variance (s2) 4.8 6.7 rxx’ = .639

K (change in test length) r11.25 .50 1.5 2.0 3.0 4.0 5.0 .10 .03 .05 .14 .18 .25 .31 .36 .22 .07 .12 .30 .36 .46 .53 .59 .40 .14 .25 .50 .57 .67 .73 .77 .50 .20 .33 .60 .67 .75 .80 .83 .60 .27 .43 .69 .75 .82 .86 .88 .68 .35 .52 .76 .81 .86 .89 .91 .80 .50 .67 .86 .89 .92 .94 .95 .92 .74 .85 .95 .96 .97 .98 .98 .96 .86 .92 .97 .98 .99 .99 .99 Table 6.4 Values of rkk From Spearman-Brown Prophecy Formula

Table 6.5 Effect of a Constant Change in Measures Subject Trial 1 Trial 2 1 15 25 2 17 27 3 10 20 4 20 30 5 23 33 6 26 36 7 27 37 8 30 40 9 32 42 10 33 43 Sum (S) 233 333 Mean (M) 23.3 33.3 Std. Dev. (s) 7.7 7.7 Variance (s2) 59.1 59.1 rxx’ = 1.00

Intraclass Reliability ANOVA Model Cronbach's Alpha Coefficient Alpha Coefficient

Intraclass (ANOVA) ReliabilitiesCommon terms you will encounter Alpha reliability Kuder Richardson Formula 20 (KR20) Kuder-Richardson Formula 21 (KR21) ANOVA reliabilities

Table 6.6 Calculating the Alpha Coefficient Subject Trial 1 Trial 2 Trial 3 Total 1 3 5 3 11 2 2 2 2 6 3 6 5 3 14 4 5 3 5 13 5 3 4 4 11 SX 19 19 17 55 SX2 83 79 63 643 s22.70 1.70 1.30 9.50

Calculating the Alpha Coefficient

Index of Reliability The theoretical correlation between observed scores and true scores

Standard Error of Measurement Reflects the degree to which a person's observed score fluctuates as a result of errors of measurement

Factors Affecting Test Reliability 1) Fatigue 2) Practice 3) Subject variability 4) Time between testing 5) Circumstances surrounding the testing periods 6) Appropriate difficulty for testing subjects 7) Precision of measurement 8) Environmental conditions

Decline in Reliability for the Harvard Alumni Activity Survey as the Time Between Testing Periods Increases Months Between Test-Retest

Validity Types Content-related validity Criterion-related validity Statistical or correlational Concurrent Predictive Construct-related validity

Standard Error of Estimate Standard error Standard error of prediction

SE of Measurement SE of Estimate Standard Errors

Methods of Obtaining a Criterion Measure Actual participation e.g., golf, archery Perform the criterion Known valid criterion (e.g., treadmill performance) Expert judges Panel judges Tournament participation Round robin Known valid test

What are these? Concurrent Validity coefficients Table 6.7 Correlation Matrix for Development of a Golf Skills Test (From Green et al., 1987)

Table 6.8 Concurrent Validity Coefficients for Golf Test 2-item battery Middle distance shot Pitch shot .72 3-item battery Middle distance shot Pitch shot Long putt .76 4-item battery Middle distance shot Pitch shot Long putt Chip shot .77

Figure 6.1 Diagram of Validity and Reliability Terms

Interpreting the “r” you obtain Interpreting the “r” You Obtain

Various Correlations

What are these? Concurrent Validity coefficients Interpret These Correlations Criterion

What are these? Interpret These Correlations Reliability coefficients

What is this? Interpret These Correlations Objectivity coefficient

Scatterplot Line of identity Prediction line Two trials of Leg Press

Correlation Two trials of Leg Press

Concurrent Validity This square represents variance in performance in a skill (e.g., golf)

Concurrent Validity The different colors and patterns represent different parts of a skills test battery to measure the criterion (e.g., golf)

Concurrent Validity Error The orange color represents ERROR or unexplained variance in the criterion (e.g., golf)

A B C D Concurrent Validity Consider the concurrent validity of the above 4 possible skills test batteries

D—it has the MOST error and requires 4 tests to be administered A B C D Concurrent Validity Which test battery would you be LEAST likely to use? Why?

C—it has the LEAST error but it requires 3 tests to be administered A B C D Concurrent Validity Which test battery would you be MOST likely to use? Why?

A or B—requires 1 or 2 tests to be administered but you lose some validity A B C D Concurrent Validity Which test battery would you use if you are limited in time?

PASW Examples

Reliability and Validity: Understanding Measurement Consistency and Truthfulness

Reliability and Validity: Understanding Measurement Consistency and Truthfulness

Presentation Transcript

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

CHAPTER 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

CHAPTER 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6

Chapter 6