330 likes | 1.29k Views
Establishing the Reliability and Validity of Outcomes Assessment Measures. Anthony R. Napoli, PhD Lanette A. Raymond, MA Office of Institutional Research & Assessment Suffolk County Community College http://sccaix1.sunysuffolk.edu/Web/Central/IT/InstResearch/. Validity defined.
E N D
Establishing the Reliability and Validity of Outcomes Assessment Measures Anthony R. Napoli, PhD Lanette A. Raymond, MA Office of Institutional Research & Assessment Suffolk County Community College http://sccaix1.sunysuffolk.edu/Web/Central/IT/InstResearch/
Validity defined • The validity of a measure indicates to what extent items measure some aspect of what they are purported to measure
Types of Validity • Face Validity • Content Validity • Construct Validity • Criterion-Related Validity
Face Validity • It looks like a test of *#%* • Not validity in a technical sense
Content Validity • Incorporates quantitative estimates • Domain Sampling • The simple summing or averaging of dissimilar items is inappropriate
Indicated by correspondence of scores to other known valid measures of the underlying theoretical trait Discriminant Validity Convergent Validity Construct Validity
Represents performance in relation to particular tasks of discrete cognitive or behavioral objectives Predictive Validity Concurrent Validity Criterion-Related Validity
Reliability defined • The reliability of a measure indicates the degree to which an instrument consistently measures a particular skill, knowledge base, or construct • Reliability is a precondition for validity
Types of Reliability • Inter-rater (scorer) reliability • Inter-item reliability • Test-retest reliability • Split-half & alternate forms reliability
Validity & Reliability in Plain English • Assessment results must represent the institution, program, or course • Evaluation of the validity and reliability of the assessment instrument and/or rubric will provide the documentation that it does
Content Validity for Subjective Measures • The learning outcomes represent the program/course (domain sampling) • The instrument addresses the learning outcomes • There is a match between the instrument and the rubric • Rubric scores can be applied to the learning outcomes, and indicate the degree of student achievement within the program/course
Inter-Scorer Reliability • Rubric scores can be obtained and applied to the learning outcomes, and indicate the degree of student achievement within the program/course consistently
Content Validity for Objective Measures • The learning outcomes represent the program/course • The items on the instrument address specific learning outcomes • Instrument scores can be applied to the learning outcomes, and indicate the degree of student achievement within the program/course
Inter-Item Reliability • Items that measure the same learning outcomes should consistently exhibit similar scores
Objective I II III IV Description Write and decipher chemical nomenclature Solve both quantitative and qualitative problems Balance equations and solve mathematical problems associated w/ balanced equations Demonstrate an understanding intra-molecular forces Content Validity (CH19) A 12-item test measured students’ mastery of the objectives
Objective I II III Description Identify the basic methods of data collection Demonstrate an understanding of basic sociological concepts and social processes that shape human behavior Apply sociological theories to current social issues Content Validity (SO11) A 30-item test measured students’ mastery of the objectives
Drawing Design Technique Creativity Artistic Process Aesthetic Criteria Growth Portfolio Presentation Scale: 5 = Excellent 4 = Very Good 3 = Satisfactory 2 = Unsatisfactory 1 = Unacceptable Inter-Rater ReliabilityFine Arts Portfolio
Inter-Item Reliability (PC11) Objective Description Demonstrate a satisfactory knowledge of: 1. the history, terminology, methods, & ethics in psychology 2. concepts associated with the 5 major schools of psychology 3. the basic aspects of human behavior including learning and memory, personality, physiology, emotion, etc… 4. an ability to obtain and critically analyze research in the field of modern psychology A 20-item test measured students’ mastery of the objectives
Embedded-questions methodology Inter-item or internal consistency reliability KR-20, rtt = .71 Mean score = 12.478 Std Dev = 3.482 Std Error = 0.513 Mean grade = 62.4% Inter-Item Reliability (PC11)
Inter-Item Reliability (PC11)Motivational Comparison • 2 Groups Graded Embedded Questions Non-Graded Form & Motivational Speech • Mundane Realism
Inter-Item Reliability (PC11)Motivational Comparison • Graded condition produces higher scores (t(78) = 5.62, p < .001). • Large effect size (d = 1.27).
Inter-Item Reliability (PC11)Motivational Comparison • Minimum competency 70% or better • Graded condition produces greater competency (Z = 5.69, p < .001).
Inter-Item Reliability (PC11)Motivational Comparison • In the non-graded condition this measure is neither reliable nor valid KR-20N-g = 0.29
“I am ill at these numbers.” -- Hamlet --
“When you can measure what you are speaking about and express it in numbers, you know something about it; but when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind.” -- Lord Kelvin -- “There are three kinds of lies: lies, damned lies, and statistics.” -- Benjamin Disraeli --