1 / 51

Validity

Unified View of Validity. ?Validity is an overall evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions on the basis of test scores or other modes of assessment" (Messick, 1995)Validity is a

bailey
Download Presentation

Validity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Chapter 6 Validity

    3. The Concept of Validity Face Validity What a test ‘seems’ to measure Transparency… Honesty?

    4. Tripartite View Content Criterion-related Construct

    5. Content Validity The adequacy of stimulus sampling The degree to which the contents of a test reflect the domain of interest The extent to which one can generalize from a particular collection of items to all possible items in a broader domain of items

    6. Content Validity How is it established? Stimulus sampling procedures E.g., randomly sample items from domain Rational analysis of test content during test development By the test developer “…it is surely impossible to write items in the first place without a conceptualization of the attribute of which they are to be indicators” (McDonald, 1999)

    7. Content Validity How is it established (contd.)? By test users E.g., expert sorts Lawshe’s Content Validity Ratio SMEs judge whether a test item is essential CVR = (ne – N/2) / (N/2) CVI = the average of all CVR item indexes Evidence of internal consistency of items Convergent validity correlations with other measures

    8. Assertiveness

    9. Content Validity

    10. Content Validity Culture and the Relativity of Test Validity History class example

    11. Criterion-Related Validity What is a Criterion? The standard against which a test or a test score is evaluated. Characteristics of a criterion Relevant Valid Uncontaminated

    12. Contamination

    13. Criterion-related Validity (aka Predictive Validity) Establishing that test scores relate to an external standard Important but limited in scope: “No amount of apparently sound theory can substitute for lack of a correlation between predictor and criterion” (p. 95, Nunnally & Bernstein, 1994) “…predictive validity represents a very direct, simple, but limited issue in scientific generalization that concerns the extent to which one can generalize from scores on one variable to scores on another variable” (p. 99, Nunnally & Bernstein, 1994)

    14. How is it established? Establish an empirical relation between predictor and criterion; the dreaded “validity coefficient” Concurrent Collect current criterion data Predictive Administer test (do not use the results to make decisions) and after a suitable period of time, collect criterion data Incremental validity Expectancy data Postdictive Collect past criterion data

    15. Validity Coefficient

    16. The Criterion Problem “Predictive validation accepts the criterion as a given, unlike construct validation” (p. 96, Nunnally & Bernstein) Criterion reliability Criterion deficiency Criterion contamination Range restriction Study attrition

    17. The Criterion Problem

    18. Criterion-Related Validity Incremental validity The degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use

    19. Criterion-Related Validity Incremental validity The degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use

    20. Criterion-Related Validity Incremental validity The degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use

    21. Criterion-Related Validity Expectancy data Expectancy table

    22. Criterion-Related Validity Expectancy table

    23. Criterion-Related Validity Taylor and Russell the correlation between the test score and job performance Validity coefficient The base rate of success on the job Given current selection measures The selection ratio Number of people to be hired vs applicants available sss

    24. Taylor Russell Tables See page 170, Table 6-3 Limitations Must be a linear relationship Identifying successful/unsuccessful criterion score Naylor-Shine Tables Test Utility Theory

    25. Decision Theory and Test Utility Cronbach & Gleser (1965) presented: A classification of decision problems Various selection strategies ranging from single-stage processes to sequential analyses A quantitative analysis of the relationship between test utility, the selection ration, cost of the testing program, and the expected value of the outcome A recommendation that in some instance job requirements be tailored to the applicants ability instead of the other way around (adaptive treatment).

    26. Decision Theory and Test Utility Base rate The extent to which a particular trait, behavior, characteristic, or attribute exists in the population Hit rate The proportion of people a test accurately identifies as possessing the construct of interest Miss rate The proportion of people a test incorrectly identifies as possessing the construct of interest False positive False negative

    27. Construct Validity Evidence that a variety of behaviors will correlate with one another in studies of individual differences and/or will be similarly affected by experimental manipulations Construct validation is an obvious issue in scientific generalization The measure must show expected patterns of relations with other variables (nomological net) interpretations of test scores be similar if other measures of the construct were used? How trustworthy are score interpretations?

    28. 3 Major Aspects of Construct Validity Specify the domain of observables related to the construct Test empirically the relation between observables Perform individual differences studies and/or experiments to determine the extent to which measures are consistent with a priori hypotheses

    29. How is it established? Internal analysis of item or subtest relationships Item analysis Homogeneity Use of factor analysis EFA/CFA Factor Loading/Identifying Factors Predictive validation designs Group differences (known groups)

    30. MTMMs

    31. How is it established? Correlations between measures Convergent & discriminant validity MTMMs Changes Over time After experimental intervention

    32. Validity and Test Bias The Definition of “Test Bias” Bias… A factor inherent in a test that systematically prevents accurate impartial measurement Random vs systematic variation

    33. Test Bias Eye color example Three characteristics of regression lines The slope The intercept The error of the estimate

    34. Slope Bias

    35. Intercept Bias

    36. Error of the Estimate

    37. What does this tell us?

    38. Bias Design of the research study Less minority participants

    39. Validity and Test Bias Rating Error A judgment resulting from the intentional or unintentional misuse of a rating scale Leniency Severity Central tendency Rankings A procedure that requires the rater to measure individuals against one another

    40. Validity and Test Bias Rating Error Halo effect A tendency to give a particular ratee a ‘different’ rating than he/she objectively deserves, because of the rater’s failure to discriminate among conceptual distinct and potentially independent aspects of a ratee’s behaviors

    41. Halo Effect

    42. Validity and Test Bias Test bias – statistical considerations Test fairness The extent to which a test is used in an impartial, just, and equitable way

    43. Validity and Test Bias Common misunderstandings regarding fairness Unfair because test ‘discriminates’ amongst individuals??? Particular sample group not included in validation process? Bias found

    44. Guion’s View According to Guion, construct validity is really the only kind of validity The logic of construct validity is evidenced throughout; must be: Reliable Characterized by good stimuli Be of interest; must show expected patterns of relations with other constructs 9 questions to ask in evaluating tests (Guion, 1998)

    45. Questions in Test Development 1. “Did the developer of the procedure have a clear idea of the attribute to be measured?” Boundaries of the attribute? Behaviors that exhibit those attributes and those that do not? Variables that it would be correlated with and those that it would not? “Are the mechanics of measurement consistent with the concept?” Appropriateness of: Presentation medium Rules of standardization (e.g., time limits) Response requirements

    46. 3. “Is the stimulus content appropriate?” Requirements for content sampling alone to justify test use: Content must be behavior that has a generally accepted meaning Domain must be defined unambiguously Domain must be directly relevant to measurement purpose (sample v. sign ala Wernimont & Campbell, 1968) Qualified judges must agree that the domain was properly sampled Responses must be scored & evaluated reliably 4. “Was the test carefully and skillfully developed?” “…I look for evidence that the plan was carried out well. The evidence depends on the plan.” Use of pilot tests? Based on appropriate item analysis?

    47. Evidence Based on Reliability 5. “Is the internal statistical evidence satisfactory?” Internal consistency, internal completeness (or relevance) In CTT, discrimination & difficulty indices 6. “Are scores stable over time and consistent with alternative measures?” Test-retest, alternate forms, interrater agreement, depending on the purpose/use of the test Bear in mind that, “…consistency may be due to consistent error.”

    48. Evidence from Patterns of Correlates “Does empirical evidence confirm logically expected relationships with other variables?” Failure to support hypotheses casts doubt on: The validity of the inference or The conceptual & operational definitions of the attribute “Does empirical evidence disconfirm alternative meanings of test scores?” Cronbach’s “strong program” of construct validation Provide an explicit theory of the attribute Identify & evaluate plausible rival inferences

    49. Evidence Based on Outcomes 9. “Are the consequences of test use consistent with the meaning of the construct being measured?”

    50. Messick’s View Validity as a unified concept: Validity cannot rely on any one form of evidence Validity does not require any one form of evidence Validity applies to all assessments Validity judgments are value judgments Expose values underlying tests by pitting pros and cons of test use against alternatives “…unintended consequences, when they occur, are also strands in the construct’s nomological network that need to be taken into account in construct theory, score interpretation, and test use” (p.744)

    51. Messick’s Views (continued) Two major threats to construct validity: Construct underrepresentation Construct irrelevant variance Construct-irrelevant difficulty Construct-irrelevant easiness “…any negative impact on individuals or groups should not derive from any source of test invalidity, such as construct underrepresentation or construct-irrelevant variance” (p. 746)

    52. Messick’s Views (continued) 6 aspects of construct validity: Content representativeness Substantive theoretical rationale & findings Structural link between scoring structure to domain Generalizability of interpretations across groups, settings, and tasks External convergent & discriminant evidence Consequential Value implications of score interpretations as a basis for action & the actual & potential consequences of test use

More Related