Measuring Psychological Concepts: Understanding Variables and Validity

Chapter 5Measurement and Sampling

Psychological Concepts • Measuring complex concepts • Psychological concepts are often abstract • We create operational definitions to measure these complex, abstract concepts • Our operational definitions have to make sense for the research questions we want to answer • Operational definitions • Our measurements represent the concepts that we cannot observe directly

Defining and Measuring Variables • Operational definition—A working definition of a concept that is based on how we measure it • Variable—An element that, when measured, can take on different values (e.g., intelligence test scores) • Hypothetical construct—A concept that helps us understand behavior but that is not directly observable

Defining and Measuring Variables • Example: One measure of stress? • Score on the Social Readjustment Rating Scale • 43 items on the scale • Different number of stress units for different life events; greater numbers of points reflect greater stress (and likelihood of illness) • Death of a spouse: 119 points • Jail term: 79 points • Change in schools: 35 points • Christmas: 30 points • Add points on the 43 items to get stress level • The score on the scale represents the underlying hypothetical construct of stress

Defining and Measuring Variables • The importance of culture and context in defining variables • The Social Readjustment Rating Scale may not be good for some populations, like young college students • A different scale includes 51 items, including roommate problems, maintaining a steady dating relationship, attending a football game • It is critical to understand the people who you are measuring

Multiple Possible Operational Definitions • You can measure constructs in a variety of ways, depending on your research. • Considering stress: • Physiological measurements (cortisol level in the bloodstream) • Questionnaire scores • You choose your operational definition depending on the nature of your research question and on practical issues

Probability Sampling • Probability sampling—Set of sampling methods in which every person in the population has a specified probability of being selected • Generalization—Applying results of research to an entire population • Probability sampling permits researchers to generalize from their sample to the population because the means of selection does not bias toward including or excluding people, so the sample is likely to represent the entire population

Probability Sampling

Nonprobability Sampling • Nonprobability sampling—Sampling that relies on groups of people who are convenient or available to participate • Nonsampling error—Problem with nonprobability sampling in which some members of the population are systematically excluded from participation • Problem with nonprobability sampling—It is not clear to whom the results will generalize because the sample is idiosyncratic

Nonprobability Sampling

Making Useful Measurements • Reliability—A characteristic of data related to the consistency of a the measurement • Validity—A characteristic of data related to how useful a measurement is for the intended purpose • Measurement error—An error in measurement due to poor measuring instruments or humor error, which can lead to poor conclusions

Making Useful Measurements • The relation between reliability and validity • Reliability simply means consistency but does not indicate how useful the measurements are • The validity level of measurements is limited by how reliable they are (e.g., low reliability guarantees low validity) • If measurements are reliable, they might be valid • If measurements are not reliable, they cannot be valid • If measurements are valid, they must be reliable

Making Useful Measurements

Considering Validity in Research

Construct Validity • Construct validity: Is our measurement appropriate for what we are trying to measure? • The Beck Depression Inventory has acceptable construct validity for people from many cultures • Mexican • Portuguese • Arabic • American

Construct Validity • The Beck Depression Inventory does not have good construct validity for • Alzheimer’s patients • Seriously depressed patients • Some people with chronic disease

Construct Validity • The Beck Depression Inventory (like all measurements) may have good construct validity in some situations but not in all.

Convergent Validity • Measurements that correlate when they should correlate • Sometimes they should correlate positively • Sometimes they should correlate negatively

Divergent Validity • Measurements do not correlate (either positively or negatively) when there is no reason to expect that they should.

Internal Validity Internal Validity: Can you identify the most likely cause and rule out alternative explanations in understanding behavior • Random assignment of participants to experimental groups increases the likelihood that internal validity will be high • Random assignment is useful is permitting researchers to draw cause-and-effect conclusions

How to Randomly Assign 1. Go through a random number table and write down the numbers from 1 to N (your sample size) in the order in which they occur in the table. 2. Pair each person with the random numbers as they occur. 3. Put each person paired with an odd number into Group 1 and each person paired with an even number into Group 2.

External Validity • Are your measurements relevant for • Other settings? • Other people? • Other times?

Statistical Conclusion Validity Are your statistics appropriate to answer your research question? • Scales of measurement • Some researchers place importance on scales of measurement in determining statistical tests • Some researchers think that scales of measurement are generally unimportant • There are controversies regarding the value of null hypothesis statistical testing

The SAT: Questions of Reliability and Validity The SAT is fairly reliable, but is it valid? • That is, people taking the SAT on different occasions generally score about the same each time • Does your SAT score predict your grades in college? • In general, there is a reasonably high (but not perfect) correlation between SAT scores and college grades (r > .50), suggesting that it shows a degree of construct validity

Controversy: The Head Start Program How effective is the Head Start Program? • How do you operationally define effective? • Gains in IQ scores by children in Head Start are not long lasting, so by this measure, Head Start is not effective • Head Start children show higher grades and graduation rates than children who did not participate in Head Start, so by this measure, Head Start is effective

Controversy: The Head Start Program • Most, but not all, studies show long-term gains due to Head Start • Complex questions like this are very difficult to answer • To reach sound conclusions, we have to evaluate how adequate (i.e., valid) the measurements are, weigh all the evidence, and think critically about the issue

Scales of Measurement

Scales of Measurement • Some researchers believe that most statistical approaches in psychology require interval or ratio data • There is sometimes controversy about what scale applies to a given measurement.

Scales of Measurement • Is IQ score nominal, ordinal, interval, or ratio? • IQ is not simply a set of categories, so it is not nominal • IQ scores let you say, “Person A has a higher score than Person B” so IQ scores are at least ordinal. • A difference between scores of 80 and 90 and between 130 and 140 represent equal differences in numerical values, so an IQ score appears to be at least interval, but psychologically, is the difference in function between people with scores of 80 and 90 equal to the difference between people with scores of 130 and 140? If not, is the scale really interval with respect to psychological differences? • If somebody has a score 10% higher than another person, it does not mean that the person is 10% smarter, so the scale may not be ratio.

Is IQ score nominal, ordinal, interval, or ratio? • Some people have argued that IQ scores are really ordinal. • Others argue that IQ scores are interval. • In spite of the theoretical controversy, researchers treat measures like IQ scores as interval for purposes of data analysis.

Measuring Psychological Concepts: Understanding Variables and Validity