Estimating Interaction Effects Using Multiple Regression

Estimating Interaction Effects Using Multiple Regression Herman Aguinis, Ph.D. Mehalchin Term Professor of Management The Business School University of Colorado at Denver www.cudenver.edu/~haguinis

Overview • What is an Interaction Effect? • The “So What” Question: Importance of Interaction Effects for Theory and Practice • Estimating Interaction Effects Using Moderated Multiple Regression (MMR) • Problems with MMR • Aguinis, Beaty, Boik, & Pierce (2005, J. of Applied Psychology) • The “Now What” Question: Addressing problems with MMR • Some Conclusions

What is an Interaction Effect? • The relationship between X and Y depends on Z (i.e., a moderator) X Y X Y Z Z • Other terms used: • Population control variable (Gaylord & Carroll, 1948); Subgrouping variable (Frederiksen & Melville, 1954); Predictability variable (Ghiselli, 1956); Referent variable (Toops, 1959); Modifier variable (Grooms & Endler, 1960);Homologizer variable (Johnson, 1966)

Importance of Interaction Effects: Theory • Going beyond main effects • We typically say “it depends” • More complex models • “If we want to know how well we are doing in the biological, psychological, and social sciences, an index that will serve us well is how far we have advanced in our understanding of the moderator variables of our field” (Hall & Rosenthal, 1991, p. 447)

Importance of Interaction Effects: Practice For example, personnel selection: • Test bias: The relationship between a test and a criterion depends on gender or ethnicity • “No bias exists if the regression equations relating the test and the criterion are indistinguishable for the groups in question” (Standards, 1999, p. 79) • In other words, the X-Y relationship differs depending on the value of Z (e.g., 1 = Female, 0 = Male)

Illustration of Gender as a Moderator in Personnel Selection Women Ŷwomen Common line Ŷcommon Ŷmen Job Performance Men X Test Scores

Importance of Interaction Effects: Practice • Management in General • Does an intervention work similarly well for, for example, Cantonese and American employees working in Hong Kong? (categorical moderator) • Example: Performance management system regarding teaching at university in Hong Kong. Would the same evaluation methods lead to employee (i.e., faculty) satisfaction depending on the national origin of faculty members?

Estimating Interaction Effects • Moderated Multiple Regression (MMR) • Ŷ = a + b1X + b2Z + b3X·Z, where Y = criterion (continuous variable) X = predictor (typically continuous) Z = moderator (continuous or categorical) X·Z = product term carrying information about the moderating effect (i.e., interaction between X and Z)

Statistical Significance Test • Ŷ = a + b1X + b2Z ; • Ŷ = a + b1X + b2Z + b3X·Z; ; Ho : ψ1 = ψ2 • Ho: β3 = 0 (using a t-statistic)

Estimating Interaction Effects Using Moderated Multiple Regression (MMR) • For example: • Personnel selection: Y = measure of performance, X = test score, Z = gender • Additional research areas: training, turnover, performance appraisal, return on investment, mentoring, self-efficacy, job satisfaction, organizational commitment, and career development, among others

Interpreting Interactions(Z is continuous) • Ŷ = a + b1X + b2Z + b3X·Z, • b3 = 2 means that a one-unit change in X (Z) increases the slope of Y on Z (Y on X) by 2 points

Interpreting Interactions(Z is binary, dummy coded) • Ŷ = a + b1X + b2Z + b3X·Z, • b3 = estimated difference between the slope of Y on X between the group coded as 1 and the group coded as 0. • b2 = estimated difference between X scores for a member in group coded as 1 and a member in group coded as 0 assuming the scores on Y are 0. • b1 = estimated X score for members of the group coded as 1 assuming the scores on Y are 0. • a = mean score on X for members of group coded as 0.

Pervasive Use of MMR in the Organizational Sciences • Recent review: MMR was used in over 600 attempts to detect moderating effects of categorical variables in AMJ, JAP, and PP between 1977-1998 (Aguinis, Beaty, Boik, & Pierce, 2005, JAP)

Selected Research on MMR • Aguinis (2004, Regression Analysis for Categorical Moderators, Guilford Press) • Aguinis, Beaty, Boik, and Pierce (2005, J. of Applied Psychology) • Aguinis, Boik, and Pierce (2001, Organizational Research Methods) • Aguinis, Petersen, and Pierce (1999, Organizational Research Methods) • Aguinis and Pierce (1998, Organizational Research Methods) • Aguinis and Pierce (1998, Ed. & Psychological Measurement) • Aguinis and Stone-Romero (1997, J. of Applied Psychology) • Aguinis, Bommer, and Pierce (1996, Ed. & Psychological Measurement) • Aguinis (1995, J. of Management)

Methodology: Monte Carlo Simulations • Research question: Does MMR do a good job at estimating moderating effects? • Difficulty: We don’t know the population • Solution: Monte Carlo methodology • Create a population • Generate random samples • Perform MMR analyses on samples • Compare population versus samples • Assess % of hits and misses

Problems with MMR • We don’t find moderators • If we find them, they are small Why should we care? • Theory: Failure to find support for correct hypotheses (derailment of theory advancement process; model misspecification) • Practice: Erroneous decision making (e.g., over and under prediction of performance, implementation of ineffective interventions) • Ethical implications • Legal implications

Some Culprits for Erroneous Estimation of Moderating Effects • Small total sample size • Unequal sample size across moderator-based groups • Range restriction (i.e., truncation) in predictor variable X • Scale coarseness • Violation of homogeneity of error variance assumption • Unreliability of measurement • Artificial dichotomization/polichotomization of continuous variables • Interactive effects

Unequal Sample Size Across Moderator-based Subgroups • Applies to categorical moderators (e.g., gender, national origin) • In many research situations, n1n2 • Two studies examined this issue (Aguinis & Stone-Romero, 1997; Stone, Alliger, and Aguinis, 1994) (see also Aguinis, 1995) • Conclusion: n1 needs to be (.3 n2) or larger to detect medium moderating effects

Truncation in Predictor X • Non-random sampling • Pervasive in field settings (systematic in personnel selection/test validation research, [X,Y] | X > x) • Aguinis and Stone-Romero (1997) (categorical moderator) McClelland and Judd, 1993 (continuous moderator) • Truncation has a dramatic impact on power • N = 300, medium moderating effect, power = .81 • Same conditions, truncation = .80, power = .51 • Conclusion: Even mild levels of truncation can have a substantial detrimental effect on power

Violation of Homogeneity of Error Variance Assumption • Applies to categorical moderators • Error variance: Variance in Y that remains after predicting Y from X is equal across subgroups (e.g., women, men) • Distinct from homoscedasticity assumption

Regression of Homoscedastic Data Total Sample: Women & Men

Regression for Subgroups Women Men

Artificial polichotomization of continuous variables • Median split and other common methods for “simplifying the data” before conducting ANOVAs • Cohen (1983) showed this practice is inappropriate • In the context of MMR, some have used a median split procedure on continuous predictor Z and compared correlations across groups • MMR always performs better than comparing artificially-created subgroups (Stone-Romero & Anderson, 1994) • Conclusion: Do not polichotomize truly continuous predictors

Interactions Among Artifacts • Concurrent manipulation of truncation, N, n1 and n2, and moderating effect magnitude (Aguinis & Stone-Romero, JAP, 1997) . • Results: Methodological artifacts have interactive effects on power. • Even if conditions conducive to high power are favorable regarding one factor (e.g., N), conditions unfavorable regarding other factors (e.g., truncation) will lead to low power. • Conclusion: Relying on a single strategy (e.g., increase N) to improve power will not be successful if other methodological and statistical artifacts

Aguinis, Beaty, Boik, & Pierce (2005, JAP) • Q1: What is the size of observed moderating effects of categorical variables in published research? • Q2: What would the size of moderating effects of categorical variables be in published research under conditions of perfect reliability? • Q3: What is the a priori power of MMR to detect moderating effects of categorical variables in published research? • Q4: Do MMR tests reported in published research have sufficient statistical power to detect moderating effects conventionally defined as small, medium, and large?

Method • Review of all articles published from 1969 to 1998 in Academy of Management Journal (AMJ), Journal of Applied Psychology (JAP), and Personnel Psychology (PP) • Criteria for study inclusion: • At least one MMR analysis • The MMR analysis included a continuous criterion Y, a continuous predictor X, and a categorical moderator Z

Effect Size and Power Computation • Total of 636 MMR analyses • Moderator sample sizes for 507 (79.72%) • Moderator group sample sizes and predictor-criterion rs for 261 (41.04%) • Effect sizes and power computation based on 261 MMR analyses for which ns and rs were available. We used SD information when available, and assumed homogeneity or error variance when this information was not available

Results (I) Frequency of MMR Use over Time:

Q1: Size of Observed Effects (I) • Effect size metric: • Median f2 = .002, • Mean (SD) = .009 (.025) • 95% CI = .0089 to .0091 • 25th percentile = .0004 • 75th percentile = .0053 • Effect size values over time: r(261) = .15, p < .05

Q1: Size of Observed Effects (II) • F(2, 258) = 4.97, p = .008, η2 = .04 • Tukey HSD tests: AMJ > JAP and PP > JAP

Q1: Size of Observed Effects (III) • F(2, 258) = 8.71, p < .001, η2 = .06 • Tukey HSD tests: Other > Ethnicity

Q1: Size of Observed Effects (IV) • t(259) = -.226, p = ns • t(259) = -0.95, p = ns

Q2: Construct-level Effects (I) • Median f2 = .003 • Increase of .001 over median observed effect size • Mean (SD) = .017 • Increase of .008 over mean observed effect size

Q3: Statistical Power (I)

Q3: Statistical Power (II)

Q4: Power to Detect Small, Medium, and Large Effects • Small f2 (.02); mean power = .84; 72% of tests would have a power of .80 or higher • Medium f2 (.15); mean power = .98 • Large f2 (.35); mean power = 1.0

Some Conclusions • We expected effect size to be small, but not so small (i.e., median of .002) • Computation of construct-level effect sizes did not improve things by much (i.e., median of .003) • More encouraging results: • None of the 95% CIs around the mean effect size for the various comparisons included zero • Effect sizes have increased over time • Given the observed sample sizes, mean power is sufficient to detect effects ≥ .02 • 72% of studies had sufficient power to detect an effect ≥ .02

Some Implications • Are theories in dozens of research domains incorrect in hypothesizing moderators? • Are hundreds of researchers in dozens of disparate domains wrong and population moderating effects so small? • Could be, but….. more likely, methodological artifacts decrease the observed effect sizes substantially vis-à-vis their population counterparts • More attention needs to be paid to design and analysis issues that decrease observed effect sizes • Conventional definitions of effect size (f 2) for moderators should probably be revised

The “Now What” Question • Before data are collected • Larger sample size * • More reliable measures * • Avoid truncated samples * • Use non-coarse scales (e.g., program by Aguinis, Bommer, & Pierce, 1996, Ed. & Psych. Measurement) • Equalize sample size across moderator-based subgroups • Use computer programs in the public domain to estimate sample size needed for desired power level • Gather information on research design trade-offs * Easier said that done!

Tools to Improve Moderating Effect Estimation (Aguinis, 2004) • Scale coarseness • Aguinis, Bommer, and Pierce (1996, Educational & Psychological Measurement) • Homogeneity of error variance • Aguinis, Petersen, and Pierce (1999, Organizational Research Methods) • Power estimation and research design trade-offs • Aguinis, Pierce, and Stone-Romero (1994, Educational & Psychological Measurement) • Aguinis and Pierce (1998, Educational & Psychological Measurement) • Aguinis, Boik, and Pierce (2001, Organizational Research Methods)

Assessment of Assumption Compliance • DeShon and Alexander’s (1996) 1.5 rule of thumb • Bartlett’s homogeneity test: M = • k = number of sub-groups • nk = number of observations in each sub-group • s2 = sub-group variance on the criterion • v = degrees of freedom from which s2 is based

Homogeneity is not Met... Now What? • Use alternatives to MMR • Alexander and colleagues' normalized-t approximation: • OR James's second-order approximation: ; where

Program ALTMMR • Calculates • Error variance ratio (highest if more than 2 subgroups) • Bartlett’s M • James’s J • Alexander’s A • Uses sample descriptive data • nk , sx , sy , rxy • User sets p = .05 or .01 (for all but James’s statistic)

Program ALTMMR • Described in detail in Aguinis (2004) • Available at www.cudenver.edu/~haguinis/ (click on MMR icon on left side of page) • Executable on-line or locally

Power Estimation • Program POWER • Aguinis, Pierce, and Stone-Romero (1994, Ed. & Psych. Measurement) • Program MMRPWR • Aguinis and Pierce (1998, Ed. & Psych. Measurement) • Program MMRPOWER • Aguinis, Boik, and Pierce (2001, Organizational Research Methods)

Program MMRPOWER • Problems/Challenges regarding POWER and MMRPWR programs: • Based on extrapolation from simulations: Range of values is limited • Absence of factors known to affect power of MMR (e.g., unreliability) • Theoretical approximation to power:

Program MMRPOWER • Described in detail in Aguinis (2004) • Available at www.cudenver.edu/~haguinis/ (click on MMR icon on left side of page) • Executable on-line or locally

Some Conclusions • Observed moderating effects are very small • MMR is a low power test for detecting effect sizes as typically observed • Researchers are not aware of problems with MMR • Implications for theory and practice • User-friendly programs are available and allow researchers to improve moderating effect estimation • Using these tools will allow researchers to make more informed decisions regarding the operation of moderating effects

Estimating Interaction Effects Using Multiple Regression