Methods for Multilevel Analysis

Methods for Multilevel Analysis XH Andrew Zhou, PhD Professor, Department of Biostatistics University of Washington

Examples of Multilevel (Hierarchical) Data • Individual-family-neighborhood • Students-classroom-school-district • Patient-provider-facility (the Ambulatory Care Quality Improvement Project (ACQUIP). • Other types, multiple outcomes nested within individual

ACQUIP AlcoholTrial • A group-randomized trial • Intervention: Feedback given to the providers at each visit on patient’s general perceived health status as well as the condition specific perceived health status for 6 common conditions — chronic obstructive pulmonary disease (COPD), coronary artery disease (CAD), hypertension, depression, diabetes, and alcohol problems. • Outcome at 1-yr follow-up: (1) Self-reports of advice about alcohol from their provider; binary outcome.

Hierarchical Nature of Data • Patients – Providers – facility • Patient’s characteristics, e.g. advice at baseline, co-morbility • Provider’s characteristics, e.g panel size • Facility’s characteristics, e.g. urban vs rural.

Research Questions • Whether the intervention was significantly related with patient self-reports of advice about alcohol from their providers after one year of the intervention. • Independent effects of patient-level, provider-level, and facility-level factors. • Quantification of provider-to-provider variability and facility-to-facility variability and the degree which it can be explained by patient-level, provider-level, and facility-level factors

Research Questions, Cont • Do facilities differ in expected outcomes after controlling for individual-level, provider-level, and facility-level factors? • Do providers differ in expected outcomes after controlling for individual-level, provider-level, and facility-level factors?

Multilevel (Hierarchical) Models A hierarchical model analysis will treat the sites and the providers as random effects and will parse out the amount of total variation in the outcome that is attributable to each level of hierarchy.

An example using two-level linear model on schools • A study of the relationship between a single student-level predictor variable (say, socioeconomic status (SES)) and one student-level outcome variable (mathematics achievement) in J schools randomly drawn from the entire population of schools.

The SES-Achievement relationship in one school • Our regression model would be • Figure 2.1 provides a scatterplot of this relationship.

Centering in covariates • 0 is defined as the expected achivement of a student whose SES is zero. • It may be helpful to scale the independent variable, X, so that the intercept will be meaningful. • We center SES by subtracting the mean SES from each score. • Figure 2.2 shows the regression model with centering.

The SES-Achievement relationship in two schools • Figure 2.3 shows separate regression models for two schools.

The two lines indicate that School 1 and School 2 differ in two ways. • (1) School 1 has higher mean than school 2 (01>02) • (2) SES is less predictive of achievement in School 1 than School 2 (11<12)  • If students had been randomly assigned to the two schools, we could say that School 1 is both more “effective” and more “equitable”. • Of course, students are not assigned at random, so such interpretations of school effects are unwarranted without taking into account other differences in student composition.

The SES-Achievement relationship in J schools (2-level Variance Component)

Often sensible and convenient to assume that the intercept and slope have a bivariate normal distribution across the population of schools.

Interpretation • 0: the average school mean for the population of schools • 00: the population variance among the school means • 1: the average SES-achievement slope for the population of schools • 11:the population variance among the slopes • 01: the population covariance between slopes and intercepts

Figure 2.4 provides a scatterplot of the relationship between 0j and 1j for a hypothetical sample of 200 schools. • There is more dispersion among means than slopes (00> 11) • Two effects tend to be negatively correlated (01<0); schools with high averaged achievment, 0j, tend to have weak SES-achievement relationship, 1j

Modeling the second level • Having examined graphically how schools vary in terms of their intcepts and slopes, we wish to develop a model to predict 0j and 1j using school characteristics. • Let Wj be an indicator, which takes on a value of one for Catholic schools and a value of zero for public

Two-level Linear Model, Cont

Interpretation • 00: the mean achievement for public schools • 01: the mean achievement difference between Catholic and public schools • 10: the average SES-achievement slope in public schools • 11: the mean difference in SES-achievement slope in between Catholic and public schools • u1j:the unique effect of school j on mean achievement holding Wj constant • u0j: the unique effect of school j on SES-achievement slope holding Wj constant

Estimation methods • It is not possible to estimate the parameters of these regression models directly because the outcomes (0j, 1j) are not observed. • However, the data contain information needed for this estimation.

Estimation methods, cont • Combining models in two stages, we obtain

Estimation methods, Cont • The overall linear regression model is not the typical linear model assumed in standard ordinary least squares (OLS). • Efficient estimation and accurate hypothesis testing based on OLS require that the random errors are independent, normally distributed, and have constant variance. • In contrast, random errors in our overall model are dependent within each school and also have non-constant variances.

Estimation methods, cont. • The variance of random errors has the following complicated form:

Estimation methods, cont • Through standard regression analysis is not appropriate, such models can be estimated by iterative maximum likelihood procedure. • Figure 2.5 provides a graphical representation of the model specified above. • Here we see two hypothetical plots of the association between 0j and1j, one for public and a second for Catholic schools. • Plots show Catholic schools have both higher mean achievement and weaker SES effects than do the public school

Estimation methods, Cont • Three types of parameters to estimate to be estimated: • Fixed effects (00,01,10,11) • Random level-1 coefficients (0j,1j) • Variance-covariance components (2,00,11,01)

Three common estimation methods • Maximum likelihood (ML) method is a general estimation procedure, which produces estimates for the population parameters that maximize the probability of the observing the data given the model. • Iterative generalized least squares (IGLS) and Restricted Iterative generalized least squares. • Bayesian method

ML method • Two different likelihood functions: • Full Maximum Likelihood (FML) – both the regression coefficients and the variance components are included in the likelihood function. • Restricted Maximum Likelihood (RML) – only the variance components are included in the likelihood function, and the regression coefficients are estimated in a second estimation step.

Comparison of these two methods • FML is more efficient and can provide estimates for both variance components and fixed effect parameters. But, FML may produce biased estimates for variance components. • RML can provide less biases estimates for the variance components and is equivalent to ANOVA estimates, which are optimal, if the groups are balanced. • FML still continues to be used because (1) its computation is generally easier, and (2) it is easier to compare two models that differ in the fixed parameters using the likelihood-based tests. However, with RML, only differences in the random part can be compared with likelihood-based tests

IGLS and RIGLS • The combined model is

IGLS and RIGLS, Cont • If , 00, 11, and 01 were known, then the covariance matrix,, could be constructed immediately, and the estimation could be performed with generalized least squares. • However, without knowledge of the covariance matrix, the estimation method is instead and iterative process known as iterative generalized least squares (IGLS).

IGLS and RIGLS, Cont • The first step is to start with reasonable estimates of the fixed parameters. Typically these are the estimates from Ordinary Least Squares (OLS) that assumes 00=11=01=0. • From these estimates, the raw residuals are formed:

IGLS and RIGLS, Cont

IGLS and RIGLS, Cont • With the estimates of and from GLS, the iterative procedure returns to the fixed part of the model and calculates new estimates of the fixed effects. • The procedure alternates between the fixed and random effects in this way until convergence, or until the parameter estimates do not change from iteration to iteration.

IGLS and RIGLS, Cont • IGLS estimation may produce biased estimates of the random parameters because it does not take into account the sampling variation of the estimates for variance components. • This may be most severe in small samples. • However, unbiased estimates can be produced using Restricted Iterative Generalized Least Squares (RIGLS). • The main difference between IGLS and RIGLS is that IGLS uses maximum likelihood and RIGLS uses restricted maximum likelihood.

Bayesian method • Bayesian methods combine any prior information about the parameters with the information contained in the data to produce a posterior distribution. • MCMC methods are commonly used computational methods for generaring a random sample from a posterior distribution. • MCMC methods are also iterative and include Gibbs sampling and Metropolis-Hastings sampling. MCMC methods tend to produce more accurate interval estimates for small samples.

Three-level binary response models for the Alcohol Drinking • Let Yijk be the binary response variable for whether to receive drinking advice by subject i cared by provider j in hospital k • Xijk is an intervention status for subject i by provider j in hospital k.

Three-level logistic regression

The parameter eis a natural test for whether the assumption of Binomial variation is valid. • If is significantly different from one, the data is said to exhibit extra-binomial variation. • If is less than one is, the data is said to be under-dispersed and if is greater than one, the data is is said to be over-dispersed.

Two estimation methods Two estimation methods for multi-level logistic regression models: • A quasi-likelihood approach • Bayesian approach with MCMC methods. I will briefly describe these two approaches below.

Two Quasi-likelihood methods • For the quasi-likelihood approach, the first step in the estimation is to approximate the non-linear logistic regression equation using a Taylor series expansion. A Taylor series approximates a nonlinear function by an infinite series of terms. • If only the first term in the series is used, then the estimation is known as a first order approximation. • If the second term in the series is also used, then is referred to as second order approximation. • If the Taylor series is expanded about the fixed parameters only, then the estimation is known as Marginal Quasi-likelihood (MQL).

Two Quasi-likelihood methods,Cont • If the Taylor series is expanded about the fixed and the random parameters, then the estimation is known as Penalized Quasi-Likelihood (PQL). • Once the quasi-likelihood has been formed, the estimation procedures, IGLS and RIGLS, can be applied to estimate the parameter values.

The MCMC method used for the logistic regression equations in this paper will be Metropolis-Hastings sampling. Bayesian method

ACQUIP AlcoholTrial • Binary outcome at 1-yr follow-up: (1) Self-reports of advice about alcohol patients receive from their provider. • Patient-level covariates • Provider-level covariates.

The Alcohol Example, Cont • Random assignment at the firm level should ensure that, on average, the two groups should be balanced on the baseline covariates. However, imbalance may still occur and confounding may still present a problem. • Patient-level potential confounders: hypertension, liver disease, being a smoker in the past year, and the AUDIT score. • Provider-level potential confounders: the number of patients per provider (Panel Size) and provider training.

Alcohol example

Methods for Multilevel Analysis