1 / 40

Generalized Estimating Equations (GEEs)

Generalized Estimating Equations (GEEs). Purpose: to introduce GEEs These are used to model correlated data from Longitudinal/ repeated measures studies Clustered/ multilevel studies. Outline. Examples of correlated data Successive generalizations Normal linear model

Download Presentation

Generalized Estimating Equations (GEEs)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generalized Estimating Equations (GEEs) Purpose: to introduce GEEs These are used to model correlated data from Longitudinal/ repeated measures studies Clustered/ multilevel studies

  2. Outline • Examples of correlated data • Successive generalizations • Normal linear model • Generalized linear model • GEE • Estimation • Example: stroke data • exploratory analysis • modelling

  3. Treatment groups Measurement times A Subjects, i = 1,…,n B C Randomize Correlated data • Repeated measures: same subjects, same measure, successive times – expect successive measurements to be correlated

  4. Correlated data Level 3 • Clustered/multilevel studies Level 2 Level 1 E.g., Level 3: populations Level 2: age - sex groups Level 1: blood pressure measurements in sample of people in each age - sex group We expect correlations within populations and within age-sex groups due to genetic, environmental and measurement effects

  5. Notation • Repeated measurements: yij,i = 1,… N, subjects; j = 1, … ni, times for subject i • Clustered data: yij, i = 1,… N, clusters; j = 1, … ni, measurements within cluster i • Use “unit” for subject or cluster

  6. Normal Linear Model For unit i: E(yi)=i=Xi; yi~N(i, Vi) Xi: nip design matrix : p1 parameter vector Vi: nini variance-covariance matrix, e.g., Vi=2I if measurements are independent For all units: E(y)==X, y~N(,V) This V is suitable if the units are independent

  7. Normal linear model: estimation We want to estimate and V Use Solve this set of score equations to estimate

  8. Generalized linear model (GLM)

  9. Generalized estimating equations (GEE)

  10. Generalized estimating equations Di is the matrix of derivatives i/j Vi is the ‘working’ covariance matrix of Yi Ai=diag{var(Yik)}, Ri is the correlation matrix for Yi  is an overdispersion parameter

  11. Estimated using the formula: Overdispersion parameter Where N is the total number of measurements and p is the number of regression parameters The square root of the overdispersion parameter is called the scale parameter

  12. Estimation (1) • More generally, unless Vi is known, need iteration to solve • Guess Vi and estimate  by b and hence  • Calculate residuals, rij=yij-ij • Estimate Vi from the residuals • Re-estimate b using the new estimate of Vi • Repeat steps 2-4 until convergence

  13. Estimation (2) – For GEEs

  14. Start with Ri=identity (ie independence) and =1: estimate  Use estimates to calculated fitted values: And residuals: These are used to estimate Ai, Ri and  Then the GEE’s are solved again to obtain improved estimates of  Iterative process for GEE’s

  15. Correlation For unit i For repeated measures = correl between times l and m For clustered data = correl between measures l and m For all models considered here Vi is assumed to be same for all units

  16. Types of correlation • Independent: Vi is diagonal • 2. Exchangeable: All measurements on the same unit are equally correlated • Plausible for clustered data • Other terms: spherical and compound symmetry

  17. Types of correlation 3. Correlation depends on time or distance between measurements l and m e.g. first order auto-regressive model has terms , 2, 3 and so on Plausible for repeated measures where correlation is known to decline over time 4.Unstructured correlation:no assumptions about the correlations Lots of parameters to estimate – may not converge

  18. Missing Data For missing data, can estimate the working correlation using the all available pairs method, in which all non-missing pairs of data are used in the estimators of the working correlation parameters.

  19. Choosing the Best Model Standard Regression (GLM) AIC = - 2*log likelihood + 2*(#parameters) • Values closer to zero indicate better fit and greater parsimony.

  20. Choosing the Best Model GEE QIC(V) – function of V, so can use to choose best correlation structure. QICu – measure that can be used to determine the best subsets of covariates for a particular model. the best model is the one with the smallest value!

  21. Other approaches – alternatives to GEEs • Multivariate modelling – treat all measurements on same unit as dependent variables (even though they are measurements of the same variable) and model them simultaneously • (Hand and Crowder, 1996) • e.g., SPSS uses this approach (with exchangeable correlation) for repeated measures ANOVA

  22. Other approaches – alternatives to GEEs • Mixed models – fixed and random effects • e.g., y = X + Zu + e • : fixed effects; u: random effects ~ N(0,G) • e: error terms ~ N(0,R) • var(y)=ZGTZT + R • so correlation between the elements of y is due to random effects Verbeke and Molenberghs (1997)

  23. Example of correlation from random effects Cluster sampling – randomly select areas (PSUs) then households within areas Yij =  + ui + eij Yij : income of household j in area i  : average income for population ui : is random effect of area i ~ N(0, ); eij: error ~ N(0, ) E(Yij) = ; var(Yij) = ; cov(Yij,Ykm)= , provided i=k, cov(Yij,Ykm)=0, otherwise. So Vi is exchangeable with elements: =ICC (ICC: intraclass correlation coefficient)

  24. Numerical example: Recovery from stroke Treatment groups A = new OT intervention B = special stroke unit, same hospital C= usual care in different hospital 8 patients per group Measurements of functional ability – Barthel index measured weekly for 8 weeks Yijk : patients i, groups j, times k • Exploratory analyses – plots • Naïve analyses • Modelling

  25. Numerical example: time plots Individual patients and overall regression line

  26. Numerical example: time plots for groups

  27. Numerical example: research questions • Primary question: do slopes differ (i.e. do treatments have different effects)? • Secondary question: do intercepts differ (i.e. are groups same initially)?

  28. Numerical example: Scatter plot matrix

  29. Numerical example Correlation matrix

  30. Numerical example1. Pooled analysis ignoring correlation within patients

  31. Numerical example 2. Data reduction

  32. Numerical example 2. Repeated measures analyses using various variance-covariance structures For the stroke data, from scatter plot matrix and correlations, an auto-regressive structure (e.g. AR(1)) seems most appropriate Use GEEs to fit models

  33. Numerical example 4. Mixed/Random effects model • Use model • Yijk = (j + aij) + (j + bij)k + eijk • j and j are fixed effects for groups • other effects are random • and all are independent • Fit model and use estimates of fixed effects to compare j’s and j’s

  34. Numerical example: Results for intercepts Results from Stata 8

  35. Numerical example: Results for intercepts Results from Stata 8

  36. Numerical example: Results for intercepts Results from Stata 8

  37. Numerical example: Results for slopes Results from Stata 8

  38. Numerical example: Results for slopes Results from Stata 8

  39. Numerical example: Results for slopes Results from Stata 8

  40. Numerical example: Summary of results • All models produced similar results leading to the same conclusion – no treatment differences • Pooled analysis and data reduction are useful for exploratory analysis – easy to follow, give good approximations for estimates but variances may be inaccurate • Random effects models give very similar results to GEEs • don’t need to specify variance-covariance matrix • model specification may/may not be more natural

More Related