Applications of G-estimation using a new Stata command

Applications of G-estimation using a new Stata command Jonathan Sterne jonathan.sterne@bristol.ac.uk Kate Tilling kate.tilling@bristol.ac.uk Department of Social Medicine,University of Bristol UK

Outline • Time varying confounding and G-estimation • G-estimation in Stata • Applications • Discussion and future plans

A covariate is a time-varying confounder for the effect of exposure on outcome if: • past covariate values predict current exposure • current covariate value predicts outcome Example: • people with low CD4 are more likely to get HAART • Low CD4 is a risk factor for AIDS and death If, in addition, past exposure predicts current covariate value then standard survival analyses with time-updated exposure effects will give biased exposure effect estimates For example, CD4 count predicts HAART and HAART raises CD4 counts

Assume that subject i has an underlying counterfactual failure time Ui- the time to failure had they never been exposed. This is unobservable for subjects who were exposed at any time Assume that exposure accelerates failure time by a factorexp(-) - the causal survival time ratio. So if <0 exposure increases survival, if >0 exposure decreases survival If we knew , then for any subject who experienced the outcome event at time Ti, the counterfactual failure time could be derived by: Example: if subject i experienced the outcome event at 5 years and was exposed for 3 years thenUi =3exp()+2 G-estimation (1)

Assume that there are no unmeasured confounders conditional on measured history (past and present confounders and past exposure) subjects’ present exposure is independent of their counterfactual failure time Ui e.g. for 2 individuals with identical histories, the decision to quit smoking does not depend on underlying survival time G-estimation (2) Use logistic regression to search for a value of  that satisfies this condition

No competing risks Replace U() with variable indicating whether individual would have been observed to fail both if they were exposed and if they were unexposed. Competing risks Assume that conditional on known covariates censoring due to competing risks is independent of failure time Estimate the cumulative probability of being free from competing risks until end of follow up, and weight by the inverse of this probability. Censoring

Written for Stata User specifies exposure, covariates (including baseline and lagged covariates) and any censoring variables Data set up in Stata survival analysis format (i.e. start time, end time and failure indicator for each interval for each individual) Uses interval bisection method to search for G-estimate and 95% CI (or user can specify range and ‘step’ for grid search) The stgest command

Caerphilly study • 2512 men first examined 1979 to 1983, mean age at baseline 52 years • Three further follow up surveys with ascertainment of MI and deaths to August 2000 • Data from the first examination is used to provide baseline exposure measures, so follow-up starts from the second examination • 1756 men included in analyses • 244 had a first MI or died from CHD between the second examination and the end of follow up

Baseline smoking history, age, self-reported CHD, gout, diabetes, high blood pressure Every visit BP, BMI, smoking status, total cholesterol, CHD, gout, diabetes, fibrinogen Data

Four possibilities: Not censored 1175 (66.9%) MI or MI death 244 (13.9%) Death from other cause 231 (13.2%) Lost 106 (6.0%) Multinomial logistic regression estimate the probability that each id was censored (last two categories) as the product of the probability of censoring at each examination Censoring

list id visit examdat exitdate mi examdat2 cursmok if touse id visit examdat exitdate mi examdat2 cursmok 16. 1021 1 10sep1979 31jul1984 0 31jul1984 0 17. 1021 2 31jul1984 17mar1992 0 31jul1984 0 18. 1021 3 17mar1992 18jun1996 1 31jul1984 0 19. 1022 1 10sep1979 19sep1984 0 19sep1984 1 20. 1022 2 19sep1984 20nov1989 0 19sep1984 1 21. 1022 3 20nov1989 28oct1993 0 19sep1984 1 22. 1022 4 28oct1993 31dec1998 0 19sep1984 0 23. 1023 1 10sep1979 03oct1984 0 03oct1984 1 24. 1023 2 03oct1984 20nov1989 0 03oct1984 1 25. 1023 3 20nov1989 08nov1993 0 03oct1984 1 26. 1023 4 08nov1993 31dec1998 0 03oct1984 1

. stset exitdate, id(id) failure(mi) origin(time examdat2) scale(365.25) id: id failure event: mi ~= 0 & mi ~= . obs. time interval: (exitdate[_n-1], exitdate] exit on or before: failure t for analysis: (time-origin)/365.25 origin: time examdat2 ----------------------------------------------------------------------- 6377 total obs. 1756 obs. end on or before enter() ----------------------------------------------------------------------- 4621 obs. remaining, representing 1756 subjects 244 failures in single failure-per-subject data 18547.87 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 14.47502

. list id visit examdat exitdate mi _t0 _t _d _st if touse, noobs nodisp id visit examdat exitdate mi _t0 _t _d _st 1021 1 10sep1979 31jul1984 0 . . . 0 1021 2 31jul1984 17mar1992 0 0.00 7.63 0 1 1021 3 17mar1992 18jun1996 1 7.63 11.88 1 1 1022 1 10sep1979 19sep1984 0 . . . 0 1022 2 19sep1984 20nov1989 0 0.00 5.17 0 1 1022 3 20nov1989 28oct1993 0 5.17 9.11 0 1 1022 4 28oct1993 31dec1998 0 9.11 14.28 0 1 1023 1 10sep1979 03oct1984 0 . . . 0 1023 2 03oct1984 20nov1989 0 0.00 5.13 0 1 1023 3 20nov1989 08nov1993 0 5.13 9.10 0 1 1023 4 08nov1993 31dec1998 0 9.10 14.24 0 1

. makebase cursmok hearta gout highbp diabet fibrin chol cholsq /* > */ bpsyst bpdias obese thin, firstvis(1) visit(visit) Baseline confounders storage display value variable name type format label variable label --------------------------------------------------------------------- Bcursmok byte %9.0g Bhearta byte %9.0g Bgout byte %9.0g Bhighbp byte %9.0g Bdiabet byte %9.0g Bfibrin float %9.0g Bchol float %9.0g Bcholsq float %9.0g Bbpsyst int %9.0g Bbpdias int %9.0g Bobese byte %9.0g Bthin byte %9.0g

. makelag cursmok hearta gout highbp diabet fibrin chol cholsq /* > */ bpsyst bpdias obese thin, firstvis(1) visit(visit) Lagged confounders storage display value variable name type format label variable label ---------------------------------------------------------------------- Lcursmok byte %9.0g Lhearta byte %9.0g Lgout byte %9.0g Lhighbp byte %9.0g Ldiabet byte %9.0g Lfibrin float %9.0g Lchol float %9.0g Lcholsq float %9.0g Lbpsyst int %9.0g Lbpdias int %9.0g Lobese byte %9.0g Lthin byte %9.0g

. stcox cursmok Agegrp* hearta gout highbp diabet fibrin chol cholsq bpsyst bpdias obese thin B* L* failure _d: mi analysis time _t: (exitdate-origin)/365.25 origin: time examdat2 id: id No. of subjects = 1756 Number of obs = 4621 No. of failures = 244 Time at risk = 18547.87132 LR chi2(41) = 178.92 Log likelihood = -1662.3478 Prob > chi2 = 0.0000 ---------------------------------------------------------------------- _t | _d | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] ---------+------------------------------------------------------------ cursmok | 1.014992 .2085446 0.07 0.942 .6785331 1.518288 (remaining output omitted)

. stgest cursmok Agegrp* fibrin hearta gout highbp diabet chol cholsq bpsyst bpdias obese thin, visit(visit) firstvis(2) lagconf(cursmok fibrin hearta gout highbp diabet chol cholsq bpsyst bpdias obese thin) baseconf(fibrin hearta gout highbp cursmok chol cholsq diabet bpsyst bpdias obese thin) lasttime(mienddat) range(-2 2) saveres(caergestsmoknocens) replace causvar: cursmok visit: visit Range: -2 2, rnum: 2 Search method: interval bisection -2.00 2.00 0.00 1.00 0.50 0.25 0.13 0.19 0.22 0.23 0.24 0.24 0.24 0.24 0.38 0.31 0.34 0.36 0.37 0.37 0.37 0.37 -1.00 -0.50 -0.25 -0.13 -0.06 -0.03 -0.02 -0.01 -0.00 -0.00 -0.00 savres: caergestsmoknocens G estimate of psi for cursmok: 0.239 (95% CI -0.001 to 0.368) Causal survival time ratio for cursmok: 0.787 (95% CI 0.692 to 1.001)

. weibull _t cursmok Agegrp* hearta gout highbp diabet fibrin chol cholsq bpsyst bpdias obese thin B* L* if visit>=2, dead(_d) t0(_t0) hr _t | Haz. Ratio Std Err z P>|z| [95% Conf. Interval] --------+--------------------------------------------------------- cursmok | 1.01690 .2083929 0.08 0.935 .6805221 1.519549 (rest of output omitted) . gesttowb g-estimated hazard ratio 1.28 ( 1.00 to 1.47)

. * allowing for censoring due to competing risks; . stgest cursmok Agegrp* fibrin hearta gout highbp diabet chol cholsq bpsyst bpdias obese thin, visit(visit) firstvis(2) lagconf(fibrin hearta gout highbp diabet cursmok chol cholsq bpsyst bpdias obese thin) baseconf(fibrin hearta gout highbp cursmok chol cholsq diabet bpsyst bpdias obese thin) lasttime(mienddat) saveres(caergestsmok) replace idcens(idcrcens) range(-2 2) pnotcens(pnotcens) G estimate of psi for cursmok: 0.290 (95% CI -0.190 to 0.773) Causal survival time ratio for cursmok: 0.748 (95% CI 0.462 to 1.210) . gesttowb g-estimated hazard ratio 1.34 ( 0.82 to 2.19)

15, 792 members of 4 communities in the USA baseline exam between 1987 and 1989 3 follow-up exams at 3 year intervals followed up for death, CHD and stroke Atherosclerosis Risk in Communities (ARIC) study

Baseline smoking history, education level, age, sex, ethnicity, self-reported stroke/CHD Every visit BP, BMI, smoking status, total, HDL and LDL cholesterol, diabetes status, use of anti-hypertensive medication ARIC data

13898 persons with data on visits 1 and 2 7699 (55%) female Mean age =54 (min=45, max=65). CHD present in 625 (5%) 9754 (70%) not on anti-hypertensive medication at visits 1 or 2. ARIC data

Weibull analysis and G-estimation Outcomes - death, incident CHD. CHD as outcome - exclude those with CHD at baseline/1st visit, censor if die of other causes Exposures - BP, smoking, BMI, HDL,LDL BP - exclude those on anti-hypertensives at baseline, censor at anti hypertensive use. Methods

Published in the American Journal of Epidemiology, April 15th 2002. Tilling K, Sterne JAC, Szklo M. G-estimation of the effects of cardiovascular risk factors on all-cause mortality and CHD: the ARIC study. AJE 2004; 155: 710-718 Summary: effects tended to be under-estimated by Weibull compared to g-estimation. Results

Model specified that exposure at a given visit multiplies survival from that moment by a given amount. Alternatives: effect on survival only lasts for a given period (e.g. use of anti-hypertensives) effect on survival starts after a given period (e.g. possible lagged effect of smoking) Discussion - model specification

Implement MSMs in Stata Effect of cardiovascular risk factors (e.g. smoking, fibrinogen) and anti-hypertensives in Caerphilly study Effect of treatments (e.g. anti-hypertensives, anti-platelet agents) on stroke recurrence using South London Stroke Register Future work and (we hope) collaboration

Causal effect of HAART When to start Effect of different drug combinations Will require large collaborations between cohorts Aim to build on an existing collaboration between 13 cohorts involving 12500 patients starting HAART Future work and (we hope) collaboration

Applications of G-estimation using a new Stata command

Applications of G-estimation using a new Stata command

Presentation Transcript

Econometric Analysis Using Stata

Econometric Analysis Using Stata

Latent Class Analysis Using Stata

Development of large-scale applications with Stata

Monad – A New Command Shell

Econometric Analysis Using Stata

Beyond Statistical Significance: Using Stata Post-Estimation Procedures to Examine Substantive Effects

Estimation – More Applications

Multilevel Modeling using Stata

Estimation using COCOMO

STATA APPLICATIONS

Using Estimation

How to Begin Using Stata

Estimation Results with Stata Graphics

Modeling of complex biological systems Developing a new parameter estimation method using

THE NEW COMMAND

RESEARCH WORKFLOW USING STATA

Using Stata as a Computation/Estimation Companion in a Relational Database Environment

Econometric Analysis Using Stata

Logistic Regression using STATA

Development of large-scale applications with Stata

How to Begin Using Stata