530 likes | 692 Views
Meta-Analysis of Clinical Data for Regulated Biopharmaceutical Products: Answers to Frequently Asked Questions. Brenda Crowe, Research Advisor, Eli Lilly and Company With special thanks to Jesse Berlin. Midwest Biopharmaceutical statistics workshop May 21, 2013. Disclaimer.
E N D
Meta-Analysis of Clinical Data for Regulated Biopharmaceutical Products: Answers to Frequently Asked Questions Brenda Crowe, Research Advisor, Eli Lilly and Company With special thanks to Jesse Berlin Midwest Biopharmaceutical statistics workshop May 21, 2013
Disclaimer The views expressed herein represent those of the presenter and do not necessarily represent the views or practices of the presenter’s employer or any other party
Acknowledgements JuergenKuebler Amy Xia Jesse Berlin Ed Whalen Carol Koro
Agenda • Background • The 6 questions • What studies should be pooled/combined? • Method of ascertainment? • Individual patient data (vs. aggregate patient data)? • Multiple looks and/or multiple endpoints? • Heterogeneity of design and results? • Fixed-effect models or random-effects models? • Concluding remarks
Background SPERT = Safety Planning, evaluation and Reporting Team • During drug development, sponsors need to recognize safety signals early and adjust the development program accordingly • Crowe et al. (SPERT): overview of the framework and planning of MA in drug development but did not provide details regarding practical issues arising during implementation. • Focus here on common analytical topics (6 questions) • Emphasis on situations that arise in drug development, mostly premarketing
A little vocabulary (in today’s context) • POOL (noun): a grouping of studies used to address a specific research question • Swimming in data (avoid drowning)
Existing Guidance • FDA guidance on premarketing risk assessment
Existing Guidance • International Conference on Harmonization (ICH) M4E
Existing Guidance • Council for International Organizations of Medical Sciences VI (CIOMS VI) report
What to pool? Decisions on what to combine depend on the specific questions to be answered (duh) Often there are several questions and these might require different subsets of studies or subjects
Pools may be based on Type of control: placebo vs. active Dose route or regimen Concomitant (background) therapy Methods of eliciting adverse events (e.g., active vs. passive). Disease state Duration of treatment (and follow-up?) Subgroups of patients based on age groups, geographies, ethnicity groups, or severity of disease, etc.
Considerations for inclusion in a pool Usually exclude Phase 1 pharmacokinetic and pharmacodynamic studies (because short duration, healthy subjects or patients with incurable end-stage disease). Studies that cannot / will not provide individual patient level data if required for analysis.
Considerations for inclusion in a pool It is generally most appropriate to combine data from studies that are similar. Strong similarity is not required for pooling, if the effects of treatment don’t depend on the trial characteristics being considered.
For example . . . • Suppose some studies (or arms) were conducted at a higher dose than the sponsor is proposing for the marketing label. Would you exclude those arms from the analysis? • Yes, if the goal for those analyses is to characterize adverse events from proposed indications at the proposed doses. • However, one might choose to combine the high-dose studies or arms in a different pool to help assess what could happen in an overdose situation.
Studies (or arms) at a higher or lower dose than proposed for marketing? • In general, exclude dose arms that are lower than the proposed dose for marketing, as these may dilute the effects seen at the higher marketed dose • However, events may occur in the lower dose studies that should not be ignored • Including low-dose and high-dose studies may help understand the dose-response relationship
AEs in all those who took the drug? • Can analyze ALL who took drug as a single cohort without a comparator group: useful for accounting for all events and estimating event rates for infrequent events • Can then be compared to external reference population rates • However, external population rates limited by the availability of event rates for a specific subset of the population that is comparable to the trial population • If the underlying disease increases the risk of a particular event, comparisons with an external reference could be biased against the study drug. • Conversely, if enrollment criteria are such that high-risk patients are excluded from trials, the on-study rates could appear to be artificially low.
Hypothesis generating studies? • What if a safety signal was detected in Phase 2 that resulted in a change in ascertainment of an AE in Phase 3 (e.g., an adjudication process, special case report form)? • Create a grouping of Phase 3 studies designed for that particular event • Advantages • Studies with consistent ascertainment analyzed together • Excludes studies that generated the hypothesis being tested
Hypothesis generating studies (cont.) • Previous addresses type I error but • sacrifices statistical power • discards data from what may be studies in a closely monitored population, which may also be at differential risk due to exposure to the compound • And it can raise all kinds of red flags (so transparency is key – do the analysis with and without those studies)
Caveats • Do not do a crude unstratified analysis that combines studies with a comparator and studies without a comparator. • Results can be very misleading. See Lièvre 2002, Chuang-Stein 2010 for further information on dangers of not stratifying.
Q2: How does the method of ascertainment impact the quality of the meta-analysis?
Ascertainment method • Can affect observed event rates, e.g., actively solicited events will have higher reporting rates than passively collected events • E.g., for drugs that cross the blood–brain barrier, use prospective tool to assess suicidal ideation and behavior (vs. post hoc adjudication)
Retrospective adjudication Even with strict criteria using previously collected data, bias could be introduced by retrospective adjudication • Important detailed clinical information may be missing • If post hoc adjudication is necessary, use an external, independent adjudication committee that • Is masked to treatment assignment AND • Adjudicates events across the entire development program
Q3: What are the advantages of using individual patient data (vs. aggregate summaries)?
Individual or aggregate-level data? For many questions get same answer with IPD as with APD For analyses that do not require patient-level data, including all relevant studies improves precision May also reduce bias that could be introduced by limiting the analysis to those where patient-level data are available However, there can be advantages to IPD Much easier to detect interactions between treatment and patient-level characteristic with IPD than with APD
Advantages of patient-level data • Allows mapping all data to a common version of MedDRA (or other) increasing consistency of terminology across trials • Generally permits creation of common variables across trials • E.g., age categories may have been defined using different category boundaries • Different threshold hemoglobin values may have been used to define ‘anemia’
More advantages of IPD Allows specification of a common set of patient-level covariates so subgroup analyses across trials can be performed Can define outcomes based on combinations of variables defining specific events but that may indicate a common mechanism, e.g., a combination of weight loss or appetite reduction
And still more advantages of IPD • Post hoc analyses of outcomes that require adjudication can sometimes be derived, as in the case of suicide event grading according to Columbia Classification Algorithm of Suicide Assessment (C-CASA criteria) • Creation of time-to-event variables (may not be available in publications) • Flexibility in defining time periods of interest for analyses, e.g., events occurring during “short-term” follow-up
Why not always use IPD? • Integration required to provide the database is labor intensive, especially if done in retrospect • Sometimes summary statistics may be the only information available for some studies of interest, e.g., • studies of a new therapeutic approach done by an academic group that does not share patient-level data, or • the drug of interest may have been included as an active control by another sponsor
Q4: should we adjust for multiple looks and/or multiple endpoints in the context of meta-analysis?
Q4: Multiple comparisons Complicated by having multiple looks over time and multiple (and an unknown number of) endpoints Safety Planning, Evaluation, and Reporting Team (SPERT) defined “Tier 1 events” as those for which a prespecified hypothesis has been defined
Tier 1 Events E.g., to rule out an effect of a certain magnitude for assessing a particular risk (a noninferiority test – as for diabetes drugs) Generally, should consider performing formal adjustment for multiple looks for Tier 1 events and for multiple endpoints for other events
Diabetes drugs Need to rule out a relative risk of 1.8 (for CV events) for conditional approval, and 1.3 for final approval Confidence level for that specific outcome may need to be adjusted for multiple looks, which can be considered separately from non-Tier 1 events because it needs to be met for the drug to move forward An event of interest: important regardless of the specific side effect profile and Analogous to a primary analysis in the efficacy setting
Multiplicity is a complicated issue in the safety context Often have low power, lack of a priori definitions, and extraneous variability Value in trying not to miss a safety signal, but remember that initial detection is not the same as proving that a given AE is definitively related to a given drug Worry about reducing false negative findings in drug safety given the known limitations of our tools
Q5: what is heterogeneity and what are sources of heterogeneity?
Heterogeneity refers to differences among studies and/or study results. Can be classified in 3 ways: clinical, methodological and statistical.
Clinical Heterogeneity Differences among trials in their Patient selection (e.g., disease conditions under investigation, eligibility criteria, patient characteristics, or geographic differences)
Clinical Heterogeneity Differences among trials in their Interventions (e.g., duration, dosing, nature of the control) Outcomes (e.g., definitions of endpoints, follow-up duration, cut-off points for scales)
Methodological Heterogeneity Differences in Study design (e.g., the mechanism of randomization). Study conduct (e.g., allocation concealment, blinding, extent and handling of withdrawals and loss to follow up, or analysis methods). Decisions about what constitutes clinical heterogeneity and methodological heterogeneity do not involve any calculation and are based on judgment.
Statistical heterogeneity Numerical variability in results, beyond expected by sampling variability May be caused by Known (or unknown) clinical and methodological differences among trials Chance
Clinical heterogeneity may not always result in statistical heterogeneity. If there is clinical heterogeneity but little variation in study results, may represent robust, generalizable treatment effects.
Beware of Q(unless you are James Bond) Cochran’s Q is a global test of heterogeneity I2 is a measure of global heterogeneity KEY POINT: They are informative, but rely on neither of these statistics Apparent lack of overall heterogeneity does not rule out a specific source of heterogeneity Conversely, large studies with clinically small variability can yield spuriously high statistical heterogeneity
Q6: is it sufficient to use fixed-effects models when combining studies or do we need to consider random-effects models?
Fixed-effect vs. random-effects • Fixed = common effect across all studies • Inference is to the studies at hand • Reasonable to expect (?) when designs and populations are similar across studies • Random-effects models: true underlying population effects differ from study to study and that the true individual study effects follow a statistical distribution • The analytic goal is then to estimate the overall mean and variance of the distribution of true study effects
More on FE vs. RE In some situations, it may not be appropriate to produce a single overall treatment-effect estimate Goal should sometimes (often) be to model and understand sources of heterogeneity
More points on FE vs. RE Risk differences more heterogeneous than odds ratios (OR) or relative risks (RR, a point that is also made in an FDA’s draft guidance for industry on noninferiority trials) Can model on OR scale then convert to RD or RR to help with clinical interpretability Constant OR implies effect size must vary for RD, so - must decide whether to estimate the baseline (control) event rate from the external data or from the data included in the actual meta-analysis (implications for variance estimation)
How to decide on FE or RE? • Do you expect a common effect or not? • Single indication, similar protocols, same data collection methods, definitions, etc., FE likely to be appropriate. • Different populations, etc., use RE but ALSO explore sources of heterogeneity • Enough data? • Sparse data, few studies, may not permit RE estimation • Small studies may get “up-weighted” with RE: are small study results systematically different?
Once you go Bayesian, you’ll never go back Specify a prior probability distribution Today’s posterior becomes tomorrow’s prior Flexibility to deal with heterogeneity through complex modeling Available under both FE and RE (use Deviance Information Criterion to decide?) Bayesian inferences are based on the full ‘exact’ posterior distributions (so useful for small numbers of events)