Southern California Clinical and Translational Science Institute:

Biostatistics Lunch & Learn SeriesStudy designs and data collection strategies: scientific and logistical considerations in selecting the design to address your research question. Southern California Clinical and Translational Science Institute: Research Development and Team ScienceBiostatistics, Epidemiology and Research Design (BERD) November 12, 2018

Biostatistics, Epidemiology and Research Design (BERD) Faculty: Wendy Mack, BERD Director Christianne Lane, USC Melissa Wilson, USC Cheryl Vigen, USC Ji HoonRyoo, CHLA Staff: Choo Phei Wei, CHLA Caron Park and Melissa Mert, USC

Objectives • Part 1: Formulating a sound research question and study hypotheses: hypothesis testing • Today: Study designs and data collection strategies: scientific and logistical considerations in selecting the design to address your research question • Part 3: Sample size and study power: Why do I need so many subjects? What will my biostatistician need to know and how can I get that information? • Part 3: Statistical analysis: What statistical methods are appropriate for my study design and data collected?

Reminder: Defining the Research Question and Hypothesis Testing • What are the components of a good research question? • How do I translate my research question to a statistical question (and hypothesis) that I can test? • What is statistical hypothesis testing? What does a p-value mean? • How does the research question relate to study design? What alternative designs might be used to address my research question? (Today)

PICOT Criteria to Develop the Research Question • P PopulationWhat specific population will you test the intervention in? • I Intervention (or Exposure)What is the intervention/exposure to be investigated?Intervention (clinical trial); Exposure (observational study) • C Comparison Group What is the main comparator to judge the effect of the intervention? • O OutcomeWhat will you measure, improve, affect? • T TimeOver what time period will outcome be assessed?

Spectrum of Study Designs From Center for Evidence-Based Medicine (CEBM), University of Oxford http://www.cebm.net/study-designs/

Decriptive vs. Analytic Study • Descriptive Study: Research question involving “PO” (Population, Outcome) • What is the survival rate following hip fracture in community-dwelling postmenopausal women? (P: Community-dwelling postmenopausal women with hip fracture; O: Survival) • What is the rate of accumulation of Alzheimer-like pathology in an experimental mouse model?(P: Mouse model; O: Alzheimer-like pathology)

Decriptive vs. Analytic Study • Analytic Study: Research question adds I/E and C (Intervention/Exposure, Comparator Group). Questions of association and/or effect (I/E on O).Comparator group: Not “exposed”, does not get “intervention” • Does the survival rate following hip fracture differ in postmenopausal women who live with others vs alone? (E:live with others; C:live alone) • Does Alzheimer-like pathology differ in mice fed a high-fat versus standard diet? (I:high-fat diet; C:standard diet)

Observational Study Defined • Clinicaltrials.gov: A clinical study in which participants identified as belonging to study groups are assessed for biomedical or health outcomes. Participants may receive diagnostic, therapeutic, or other types of interventions, but the investigator does not assign participants to specific interventions (as in a clinical trial ). Exposures (interventions) are self-selected. • Associations between exposures/interventions and outcomes may be biased (confounded) by characteristics that differ between those that choose exposure vs. no exposure.

Cohort Study Select persons free of outcome, including persons with and without exposure. Follow forward in time to determine outcome.Does the proportion of persons with outcome (or rates of outcome) differ in persons with versus without the exposure?Exposed (E+): Postmenopausal women with recent hip fracture, living with othersNot exposed (E-, C): Postmenopausal women with recent hip fracture, living alone Outcome: Did they die or survive? (Note: Define some period of time you will follow for outcomes; e.g., in the year following the hip fracture)Compare: The survival rates in exposed (live with others) vs not exposed (live alone)

Cohort Study • Advantages:Efficient (i.e., less sample size required) for “rare” exposures.Temporal sequence between “exposure” and “outcome” is firmly established (by design).Can study multiple outcomes (e.g., could also compare survival, subsequent hospitalizations, etc. in our example)Less issue with subject selection biases (as we don’t know outcomes when selecting subjects) • Disadvantages:Time and cost (often long follow-up of large numbers of persons)Inefficient for rare outcomes (need a large sample to obtain sufficient outcomes) or long latency period (between exposure and outcome)Losses to follow-up may bias

Case-Control Study Select persons with (cases) and without (controls) outcome (O); determine their past exposure (E; i.e., BEFORE the outcome occurred).Does the proportion of persons who were exposed differ in cases and controls? Cases (O+): Postmenopausal women who died following hip fractureControls (O-, C): Postmenopausal women who did not die following hip fracture Exposure: Did they live with others after their hip fracture?Compare: The proportion of women who lived with others (vs alone) in cases vs controls.

Case-Control Study • Advantages:Efficient (i.e., less sample size required) for “rare” diseases/outcomes.Efficient when there is a long duration between “exposure” and “outcome” (don’t have to wait around to see if “exposed” persons develop disease or not.Lower time and costs. • Disadvantages:Have many possible “biases” (e.g., differential subject selection by case/control; differential recall of “exposure” by case/control). Have to be very careful in design stage.Inefficient for rare exposures.

Cross-sectional Analytic Study Find population, measure outcome and exposure at the same timeAre outcomes and exposure associated? Example: Assessing a possible blood biomarker for a clinical condition. Construct a sample of persons with and without the condition (O) and measure the blood biomarker (E).Is the average level of the biomarker different in persons with and without the condition?Does the percentage of persons above a biomarker threshold differ in persons with and without the condition?

Cross-sectional Analytic Study • Advantages:Relatively quick and easyCan study rare diseases/outcomes or exposures (assuming you have access to a sufficient number)Is there something going on (any evidence of association) that can be followed up in more stringent design?Can often use existing data (medical records, etc.) • Disadvantages:No clue on temporal sequence between exposure and outcome (were any biomarker differences apparent before the disease developed? In what timeframe?)

Cross-sectional Analytic Study • Cohort Study Alternative:Obtain blood levels of biomarker in persons without the condition. Follow to xx (time) to ascertain development of condition. Compare disease rates by biomarker level. • Case-Control Alternative:Select samples of persons with (cases) and without (controls) the condition. Access stored tissue samples (before condition developed). Compare pre-disease biomarker in cases and controls.

Clinical Trial Defined Clinicaltrials.gov: A clinical study in which participants are assigned to receive one or more interventions (or no intervention) so that researchers can evaluate the effects of the interventions on biomedical or health-related outcomes. The assignments are determined by the study protocol. Participants may receive diagnostic, therapeutic, or other types of interventions. A cohort study where persons are “assigned” to exposures (interventions) and followed for ascertainment of outcomes.Clinical trials are not feasible when assignment to an exposure/intervention is not ethical.

Clinical Trial Sample: Postmenopausal women with recent hip fracture. Intervention (exposure): Social support intervention vs. usual care Comparator (not exposed): Usual care.Assign women to Intervention or Comparator. Outcome: Mortality in xx (time) following hip fracture. Compare mortality (survival) rates in the intervention and comparator groups.

Clinical Trial Sample: Aged AD model mice Intervention: High fat diet Comparator (not exposed): Standard chowAssign mice to Intervention or Comparator Outcome: Alzheimer-like pathology after xx (time) Compare pathology variables in the intervention and comparator groups

Assignment to Interventions • Randomization Greatly reduces the possibility of systematic differences between study groups that might taint (confound) your conclusions regarding the efficacy of your experimental intervention. Key advantage over observational studies. • Blinded vs open-label (do participants know their intervention)Knowing the intervention may affect how a participant responds, outcomes.

Trial Designs • Parallel group: Each participant is assigned to one (and only one) of the trial interventions. Standard approach for most clinical trials • Crossover: Each participant receives both the experimental and comparator interventions, usually in randomized order, with a washout period between interventionsPerfect matching – each participant acts as their own control – requires fewer subjectsDisadvantages: Greater likelihood of dropout; must be a stable disease under study; only appropriate for interventions that wash-out and have short-term (not permanent) outcomes

Trial Designs • Factorial: Participants randomized to combinations of > 1 experimental intervention.Efficient: Two trials for the price/time of one • Disadvantage: Interaction between treatments, complexity of treatment (may reduce adherence) • E.g., Physician’s Health Study

Trial Designs • Cluster randomized: The unit of randomization is a group of persons, rather than a single individual.Example: Randomize hospitals to a new intervention vs. standard practice to prevent hospital-acquired pneumonia • Common in testing complex interventions in primary care, health promotion, community/public health settings • Advantage: Avoids contamination of intervention effects with cluster-related effects (e.g., private more affluent hospitals have more resources for programs to prevent hospital-acquired pneumonia) • Disadvantages: (1) Requires more subjects to account for the correlation of the outcomes between persons in the same cluster. (2) Blinding is usually not possible

Data Collection and Management • A well-developed plan for collecting and managing your study data should be part of your study design. Engage your biostatistician sooner rather than TOO LATE! • Your data collection plan stems from the research question • What kind of variables are needed to test this research question? • What kind of analyses would you expect? • What kind of table/figure would you expect? • What kind of conclusions could you draw?

Translate Research Question to Data Collection Needs • What specific data will I need to test my hypotheses and answer my research question? This includes specific data needed to: • (1) describe the study population (P; sample descriptors) • (2) detail the process for recruitment, enrollment, and retention of subjects (numbers screened, reasons for screen failure, numbers enrolled, reasons why not enrolled, numbers completing, reasons for not completing) • (3) define exposures (or interventions) (E/I, C) • (4) define outcomes (O) • (5) other variables needed (e.g., to control for possible confounding biases, define subgroups for analysis)

Translate Research Question to Data Collection Needs: Animal Studies • Animal loss and reasons • Genetic relationships (e.g., littermates) • Generation • Intervention: dates of delivery, doses

Data Collection Plan • Define sources for collection of each data element: Is the data valid? Is it measured reliably? If it is existing data, is it readily accessible and available in all subjects? How much will it cost, how much time will it take to collect this data? • Development and testing of case report forms (to record data in a uniform and complete manner). • Software resources for entering and managing your study data. • Data management throughout your study (DON’T WAIT TO THE END OF YOUR STUDY TO FIND THESE PROBLEMS): missing data, logical errors, range checks. • Involving your biostatistician collaborator in reviewing your data management plan (before and during data collection).

Data Collection: Operationalize Study Variables • Define a construct in a manner which can be measured quantitatively • Measure variables with as much precision as possible • How are exposures and outcomes reported in the literature? • Are there standard definitions? • How will you deal with missing data? • Avoid missing data; record reasons for missing data in your database (for key variables: outcomes and exposures)

Study Database • As much as possible, use standardized definitions, measurement and coding of study variables. Data standardization tools include:Data harmonization efforts: (NIH toolbox, CDISC: Clinical Data Interchange Standards Consortium, IMPReSS: International Mouse Phenotyping Resource of Standardized Screens)Standard data collection forms (demographics, adverse events, condition-based)Coding standards (ICD diagnostic codes, CPT medication codes) • Efficiency (number of variables, simple coding): many studies end up with bad data simply because investigators were overly ambitious, attempting to collect many more variables than needed • Ultimate usability: can the data be imported, summarized, analyzed as is? • Review with data analyst: prior to collection, quick checks ongoing

General Principles for Database Design and Management • Talk to a knowledgeable statistician before designing your database. • Only have ONE database. • Use a “real” database system. • Always make a copy of the database before updating. • Backup. Then Backup again. • A simple structure is best. • Make data easily importable to statistical software. • Make sure your data is auditable (What changes were made? By whom? When? Why?)

REDCap (projectredcap.org)

REDCap (projectredcap.org) • Research Electronic Data Capture (REDCAP) • Database system focused on streamlining the process from the clinician and statistician’s viewpoint. • Captures data from surveys & other clinical and basic study designs • Allows for quick development of a database. • Web based data entry • Can import data from excel (e.g. lab data) • Easily sets up longitudinal data collection (e.g., repeated measures of outcome) • Creates case report forms • Puts researchers in charge of their database development and management. • Offers scheduling support • Provides audit trails for tracking data • Allows for limited data reports

USC REDCap supported by SC CTSI

SC CTSI BERD Data Cleaning Guide (handout: USE IT!)

Data Cleaning Guide

Please don’t give this to your statistician

A better approach

Next workshop on Jan 14: Sample size and study power: Why do I need so many subjects? What will my biostatistician need to know and how can I get that information? SC CTSI | www.sc-ctsi.org

CTSI Biostatistics (BERD): a resource for you at USC • Biostatisticians to help you with study design, sample size estimation, data management plan, statistical analyses, and summarizations of your methods and results • Recharge center • To request a consult:https://sc-ctsi.org/bbr-consult

Southern California Clinical and Translational Science Institute:

Southern California Clinical and Translational Science Institute:

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7