240 likes | 444 Views
K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory, Birgham & Women’s Hospital and Harvard Medical School. HIPAA and its Implications on Epidemiological Research Using Large Databases. 1. Brief outline of this presentation.
E N D
K. Arnold Chan, MD, ScD Harvard School of Public Health Channing Laboratory, Birgham & Women’s Hospital and Harvard Medical School HIPAA and its Implications on Epidemiological Research Using Large Databases 1
Brief outline of this presentation • Using large linked automated data for public health research • Data development processes to ensure HIPAA-compliance • Examples • Some thoughts
Two types of data for public health research • Primary data • Prospectively collected • Well-designed data collection tool • Informed consent • Secondary data • Data originally collected for other purposes • May be proprietary • Privacy and confidentiality (particularly important if no prior authorization) • Different data systems
Large linked healthcare databases • Health insurance claims data • Medicaid • Medicare • Managed Care Organizations (MCO) • Automated medical records • Hospital / Clinic IT systems • Availability of written records • Need to contact patients / individuals ?
Public health research within MCOs • Harvard Community Health Plan (subsequently became Harvard Pilgrim HealthCare) • Kaiser Permanente (several states) • Group Health Cooperative (Seattle area) • Others • HMO Research Network • 10+ MCOs across the U.S.
Public health research within MCOs • Different types of MCOs • Group model • Staff model • Different relationship with hospitals • Implications on data access • MCOs with research programs • Separate research departments • Full-time investigators and support staff
Data elements in the MCO data • Demographic information • Membership • Start date, termination date, benefit plan, ... • Office visits • Type of visit, diagnosis(es), special procedures • Special examinations • Radiology, Laboratory examinations • Hospitalizations • Drug dispensings • Linkable by a unique ID
HIPAA and Research with Databases • Authorization from individual research subjects not feasible • Individual authorization may be waived by Institutional Review Board or Privacy Board • Minimal Risk • Data reported in aggregate fashion • No single-case report • “Minimum necessary” principle • De-identification
HIPAA and Research with Databases • Single MCO studies • Investigators and research staff are MCO employees • Multiple-MCO studies • May involve transferral of data across MCOs or to a Data Center • Other types of studies not covered in this presentation • e.g. Generate a de-identified dataset for public or commercial use
HIPAA and data development • Do not move individual level data unless absolutely necessary • Generate summary tables at each study site • Combine the tables for final report • Smalley et al. Contraindicated use of cisapride: the impact of an FDA regulatory action. JAMA 2000; 284: 3036-9.
HIPAA and data development • Randomly generated Study ID to replace True ID • Crosswalk between the two stored at secured location • Destroy the crosswalk after successful linkage of data and quality check • Implications for storage and back-up
HIPAA and data development • Roll-up / transform variables • Age --> Age groups • National Drug Code --> Drug or Group of drugs • ICD-9 diagnosis code --> Disease e.g. A man born on Dec 10, 1934 with diagnosis code xxx.yy received durg 55555-333-22 • 65-70 y/o m with Heart Failure received Digoxin
HIPAA and data development • Preserve temporal sequence of events but disguise the real dates • e.g. Drug use during pregnancy study • 29 year-old received 55555-333-22 on Nov 25, 1999 and delivered a baby on Dec 10, 1999 --> • 26-30 year-old mother delivered in 1999, baby exposed to amoxicillin at -16 days
HIPAA and data development • Only extract information relevant to the study • e.g. A study of osteoporosis does not require information on subjects' mental health status • Co-morbid conditions may be relevant • Use proxy measures to describe level of comorbidity • Charlson's Index (based on concomitant diagnoses) • Chronic Disease Score (based on co-medications)
HIPAA and data development • Geocoding • Describe social-economic status of study subjects based on census tract data • Send out (Study ID, address) to a geocoding firm • (Study ID, X1, X2, X3) returned • X1 : education level • X2 : income level • X3 : race/ethnicity information
An example Finkelstein et al. Decreasing Antibiotic Use Among US Children: The Impact of Changing Diagnosis Patterns.Pediatrics 2003; 112: 620-7. • Data elements involved • Date of birth, gender • Membership • Drug dispensings • Diagnoses in close proximity to antibiotics dispensings • Data from nine MCOs
Finkelstein et al. Pediatric antibiotics use study • Data development at each MCO • Extract antibiotics use information • Extract diagnosis of interest (infections) • Use date of birth, gender, and membership data to calculate person-time of interest • Refined, aggregate data forwarded to the Data Center • Rate of antibiotics use = # of antibiotics use / 1,000 person-years for each age-gender group
HIPAA and data development • Individual identification is needed for certain types of research • Obtain medical records • Contact patient to conduct interview and/or request specimen • Linkage with external data • Cancer registry • National Death Index
HIPAA and data development • The process • Data extraction, transformation, reduction, and de-identification carried out at each MCO • Governed by State laws and local HIPAA-compliant Standard Operating Procedures • Principle of Limited Dataset / Minimum necessary • The goal • Highly processed and de-identified data available for concatenation across study sites and complex analyses
k-anonymity and large datasets • The goal • A de-identified dataset at a certain level of individual anonymity A 43 year-old man with hypertension, diabetes, and anxiety, taking atenolol, rosiglitazone, and lorazepam vs. A man 40-45 taking a beta-blocker and a thiazolidenedione
HIPAA, Data Storage and Access • Implications on Data Backup Plans • Data need to be destroyed after the report is published • Data only used to support pre-defined analyses • Ancillary analysis are possible after IRB review and approval
Epidemiology studies using large databases • In the old days ... • Give me all the data, do what I say ... • What if the investigator / reviewer want to do THIS analysis ? • Use existing datasets to test new hypothesis • Good research practice • Define necessary data elements according to research protocol • Pre-defined analytic plan
Epidemiology studies using large databases • Keys to protection of human subjects • Competent, responsible investigators and staff • IRB review and oversight • Data development guidelines • e.g. Good Epidemiology Practice • Information technology • Some reasonable rules/guidelines are better than no guideline