760 likes | 1.02k Views
Vanderbilt’s DNA Databank : BioVU. Personalized Medicine. Integration of genomic information into clinical decision making Personalized disease treatment and also preventative therapies. What is BioVU ?.
E N D
Personalized Medicine • Integration of genomic information into clinical decision making • Personalized disease treatment and also preventative therapies
What is BioVU? • The move towards personalized medicine requires very large sample sets for discovery and validation • BioVU: biobank intended to support a broad view of biology and enable personalized medicine • Contains de-identified DNA extracted from leftover blood after clinically-indicated testing of Vanderbilt patients who have not opted out • Linked to Synthetic Derivative: de-identified EMR • Current sample number: 135,765 • 120,705 adult samples • 15,099 pediatric samples
Extract DNA John Doe A7CCF99DE65732…. eligible A7CCF99DE5732…. One way hash scrubbed A7CCF99DE65732…. John Doe The “synthetic derivative” (SD): can be updated
The Synthetic Derivative • A Derivative of the EMR - information content reduced by ‘scrubbing’ identifiers • Systematically shifted event dates • Contains ~1.9 million records • ~1 million with detailed longitudinal data • averaging 100,000 bytes in size • an average of 27 codes per record • Records updated over time and are current through 4/30/11
Synthetic Derivative Data Types • Narratives, such as: • Clinical Notes • Discharge Summaries • History and Physicals • Problem Lists • Surgical Reports • Progress Notes • Letters • Diagnostic Codes, Procedural Codes • Forms (intake, assessment) • Reports (pathology, ECGs, echocardiograms) • Clinical Communications • Lab Values and Vital Signs • Medication Orders • TraceMaster (ECGs)
Synthetic Derivative vs. BioVU A7CDE6532…. + scrubbed scrubbed A7CDE6532 …. A7CDE6532 …. Synthetic Derivative BioVU ~1.9 million ~135,000
Sample accrual Current accrual as of 2-13-2012: 135,765 samples 15,099 pediatric
BioVU Demographics AGE GENDER RACE
BioVU Sample Management RTS SmaRTStore
Validation in BioVU • Sample handling algorithms • Gender match • 1/384 gender mismatches • Ancestry • Characterize sample ancestry, assess usefulness of ‘race’ as defined in EMR • Provide a panel of ancestry informative markers that define ancestry • No significant difference between the concordance of self-report or observer-report with genetic ancestry • Demonstration project – American Journal of Human Genetics, 2010 • Can known associations between genetic variants and common diseases be identified in the EMR?
The “demonstration project” Genotype “high-value” SNPs in the first 8,000 samples accrued. including SNPs associated by replicated genome-wide experiments with common diseases & traits Atrial fibrillation Crohn’s disease Multiple Sclerosis Rheumatoid arthritis Type II Diabetes Develop Natural Language Processing methods to identify cases and controls Are genotype-phenotype relations replicated?
First results gene / disease marker region rs2200733 Chr. 4q25 Atrial fibrillation rs10033464 Chr. 4q25 rs11805303 IL23R rs17234657 Chr. 5 Crohn's disease rs1000113 Chr. 5 rs17221417 NOD2 rs2542151 PTPN22 rs3135388 DRB1*1501 Multiple sclerosis rs2104286 IL2RA rs6897932 IL7RA rs6457617 Chr. 6 Rheumatoid arthritis rs6679677 RSBN1 rs2476601 PTPN22 rs4506565 TCF7L2 rs12255372 TCF7L2 rs12243326 TCF7L2 rs10811661 CDKN2B Type 2 diabetes rs8050136 FTO rs5219 KCNJ11 rs5215 KCNJ11 rs4402960 IGF2BP2 0.5 1.0 5.0 2.0 Odds Ratio
First results gene / disease marker region rs2200733 Chr. 4q25 Atrial fibrillation rs10033464 Chr. 4q25 rs11805303 IL23R rs17234657 Chr. 5 Crohn's disease rs1000113 Chr. 5 rs17221417 NOD2 rs2542151 PTPN22 rs3135388 DRB1*1501 Multiple sclerosis rs2104286 IL2RA rs6897932 IL7RA rs6457617 Chr. 6 Rheumatoid arthritis rs6679677 RSBN1 rs2476601 PTPN22 rs4506565 TCF7L2 rs12255372 TCF7L2 rs12243326 TCF7L2 rs10811661 CDKN2B Type 2 diabetes rs8050136 FTO rs5219 KCNJ11 rs5215 KCNJ11 rs4402960 IGF2BP2 0.5 1.0 5.0 2.0 Odds Ratio
Types of projects • Discovery or validation of genotype-phenotype relations for disease susceptibility or drug responses • Discovery of new disease/susceptibility genes resequence in patients (obesity, Cushing's, susceptibility to infection, insomnia, pre-term birth) • Access samples without disease X, or “normals” of specified ancestry, or old normals • Phenome-wide association study (PheWAS): in development
Examples of ICD-9 codes for rare diseases
Not included in SD searches: • Bone marrow transplant • SCID • Flagged Compromised samples: • Transfusion within 2 weeks of blood draw • Leukemia • Myeloma • Lymphoma • Pre-leukemic states