230 likes | 397 Views
BioMedical Data Everywhere: Recent Developments in Data Management and Policy at NIH. Jerry Sheehan Assistant Director for Policy Development National Library of Medicine - National Institutes of Health sheehanjr@nlm.nih.gov CASC Fall Meeting September 8, 2011, Arlington, VA.
E N D
BioMedical Data Everywhere:Recent Developments in Data Management and Policy at NIH Jerry Sheehan Assistant Director for Policy Development National Library of Medicine - National Institutes of Health sheehanjr@nlm.nih.gov CASC Fall Meeting September 8, 2011, Arlington, VA
National Library of Medicine: More than a Library • World’s largest medical library • >12 million physical artifacts (books, journals, technical reports, photographs) • >22,000 print and electronic serial subscriptions • Historical collection of rare and old medical works • Intramural research laboratories • Lister Hill Nat’l Center for Biomedical Comms. • National Center for Biotechnology Information • Extramural research and training • ~ 100 research projects per year, $36M • 18 funded research training sites, 250 trainees • Health data standards and vocabularies • Information resources and services • Publications and metadata • Genomic, chemical, clinical trial data • Environmental health and toxicology data • Disaster information services & systems • Medical images, analytical tools www.nlm.nih.gov
NLM Information Resources • Publications • Citations/metadata (PubMed) • Full-text articles (PubMed Central) • Data • Genomic (GenBank, dbGaP, GEO, GeneTest) • Clinical trials (ClinicalTrials.gov) • Drug (RxNorm, Daily Med, Pillbox) • Chemical (PubChem) • Environmental & toxicology • Images • Visible Human • Spine x-rays, cervical images • Historical photos • Synthesized information • Evidence summaries • Guidelines • Consumer health information (MedlinePlus) • Vocabulary resources • Unified Medical Language System • Standard clinical terms (SNOMED) • Health data interchange • Biomedical terms • Software & Tools • APIs • Natural language processing • Image analysis • Mobile apps
PubMed/Medline: Journal Citations http://www.pubmed.gov CONTENT • 21+ million citations and abstracts • 700,000 added per year • 50%+ link to full text • 5500+ journals • 120-130 added per year USAGE (2010) • 120+ million visitors • 2 million searches per day • 2.4 billion page views • Google, Bing, others • Content used by outside developers • Mobile version Growth in Medline, the fully indexed subset of PubMed which accounts for approximately 90% of all PubMed citations. Original graph: http://www.nlm.nih.gov/bsd/stats/cit_added.html QUALITY
PubMed Central: Full-Text Articles www.pubmedcentral.gov • + 2.2 million full-text articles, • 26 thousand more added per month • Typical weekday usage: • 420,000 different users • 740,000 articles retrieved • Annually • ~ 99% of articles downloaded at least once • 28% downloaded more than 100 times
ClincalTrials.gov http://clinicaltrials.gov/ • Registry and Results Database • Federally and privately supported trials • Conducted in the United States and 170+ countries • Mandatory submission for some trials • Current content • 100,000+ registered trials • 330 new registrations/week • 3,000+ results (summary) of approved products • Outcome measures • Statistical analyses • Adverse events • Usage (2010) • 28,000 visitors per day Studies Registered at ClinicalTrials.gov since May 1, 2005
Repository for NIH-funded GWA studies • As of Aug 2011: • 161 studies • 2045 data sets • 2727 documents • 5890 Analyses • 128190 Variables
Database of biological activities of small molecules • Repository for data from NIH Molecular Libraries program • As of August, 2011: • 85 million deposited substance records • Representing more than 30 million chemically unique compounds • 500 thousand bioassay records • Representing more than 130 million experimental bioactivity results
> 170 tutorials > 75 anatomy videos > 125 surgery videos Almost 900 In English & Spanish ~ 40,000 links ~1,000 drugs 100 supplements 15-20 stories added daily Since 2006 English & bilingual issues >1,200 links to ClinicalTrials.gov >40 languages >250 topics >3,300 links Over 100 directories of doctors, hospitals, clinics & libraries ~ 3,500 articles > 2,000 images
MedlinePlus: Trusted Health Information www.medlineplus.gov 2.3M 298K ME 179K 120K 270K NH 128K 1.5M 240K VT 2.2M MA 906K 5.4M 1.2M 307K RI 208K 174K 1.8M 834K CT 109K 3.2M 4.1M NJ 462K 403K 2.4M 117K DE 436K 3.5M 1M 1.7M MD 210K 507K 10M 1.5M 25.8M 1.6M 651K 656K 1.9M 1.3M 306K 711K 623K 1.4M 343K 3.1M 296K 725K 322K 6.1M 765K 4.2M Map of 100+ Million visits in the United States in 2010 MEDLINEPLUS USAGE 150 million visitors in 2010 420,000 visitors per day. MEDLINEPLUS MOBILE Streamlines content specifically tailored for users particular type of cell phone or tablet. MEDLINEPLUS CONNECT Links from diagnosis, drug, and laboratory information in EHR/PHR to relevant material in MedlinePlus,
Genetic test means an analysis of human DNA, RNA, chromosomes, proteins, or metabolites, if the analysis detects genotypes, mutations, or chromosomal changes. Genetic test does not include an analysis of proteins or metabolites that is directly related to a manifested disease, disorder, or pathological condition.
NLM is Not Alone:Growing interest in data at NIH “[High throughput technologies] provide us with the opportunity to ask questions that have the word ‘ALL’ in them. What are ALL the transcripts in a cell? What are ALL the protein interactions? . . Those kinds of questions are now approachable, especially if we do the right job of making really powerful databases publicly accessible to all those who need them and empower investigators in small labs as well as big labs to plunge into that kind of mindset.” - Francis S. Collins, MD, PhD [Director, NIH]
http://report.nih.gov/biennialreport/ http://report.nih.gov/UploadDocs/Biomed_Info_Resources_FY08_09.pdf
http://report.nih.gov/UploadDocs/Biomed_Info_Resources_FY08_09.pdfhttp://report.nih.gov/UploadDocs/Biomed_Info_Resources_FY08_09.pdf
Select NIH Data Initiatives • NDAR – National Database for Autism Research (NIMH) • Repository for NIH-funded autism studies and centers of excellence • Genomic, phenotypic, imaging data and associated information • ADNI – Alzheimer’s Disease Neuroimaging Initiative (NIA) • Multisite study, public-private partership, validated biomarkers • Centralized FMRI and PET data, linked clinical database • NIDDK Data Repository • Archival datasets from NIDDK-funded studies (diabetes, digestive, kidney) • 29 datasets to-date; more than 100 access requests in 2009-10 • BTRIS – Biomedical Translational Research Information System (CC) • Repository for data from NIH intramural clinical studies • Allow aggregation and analysis across multiple Institute studies
Data Sharing Policies NIH Public Access Policy (journal articles) NIH GWAS Policy dbGaP NIH Sequence Data Sharing Policy GenBank GEO Clinical Trials Info Clinical Trials.gov • IC or domain-specific policies • Autism Research – National Database for Autism Research • NIAAA Genetics of Alzheimer’s • Alzheimer’s Disease Neuroimaging Initiative (LONI Repository) • Others. . . NIH Data Sharing Policy (data sharing plan)
Recent Guidance for NIH Data Sharing Plans http://grants.nih.gov/grants/sharing_key_elements_data_sharing_plan.pdf