1 / 21

Topics at the Interface of Privacy and Genomics

Topics at the Interface of Privacy and Genomics. Anthony Philippakis, MD PhD Chief Data Officer, Broad Institute June 4 th , 2018. Who am I?. Chief Data Officer of the Broad Institute Lead the Data Sciences Platform Previously studied pure math, but that’s ancient history

wyatt
Download Presentation

Topics at the Interface of Privacy and Genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topics at the Interface of Privacy and Genomics Anthony Philippakis, MD PhD Chief Data Officer, Broad Institute June 4th, 2018

  2. Who am I? • Chief Data Officer of the Broad Institute • Lead the Data Sciences Platform • Previously studied pure math, but that’s ancient history • Trained as a cardiologist at BWH • Care for patients with rare, genetic CV diseases • Venture Partner at GV (Venture Capital) • Invest at intersection of tech and life sciences Disclaimer: I am not a crypto person!!!

  3. The Challenge of Scalability in Genomics Globally, genomic data doubles every 8 months

  4. Inverting the Model of Genomic Data Sharing Opportunity: Bring researchers to the data Traditional Approach: Bring data to researchers Data Public Cloud • Problems • Data sharing = data copying • Security (data handoffs) • Huge infrastructure needed • Siloed compute • Advantages • Cost • Threat detection and auditing • Increased Accessibility • Shared & elastic compute

  5. Opportunities in Security & Compliance Areas technology can have a big impact: Management of data use Multiparty Computation Differential Privacy

  6. Our Current Protocol for Data Access Data Access Committee Data Depositors Data Use Limitations Project Request Forms Data Requestors No! This data is available for cancer research in a non-profit setting. I am studying Breast cancer at a company.

  7. Our Current Protocol for Data Access Data Access Committee Data Depositors Data Use Limitations Project Request Forms Data Requestors Yes! This data is available for cancer research in a non-profit setting. I am studying Breast cancer at a non-profit.

  8. Human review of data access does not scale Data Access Committee Data Depositors Data Use Limitations Data Access Request Data Requestors Scales Poorly!! O(N2) 50,167 Submitted 826 Number of studies in dbGaP 5,344 Number of PIs requesting data 46 Number of PI countries 1500+ Number of publications resulting from secondary data use 34,16 Approved As of July 1, 2017 dbGaP at PRIM&R 2017

  9. Problem: Data Use is not Coded! Data Use Restrictions: What are you doing with the data? “The donor wants her data used only for non-commercial cancer research” Permissions: Who are you? “Only consortium members can READ this data until it is published.” Main Question: Can Data Use Restrictions be made machine-readable?

  10. DUOS- Broad Data Use Oversight system What is DUOS? • Interfaces to transform data use restrictions and data access requests to machine-readable code (ADA-M & Consent Codes) • A matching algorithm that checks if data access requests are compatible with data use restrictions • Interfaces for the Data Access Committee to adjudicate whether structuring and matching has been done appropriately https://duos.broadinstitute.org/#/home

  11. Validation of DUOS Claim: Data Use Can Be Structured Test:Run a trial! Data Access Committee Pearl O’Rourke (Partners) Laura Rodriguez (NIH) John Wilbanks (Sage) Stacey Donnelly (Broad) Anthony Philippakis (Broad) Diseases: Diabetes research only, Breast cancer research only, etc Commercial Use: allowed/not allowed. Special populations: Ethnicities, gender, pediatric, etc. Future use for Methods Development, Aggregate Statistics, Controls Review of ~150 Data Use Limitations Letters at Broad demonstrated that ~90% can be structured with the following ontologies We have formed a DAC to compare automated review of access to traditional mode.

  12. Validation of DUOS Initial results are very promising! >90% of data use restrictions were approved in structured form by the DAC

  13. Opportunities in Security & Compliance Areas technology can have a big impact: Management of data use Multiparty Computation Differential Privacy

  14. Multiparty Computation in Genomics Areas where SMC could have a big impact in genomics: The meta-analysis problem Cohort 1 Cohort 2 • Large cohorts have been assembled over many years as part of clinical trials and epidemiology research. • Many researchers (especially industry) are reluctant to share the whole cohort with another group. • However, it is often mutually advantageous to do a focused meta-analysis (e.g., enrichment of a variant)

  15. Multiparty Computation in Genomics Areas where SMC could have a big impact in genomics: Geographic restrictions on data storage • Many countries are passing laws that data from human subjects needs to physically be stored in that country. • Clearly, we want to be able to cross-analyze large genetic cohorts from different countries. • Most would agree that it storing encrypted data is acceptable, however (i.e., secret-sharing paradigm).

  16. Secure Data Exchange Secure Multiparty computation • Challenges to software-based implementations • Requires learning new, specialized programming languages • Increased computational overhead (a big deal, given size of genomic datasets) • With N parties, need to keep N copies of the data

  17. Secure Data Exchange Secure Multiparty computation • Idea: Hardware-based approach • My group is in early days of exploring hardware-based SMC with Intel • Idea of building a “Data Switzerland” for life sciences

  18. Opportunities in Security & Compliance Areas technology can have a big impact: Management of data use Multiparty Computation Differential Privacy

  19. Differential Privacy O(10^7) O(10^1 – 10^4) Individuals O(10^6) But growing fast Genotypes Phenotypes • Potential for Differential Privacy in human genetics • Huge interest in correlating genotypes and phenotypes to discover genetic basis of disease. • Real appetite for the idea of a trusted and trustworthy database that researchers can query against without risk of re-identifying participants

  20. Differential Privacy O(10^7) O(10^1 – 10^4) Individuals O(10^6) But growing fast Genotypes Phenotypes • Why hasn’t it happened??? (my $0.02) • I’m told that allowing arbitrary computations requires adding a LOT of noise... • How much privacy is ok to leak? How do you doll out the privacy budget to researchers? • Is there a robust, production-grade system that is ready to use (and at this scale)?

  21. Closing thoughts • The Broad Data Sciences Platform is a team of nearly 150 people that focus on making robust software products. • We are organized more like a tech company than an academic research group. • We are heavily involved in applied security efforts as part of large, national sequencing initiatives. • I would love for us to be more involved in innovation in things at intersection of life sciences and data sciences. • Please email me if you think you might want to collaborate! (aphilipp@broadinstitute.org)

More Related