Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use

Strategic Health IT Advanced Research Projects (SHARP)Area 4: Secondary Use Dr. Friedman on-site visit, Mayo Clinic 3 September 2010

SHARP: Area 4: Secondary Use of EHR Data • 14 academic and industry partners • Develop tools and resources that influence and extend secondary uses of clinical data • Cross-integrated suite of project and products • Clinical Data Normalization • Natural Language Processing (NLP) • Phenotyping (cohorts and eligibility) • Common pipeline tooling (UIMA) and scaling • Data Quality (metrics, missing value management) • Evaluation Framework (population networks) © 2009 Mayo Clinic 2

Collaborations • Agilex Technologies • CDISC (Clinical Data Interchange Standards Consortium) • Centerphase Solutions • Deloitte • Group Health, Seattle • IBM Watson Research Labs • University of Utah • Harvard Univ. & i2b2 • Intermountain Healthcare • Mayo Clinic • Minnesota HIE (MNHIE) • MIT and i2b2 • SUNY and i2b2 • University of Pittsburgh • University of Colorado

Themes & Projects

Major Achievements • Foster social connections across projects • Recognition by team members that not all problems must be solved within their team • NLP and phenotypes • Phenotypes and CEM normalization • Shared responsibility for overlapping dependencies

The bookends - Projects 1&6Data Normalization & Evaluation Christopher G. Chute Stan Huff (Peter Haug)

Overview • Build generalizable data normalization pipeline • Establish a globally available resource for health terminologies and value sets • Establish and expand modular library of normalization algorithms • Iteratively test normalization pipelines, including NLP where appropriate, against normalized forms, and tabulate discordance. • Use cohort identification algorithms in both EMR data and EDW data. (normalize against CEMs)

Progress • Designation of Clinical Element Models (CEMs) as canonical form • Utilizing use case scenario’s (PAD, CPNA, etc) for CEM normalization. • Exploration into generalizable CEM models – diagnosis, medications, labs. • Development of processes/tools to identify relevant existing CEM models within CEM libraries • Development of processes to identify missing CEMs for data (and classes of data) in use-cases • Preliminary population of phenotype use-cases

Planned • Adopt eMERGE EleMap tooling for CEMs to population canonical model • Formalize Meaningful Use vocabularies into LexGrid server • Design other components of Data Normalization framework (Terminology Services - NHIN connections) • Model end-to-end flow needed to produce normalized data from structured data and unstructured (natural language) data: • High level description of process for taking “wild-type” data instances to canonical CEM instances • Applicability to use-case data as well as to general classes of data • Adopt UMIA data flows for normalization services • Examine Regenstreif and SHARP 3 modules

Project 2Clinical Natural Language Processing (cNLP) Dr. Guergana Savova

Overview Overarching goal High-throughput phenotype extraction from clinical free text based on standards and the principle of interoperability Focus Information extraction (IE): transformation of unstructured text into structured representations (CEMs) Merging clinical data extracted from free text with structured data

Progress Detailed 4-year project plan Tasks in execution: Investigative tasks: (1) defining CEMs and attributes as normalization targets for NLP, (2) defining set of clinical named entities and their attributes, (3) methods for cNE Engineering tasks: (1) defining users, (2) incorporating site NLP tools into cTAKES and UIMA, (3) common conventions and requirements, (4) de-identification flow and data sharing Forging cross-SHARP collaborations (SHARP 3, PI Kohane and Mandl)

Planned Y1 Gold standard for cNEs, relations and CEMs Focus on methods for cNE discovery and populating relevant CEMs (many subtasks) Projected module releases: Medication extraction (Nov’10) CEM OrderMedAmb population (Mar’11) Deep parser for cTAKES (Nov’10) Dependency parser for cTAKES (Jan’11) Collaboration with SHARP 3 by providing medication extraction capabilities for the medication SMaRT app

Project 3High throughput Phenotyping (HTP) Dr. Jyoti Pathak

Overview • Overarching goal • To develop techniques and algorithms that operate on normalized EMR data to identify cohorts of potentially eligible subjects on the basis of disease, symptoms, or related findings • Focus • Portability of phenotyping algorithms • Representation of phenotyping logic • Measure goodness of EMR data 06/21/10 © 2010 Mayo Clinic 15

Progress Explored use case phenotypes from eMERGE network for HTP process validation Representation of phenotype descriptions and data elements using Clinical Element Models Preliminary execution of phenotyping algorithms (Peripheral Arterial Disease) to compare aggregate data

Planned Interaction and collaboration with Data Normalization and NLP teams to develop “data collection widgets” Representation of phenotyping execution logic in a machine processable format/language Development of machine learning methods for semi-automatic cohort identification

Project 4Infrastructure & Scalability Jeff Ferraro Marshal Schor Calvin Beebe

UIMA exploitation Some initial discussions on UIMA were held in a meeting at MIT attended by Peter Szolovits (MIT) and Guergana Savova (Harvard) and some of their team members. A plan is underway for a UIMA "deep dive" for other members from Intermountain Health and Mayo. A discussion is pending to understand the how UIMA might fit with RPE (in particular, BPEL) RPE = Retrieve Process for Execution: an IHE (Integrating the Health Enterprise) profile to automate collaborative workflow between healthcare and secondary use domains)

Infrastructure Progress • Code repository – Reviewed requirements (e.g. SVN), need pre-release work areas for project teams, bulk of materials will all be in public repository. • Licensing compatibility discussion. Initial discussions on Open Source licensing which is consistent with UIMA and other project teams tooling. Will need to survey teams. • Initial platform discussions Still working on Sandbox (“Shared”) environment, need to consider Cloud in later phases of project.

Planned • Review repository options with: • ONC, Source Forge, Open Health Tools • Need to establish straw man proposal for Sandbox configuration. • Conduct cross-project discussions • Inventory tools that can be shared. • Inventory data that can be shared. • Identify shared environment site location. • Initiate high-level requirements gathering.

Project 5Data Quality Dr. Kent Bailey (Kim Lemmerman)

Overview • Support data quality and ascertain data quality issues across projects • Deploy and enhance methods for missing or conflicting data resolution • Integrate methods into UIMA pipelines

Progress & Planned • Integrate across projects and gather requirements and standards to establish data quality plan and metrics • Compare expected quality of data to actual data quality • Provide recommendation and methods to improve data quality and/or possible outcomes

Cross-Area 4 Program Efforts Lacey Hart

Progress • Started with early with face-to-face collaboration; cross-knowledge pollination • Individual project efforts synergized with timelines in synch; use cases vetted and determined for the first six months of focus. • IRB & Data Sharing issues have been raised with best practice sharing and inventory of existing agreements between institutions reviewed.

Planned • Best practices for IRB submissions and template protocol material will be made available w/ applicable state implications • Data use agreements will be completed across sites where needed in short term; effort for ‘consortium’ agreement will commence for long-term data sharing needs

Cross-ONC Efforts Dr. Christopher Chute

SHARP Area Synergies • Security: ensure piplined data does not have compromisable integrity • Cognitive: explore how normalized data and phenotypes can contribute to decisions • Applications: Potential for shared architectural strategies © 2009 Mayo Clinic 29

Beacon Synergies • High-throughput data normalization and phenotyping (SHARP) • Applied to population laboratory (Beacon) • Validate on consented sub-samples • Potential to include ALL patients in population area – regardless of provider © 2009 Mayo Clinic 30

SHARP Area 4: More information… http://sharpn.org

Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use

Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use

Presentation Transcript

The Internet

Defense Advanced Research Projects Agency (DARPA)

Industrial-strength Mathematics: Research Projects from Business and Industry

History of Research Protections for Individuals and Communities

The Art of Woo: Using Strategic Persuasion to Sell Your Ideas

STRATEGIC NATIONAL STOCKPILE

Primary, Secondary, and Scholarly Sources

Module 4: Health Indicators in Action (Examples of Indicator Projects)

Advanced Health Assessment

Introduction to Secondary Data Analysis

ECIV 720 A Advanced Structural Mechanics and Analysis

SHARP Payroll Information

Chapter 5 Structure of Solids

Operations Research Mini Course

Approaches to health care

GLOBAL HEALTH RESEARCH: A PERSPECTIVE FROM THE SOUTH

Advanced Programming

Strategic Planning and Monitoring of projects and programs

Studying Meteorological Applications using Research and Technology - Advanced Seminar Session

Colin Baigent University of Oxford, UK SHARP Chief Investigator