530 likes | 547 Views
The UMLS Semantic Network for Natural Language Processing Thomas C. Rindflesch, Ph.D. Lister Hill National Center for Biomedical Communications. Workshop on the Future of the UMLS Semantic Network. Goal. Sophisticated access to online information Supplement document retrieval with:
E N D
The UMLS Semantic Network for Natural Language Processing Thomas C. Rindflesch, Ph.D.Lister Hill National Center for Biomedical Communications Workshop on the Future of the UMLS Semantic Network
Goal • Sophisticated access to online information • Supplement document retrieval with: • Information extraction • Automatic summarization • Question answering • Literature-based discovery • Central concern of informatics research
Challenge: Language Complexity The average age of participants (approximately 63 years), the predominance of women, and the high prevalence of comorbid conditions (for example, hypertension and cardiovascular disease) reflect typical characteristics of patients with osteoarthritis.
Challenge: Language Complexity The average age of participants (approximately 63 years), the predominance of women, and the high prevalence of comorbid conditions (for example, hypertension and cardiovascular disease) reflect typical characteristics of patients with osteoarthritis. • Language encodes a lot of information
Natural Language Processing • Various approaches • Correspond to levels of linguistic expression • Words • Phrases • Relations
Words The average age of participants (approximately 63 years), the predominance of women, and the high prevalence of comorbid conditions (for example, hypertension and cardiovascular disease) reflect typical characteristics of patients with osteoarthritis.
ageapproximatelyaveragecardiovascularcharacteristicscomorbidconditionsdiseaseexamplehighageapproximatelyaveragecardiovascularcharacteristicscomorbidconditionsdiseaseexamplehigh hypertensionosteoarthritisparticipantspatientspredominanceprevalencereflecttypicalwomenyears Words
Phrases The average age of participants (approximately 63 years), the predominance of women, and the high prevalence of comorbid conditions (for example, hypertension and cardiovascular disease) reflect typical characteristics of patients with osteoarthritis.
average ageparticipants approximately 63 years predominancewomenhigh prevalencecomorbid conditions examplehypertension cardiovascular diseasetypical characteristics patients osteoarthritis Phrases
Semantic Predications The average age of participants (approximately 63 years), the predominance of women, and the high prevalence of comorbid conditions (for example, hypertension and cardiovascular disease) reflect typical characteristics of patients with osteoarthritis.
Semantic Predications The average age of participants (approximately 63 years), the predominance of women, and the high prevalence of comorbid conditions (for example, hypertension and cardiovascular disease) reflect typical characteristics of patients with osteoarthritis.
Semantic Predications Cardiovascular DiseasesCO-OCCURS_WITHDegenerative polyarthritis HypertensionCO-OCCURS_WITHDegenerative polyarthritis
Semantic Interpretation • Map syntactic structures to structured domain knowledge • Concepts • Relations • Output is semantic predication • Arguments and a predicate in a relationship • Supports enhanced access to online information
Related Research in Biomedicine [Friedman, et al.] • MedLEE, GENIES • Semantic grammar • AQUA • Definite clause grammar • MPLUS • Chart parser • MEDSYNDIKATE • Dependency grammar [Johnson, Campbell] [Haug, et al.] [Hahn, et al.]
SemRep • Interpret semantic predications in Medline • Exploit the UMLS • Concepts: Metathesaurus • Relations: Semantic Network • Syntax: SPECIALIST Lexicon • Use other resources at NLM • MetaMap • UMLS Knowledge Source Server
Minimal Commitment Approach • Focused processing • Syntax • Semantics • Incremental development • Useful results
SemRep:System Overview MedPost Tagger Lexical Look-up Resolve Ambiguity SPECIALIST Lexicon Metathesaurus Parser MetaMap Construct Relation Semantic Network Semantic Predication MedicalText
Input The aim of this study was the characterization of the specific effects of alprazolam versus imipramine in the treatment of panic disorder with agoraphobia and the delineation of dose-response and possible plasma level-response relationships.
Syntactic Processing Resolve Ambiguity SPECIALIST Lexicon MedPost Tagger Text Lexical Look-up Parser
Syntactic Processing The aim of this study was the characterization of the specific effects NP[ofalprazolam][versus]NP[imipramine]NP[in the treatment]NominalizationNP[of panic disorder]NP[with Agoraphobia]and the delineation of dose-response and possible plasma level-response relationships.
MetaMap: Metathesaurus Concepts MedPost Tagger Text Lexical Look-up Resolve Ambiguity SPECIALIST Lexicon Metathesaurus Parser MetaMap
MetaMap: Metathesaurus Concepts The aim of this study was the characterization of the specific effects NP[ofAlprazolam][versus]NP[Imipramine]NP[in treatment]NominalizationNP[ofPanic Disorder]NP[with Agoraphobia]and the delineation of dose-response and possible plasma level-response relationships.
Semantic Types The aim of this study was the characterization of the specific effects NP[of phsu][versus]NP[phsu]NP[in treatment]NominalizationNP[of dsyn]NP[with dsyn] and the delineation of dose-response and possible plasma level response relationships. Pharmacologic Substance Disease or Syndrome
Construct Relation MedPost Tagger MedicalText Lexical Look-up Resolve Ambiguity SPECIALIST Lexicon Metathesaurus Parser MetaMap Construct Relation Semantic Network Semantic Predication
Semantic Interpretation • Indicator rules • Establish a link between words and predicates in the Semantic Network • Argument identification rules • Syntactic constraints • Validation of semantic predications • Semantic Network
Semantic Network Predicates associated_with physically spatially temporally conceptually functionally_related_to occurs_in affects brings_about
Core SemRep Predicates associated_with physically spatially temporally conceptually LOCATION_OF functionally_related_to CO-OCCURS_WITH OCCURS_IN affects brings_about TREATS PREVENTS CAUSES
Semantic Network Predication Occupational Activity Biologic Function Health Care Activity Pathologic Function Therapeutic or Preventive Procedure Disease or Syndrome associated_with physically spatially temporally conceptually functionally_related_to occurs_in affects brings_about treats
Indicator Rules: Overview Item Semantic Network Structure nominalization TREATS treatment Drugs for the treatment of schizophrenia preposition in TREATS Hemofiltration in digoxin overdose preposition in HAS_LOCATION Severe infections in both feet Establish a correspondence between a syntactic item and a Semantic Network predicate
Semantic Types The aim of this study was the characterization of the specific effects NP[of phsu][versus]NP[phsu]NP[in treatment]NominalizationNP[of dsyn]NP[with dsyn] and the delineation of dose-response and possible plasma level response relationships. Pharmacologic Substance Disease or Syndrome
Apply Indicator Rule The aim of this study was the characterization of the specific effects NP[of phsu][versus]NP[phsu]NP[in treatment]NominalizationNP[of dsyn]NP[with dsyn] and the delineation of dose-response and possible plasma level response relationships. TREATS
Argument Constraints The aim of this study was the characterization of the specific effects NP[of phsu] [versus]NP[phsu]NP[in treatment]NominalizationNP[of dsyn]NP[with dsyn]and the delineation of dose-response and possible plasma level response relationships. TREATS
Semantic Network Predication phsu-TREATS-dsyn medd-TREATS-dsyn topp-TREATS-dsyn topp-TREATS-inpo The aim of this study was the characterization of the specific effects NP[of phsu] [versus]NP[phsu]NP[in treatment]NominalizationNP[of dsyn]NP[with dsyn]and the delineation of dose-response and possible plasma level response relationships.
Match Semantic Types phsu-TREATS-dsyn medd-TREATS-dsyn topp-TREATS-dsyn topp-TREATS-inpo The aim of this study was the characterization of the specific effects NP[of phsu] [versus]NP[phsu]NP[in treatment]NominalizationNP[of dsyn]NP[with dsyn] and the delineation of dose-response and possible plasma level response relationships.
Substitute Concepts The aim of this study was the characterization of the specific effects NP[of phsu] [versus]NP[Alprazolam]NP[in treatment]NominalizationNP[ofPanic Disorder]NP[with dsyn]and the delineation of dose-response and possible plasma level response relationships. Alprazolam-TREATS-PanicDisorder
Evaluation • Developing a test collection • 2,000 sentences from MEDLINE • Mainly drug therapies • TREATS, OCCURS_IN, LOCATION_OF, ISA • Preliminary results • TREATS: 49% recall, 78% precision • ISA: 83% precision
Applications • Automatic summarization • Marcelo Fiszman • Machine translation • Halil Kilicoglu • Discovery • Information extraction in genomics • Bisharah Libbus • Question answering • Dina Demner-Fushman
Semantic Medline Enhanced Information Management UMLS Medline Semantic Processing PubMed
Automatic Summarization • PubMed search with query “migraine” • Retain 500 most recent citations • Process with SemRep • Summarize SemRep output • Condense list of predications • Visualize results (Halil Kilicoglu) • Translate summarized results (using MeSH)
Summarization for Discovery • Investigate “unexpected” connections • PubMed search with • sleep AND gastrointestinal • Run SemRep and summarize on • Gastroesophageal reflux disease
Summarization for Discovery • Investigate “unexpected” connections • PubMed search with • sleep AND gastrointestinal • Run SemRep and summarize on • Gastroesophageal reflux disease • New PubMed search on “cpap AND gerd”
Marked improvement in nocturnal gastroesophageal reflux in a large cohort of patients with obstructive sleep apnea treated with continuous positive airway pressure
Semantic Network for NLP • Very useful as is • Issues noted while developing SemRep • Missing relations • Infelicitous relations • Semantic type hierarchy • Recommendations for development • Theory and practice • Incremental development • Maintenance
Missing Relations • Genomics • “Genes” CAUSE “Disease” • “Genes” INTERACT_WITH “Genes” • Treatment • “Intervention” TREAT “Patients” / “Organism” Donepezil for patients with Alzheimer’s
Semantic Type Hierarchy • Groups and group members (organisms) • dsyn OCCURS_IN podg • Adults with acidosis • dsyn PROCESS_OF mamm • Dogs with acidosis