1 / 26

Biomedical Named Entity Recognition

Biomedical Named Entity Recognition. Ramakanth Kavuluru. NLP Seminar – 8/21/2012. What are named entities? . The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes.

ellis
Download Presentation

Biomedical Named Entity Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biomedical Named Entity Recognition RamakanthKavuluru NLP Seminar – 8/21/2012

  2. What are named entities? • The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes. • Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells

  3. What are named entities? • The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes. • Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells Drug Biologically Active Substance Disorder Enzyme Organic Chemical Cell

  4. What are named entities? • The benefits of taking cholesterol lowering statin drugsoutweigh the risks even among people who are likely to develop diabetes. • Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells Drug Cholesterol lowering drugs Biological Function

  5. Why do we need to extract them? • To provide effective semantic search • Find all discharge summaries of patients that have a history of diabetes and obesity and have taken statins as part of their treatment. • Find all biomedical articles that discuss the dopamine neurotransmitter in the context of depressive disorders. Clinical Trial Recruitment Literature Review

  6. Why do we need to extract them? • To use as features in machine learning for effective text classification • To build semantic clusters of textual documents to understand evolving themes • Reduce noise by avoiding key words that are not indicative of the classes or clusters • Recently, as a first step in relation extraction and hence in knowledge discovery

  7. A major task in text mining • Extract information from textual data • Use this information to solve problems • What type of information? • relevant concepts - a medical condition or finding, a drug, a gene or protein, an emotion (hope, love, …) • Relevant (binary) relations – drug TREATS a condition, protein CAUSES a disease • What are the typical questions? • Does a pathology report indicate a reportable case? • Which patients satisfy the criteria for a clinical trial?

  8. Knowledge Discovery • VIP Peptide – increases – Catecholamine Biosynthesis • Catecholamines – induce – β-adrenergic receptor activity • β-adrenergic receptors – are involved – fear conditioning VIP Peptide – affects – fear conditioning ????? In Cattle In Rats In Humans

  9. Clinical NER

  10. Why is NER Hard?

  11. Linguistic Variation • Derivational variation: cranial, cranium • Inflectional variation: coughed, coughing • Synonymy • nuerofibromin 2, merlin, NF2 protein, and schwannomin. • Addison’s disease, adrenal insufficiency, hypocortisolism, bronzed disease • Feeding problems in newborn – The mother said she was having trouble feeding the baby.

  12. Polysemy • Merlin – both a bird and protein in UMLS • Discharge • Patient was prescribed codeine upon discharge • The discharge was yellow and purulent • Abbreviations • APC: Activated protein C, Adenomatosis polyposis coli, antigen presenting cell, aerobic plate count, advanced pancreatic cancer, age period cohort, antibody producing cells, atrial premature complex

  13. Negation • Nearly half of all clinical concepts in dictated narratives are negated • There is no maxillary sinus tenderness • Implied absence without negation • Lungs are clear upon auscultation So, • Rales: Absent • Rhonchi: Absent • Wheezing: Absent

  14. Controlled Terminologies Controlled vocabulariesor taxonomies • Gene Ontology (gene products) • most cited, 450 per year in PubMed • Total of 33000+ terms • SNOMED CT (about 300K+ concepts) • NCI Thesaurus , ICD-9/10, ICD-0-3, LOINC, MedlinePlus • UMLS Metathesaurus (integration of 140+ vocabularies) • 2.3 million concepts

  15. more Metathesaurus • CUIs • LUIs • SUIs • AUIs

  16. Semantic Types and Relations • NLM Semantic Network, the type system behind UMLS Metathesaurus • Semantic Types (135) • Semantic Groups (15) • Semantic Relations (54) • Specialist Lexicon • Malaria, malarial • Hyperplasia, hyperplastic How do we extract named entities?

  17. Metamap from NLM Identify phrases: Use SPECIALIST parser Map to CUIs: Use SPECIALIST Lexicon, Metathesaurus and Semantic Network

  18. Output of syntactic analysis • Syntactic Analysis – “ocular complications of myasthenia gravis” • Ocular (adj), complications (noun), of (prep), myasthenia (noun), gravis (noun) • gives noun phrases (NP): “Ocular complications”and “Myasthenia gravis” • Prepositions are ignored • In a given NP, you have a head and modifiers: • Ocular (mod) and complications (head) • How about “male pattern baldness”?

  19. Variant Generation

  20. Variant Generation

  21. Candidate identification • Look for all variants in Metathesaurus strings and identify those candidate concepts (CUIs) that contain at least one variant as a substring • Example: For ocular complication, obtain all Metathesaurus strings that contain any of the following as substrings • Optic complication • Eyes complication • Opthalmic complicated • ….

  22. Mapping and Evaluation • So now we have a bunch of candidate CUIs based on presence of variants of the given phrase in Metathesaurus strings. How do we select the best candidate. • Use several measures to compute a rank • Centrality (involvement of head) • Variation (average of inverse distance scores) • Coverage • Cohesivness

  23. Final Score

  24. Metamap Options • Types of variants: include or exclude derivational variants • Word sense disambiguation • Discharge (bodily secretion VS release the patient) • Concept gaps • Obstructive apnea mapping to “obstructive sleep apnea” or “obstructive neonatal apnea” • Term processing • Process the input string as a single concept, that is, don’t split it into noun phrases

  25. Output options • Human readable format • XML format • Restrictions based on certain vocabularies: consider only ICD-9 • Restrictions based on certain types: consider only pharmacological substances (i.e., drugs) DEMO TIME: Daniel Harris

  26. References • An overview of Metamap: Historical Perspectives and Recent Advances, Alan Aronson and Francois Lang • Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program, Alan Aronson • Comparison of LVG and Metamap Functionality, Alan Aronson • Lexical, Terminological, and Ontological Resources for Biological Text Mining, Olivier Bodenreider

More Related