240 likes | 250 Views
Explore the intricate world of Knowledge Organization Systems in biomedicine, including MeSH, UMLS, and associated tools. Learn about current translations of MeSH, UMLS applications, and how the UMLS benefits various industries beyond biomedicine.
E N D
CENDI Staff Workshop Knowledge Organization Systems: Current and Future Uses September 16, 2004 National Library of Medicine Betsy L. Humphreys Associate Director for Library Operations NLM, NIH, HHS blh@nlm.nih.gov
NLM “Knowledge Organization Systems” • Name and Series/Journal Authority Files • Library Materials Classification • Individual Controlled Vocabularies • MeSH, MedlinePlus Health Topics, NCBI Taxonomy, RxNorm clinical drug vocabulary • Unified Medical Language System (UMLS) Knowledge Sources • Metathesaurus – many vocabularies in a common, integrated format • Semantic Network • Lexicon • Associated tools
NLM “Knowledge Organization Systems” • Common Characteristics • Searchable on the Web, often interlinked with other NLM resources • Distributed in one or more electronic formats • Used within NLM for: • Information retrieval and display • Data creation • Natural language interpretation • Heavily used outside NLM for wide range of applications • Most built and maintained with custom systems
Medical Subject Headings (MeSH) • Structure of MeSH upgraded in 2000 • Descriptor Class – closely related concepts grouped to enhance retrieval • Concept – distinct meaning • Term – concept name http://www.nlm.nih.gov/mesh/meshrels.html
Known Translations of MeSH • In UMLS - Dutch, Finnish, French, German, Italian, Japanese, Portuguese, Russian, Spanish, Swedish • Other Complete Translations • Arabic, Chinese, Czech, Greek, Thai, Turkish • In Progress or Planned or Hoped For • Korean, Slovenian, Vietnamese, Lithuanian, Polish, Slovakian, Norwegian, Kiswahili
Coordinating Translations How? Single Database - Web Interface Add Language as a Term Property Translated Terms added to Concept Non-English Concepts added to Descriptor
Status of Use • Current Active Groups • German, French, Italian, Vietnamese • Groups Beginning Work with MTMS • Dutch, Finnish, Japanese, Polish, Slovakian • Groups Starting Soon • Czech, Portuguese, Korean, Norwegian, Russian, Spanish
The UMLS in practice • Database • Series of relational files • Interfaces • Web interface: Knowledge Source Server (UMLSKS) • Application programming interfaces(Java and XML-based) • Applications • lvg (lexical programs) • MetamorphoSys (installation and customization) • SOON: Metathesaurus browser The UMLS is not an end-user application
UMLS 3 components • Metathesaurus • Concepts • Inter-concept relationships • Semantic Network • Semantic types • Semantic network relationships • Lexical resources • SPECIALIST Lexicon • Lexical tools
Metathesaurus Source Vocabularies (2004AB) • 134 source vocabularies • 126 contributing concept names • 73 families of vocabularies • multiple translations (e.g., MeSH, ICPC, ICD-10) • variants (American-English equivalents, Australian extension/adaptation) • subsequent editions usually considered distinct families (ICD: 9-10; DSM: IIIR-IV) • Broad coverage of biomedicine • Common presentation
L0000002 A0000005Cephalgia(source 1) S0000003 Metathesaurus Concepts (2004AB) • Concept (> 1M) CUI • Set of synonymousconcept names • Term (> 3.8 M) LUI • Set of normalized names • String (> 4.3M) SUI • Distinct concept name • Atom (> 5.1M) AUI • Concept namein a given source C0000001 L0000001 A0000001headache(source 1) A0000002 headache(source 2) S0000001 A0000003 Headache(source 1) A0000004 Headache(source 2) S0000002
Metathesaurus Relationships • Symbolic relations: ~9 M pairs of concepts • Statistical relations : ~7 M pairs of concepts (co-occurring concepts) • Mapping relations: 100,000 pairs of concepts • Categorization: Relationships between concepts and semantic types from the Semantic Network
Why you might care about the UMLS • Content with applicability outside of biomedicine • Tools generally useful in NLP, datamining • New Metathesaurus Rich Release Format • Potentially useful as format for distribution of any set of vocabularies/ontologies and for robust purpose-specific mappings between such systems • May well lead to development of a variety of tools that can output or ingest the format