1 / 68

Abstraction Networks for Terminologies

Abstraction Networks for Terminologies. Yehoshua Perl Computer Science Dept. New Jersey Institute of Technology Newark, NJ 07102 USA yehoshua.perl@gmail.com. 09/12/12. 09/12/12. 09/12/12. 1. 1. Overview. What are abstraction networks of terminologies?

pembroke
Download Presentation

Abstraction Networks for Terminologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Abstraction Networks for Terminologies • Yehoshua Perl • Computer Science Dept. • New Jersey Institute of Technology • Newark, NJ 07102 USA • yehoshua.perl@gmail.com 09/12/12 09/12/12 09/12/12 1 1

  2. Overview • What are abstraction networks of terminologies? • Characteristics of the abstraction networks • Examples of abstraction network derived for UMLS, SNOMED CT and the MED • Uses of abstraction networks in visual summarization, orientation, auditing and navigation of terminologies 09/12/12 09/12/12 2

  3. Motivation • Terminologies are playing major roles in healthcare information systems. • They are large, complex and difficult to maintain. • Graphical displays are needed for better orientation to aid terminology use and maintenance. • We have introduced abstraction networks as a way to support orientation. 09/12/12 09/12/12 09/12/12 3 3

  4. Nature of Abstraction Networks • Most terminologies have a network structure, with a backbone of IS-A relationships. • An abstraction network is a secondary network that provides a compact view of the structure and content of the primary terminology. • Terminology Network Abstraction Network 09/12/12 09/12/12 09/12/12 4 4

  5. 5 09/12/12 09/12/12 5

  6. 6 09/12/12 09/12/12 6

  7. Derivation of Abstraction Networks • Abstraction of a terminology is the process by which subsets of concepts are each replaced by a higher-level conceptual entity called a node. • These nodes are interconnected by child-of hierarchical relationships. Terminology of Concepts Abstraction Network of Nodes Subset of concepts modeled by a node 09/12/12 09/12/12 09/12/12 7 7

  8. Abstraction Network Characteristics (1) • Three characteristics • Disjointness • Derivation origin • Abstraction ratio • Disjointness: Does an abstraction network divide the underlying terminology into disjoint parts? Disjoint abstraction network Intersection abstraction network 09/12/12 09/12/12 09/12/12 8 8

  9. Abstraction Network Characteristics (2) • Derivation Origin: Are the nodes derived from the terminology (intrinsic) or are they formulated based on some external knowledge (extrinsic)? • Abstraction ratio = Intrinsic derivation Extrinsic derivation # concepts of terminology # nodes of abstraction network 09/12/12

  10. An abstraction network is disjoint if each concept of the terminology is mapped to a unique node. An abstraction network is an intersection abstraction network if some concepts belong to multiple nodes. Anatomical Abnormality Disease Intersection Abstraction Network Dynamic subaortic stenosis 09/12/12 09/12/12 09/12/12 10 10

  11. More on Orientation • An abstraction network offers a high-level view of the terminology for orientation into its content. • The orientation problem has two facets • Orientation on the macro level to provide context for the content and structure of the whole terminology. • Orientation on the micro level into details of small portions of the terminology. • Without an orientation on the macro level, it is difficult to obtain an orientation on the micro level due to lack of context. • Abstraction networks provide macro level orientation. 09/12/12 09/12/12 09/12/12 11 11

  12. Example Abstraction Networks • We cover abstraction networks for some known terminological systems. • UMLS • SNOMED CT • MED • We describe the derivation for each example • We categorize them according to the 3 characteristics above: Disjointness, source origin and abstraction ratio. 09/12/12 09/12/12 12

  13. An Abstraction Network for the UMLS Metathesaurus • The two major knowledge sources of the UMLS • Metathesaurus (META) • The Semantic Network (SN) • The META is a large repository of concepts compiled from more than 160 source vocabularies. • Its 2011AB META release comprises about 8.6 million terms mapped into more than 2.6 million concepts. 09/12/12 09/12/12 13

  14. Semantic Network Excerpt Event Entity Phenomenon or Process Physical Object Conceptual Entity Natural Phenomenon or Process Injury or Poisoning Organism Attribute Anatomical Structure Clinical Attribute Biology Function Fully Formed Anatomical Structure Anatomical Abnormality Pathologic Function Congenital Abnormality Acquired Abnormality Cell or Molecular Dysfunction Disease or Syndrome Experimental Model of Disease Mental or Behavioral Dysfunction Neoplastic Process 09/12/12

  15. Semantic Network • SN consists of 133 semantic types (high-level categories). • The SN is organized through IS-A hierarchical relationships in two trees rooted at Entity and Event, respectively. 09/12/12 09/12/12 15

  16. Characteristics of the SN abstraction network • The SN is an extrinsic abstraction network for META, since it is not derived from META. • Each concept in META is assigned one or more of SN's semantic types. • Thus, SN is an intersection abstraction network since a concept may be assigned multiple semantic types. • SN exhibits an abstraction ratio of about 19,500:1. • SN has been used in conjunction with the underlying META in a variety of applications. • 95 papers returned by PUBMED for “Metathesaurus Semantic Network”. 09/12/12 09/12/12 16

  17. In the SN intersection abstraction network, concepts with a single category have a simple semantics. Concepts with multiple categories have a compound semantics, elaborated by the respective category combination. Concepts with compound semantics are complex since they are both “a this and a that”. Disease or Syndrome Eyelid Diseases Anatomical Abnormality Deformity Simple & Compound Semantics Simple Simple Lacrimal Duct Obstruction Compound 09/12/12 09/12/12 17

  18. EMD EMD∩NP 26 Intersection of Semantic Types NP • The extent of a Semantic Type S is the set of concepts assigned S. • There are 73 concepts in the extent of Experimental Model of Disease (EMD) • Experimental Model of Disease has an intersection with Neoplastic Process (NP) 09/12/12 09/12/12 18

  19. Non-Uniform Semantics • Within EMD’s extent, 26 concepts are both experimental models of disease and neoplastic processes, and 47 are only experimental models of disease. • The non-uniformity of EMD semantic type extent makes it difficult to comprehend the extent of EMD. EMD (47) EMD ∩ NP (26) 09/12/12

  20. Refined Semantic Network (RSN) • To address this non-uniformity, we introduced the “Refined Semantic Network” (“RSN”) [Gu, JAMIA 2000]. • RSN comprises two kinds of types: pure semantic types and intersection types. • The extent of a pure semantic type S is the subset of concepts assigned S, exclusively. • The pure semantic type Experimental Model of Disease is assigned to the 47 concepts. 09/12/12 09/12/12 20

  21. EMD EMD ∩ NP 26 Intersection Types NP • An intersection type is a reifications of a non-empty intersection of the extents of semantic types. • Example: the RSN contains an intersection type EMD∩ NP with an extent of 26. 09/12/12 09/12/12 21

  22. Excerpt of the Refined Semantic Network Entity Event Phenomenon or Process Human-caused Phenomenon or Process Natural Phenomenon or Process Physical Object Biologic Function Anatomical Structure Pathologic Function Experimental Model of Disease Disease or Syndrome Anatomical Abnormality Mental or Behavioral Dysfunction Acquired Abnormality Congenital Abnormality Neoplastic Process Intersection Semantic Types • Acquired • Abnormality • Disease or Syndrome • Anatomical • Abnormality • Disease • or Syndrome • Congenital • Abnormality • Disease • or Syndrome Experimental Model of Disease ∩ Neoplastic Process Natural Phenomenon or Process ∩ Human-caused Phenomenon or Process 09/12/12

  23. Characteristics of the RSN • The RSN is an intrinsic abstraction network derived automatically from the SN and its semantic-type assignments to the concepts of META. • The RSN is a disjoint abstraction network. • The RSN contains a total of 539 types, including 406 intersection types and 133 semantic types. • The abstraction ratio of approximately 4,800:1. 09/12/12 09/12/12 23

  24. RSN Properties • RSN hierarchy is a directed acyclic graph (DAG) due to multiple parents of intersection types. • RSN’s hierarchical depth is 11 as compared to depth 9 for SN. • In the description of the first version of SN, McCray & Hole state: • “The current scope of the [Semantic] Network is quite broad, yet the depth is fairly shallow. • We expect to make future refinements and enhancements to the Network, based on actual use and experimentation.” • Introduction of the RSN abstraction network is a step in direction planned. 09/12/12 09/12/12 24

  25. Uses of RSN (1) • The RSN has been proven an excellent vehicle for the support of UMLS auditing. • The intersection types with very small extents (1-6 concepts) proved to have high likelihood of errors. • Structural group auditing was introduced for extents of RSN [Chen, JBI 2009, JAMIA 2011] 09/12/12 09/12/12 25

  26. Uses of RSN (2) • RSN can aid in efficient navigation of the content of META. • The “Chemical Specialty Semantic Network,” abstraction network is focused on the chemical concepts of the UMLS [Morrey, Cheminformatics 2012]. • The RSN framework supports accurate modeling of complex and conjugate chemicals [Chen, JAMIA, 2009] 09/12/12

  27. Taxonomies for SNOMED CT • Three related kinds of taxonomies have been formulated as abstraction networks for description-logic-based (DL) terminologies. • They are the area taxonomy, the partial-area taxonomy, and the disjoint partial-area taxonomy. • DL Terminologies examples: SNOMED CT and NCIt • Taxonomies are also applicable for similarly modeled terminologies. • Convergent Medical Terminology (CMT )of Kaiser Permanente • Enterprise Reference Terminology (ERT) of the VA. 09/12/12 09/12/12 27

  28. Area Taxonomy • The nodes of the area taxonomy are derived from a partition of a terminology based on the relationships of its concepts. • Concepts with the exact same relationships are grouped together into an area. • In the area taxonomy, each area is a node. Area morphology topography Morphology topography (3 concepts) 09/12/12 09/12/12 28

  29. Area Taxonomy for Specimen 09/12/12 09/12/12 29

  30. Area Taxonomy • The area taxonomy is disjoint since each concept has a unique set of relationships. • Areas are connected with links called child-of relationships. • A root is top-level concept in an area whose parents all reside in other areas. • There can be multiple root per area. B B child-of IS-A A A 09/12/12 09/12/12 30

  31. B C A Partial-Area Taxonomy • The partial-area taxonomy refines the area taxonomy by considering local hierarchical configurations within an area. • A partial-area is a division of an area consisting of a root with all its descendants in the area. • Each partial-area is a node within the area. • The partial-area taxonomy is not disjoint. Partial Area A (4) B (6) C (3) Area 09/12/12 09/12/12 31

  32. Partial-Area Taxonomy 09/12/12 09/12/12 32

  33. Summary Visualization • A partial-area taxonomy refines the visualization of area taxonomy. • For example, inside area {substance}, there are 11 white boxes, each with the name of the respective partial-area and the number of concepts. • The name of the partial-area, after its root, represents the overarching semantics of the group. 09/12/12 09/12/12 33

  34. Overlap of Partial Areas • The partial-area taxonomy provides a summarization of the 102 concepts that only exhibit the substance relationship. • The sum of the cardinalities of the four large partial-areas 137, is greater than the cardinality 102 of the entire area. • This occurs due to the overlap among these four non-disjoint partial-areas. 09/12/12 09/12/12 34

  35. Auditing Small Partial Areas • In partial area taxonomy we see many small partial-areas of one or two concepts. • As shown in [Halper, AMIA 2007], the partial-areas of very few concepts have a higher likelihood of concepts in error. • The partial-area taxonomy visualization serves to enhance a framework for quality-assurance. 09/12/12 09/12/12 35

  36. B C D A Overlaps of Partial Areas • Concepts in multiple partial-area complicate the categorization of the partial-area taxonomy. • In a given partial-area, some concepts belong solely to that partial-area elaborating the semantics of its root only, others belong to multiple partial-areas. • We get a partition of the concepts of an area into disjoint partial-areas with no overlaps. disjoint partial-area A (3) B (5) C (3) Area D (1) 09/12/12 09/12/12 36

  37. Disjoint Partial Area Taxonomy • A Disjoint Partial Area Taxonomy is a refinement of the partial-area taxonomy. • The disjoint partial-areas are the nodes. • These nodes are connected via child-of links, in a manner similar (but more complex) to that in a partial-area taxonomy. • The partitioning is carried out in a recursive manner due to the potential of “hierarchical tangling” within the an area (see [Wang, JBI 2012]). 09/12/12 09/12/12 37

  38. Excerpt of the disjoint partial-area taxonomy {substance} area 09/12/12 09/12/12 38

  39. Better Orientation • This figure illustrates how the disjoint partial-area taxonomy supports orientation to the most tangled parts of a SNOMED hierarchy, as area {substance} of the Specimen hierarchy. • Six color-coded overlapping partial-areas are on Level 1. • The overlaps among these six partial-areas are displayed utilizing combinations of their color coding. • They are arranged in layers according to the number of overlapping partial-areas. 09/12/12 09/12/12 39

  40. Orientation into a Tangled Hiercharchy • There are 7 disjoint partial-areas inheriting from both partial-areas Body substance sample and Fluid sample with 30 concepts. • The largest disjoint partial-area, Body fluid sample, has 15 concepts, which were counted twice before, once with respect to Body substance sample (55) and the other with respect to Fluid sample (44). • The other six disjoint partial-areas (on Level 3) are overlaps of three partial-areas, where Blood specimen (25) is the third with 15 overlapping concepts counted three times in the partial-area taxonomy. • By the arrangement of these 30 concepts into disjoint partial-areas, the figure gives a picture of their actual nature and respective grouping, with largest disjoint partial-area Acellular blood (serum or plasma) specimen (9). 09/12/12 09/12/12 40

  41. Use in Auditing and Orientation • In [Wang, JBI 2012], such overlapping concepts were shown to have a statistically significant higher ratio of errors. • This taxonomy yields insights into the modeling of tangled portions of a hierarchy that can lead to improvements. 09/12/12 09/12/12 41

  42. Taxonomies Characteristics • All three of these abstraction networks are intrinsic as they are derived strictly from the terminology. • The area taxonomy and disjoint partial-area taxonomy are disjoint. The partial-area taxonomy is not disjoint. • The abstraction ratios for the area taxonomy and partial-area taxonomy are 58 (= 1,330 / 23) and 3.26 ( =1,330 / 407), respectively. For the disjoint partial-area taxonomy, the ratio is 2.73 (= 1,330 /487). 09/12/12 09/12/12 42

  43. In 2000, we presented an abstraction network for the Medical Entities Dictionary (MED) of Columbia The group of all concepts with the same set of properties (i.e., attributes and relationships) is represented by a node with the same attributes and relationships. a x b x c x An Abstraction Network for the MED a x 09/12/12 09/12/12 43

  44. A concept is a root of a given node if all its parent concepts do not belong to the node. A child-of relationship is defined from node A to node B to reflect an IS-A relationship from the root concept of A to a concept in B. A root names the node since it generalizes all its concepts d c r Root of a Node d r 09/12/12 09/12/12 44

  45. MED Abstraction Network Has 2 Kinds of Nodes • The first kind, called a property-introduction node, has a unique root for which new properties are defined.                       • The second kind, called an intersection node has multiple parents from different nodes. • It inherits properties from each of its parents and thus has more properties than any single parent. 09/12/12

  46. Excerpt from MED Abstraction Network Medical Entity Event Component Anatomic Entity Sampleable Entity CPMC Radiology Term Measurable Entity Diagnostic Procedure Etiologic Agent Disease or Syndrome Laboratory or Test Result ICD9 Element CPMC Electro-cardiograph Procedure Chemical Laboratory Results Number or String Result ICD9 (or CPT) Procedures Abnormal Findings in Body Substances Antibiotics Laboratory Diagnostic Procedure Culture Results Date Result Quantity Result Smear Results CPMC Laboratory Diagnostic Procedures Single-Result Laboratory Test ID Number Plus Text Results Numeric Result Restricted to Given Range of Values Abnormal Blood Hematology ICD9 Diagnostic Procedure Orderable Tests Radiology Event Component Physical Anatomic Entity Water Microorganism Mental or Behavioral Dysfunction Coma Cardiac Dysrhythmia Organisms Seen on Smear Cell Microscopic Examination Anemia Image-Guided Interventional Procedure Calcified Body Part or Structure Hypoglycemia 09/12/12 09/12/12 09/12/12 46 46 Adrenal Calcification

  47. The abstraction network obtained is disjoint since descendants of more than one property-introduction root are defined to be concepts of a unique intersection node. A program to create such an abstraction network for a given terminology satisfying Cimino’s desiderata is given in [Liu, Distributed and Parallel Databases, 1999] Deriving the MED Abstraction Network 09/12/12 09/12/12 09/12/12 47 47

  48. For the MED, consisting of about 43,000 concepts (1996 version), the abstraction network contains 90 nodes; 53 introduction nodes and 37 intersection nodes. For the InterMED (a small offshoot of the MED of about 2,800 concepts), an abstraction network of 28 nodes was derived. The abstraction ratios for these two terminologies are respectively 478:1 and 89:1. The MED exhibits the characteristic of a unique introduction concept for each property. Thus, the number of introduction nodes is bounded by the number of properties in the MED. Properties of MED Abstraction Network 09/12/12 09/12/12 09/12/12 48 48

  49. Abstraction Network from MED Excerpt Medical Entity Diagnostic Procedure American Hospital Formulary Service Class Specimen Pharmacy Item (Drug and Nondrug) Sampleable Entity Drug Enforcement Agency (DEA) Controlled Substance Category Laboratory Diagnostic Procedure Laboratory or Test Result Disease or Syndrome Anatomical Structure Measurable Entity Antihistamine Drug ICD9 Element Etiologic Agent Number Or String Result CPMC Laboratory Diagnostic Procedure Unknown and Unspecified Cause of Morbid or Mortality Chemical Single-Result Laboratory Test Pancreatin Heart Disease Allen Serum Amylase Measurement Calcified Pericardium 09/12/12 09/12/12 49

  50. Excerpt from MED Medical Entity Conceptual Entity Physical Object Specimen Event Sampleable Entity Measurable Entity Etiologic Agent Activity Anatomic Structure Intravascular Fluid Specimen Orderable Entity Intellectual Product Substance Patient Problem Acquired Abnormality Occupational Activity Classification Finding Chemical Intravascular Chemistry Specimen Disease or Syndrome Serum Specimen Serum Chemistry Specimen Health Care Activity Chemical Viewed Structurally Pharmacy Concepts ICD9 Element Laboratory or Test Result Lesion Pharmacy Item (Drug and Nondrug) Number Or String Result Organic Chemical Allen Serum Specimen Diagnostic Procedure Calcified Body Part or Structure Laboratory Procedure ICD9 Disease Laboratory Diagnostic Procedure Drug Enforcement Agency (DEA) Controlled Substance Category Common In-Patient Diagnoses Amino Acid, Peptide or Protein Disorder of Circulatory System American Hospital Formulary Service Class Diphenhydramine Enzyme CPMC Laboratory Diagnostic Procedure Single-Result Laboratory Test Cardiovascular Disease CPMC Formulary Drug Item Amylase Heart Disease Single-Result Chemistry Test Drug Enforcement Agency (DEA) Class 0 CPMC Chemistry Panels Antihistamine Drug Disease of Pericardium Intravascular Chemistry Test Diphenhydramine Preparation Disease of Pericardium, Other (ICD9) Serum Chemistry Test Amylase Panels Serum Amylase Test Calcified Pericardium CPMC Drugs Benadryl 25 MG Cap Serum Total Amylase Test Pancreatin Allen Serum Amylase Measurement 09/12/12

More Related