1 / 17

Knowledge-Based Integration of Neuroscience Data Sources

Knowledge-Based Integration of Neuroscience Data Sources. Amarnath Gupta Bertram Lud äscher Maryann Martone University of California San Diego. View Definition. A Standard Information Mediation Framework. Client Query. Integrated XML View. Mediator. XML View. XML View. XML View.

quang
Download Presentation

Knowledge-Based Integration of Neuroscience Data Sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego

  2. View Definition A Standard Information Mediation Framework Client Query Integrated XML View Mediator XML View XML View XML View Wrapper Wrapper Data Source XML Data Source Data Source

  3. View Definition A Neuroscience Question Cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? Integrated View Mediator Wrapper Wrapper Wrapper Wrapper WWW CaBP, Expasy protein localization morphometry neurotransmission

  4. Integration Issues • Structural Heterogeneity • Resolved by converting to common semistructured data model • Heterogeneity in Query Capabilities • Resolved by writing wrappers with binding patterns and other capability-definition languages • Semantic Heterogeneity • Schema conflicts • Partially resolved by mapping rules in the mediator • Hidden Semantics?

  5. Purkinje Cell layer of Cerebellar Cortex Molecular layer of Cerebellar Cortex Fragment of dendrite Hidden Semantics:Protein Localization <protein_localization> <neuron type=“purkinje cell” /> <protein channel=“red”> <name>RyR</> …. </protein> <region h_grid_pos=“1” v_grid_pos=“A”> <density> <structure fraction=“0.8”> <name>spine</> <amount name=“RyR”>0</> </> <structure fraction=“0.2”> <name>branchlet</> <amount name=“RyR”>30</> </>

  6. Branch level beyond 4 is a branchlet Must be dendritic because Purkinje cells don’t have somatic spines Hidden Semantics: Morphometry <neuron name=“purkinje cell”> <branch level=“10”> <shaft> … </shaft> <spine number=“1”> <attachment x=“5.3” y=“-3.2” z=“8.7” /> <length>12.348</> <min_section>1.93</> <max_section>4.47</> <surface_area>9.884</> <volume>7.930</> <head> <width>4.47</> <length>1.79</> </head> </spine> …

  7. The Problem • Multiple Worlds Integration • compatible terms not directly joinable • complex, indirect associations among schema elements • unstated integrity constraints • Why not use ontologies? • typical ontologies associate terms along limited number of dimensions • What’s needed • a “theory” under which non-identical terms can be “semantically” joined

  8. Our Approach • Modify the standard Mediation Architecture • Wrapper • Extend to encode an object-version of the structure schema • Mediator • Redesign to incorporate auxiliary knowledge sources to • Correlate object schema of sources • Define additional objects not specified but derivable from sources • At the Mediator • Use a logic engine to • Encode the mapping rules between sources • Define integrated views using a combination of exported objects from source and the auxiliary knowledge sources • Perform query decomposition • We still use Global-as-View form of mediation

  9. Object Wrapper Object Wrapper Structure Wrapper Structure Wrapper The KIND Architecture Integrated User View View Definition Rules Auxiliary Knowledge Source 1 Logic Engine Integration Logic Auxiliary Knowledge Source 2 Schema of Registered Sources Materialized Views Src 2 Src 1

  10. The Knowledge-Base • Situate every data object in its anatomical context • An illustration • New data is registered with the knowledge-base • Insertion of new data reconciles the current knowledge-base with the new information by: • Indexing the data with the source as part of registration • Extending the knowledge-base • Creating new views with complex rules to encode additional domain knowledge

  11. F-Logic for the Mediation Engine • Why F-Logic? • Provides the power of Datalog (with negation) and object creation through Skolem IDs • Correct amount of “notational sugar” and rules to provide object-oriented abstraction • Schema-level reasoning • Expressing variable arity • F-Logic in KIND • Source schema wrapped into F-Logic schema • Knowledge-sources programmed in F-Logic • Definition of Integrated Views

  12. Wrapping into Logic Objects • Automated Part <!ELEMENT Studies (Study)*> <!ELEMENT Study (study_id, … animal, experiments, experimenters> <!ELEMENT experiments (experiment)*> <!ELEMENT experiment (description, instrument, parameters)> studyDB[studies   study]. study[study_id string; … animal  animal; experiments  experiment; experimenters   string]. … • Non-automated Part • Subclasses • Rules • Integrity Constraints mushroom_spine::spine S:mushroom_spine IF S:spine[head_;neck _]. ic1(S):alert[type  “invalid spine”; object S] IF S:spine[undef   {head, neck}].

  13. union view association rule taxon[subspecies  string; species  string; genus  string; … phylum  string; kingdom  string; superkingdom  string]. Schema At Mediator subspecies::species::genus:: … kingdom::superkingdom T:TR, TR::TR1 IF T: ‘TAXON’.taxon[Taxon_Rank TR, Taxon_Rank1 TR1], Taxon_Rank::Taxon_Rank1. Class creation by schema reasoning Computing with Auxiliary Sources • Creating Mediated Classes • Reasoning with Schema animal[MR] IF S:source, S.animal [MR] . animal[taxon  ‘TAXON’.taxon]. X[taxonT] IF X: ‘PROLAB’.animal[name N], words(N,[W1,W2|_]), T: ‘TAXON’.taxon[genus W1;species W2].

  14. Integrated View Definition • Views are defined between sources and knowledge base • Example: protein_distribution • given:organism, protein, brain_region • KB Anatom: • recursively traverse the has_a paths under brain_region collect all anatomical_entities • Source PROLAB: • join with anatomical structures and collect the value of attribute “image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = protein and “study_db.study.animal.name” = organism • Mediator: • aggregate over all parents up to brain_region • report distribution

  15. Query Evaluation Example • protein distribution of Human NCS-1 homologue • from wrapped CaBP website: • get the amino acid sequence for human NCS-1 • from wrapped Expasy website: • submit amino acid sequence, get ranked homologues • at Mediator: • select homologues H found in rat, and homology > 0.70 • at Mediator: • for each h in H • from previous view: • protein_distribution(rat, h, cerebellum, distribution) • Construct result a second integrated view

  16. Implementation • System • Flora as F-Logic Engine • Communicate with ODBC databases through underlying XSB Prolog • XML wrapping and Web querying through XMAS, our XML query language and custom-built wrappers • Data • Human Brain Project sites • NPACI Neuroscience Thrust sites

  17. Work in Progress • Architecture • plug-in architecture for • domain knowledge sources • conceptual models from data sources • Functionality • better handling of large data • operations • expressive query language • operators for domain knowledge manipulation • query evaluation • query optimization using domain knowledge • Demonstration • at VLDB 2000

More Related