1 / 59

Professor Carole Goble University of Manchester mygrid.uk

Part 4: Semantics and Metadata Semantic publication and discovery Provenance metadata Semantic Web and the Grid. Professor Carole Goble University of Manchester http://www.mygrid.org.uk. Registries. Workflow. Information. mIRs. Resources. Service. Virtual organisations and (Re)use.

jalena
Download Presentation

Professor Carole Goble University of Manchester mygrid.uk

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part 4: Semantics and MetadataSemantic publication and discoveryProvenance metadataSemantic Web and the Grid Professor Carole Goble University of Manchester http://www.mygrid.org.uk GGF Summer School 24th July 2004, Italy

  2. Registries Workflow Information mIRs Resources Service Virtual organisations and (Re)use Service & Platform Administrators Bioinformaticians Service Providers Annotation providers Biologists Tool & middleware developers GGF Summer School 24th July 2004, Italy

  3. Finding and selecting services Activation energy gradient Unregistered services • Scavenging • URLs and Soaplab endpoints • Introspection Registered services • Word-based searching • Semantic annotation for later discovery and (re)use by friends and strangers in your VO (Part 3) Drag and drop services onto Taverna workbench GGF Summer School 24th July 2004, Italy

  4. Registry View Service • Registry • Third party registries • Third party services • Third party annotation (RDF) • Views over federated registries • UDDI interfaces extended with RDF • Federated views • Updated via Notification Service • Personalized based on Annotation • Authorisation and IPR GGF Summer School 24th July 2004, Italy

  5. Semantic discovery • User chooses services • A common ontology is used to annotate and query any myGrid object including services. • Discover workflows and services described in the registry via Taverna. • Look for all workflows that accept an input of semantic type nucleotide sequence • Aim to have semantic discovery over public view on the Web. GGF Summer School 24th July 2004, Italy

  6. Workflow and service annotation • Adding structured metadata to a workflow registration to enable others to discover and reuse it more effectively. E.g. what semantic type of input does it accept. GGF Summer School 24th July 2004, Italy

  7. Can you guess what it is yet? GGF Summer School 24th July 2004, Italy

  8. Service Registration http://pedro.man.ac.uk GGF Summer School 24th July 2004, Italy

  9. Semantic Discovery • Drag a workflow entry into the explorer pane and the workflow loads. • Drag a service/ workflow to the scavenger window for inclusion into the workflow GGF Summer School 24th July 2004, Italy

  10. Annotation Service Providers Ontologists Others Ontology Store Description extraction WSDL Interface Description Vocabulary Soap- lab Pedro Annotation tool Annotation providers Annotation/ description Taverna Workbench Registry (Personalised View) Registry Registry plug-in Registry GGF Summer School 24th July 2004, Italy

  11. Annotation Ontologists Ontology Store Vocabulary Haystack Provenance Browser Pedro Annotation tool Annotation providers Annotation/ description Scientists Taverna Workbench mIR Store plug-in GGF Summer School 24th July 2004, Italy

  12. Feta plug-in Registry plug-in Service Providers Ontology Store Ontologists Others Vocabulary WSDL Feta Semantic Discovery Soap- lab Bioinformaticians Registry Taverna Workbench Registry (Personalised View) Registry Registry Workflow Execution FreeFluo WfEE invoking mIR Store data & metadata GGF Summer School 24th July 2004, Italy

  13. Layered Semantics • Domain Semantics layered on top of domain neutral but scientific data model • Reducing the activation energy, lowering barriers of entry. Domain Semantics Ontologies Data Metadata IMv2 Workflow metadata Experiment Semantics Format XSD types MIME types Service Metadata Provenance metadata Syntax Workflow OGSA-DQP GGF Summer School 24th July 2004, Italy

  14. Model of services Operation name, description task method resource application Service name description authororganisation Parameter name, description semantic type format transport type collection type collection format hasInput hasOutput subclass subclass WSDL based operation WSDL based Web service workflow bioMoby service Soaplab service Local Java code GGF Summer School 24th July 2004, Italy

  15. Task Service class Specific services IBM Life Sciences service setProgram() BLAST SOAPLAB service createJob() setDatabase() BLAST Sequence similarity search BLASTservice run() or setE_value() getResults() blastQuery() Tiered specifications Classes of services Domain “semantic” “Unexecutable” “Potentials” Instances of services Business “operational” “Executable” “Actuals” GGF Summer School 24th July 2004, Italy

  16. Matrix of metadata in workflow lifecycle GGF Summer School 24th July 2004, Italy

  17. Stratified metadata • Service Type and Class (OWL) • Service Instance (RDF) GGF Summer School 24th July 2004, Italy

  18. Scufl URI Workflow registry entry RDF Store Workfllow Executive Summary Descriptions Inputs, Outputs, Tasks, Component resources Invokable Interface descriptions e.g. XML data types stored WSDL Syntactic descriptions e.g. MIME types RDF Conceptual descriptions OWL encoded Operational Descriptions Cost, QoS Access rights… OWL/RDF Provenance Descriptions Authors, creation date, institution… Service and Workflow registration • Description scheme • RDFS & DAML+OIL / OWL ontologies of services & biology • Based on DAML-S • Reasoning over OWL descriptions • Query over RDF • Aim to have semantic discovery over public view on the web. Workflow registration allows peer review and publication of e-Science methods. GGF Summer School 24th July 2004, Italy

  19. Service Ontology Suite parameters: input, output, precondition, effect performs_task uses-resource is_function_of Upper level ontology Inspired by DAML-S Informatics ontology Molecularbiology ontology Publishing ontology Organisationontology Task ontology Bioinformatics ontology Web serviceontology Current work: Joint development on an Open Biological Ontologies BioService Ontology. http://obo.sourceforge.net/ GGF Summer School 24th July 2004, Italy

  20. Reflections • Adverts for services and workflows turns out to be tricky • Describing different executable objects • Workflows and Services • Stratification of metadata • Classes and Instances of services and workflows • Service execution • Complex state based invocation models • Parametric polymorphism of services • Executable process models vs discovery process models • Multi-dimensions of service composition. GGF Summer School 24th July 2004, Italy

  21. Reflections • Multiple descriptions, multiple interfaces • Users needs vs machine needs • The dimensions of Service Class substitution • Biologists choose experimentally meaningful services and do not want “semantically similar” substitutions; only substituting one instance for another • Experimentally neutral “glue” services that can be substituted are comparatively few • If users are choosing services you don’t need many kinds of metadata to eliminate 90% of options. GGF Summer School 24th July 2004, Italy

  22. Reuse and Repurposing • Describing for reuse is challenging • Reuse depends on semantic descriptions and these are costly to produce • Describing for someone else’s benefit • Reuse by multiple stakeholders • Licensing workflows for reuse. • Authorisation models • But reuse does happen! • Metadata pays off but it needs a network effect and there is a cost. GGF Summer School 24th July 2004, Italy

  23. So far, Using Concepts • Controlled vocabulary for advertisements for workflows and services • Indexes into registries and mIR • Semantic discovery of services and workflows • Semantic discovery of repository entries • Type management for composition • Semantic workflow construction: guidance and validation • Navigation paths between data and knowledge holdings • Semantic “glue” between repository entries • Semantic annotation and linking of workflow provenance logs GGF Summer School 24th July 2004, Italy

  24. Part 4: Semantics and MetadataSemantic publication and discoveryProvenance metadataSemantic Web and the Grid GGF Summer School 24th July 2004, Italy

  25. Provenance Experiments being performed repeatedly, at different site, different time, by different users or groups; A large repository of records about experiments!! • verification of data; • “recipes” for experiment designs; • explanation for the impact of changes; • ownership; • performance of services; • data quality; Scientists In silico experiments: GGF Summer School 24th July 2004, Italy

  26. Genomic Project data1 WSDL serviceInvocation1 data2 data3 dataAnother serviceInvocation2 data4 Process provenance Data provenance Organisation provenance Knowledge provenance Provenance Web GGF Summer School 24th July 2004, Italy

  27. Representing links urn:lsid:taverna.sf.net:datathing:45fg6 urn:lsid:taverna.sf.net:datathing:23ty3 • Identify each resource • Life science identifier: URI with associated data and metadata retrieval protocols. • Understanding that underlying data will not change GGF Summer School 24th July 2004, Italy

  28. Representing links II http://www.mygrid.org.uk/ontology#derived_from urn:lsid:taverna.sf.net:datathing:45fg6 urn:lsid:taverna.sf.net:datathing:23ty3 • Identify link type • Again use URI • Allows us to use RDF infrastructure • Repositories • Ontologies GGF Summer School 24th July 2004, Italy

  29. Knowledge Level Data Level Organisation Level Provenance Pyramid Process Level GGF Summer School 24th July 2004, Italy

  30. Organisation level provenance Process level provenance Service Project runBye.g. BLAST @ NCBI Experiment design Process Workflow design componentProcesse.g. web service invocation of BLAST @ NCBI partOf Event instanceOf componentEvente.g. completion of a web service invocation at 12.04pm Workflow run Data/ knowledge level provenance knowledge statementse.g. similar protein sequence to run for User can add templates to each workflow process to determine links between data items. Data item Person Organisation Data item Data item GGF Summer School 24th July 2004, Italy data derivation e.g. output data derived from input data

  31. ..masked_sequence_of .. nucleotide_sequence project ..part_of organisation >gi|19747251|gb|AC005089.3| Homo sapiens BAC clone CTA-315H11 from 7, complete sequence AAGCTTTTCTGGCACTGTTTCCTTCTTCCTGATAACCAGAGAAGGAAAAGATCTCCATTTTACAGATGAG GAAACAGGCTCAGAGAGGTCAAGGCTCTGGCTCAAGGTCACACAGCCTGGGAACGGCAAAGCTGATATTC AAACCCAAGCATCTTGGCTCCAAAGCCCTGGTTTCTGTTCCCACTACTGTCAGTGACCTTGGCAAGCCCT GTCCTCCTCCGGGCTTCACTCTGCACACCTGTAACCTGGGGTTAAATGGGCTCACCTGGACTGTTGAGCG experiment definition rdf:type ..part_of group urn:lsid:taverna:datathing:13 ..part_of ..author workflow definition ..works_for ..invocation_of ..author person ..BLAST_Report workflow invocation ..similar_sequences_to ..run_for ..run_during service description rdf:type 19747251 AC005089.3 831 Homo sapiens BAC clone CTA-315H11 from 7, complete sequence 15145617 AC073846.6 815 Homo sapiens BAC clone RP11-622P13 from 7, complete sequence 15384807 AL365366.20 46.1 Human DNA sequence from clone RP11-553N16 on chromosome 1, complete sequence 7717376 AL163282.2 44.1 Homo sapiens chromosome 21 segment HS21C082 16304790 AL133523.5 44.1 Human chromosome 14 DNA sequence BAC R-775G15 of library RPCI-11 from chromosome 14 of Homo sapiens (Human), complete sequence 34367431 BX648272.1 44.1 Homo sapiens mRNA; cDNA DKFZp686G08119 (from clone DKFZp686G08119) 5629923 AC007298.17 44.1 Homo sapiens 12q22 BAC RPCI11-256L6 (Roswell Park Cancer Institute Human BAC Library) complete sequence 34533695 AK126986.1 44.1 Homo sapiens cDNA FLJ45040 fis, clone BRAWH3020486 20377057 AC069363.10 44.1 Homo sapiens chromosome 17, clone RP11-104J23, complete sequence 4191263 AL031674.1 44.1 Human DNA sequence from clone RP4-715N11 on chromosome 20q13.1-13.2 Contains two putative novel genes, ESTs, STSs and GSSs, complete sequence 17977487 AC093690.5 44.1 Homo sapiens BAC clone RP11-731I19 from 2, complete sequence 17048246 AC012568.7 44.1 Homo sapiens chromosome 15, clone RP11-342M21, complete sequence 14485328 AL355339.7 44.1 Human DNA sequence from clone RP11-461K13 on chromosome 10, complete sequence 5757554 AC007074.2 44.1 Homo sapiens PAC clone RP3-368G6 from X, complete sequence 4176355 AC005509.1 44.1 Homo sapiens chromosome 4 clone B200N5 map 4q25, complete sequence 2829108 AF042090.1 44.1 Homo sapiens chromosome 21q22.3 PAC 171F15, complete sequence urn:lsid:taverna:datathing:15 service invocation ..described_by ..created_by ..filtered_version_of A B Provenance tracking • Automated generation of this web of links • Workflow enactor generates • LSIDs • Data derivation links • Knowledge links • Process links • Organisation links Relationship BLAST report has with other items in the repository Other classes of information related to BLAST report GGF Summer School 24th July 2004, Italy

  32. Haystack (IBM/MIT) GenBank record Portion of the Web of provenance Managing collection of sequences for review GGF Summer School 24th July 2004, Italy

  33. GGF Summer School 24th July 2004, Italy

  34. Reflections • Visualisation of results usually domain specific • Provenance browsing and querying needs to fit with that visualisation • Generic graphical presentation limited to small, low complexity result sets • Layered provenance for different purposes and different stakeholders • Detailed process for debugging and usage statistics for QoS • Data and Knowledge for the Scientist • Migration with data objects • Versioning • Using provenance to its maximum potential GGF Summer School 24th July 2004, Italy

  35. OWL Ontologies mapping between objects LSID HTML XML XML URI LSID XML RDF PDF Map of Context Literature relevant to provenance study or data in this workflow Provenance record of a workflow run Interlinking graph of the workflow that generates the provenance logs Web page of people who has related interests as the owner of the workflow Experiment Notes GGF Summer School 24th July 2004, Italy

  36. URI LSID LSID URI LSID metadata Provenance metadata • Outside objects • RDF store • Within objects • LSID metadata. GGF Summer School 24th July 2004, Italy

  37. Linked Provenance Resources The subsumed concepts Link to the log annotated with more general concept The subsuming concepts Link to the log annotated with more specific concept GGF Summer School 24th July 2004, Italy

  38. Generating Links The concept The generated Link to related provenance document The name of the data GGF Summer School 24th July 2004, Italy

  39. Semantics Ontology-aided workflow construction • RDF-based service and data registries • RDF-based metadata for ALL experimental components • RDF-based provenance graphs • OWL based controlled vocabularies for database content • OWL based integration of experiment entities RDF-based semantic mark up of results, logs, notes, data entries GGF Summer School 24th July 2004, Italy

  40. Standards • By tapping into (defacto) standards (LSID, RDF, WS-I) and communities we can leverage others results and tools • Haystack, Pedro, Jena, CHEF/Sakai. • The Grid standards are confusing and volatile • The choice of vanilla Web Services was good. • We didn’t jump to OGSI. We won’t jump to WSRF until its necessary. • And workflow standards have been untimely. GGF Summer School 24th July 2004, Italy

  41. Controlling contents of metadata and data Ontologies Describing & Linking Provenance records Resource annotations Change & event Notification topics Role of Ontologies Service matching and provisioning Composing and validating workflows and service compositions & negotiations Service & resource registration & discovery Help Knowledge-based guidance and recommendation Schema mediation GGF Summer School 24th July 2004, Italy

  42. Part 4: Semantics and MetadataSemantic publication and discoveryProvenance metadataSemantic Web and the Grid GGF Summer School 24th July 2004, Italy

  43. A pioneer of the… The Semantic Grid is an extension of the current Grid in which information and services are given well-defined and explicitly represented meaning, better enabling computers and peopleto work in cooperation Semantics in and on the Grid GGF Summer School 24th July 2004, Italy

  44. The semantics of knowledge • Semantic Grids • Grids and Grid middleware that makes use of semantics for its installation, deployment, running etc. • I.e. Semantics IN the Grid FOR the Grid. • Knowledge Grids • A virtual knowledge base derived by using the Grid resources, in the same spirit as a data grid is a virtual data resource and a compute grid a virtual computer. Knowledge Grids include services for knowledge mining. • I.e Semantics ON the Grid arising from the USE of the Grid. GGF Summer School 24th July 2004, Italy

  45. Scientific Applications Scientists Grid platform and resources Grid Middleware Security policies standards Service Computer Scientists Providers Knowledge Stakeholders Knowledge for the Grid Application Semantics for the Grid Sources of Knowledge GGF Summer School 24th July 2004, Italy

  46. “The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. It is based on the idea of having data on the Web defined and linked such that it can be used for more effective discovery, automation, integration, and reuse across various applications.” Hendler, J., Berners-Lee, T., and Miller, E. Integrating Applications on the Semantic Web, 2002, http://www.w3.org/2002/07/swint. GGF Summer School 24th July 2004, Italy

  47. Big Vision The Web today is: • A hypermedia digital library • Collection of linked web pages • Ubiquitous interface to applications • Amazon.com • A platform for multimedia • BBC Radio 4 in my room! • A naming scheme • Unique identity for resources A place where people do the work, filtering, linking and interpreting. Computers do the presentation. Why not make the computers do the work? From machine readable resources for humans to computable resources for machines GGF Summer School 24th July 2004, Italy

  48. Expose the meaning of resources by assertions in a common data model… • Publish and share consensually agreed ontologies so we can share the metadata and add in background knowledge • Then we can query, filter, integrate and aggregate the metadata … • and reason over it to infer more metadata using rules … • and attribute trust to the metadata. hasvenue http://www.marriott.com/epp/... http://www.amia.org/meetings/... haslocation organisedby event conference hotel period haslocation Washington http://www.amia.org/about/ dates city locatedin locatedin USA country GGF Summer School 24th July 2004, Italy

  49. On demand transparently constructed multi-organisational federations of distributed services Distributed computing middleware Computational Integration Sharing Resources Infrastructure enablers for e-Research Grid Computing Semantic Web • An automatically processable, machine understandable web • Distributed knowledge and information management • Information integration • Sharing information GGF Summer School 24th July 2004, Italy

  50. Semantic Web layers Trust p -> a; p=a p -> a; p=a Rules p -> a; p=a p -> a; p=a p -> a; p=a Agents Ontologies Metadata Annotation Search engines and filters Web Applications Deep web GGF Summer School 24th July 2004, Italy

More Related