1 / 77

Finding knowledge, data and answers on the Semantic Web

Finding knowledge, data and answers on the Semantic Web. Tim Finin University of Maryland, Baltimore County http://ebiquity.umbc.edu/resource/html/id/223/ Joint work with Li Ding, Anupam Joshi, Cynthia Parr, Joel Sachs, Andriy Parafiynyk and Lushan Han.

dean-knight
Download Presentation

Finding knowledge, data and answers on the Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding knowledge, data and answers on the Semantic Web Tim Finin University of Maryland, Baltimore County http://ebiquity.umbc.edu/resource/html/id/223/ Joint work with Li Ding, Anupam Joshi, Cynthia Parr,Joel Sachs, Andriy Parafiynyk and Lushan Han  http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433

  2. This talk • Motivation • Semantic Web background • Swoogle Semantic Websearch engine • Use cases and applications • Social Semantic Web • Conclusions

  3. Google has made us smarter

  4. tell register But what about our agents? Agents still have a very minimal understanding of text and images.

  5. Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle tell register But what about our agents? A Google for knowledge on the Semantic Web is needed by software agents and programs

  6. This talk • Motivation • Semantic Web background • Swoogle Semantic Websearch engine • Use cases and applications • Social Semantic Web • Conclusions

  7. Brief history of the Semantic Web Tim Berners-Lee’s original 1989 WWW proposal described a web of relationships among namedobjects unifying many info. management tasks. • Guha’s MCF (~94) • XML+MCF=>RDF (~96) • Semantic Web coined (~97) • RDF+OO=>RDFS (~99) • RDFS+KR=>DAML+OIL (00) • W3C’s SW activity (01) • W3C’s OWL (03) • SPARQL (06) • Rules, RDFa, …. http://www.w3.org/History/1989/proposal.html

  8. Interest is high • Interest in industry, government and VCs is high • RDF is in Adobe’s products, Oracle 10g and 11g, Microsoft Vista, and Yahoo’s food portal • Several high-visibility startups use RDF • Joost (internet TV), Teranode (Bioinformatics), Garlik (personal info monitoring) • And, if you want more evidence that interest is high …

  9. $1795 $695CD Only

  10. PowerSet “NLP” Folksonomies “a smarter Google” XML Tags topic maps ad hocapproaches Microformats otherstructured Freebase Google Base What do we mean by “Semantic Web” SemanticWeb explicitsemantics KR based RDF+OWL

  11. RDF is the first SW language Graph XML Encoding RDF Data Model <rdf:RDF ……..> <….> <….> </rdf:RDF> Good For HumanViewing Good for MachineProcessing Triples stmt(docInst, rdf_type, Document) stmt(personInst, rdf_type, Person) stmt(inroomInst, rdf_type, InRoom) stmt(personInst, holding, docInst) stmt(inroomInst, person, personInst) Good For Reasoning • RDF is a simple language for building graph based representations • Grounded in web standards • With terms to support ontologies, description logic, rules and much of first order logic

  12. IMHO • Better NLP will help search engines, it’s a long term, incremental project • We need an well-defined and extensible representation system for explicit knowledge • It should be backed by open, non-proprietary standards supported by industry, Government and other interested parties • The W3C approach is not perfect • But “The perfect is the enemy of the good.” • “Semantic Web” vs. “semantic web”

  13. This talk • Motivation • Semantic Web background • Swoogle Semantic Websearch engine • Use cases and applications • Social Semantic Web • Conclusions

  14. http://swoogle.umbc.edu/ • Running since summer 2004 • 2.1M RDF docs, 420M triples, 10K ontologies,15K namespaces, 1.5M classes, 185K properties, 49M instances, 800 registered users

  15. Analysis … SWD classifier Ranking Index Search Services Semantic Web metadata IR Indexer Web Server Web Service SWD Indexer html rdf/xml Discovery the Web document cache SwoogleBot Semantic Web Candidate URLs Bounded Web Crawler Google Crawler human machine Legends Information flow Swoogle‘s web interface Swoogle Architecture

  16. A Hybrid Harvesting Framework true Swoogle Sample Dataset Submissions & pings Inductive learner would Seeds R Seeds M Seeds H Meta crawling Bounded HTML crawling RDF crawling google Google API call crawl crawl the Web

  17. This talk • Motivation • Semantic Web background • Swoogle Semantic Websearch engine • Use cases and applications • Social Semantic Web • Conclusions

  18. Applications and use cases Supporting Semantic Web developers • Ontology designers, vocabulary discovery, who’s using my ontologies or data?, use analysis, errors, statistics, etc. Searching specialized collections • Spire: aggregating observations and data from biologists • InferenceWeb: searching over and enhancing proofs • SemNews: Text Meaning of news stories Supporting SW tools • Triple shop: finding data for SPARQL queries 1 2 3

  19. 1

  20. 80 ontologies were found that had these three terms By default, ontologies are ordered by their ‘popularity’, but they can also be ordered by recency or size. Let’s look at this one

  21. Basic Metadata hasDateDiscovered:  2005-01-17 hasDatePing:  2006-03-21 hasPingState:  PingModified type:  SemanticWebDocument isEmbedded:  false hasGrammar:  RDFXML hasParseState:  ParseSuccess hasDateLastmodified:  2005-04-29 hasDateCache:  2006-03-21 hasEncoding:  ISO-8859-1 hasLength:  18K hasCntTriple:  311.00 hasOntoRatio:  0.98 hasCntSwt:  94.00 hasCntSwtDef:  72.00 hasCntInstance:  8.00

  22. Who uses this ontology and how do they access it?

  23. rdfs:range was used 41 times to assert a value. owl:ObjectProperty was instantiated 28 times time:Cal… defined once and used 24 times (e.g., as range)

  24. These are the namespaces this ontology uses. Clicking on one shows all of the documents using the namespace. All of this is available in RDF form for the agents among us.

  25. Here’s what the agent sees. Note the swoogle and wob (web of belief) ontologies.

  26. We can also search for terms (classes, properties) like terms for “person”.

  27. 10K terms associated with “person”! Ordered by use. Let’s look at foaf:Person’s metadata

  28. Metadata stored for a term is information about it’s definition – both what and by whom

  29. 10K terms associated with “person”! Ordered by use.

  30. How do other terms use foaf:Person? 100 documents assert that foaf:publication is a property of a foaf:Person

  31. 87K documents used foaf:gender with a foaf:Person instance as the subject

  32. 3K documents used dc:creator with a foaf:Person instance as the object

  33. Swoogle’s archive saves every version of a SWD it’s seen.

  34. 2 • An NSF ITR collaborative project with • University of Maryland, Baltimore County • University of Maryland, College Park • U. Of California, Davis • Rocky Mountain Biological Laboratory

  35. An invasive species scenario • Nile Tilapia fish have been found in a California lake. • Can this invasive species thrive in this environment? • If so, what will be the likelyconsequences for theecology? • So…we need to understandthe effects of introducingthis fish into the food webof a typical California lake

  36. Food Webs • A food web models the trophic (feeding) relationships between organisms in an ecology • Food web simulators are used to explore the consequences of changes in the ecology, such as the introduction or removal of a species • A locations food web is usually constructed from studies of the frequencies of the species found there and the known trophic relations among them. • Goal: automatically construct a food web for a new location using existing data and knowledge • ELVIS: Ecosystem Location Visualization and Information System

  37. East River Valley Trophic Web http://www.foodwebs.org/

  38. Species List Constructor Click a county, get a species list

  39. The problem • We have data on what species are known to be in the location and can further restrict and fill in with other ecological models • But we don’t know which of these the Nile Tilapia eats of who might eat it. • We can reason from taxonomic data (similar species) and known natural history data (size, mass, habitat, etc.) to fill in the gaps.

  40. Food Web Constructor Predict food web links using database and taxonomic reasoning. In an new estuary, Nile Tilapia could compete with ostracods (green) to eat algae. Predators (red) and prey (blue) of ostracods may be affected

  41. Evidence Provider

  42. Status • ELVIS(Ecosystem Location Visualization and Information System) as an integrated set of web services for constructing food webs for a given location. • Background ontologies • SpireEcoConcepts: concepts and properties to represent food webs, and ELVIS related tasks, inputs and outputs • ETHAN (Evolutionary Trees and Natural History) Concepts and properties for ‘natural history’ information on species derived from data in the Animal diversity web and other taxonomic sources. 250K classes on plants and animals • Under development • Connect to visualization software • Connect to triple shop to discover more data

  43. Supporting SW Tools 3 • Semantic Web applications can access Swoogle through a REST-based Web interface or via SQL. • Two examples: • A system to help scientists construct datasets from RDF documents on the Web • Tools to manage Semantic Web data in Blogs and other forms of social media

  44. UMBC Triple Shop • http://sparql.cs.umbc.edu/ • Online SPARQL RDF query processing with several interesting features • Automatically finds SWDs for give queries using Swoogle backend database • Datasets, queries and results can be saved, tagged, annotated, shared, searched for, etc. • RDF datasets as first class objects • Can be stored on our server or downloaded • Can be materialized in a database or(soon) as a Jena model

  45. What’s SPARQL? • SPARQL is the standard language (& protocol) for querying RDF graphs • Think: SQL for RDF PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?person ?name ?email FROM <http://rdf.example.org/people.rdf> WHERE { ?person a foaf:Person . ?person foaf:name ?name . OPTIONAL {?person foaf:mbox ?email} . }

  46. Who knows Anupam Joshi? Show me their names, email address and pictures

  47. The UMBC ebiquity site publishes lots of RDF data, including FOAF profiles

  48. No FROM clause! PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT DISTINCT ?p2name ?p2mbox ?p2pix FROM ??? WHERE { ?p1 foaf:surname "Joshi" . ?p1 foaf:firstName “Anupam" . ?p1 foaf:mbox ?p1mbox . ?p2 foaf:knows ?p3 . ?p3 foaf:mbox ?p1mbox . ?p2 foaf:name ?p2name . ?p2 foaf:mbox ?p2mbox . OPTIONAL { ?p2 foaf:depiction ?p2pix } . } ORDER BY ?p2name

More Related