530 likes | 690 Views
Tearing down walls and Building bridges. Principles and pragmatics of a Semantic Culture Web. Overview. Virtual collections and Semantic Web Semantic collection-search demonstrator For cultural heritage objects Metadata & vocabulary representation and enrichment
E N D
Tearing down walls and Building bridges Principles and pragmatics of a Semantic Culture Web
Overview • Virtual collections and Semantic Web • Semantic collection-search demonstrator • For cultural heritage objects • Metadata & vocabulary representation and enrichment • Principles for knowledge engineering on the Web
Acknowledgements • Part of large Dutch knowledge-economy project MultimediaN • Partners: VU, CWI, UvA, DEN,ICN • People: Alia Amin, Lora Aroyo, Mark van Assem, Victor de Boer, Lynda Hardman, Michiel Hildebrand, Laura Hollink, Marco de Niet, Borys Omelayenko, Marie-France van Orsouw, Jacco van Ossenbruggen, Guus SchreiberJos Taekema, Annemiek Teesing,Anna Tordai, Jan Wielemaker, Bob Wielinga • Artchive.com, Rijksmuseum Amsterdam, Dutch ethnology musea (Amsterdam, Leiden), National Library (Bibliopolis)
Hypothesis • Semantic Web technology is in particular useful in knowledge-rich domains or formulated differently • If we cannot show added value in knowledge-rich domains, then it may have no value at all
The Web: resources and links Web link URL URL
The Semantic Web: typed resources and links Painting “Woman with hat SFMOMA Dublin Core creator ULAN Henri Matisse Web link URL URL
Principle 1: semantic annotation • Description of web objects with “concepts” from a shared vocabulary
Search for objects which are linked via concepts (semantic link) Use the type of semantic link to provide meaningful presentation of the search results Principle 2: semantic search Query “Paris” Paris PartOf Montmartre
The myth of a unified vocabulary • In large virtual collections there are always multiple vocabularies • In multiple languages • Every vocabulary has its own perspective • You can’t just merge them • But you can use vocabularies jointly by defining a limited set of links • “Vocabulary alignment” • It is surprising what you can do with just a few links
AAT style/period Edo (Japanese period) Tokugawa SVCN period Edo SVCN is local in-house ethnology thesaurus AAT is Getty’s Art & Architecture Thesaurus Principle 3: vocabulary alignment “Tokugawa”
Levels of interoperability • Syntactic interoperability • using data formats that you can share • XML family is the preferred option • Semantic interoperability • How to share meaning / concepts • Technology for finding and representing semantic links
Distributed vs. centralized collection data • Minimal requirement: collection object has image URI • Preference for external metadata, accessed through protocol such as OAI • In practice, external metadata access is still cumbersome
Search strategies • Basic search: keyword-oriented • Advanced search: • Tweaking default search parameters • Time-related queries • Faceted search • Relation search • How are two URIs related?
Keyword search with semantic clustering • Btree of literals plus Porter stem and metaphone index • Find resources with matching labels • Default resources are “Work”s • Find related resources by one-way graph traversal • owl:inverseOf is used • Threshold used for constraining search • Cluster results (group instances)
Search: WordNet patterns that increase recall without sacrificing precisions
Term disambiguation is key issue in semantic search • Post-query • Sort search results based on different meanings of the search term • Mimics Google-type search • Pre-query • Ask user to disambiguate by displaying list of possible meanings • Interface is more complex, but more search functionality can be offered
Faceted search • Use Dublin Core scheme to formulate complex queries • Navigate through relevant metadata
Faceted search Faceted search
What do you need to do to make your collection part of a Semantic Culture Web? Four activities
From metadata to semantic metadata 1. Make vocabulary interoperable 2. Align metadata schema 3. Enrich metadata 4. Align vocabulary
Activity 1: syntactic vocabulary interoperability • Making vocabularies available in the Web standard RDF • Many organizations already do this • W3C provides the SKOS template to make this almost straightforward • Effort required: at most a few days
Semantic relation:broader and narrower • No subclass semantics assumed!
Activity 2: aligning the metadata schema • Specify your collection metadata scheme as a specialization of Dublin Core • With RDF/OWL this is easy/trivial! • Cf. DC Application Profiles
Aligning VRA with Dublin Core • VRA is specialization of Dublin Core for visual resources • VRA properties “material.medium” and “material.support” are specializations of Dublin Core property “format” vra:material.medium rdfs:subPropertyOf dc:fotmat . vra:material.medium rdfs:subPropertyOf dc:format .
Activity 3: enriching the metadata • Extracting additional concepts from an annotation • Matching the string “Paris” to a vocabulary term • Information-extraction techniques exists (and continue to be developed) • Effort required can be up to a few weeks • The more concepts, the better, but no need to be perfect!
Regular HTML HTML with RDFa Resulting RDF statements RDFa: embedding RDF in (X)HTML
Activity 4: aligning the vocabulary • Find semantic links between vocabulary links • Derain (ULAN) related-to Fauve (AAT)) • Automatic techniques exists, but performance varies • Often combination of automatic and manual alignment • Effort strongly dependent on vocabularies • But “a little semantic goes a long way” (Hendler)
Learning alignments • Learning relations between art styles in AAT and artists in ULAN through NLP of art historic texts • “Who are Impressionist painters?”
Principle 1: Be modest! • Ontology engineers should refrain from developing their own idiosyncratic ontologies • Instead, they should make the available rich vocabularies, thesauri and databases available in web format • Initially, only add the originally intended semantics
Principle 2: Think large! Doug Lenat "Once you have a truly massive amount of information integrated as knowledge, then the human-software system will be superhuman, in the same sense that mankind with writing is superhuman compared to mankind before writing."
Principle 3: Develop and use patterns! • Don’t try to be (too) creative • Ontology engineering should not be an art but a discipline • Patterns play a key role in methodology for ontology engineering • See for example patterns developed by the W3C Semantic Web Best Practices group http://www.w3.org/2001/sw/BestPractices/ • SKOS can also be considered a pattern
Principle 4: Don’t recreate, but enrich and align • Techniques: • Learning ontology relations/mappings • Semantic analysis, e.g. OntoClean • Processing of scope notes in thesauri