350 likes | 383 Views
Metadata Modularization Concepts and Tools. Carl Lagoze CS502 2001-03-14. Metadata. Structured data about data…. Why is Metadata important?. Key to organizing, managing, preserving, and locating content and services in digital libraries. Cost Interoperability Syntax Semantics
E N D
Metadata ModularizationConcepts and Tools Carl Lagoze CS502 2001-03-14
Metadata Structured data about data….
Why is Metadata important? Key to organizing, managing, preserving, and locating content and services in digital libraries
Cost Interoperability Syntax Semantics Customizability Extensibility Distribution Integrity, Authenticity, Quality Human and Machine Factors Naming Why is Metadata difficult?
Metadata Thoughts • Metadata takes a variety of forms • descriptive cataloging • specialized • terms and conditions • administrative • content ratings • provenance • linkage
More Metadata Thoughts • New metadata sets will continually evolve • Many metadata sets are “community-specific” • administration • use • Human and machine use
Dublin Core • Metadata Set for Simple Resource Discovery • 15 elements allowing simple descriptive sentences about document like objects: • “Document has title Hamlet” • “Document has creator William Shakespeare” • “Document has subject love and anguish”
The Dublin Core 15 • Title • Creator • Subject /Keywords • Description • Publisher • Other Contributor • Date • Resource Type • Format • Resource Identifier • Source • Language • Relation • Coverage • Rights Management
A Scope for the Dublin Core • Increase or decrease number of elements? • Structured or Unstructured value syntax? • Accommodate community extensions?
Warwick Framework • Provide context for Dublin Core effort • Integrate multiple sets of metadata addressing issues of: • individual integrity • distinct audiences • separate realms of responsibility and management
Warwick Framework Design • Containers for aggregating … • Packages of typed metadata sets • General principles - information hiding: • only operation defined at container level returns sequence of contained packages • packages are opaque at the container level • access to package contents subject to terms and conditions
Package Types • Simple metadata set • segregating distinct metadata into separate packages • Recursive container • nesting semantically related metadata sets • Indirect reference • allowing distribution and sharing of metadata sets
Metadata Container Container Package Dublin Core Package MARC record Package Indirect Reference Package Terms and Conditions URI
Open Implementation Issues • Data encoding • Semantic interaction of overlapping sets • between semantically-related packages • between semantically distinct packages • Type registry
Modeling & Encoding Metadata Components: XML Namespaces • Prevent term clash: • record?, creator? • Establish concept spaces through URIs • xmlns:dc=“http://purl.org/dcxmlns:abc=“http://ilrt.ac.uk/abc<dc:creator>Herbert Van de Sompel</dc:creator><abc:organization>Cornell University</abc:organization>
Modeling & Encoding Metadata Components: RDF • RDF (Resource Description Format) • The instantiation of the Warwick Framework on the Web • Provides enabling technology for richly-structured metadata • Rich data model supporting notions of distinct entities and properties • Syntax expressed in XML
RDF Components • Formal data model • Syntax for interchange of data • Schema Type system (schema model)
RDF Data Model • Directed labeled graphs • Model elements • Resource • Property • Value • Statement • Containers
Resource Statement RDF Model Primitives Resource Property Value
dc: dc: RDF Syntax Example URI:R Title “CIMI Presentation” Creator “Eric Miller” <RDF xmlns = “http://www.w3.org/TR/WD-rdf-syntax#” xmlns:dc = “http://purl.org/dc/elements/1.0/”> <Description about = “URI:R”> <dc:Title> CIMI Presentation </dc:Title> <dc:Creator> Eric Miller </dc:Creator> </Description> </RDF>
dc: oa: URI:ERIC bib:Aff bib:Email bib:Name URI:OCLC “OCLC” “emiller@ oclc.org” “Eric Miller” RDF Model Example #2 URI:R Title “CIMI Presentation” Creator “Eric Miller”
RDF Syntax Example #2 <RDF xmlns = “http://www.w3.org/TR/WD-rdf-syntax#” xmlns:dc = “http://purl.org/dc/elements/1.0/” xmlns:bib = “http://www.bib.org/persons#”> <Description about = “URI:R”> <dc:Title> CIMI Presentation </dc:Title> <oa:Creator> <Description> <bib:Name> Eric Miller </bib:Name> <bib:Email> emiller@oclc.org </bib:Email> <bib:Aff resource = “http://www.oclc.org” /> </Description> </oa:Creator> </Description> </RDF>
RDF Containers • Permit the aggregation of several values for a property • Express multiple aggregation semantics • unordered • sequential or priority order • alternative
RDF Schemas • Declaration of vocabularies • properties defined by a particular community • characteristics of properties and/or constraints on corresponding values • Schema Type System - Basic Types • Property, Class, SubClassOf, Domain, Range • Minimal (but extensible) at this time • minimize significant clashes with typing system designed for XML Schema WG • Expressible in the RDF model and syntax
Relationships among vocabularies dc:Creator marc:100 ms:director bib:Author
Bringing it together RDF Data Model • Support consistent encoding, exchange and processing of metadata… critical when aggregating data from multiple sources • RDF Schema • Declare, define, reuse vocabularies • RDF Metadata transmission • XML encoding
MARC Dublin Core IMS INDECS Interoperability among Metadata Vocabularies - projections to application-specific metadata vocabularies core classes
subject implied verb metadata noun literal metadata adjective Playwright “Shakespeare” dc:creator.playwright R1 dc:title “Hamlet” Attribute/Value approaches to metadata… The playwright of Hamlet was Shakespeare Hamlet has a creator Shakespeare
Hamlet has a creator Stratford birthplace “Shakespeare” dc:creator.playwright R1 dc:creator.birthplace “Stratford” …run into problems for richer descriptions… The playwright of Hamlet was Shakespeare,who was born in Stratford Hamlet has a creator Shakespeare
…because of their failure to model entity distinctions “Shakespeare” name R1 R2 creator birthplace title “Stratford” “Hamlet”
Understanding Metadata based on Query Capabilities • Simple boolean tags? • Agent, time, place questions? • Who was responsible for what and when
Applying a Model-Centric Approach • Formally define common entities and relationships underlying multiple metadata vocabularies • Describe them (and their inter-relationships) in a simple logical model • Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.
Events are key to understanding metadata relationships? • Recognizing inherent lifecycle aspects of digital content - transformation of “input” resources to “output” resources and of their descriptions. (e.g., IFLA model) • Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles. • Clarifying attachment points facilitates mapping across common entities in different vocabularies.