740 likes | 1.1k Views
Dublin Core and metadata: a tutorial. Lorcan Dempsey Andy Powell UKOLN, University of Bath (with a little help from our friends) http://www.ukoln.ac.uk/metadata. Questions for you. Metadata EAD, CIMI, TEI PICS, XML, RDF MARC 856 Dublin Core you are geeks/people with sensible shoes
E N D
Dublin Core and metadata:a tutorial Lorcan Dempsey Andy Powell UKOLN, University of Bath (with a little help from our friends) http://www.ukoln.ac.uk/metadata
Questions for you ... • Metadata • EAD, CIMI, TEI • PICS, XML, RDF • MARC • 856 • Dublin Core • you are • geeks/people with sensible shoes • goers/doers
Overview • UKOLN and metadata • Metadata landscape • Dublin Core • Metadata management • Interoperability • Harvesting • Future
ROADS subject gateways WHOIS++ templates BIBLINK CIP for electronic data Dublin Core (+ MARC) Desire WHOIS++, GILS, Dublin Core Z39.50/WHOIS++ NewsAgent current awareness, Ariadne Dublin Core, DC-dot MODELS collection description?? Agora PRIDE Initiatives UKOLN and metadata
What is metadata …? • It’s just cataloguing, isn’t it … ? • Yes and no … • Data which supports operations carried out on information objects … • discover, buy, ... • In the company of strangers (Brody) • Relieve user of having to have full advance knowledge of characteristics of resources … … variety
Libraries Picture by Stu Weibel Metadata model: the library example Semantics, syntax, content MARC, ISO 2709, AACR2 MARC AACR2
Commerce Home Pages Libraries Geospatial Internet Commons Scientific Data Museums Whatever... Picture by Stu Weibel Variety of formal and informal metadata models
Discovery Location Selection fit for use Acquire terms Manipulate Exploit IPR Document Contextualise Preserve Manage dates, people, structures, … Agent/client access …. Variety of operations ...
Variety of sectors ... • Curatorial traditions • ‘cataloguing’/documentation • libraries, archives, text archives, museums, geospatial data, etc • Network resource discovery • directory services, search engines, etc • influence from computer science • Network information management • web developments, W3C, database • sitemap, time to live, ... • pragmatic - market needs, vendor push
Variety of creation models ... • Author/creator • web pages? • Repository/site manager • effective disclosure • better management • Third party creator • e.g. eLib subject gateways • Library
Metadata ... • Variety of metadata models • syntax, semantics, content • scope • sectors/domains • Variety of operations supported • Variety of creation models • Variety of architectures for disclosure/discovery • Search and retrieve • Disclosure/distribution • Management … complex
Some formats richer… semantics, structure, domain-specific, ...
FGDC MARC Museum ... Dublin Core Dublin Core • Metadata model • Simple element set • focus on semantics - several target syntaxes • Operations • resource discovery on the web • Explicitly cross sector/domain • No constraint on creation model or application architecture … simple and intuitive
Dublin core - why success? • Simple • Coincides with strategic needs in each of sectors we identified • Curatorial: semantic interoperability between richer metadata models • Resource discovery: a simple format for descriptive metadata (DLOs) • Web management: associate metadata with Web resources • Inclusive (countries/domains/traditions) • Stu Weibel
Title Subject Description Creator Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights Dublin Core - elements • 15 element core metadata set
Dublin Core - HTML Example <HTML><HEAD> <TITLE>UKOLN Home Page</TITLE> <META NAME="DC.Title” CONTENT="UKOLN: UK Office for Library and Information Networking"> <META NAME="DC.Subject" CONTENT="national centre, network information support, library community, awareness, research, information services, public library networking, bibliographic management, distributed library systems, metadata, resource discovery, conferences, lectures, workshops"> <META NAME="DC.Description" CONTENT="UKOLN is a national centre for support in network information management in the library and information communities. It provides awareness, research and information services"> <META NAME="DC.Creator" CONTENT=”Isobel Stark"> </HEAD> ...
Data creation Practical issues of using Dublin Core for Internet resource description... • UKOLN metadata system • Requirements • 3 models for metadata management • Implementation at UKOLN
UKOLN metadata system requirements • Easy to use • Work with a variety of methods of creating HTML • Simple migration to future metadata formats • Separate metadata from resource
Pros… Simple May be useful for training and familiarisation Cons… May not be possible with all editors Maintenance problems Easy to make errors Managing Dublin Core (1)HTML Authoring tool Embed by hand using HTML or text editor
DC-dot • A Web based tool for creating Dublin Core <meta> tags • Automatic generation of some tags based on content of the resource • Forms based editing of tags • Cut-and-paste output into HTML • Conversion to other formats… • SOIF, ROADS/WHOIS++, USMARC, GILS... Run demo http://www.ukoln.ac.uk/metadata/dcdot/
Pros… Use of Web-site management tools likely to increase Object-oriented database approach Cons… Proprietry formats Early days - too early to evaluate use for metadata yet? Managing Dublin Core (2)Web-site management tool Use Web-site management tool, for example NetObjects Fusion
Pros… Separates metadata from resource Future migration fairly simple Cons… Performance Lack of integration with HTML tools Server specific Managing Dublin Core (3)On the fly generation Hold Dublin Core separately and embed on-the-fly using server-side include (SSI)
UKOLN metadata system (1) • Embed on-the-fly • Apache SSI script • Store metadata using SOIF records • Use MS-Access as tool to create the records • Associate metadata with resource by co-locating them in the Web server filestore
UKOLN metadata system (2) intro.html Apache syntax for calling server-side script <!--#exec cmd="getmeta" --> <html> <head> <title>…</title> <!--#exec cmd="getmeta" --> </head> ... HTML editor intro.html.soif @FILE { http://www.ukoln.ac. ... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel ... } MS-Access Database
UKOLN metadata system (3) MS-Access front end... Filename browser Text boxes Name choosers UKOLN specific metadata
UKOLN metadata system (4) intro.html Web robot <html> <head> <title>…</title> <!--#exec cmd="getmeta" --> </head> ... 1 2 UKOLN Web server 6 intro.html.soif @FILE { http://www.ukoln.ac. ... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel ... } 3 4 SSI script 5
Issues • Performance • Interaction with Web caches • Dublin Core vs Alta Vista style metadata <META NAME=”Description” CONTENT=”blah, blah"> <META NAME="Keywords” CONTENT="xxx, yyy, zzz"> • Granularity • Which pages should have metadata?
A short history:Dublin to Helsinki We have borrowed some of this material from Stu Weibel, with permission
Dublin Core Workshop Series .. • DC-1: OCLC/NCSA Metadata Workshop Mar, 1995 • Limited Scope: Discovery of document-like objects • 13 element Dublin Core • Interdisciplinary consensus • DC-2: OCLC/UKOLN Warwick Workshop April, 1996 • Warwick Framework - modularity • Syntax issues
.. Dublin Core Workshop Series • DC-3: CNI/OCLC Image Metadata Workshop, Sep, 1996 • Images are in scope • 15 element core; some element name changes • DC-4: Canberra Metadata Workshop Mar, 1997 • Minimalists and Structuralists • Canberra Qualifiers (additional information useful for interpretation of metadata)
Dublin core - qualifiers • Language of element value • Scheme • specifies a context for interpretation <META NAME=“DC.Subject” SCHEME=“ddc.21” CONTENT=“170.42”> • Sub-element • specifies a facet - narrows <META NAME="DC.Creator.Address" CONTENT=“l.dempsey@ukoln.ac.uk">
DC-5 • DC-5: National Library of Finland/OCLC Workshop, October 1997 • Formal Data Model (expressed in RDF) • many other problems are hereby made simpler • Resource Description Framework • The return of modularity • Finnish finish (of unqualified DC) • minimalist DC is done and will not be changed • Semantics for additional sub-structure • a small number of sub-elements will be established • Closer DC-W3C collaboration
Data Model date, relationship, source what is a resource? 1:1 RDF Relationships Typology Sub-elements Date Working groups
RFCs in preparation • Simple DC semantics (the minimalist position) • Simple DC syntax for embedded HTML • DC semantics with qualifiers • DC syntax with qualifiers • HTML 2.0 • HTML 4.0 • RDF
Projects • 30 projects; 10 countries http://purl.org/metadata/dublin_core/projects.html • “Interdisciplinary and international recognition as the lingua franca for resource discovery metadata for electronic resources” Stu Weibel • Support for use for non-digital objects
The HTML 2.0 “kludge” • Convention for simple embedded metadata • Bootstrapping early Dublin Core deployments • META tags and standard HTML syntax • Useful for simple metadata without qualifiers • Can support Dublin Core qualifiers, but with risks for interoperability and indexing purity • <META NAME="DC.Subject" CONTENT="(SCHEME=LCSH) • Information technology -- higher education">
HTML 4.0 - DC influences the web • Richer <META> tag attributes • LANG (language of the metadata) • SCHEME (formal qualifier) • SUB-ELEMENTS (dot syntax extensions) • Allows syntactically “clean” implementation of metadata with qualifiers <META NAME="DC.Subject" SCHEME="LCSH" CONTENT="Information technology -- higher education">
Information provided by Dave Beckett Information provided by Sigfrid Lundburg Some quick statistics • UK (academic sites only) • Total pages: ~1.5M (a guess!) • Embedded DC: ‘a few hundred’ http://www.cs.ukc.ac.uk/people/staff/djb1/ • Sweden • Total pages: 1.4M • Embedded DC: ‘a few dozen’ http://www.lub.lu.se/nwiPaper/
Interoperability • What do we mean by interoperability? • Issues • Z39.50 and Dublin Core • Metadata registries
In real life these can all get mixed up Interoperability? • Unify access to data in different domains - Web, library, museums, archives, ... • Issues • Protocols - Z39.50, WHOIS++, … • gateways • Attribute names - author/creator/... • Semantic interoperability - mapping tables • Format of results • format converters
Protocol Gateways - an example • ZEXI - a Z39.50 to WHOIS++ gateway • Based on CNIDR's Isite • Accepts Z39.50 searches • Converts them to WHOIS++ • Returns SUTRS records http://roads.ukoln.ac.uk/cgi-bin/egwcgi/egwirtcl/targets.egw
Attribute names • Different databases may use different ‘names’ for the same thing • ‘creator’ vs ‘author’ • Need to be able to construct searches that ‘work’ against different databases irrespective of the ‘names’ in use • Dublin Core provides a minimal set of agreed ‘names’ with which we can construct searches
Format of results • Different databases may return results in different formats • USMARC, GRS-1, SUTRS, IAFA, ... • Early stages of searching ideally need results to be returned in single ‘simple’ format • Dublin Core provides a minimal set of agreed data elements with which we can construct results
Z39.50 and DC - searching • Version 2 • Searches phrased in terms of single attribute set only • Either need to • add DC attributes to Bib-1 • map DC to Bib-1 • Version 3 • Multiple attribute sets allowed for searching • New simple DC attribute set to be proposed • Other attributes taken from Bib-1 http://cypress.dev.oclc.org:12345/~rrl/docs/dublincoreandz3950.html