590 likes | 726 Views
The OAI PMH (Open Archives Initiative Protocol for Metadata Harvesting) MetaScholar Initiative All-Project Meeting Atlanta, GA 6/18/2002. Edward A. Fox fox@vt.edu CS DLRL Virginia Tech, Blacksburg, VA, USA. Acknowledgements.
E N D
The OAI PMH(Open Archives InitiativeProtocol forMetadata Harvesting)MetaScholar InitiativeAll-Project MeetingAtlanta, GA 6/18/2002 Edward A. Fox fox@vt.edu CS DLRL Virginia Tech, Blacksburg, VA, USA
Acknowledgements • Sponsors: Mellon Foundation, SOLINET, NSF, DLF, CNI, UK’s JISC, Virginia’s CIT, … • OAI Team: Steering Committee, Technical Committee, Developers, Data Providers, Service Providers • Emory Team, Partners around Southeast • VT Colleagues: Hussein Suleman, Rohit Kelapure, Ming Luo, Ryan Richardson, Marcos Goncalves, Priya Shivakumar, Baoping Zhang, students working on term projects, …
Contents • Early history • Key concepts • Examples • ODL, XOAI • OAI Tools • Technical Plan • Conclusion
Open Archives Initiative OAI www.openarchives.org openarchives@openarchives.org
Open Archives Initiative (OAI) • xxx@LANL, high-energy physics (Ginsparg, 1991) • CSTR + WATERS = NCSTRL (Lagoze,1994) • xxx + NCSTRL = CoRR collaboration (1998) • Universal Preprint Service protoproto, Oct. 21-22, 1999, Santa Fe – led by LANL, CNI, DLF, Mellon --> OAi • Santa Fe Convention (see Feb 2000 D-Lib Magazine article) • Archives -> Open Archives • Support unique archive identifiers • Implement metadata set(s) (DC, using XML) • Implement OA harvesting protocol • Register the archive • Build tools, layer other services: linking, searching, …
OAi Philosophy • Self-archiving = submission mechanism • Long-term storage system = archive • Open interface = harvesting mechanism • Data provider + service provider • Start with “gray literature” • e-prints/pre-prints, reports, dissertations, …
Open Archives (protoproto) • ArXiv & Los Alamos National Lab • CogPrints & U. Southampton • NACA & NASA (reports) • NCSTRL & Cornell U. • NDLTD & Virginia Tech • RePEc & U. Surrey • Total of around 200K records
American Physical Society California Digital Library Caltech Coalition for Networked Info. Cornell University Harvard University Library of Congress Los Alamos Nat’l Lab Mellon Foundation NASA Langley Research Cntr Old Dominion University Stanford University U. of Ghent U. of Surrey U. of Southampton Vanderbilt University Virginia Tech Washington University Original Open Archives Members
Contents • Early history • Key concepts • Examples • ODL, XOAI • OAI Tools • Technical Plan • Conclusion
Now is a Technical Umbrella forPractical Interoperability… Metadata Harvesting Reference Libraries Museums Publishers E-PrintArchives …that can be exploited by different communities
Metadata harvesting The World According to OAI Service Providers Discovery Current Awareness Preservation Data Providers
OA 1 OA 2 OA 4 OA 3 OA 5 OA 6 OA 7 Aggregation throughOAI Harvesting –Black Box Perspective
Theology Emory GA UGA U FL UTK AmSo Library Aggregation throughOAI Harvesting –By Organization
Confederate Constitution Civil War History Oral Sports Culture AmSo Diaries Aggregation throughOAI Harvesting –By Topic
Approaches to Aggregation Build By Institution Build By Discipline
Types of Access Possible Build By Institution Build By Discipline Access by Year Category Personage Author Genre Query …
OAI Repository Required: Protocol Set Structure URI Scheme MDO MDO MDO MDO Required: DC MDO MDO MDO MDO DO DO DO DO
Metadata vs. Data • Data refers to digital objects or digital representations of objects • Metadata is information about the objects (e.g. title, author, etc.) • OAI focuses on metadata, with the implicit understanding that metadata usually contains useful links to the source digital objects
Metadata: Complex to Simple MARC (>$50) Dublin Core (DC)
harves ter repository supportdata repos i tory OAI protocol items harvesting data
Registered URI Scheme Unique ID within archive: (syntax is archive-specific) Archive Identifier: Registered within OAI identifiers locally unique key for extracting a record from a repository oai-identifier = oai:archive-identifier:record-identifier example = oai:ncstrl:ncstrl.cornellcs/TR94-1418
harvest withindate range repos i tory record record selective harvesting - datestamps
S1 harvest within set repos i tory record record record selective harvesting - sets S2
Summary:Protocol for Metadata Harvesting • Service Requests • Identify • ListMetadataFormats • ListSets • GetRecord • ListIdentifiers • ListRecords • Metadata Multiplicity • Date (and Time) Ranges • Resumption Tokens
Harvesting vs. Federation • Competing approaches to interoperability • Federation is when services are run remotely on remote data (e.g., federated searching) • Harvesting is when data/metadata is transferred from the remote source to the destination where the services are located (e.g., union catalogues) • Federation requires more effort at each remote source but is easier for the local system and vice versa for harvesting • OAI (currently) focuses on harvesting
Contents • Early history • Key concepts • Examples • ODL, XOAI • OAI Tools • Technical Plan • Conclusion
Example 1: Union Collection of ETDs(Electronic Theses and Dissertations,for Networked Digital Library ofTheses and Dissertations, NDLTD)
referenced items & collections referenced items & collections Special Databases Portals & Clients Portals & Clients Portals & Clients NSDL Services NSDL Services Other NSDL Services NSDL Collections NSDL Collections NSDL Collections Core Services: information retrieval CI Services browsing CI Services authentication Core Services: metadata gathering CI Services personalization Core Collection- Building Services protocols CI Services discussion Core Collection- Building Services harvesting CI Services annotation Example 2: NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure Workgroup User Interfaces CoreNSDL “Bus” Usage Enhancement Collection Building
Example 2: CITIDEL -> NSDL • Computing and Information Technology Interactive Digital Education Library • A collection project in the National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL • www.nsdl.nsf.gov • www.nsdl.org
Example 2: CITIDEL Distributed repository structure
Example 2: NSDL Collections(themes relevant to our projects) • Discovery of content • Classification and cataloguing • Acquisition and/or linking; referencing • Disciplinary-based themes define a natural body of content, but other possibilities are also encouraged • Software tool suites for analysis, modeling, simulation, or visualization • Reviewed commentary on pedagogy
Contents • Early history • Key concepts • Examples • ODL, XOAI • OAI Tools • Technical Plan • Conclusion
Open Digital LibrariesXOAI-PMH • Dissertation work of Hussein Suleman (member of OAI technical committee) • Extending the OAI protocol • Supporting rapid development of DLs using networks of components • Demonstrated with NDLTD, CSTC • Described in Dec. 2001 D-Lib Magazine article, and article scheduled for publication
Open Digital LibrariesComponents • Running now • XML-File (data provider from file system) • Union, search, browse, recent, filter • E-journal support system • Class projects • High performance multilingual search • Recommender • User rating • Others discussed • Classification/categorization and browsing
Component System Approach • (Open) DL = Network of Extended OAs Data Input Resource Discovery Search Browse Recommend Local Archive Metadata Repository Remote Archive User Interface OAI/ODL archive OAI/ODL protocol legend
Example Architecture (NDLTD) Virginia Tech User Interface PhysNet Humboldt Search Browse Recent Duisburg CalTech Union Catalog Dresden MIT Filter User Interface OAI/ODL archive OAI/ODL protocol legend MIT
Contents • Early history • Key concepts • Examples • ODL, XOAI • OAI Tools • Technical Plan • Conclusion
OAI Tools • Related resources, e.g., XML, Unicode • Submission / author support • XML Schema Validator • Servers and utilities, e.g., ARC, Kepler, EPrints • Repository Explorer • Interactive Browsing • Testing of parameters • Multiple views of data • Multilingual support • Automatic test suite
Author‘s tools www.physik.uni-oldenburg.de/EPS/mmm
VT Tool: Repository Explorer • The Repository Explorer is a tool for browsing and testing Open Archives, by Hussein Suleman • You issue commands and see the results • You also can perform a sequence of automatic tests • http://purl.org/net/oai_explorer
VT Tool: RE 1.3
Contents • Early history • Key concepts • Examples • ODL, XOAI • OAI Tools • Technical Plan • Conclusion
What will central service look like? (1 of 2) • Harvesting from local sites • Rich content, drawn from all participating sites • Data management • Logging and reporting • Repository/preservation/mirroring • Adding/updating/deleting • User interface and support for digital librarians and data providers