Central Registry for Digitized Objects: Linking Production and Bibliographic Control

Central Registryfor Digitized Objects:Linking Production andBibliographic Control Ralf StockmannGöttinger Digitization Center

As things are now • Huge ventures in • Digitization • Google • Microsoft • National programs • Local centers • Accessibility • World Digital Library • European Digital Library • National portals • Google Book Search

As things are now • We just face the dawn of mass digitization • Leaving behind the state ofmanufacturing • Entering industrialization • Scanning Robots • Accessible Full Text (OCR)

Lack of … • Coordination in digitization activities • Who scans what where when in which quality and how will it be accessible • How is “quality” defined? • Do we agree on “what”?

Facing the Consequences TechnicalImprovements Costs Waste of Ressources Costs / Value AdditionalBenefit Number of digitized items per volume

The Solution • Central registry for digitized objects • Focused on the production context (no user frontend) • API driven • Application Programming Interface • Query / Ingest • Simple implementation into existing workflow-tools • Batch mode (lists) • Open Source / free service • Matching on volume level • Score / probability

Implementation Backend Services EROMM / EDL / OCLC / … Registry / Meta Data Store Aggregator / Normalizer / Mapping API Query Ingest Ingest Ingest Collections / Projects ? ? ? ! ! ! Notice of Intent Running Project Present Collections

Metadata Store • Bibliographic • Title • Author • Date • Place of publication • Number of Pages (?) • Language • Print / Format • Edition • Technical • Resolution • Color depth • File type / compression • Accessibility • Institution • Persistent identifier • Rights • URL • Status • Digitized • In Progress • Intended (Timeline?) • Requested? Matching / Score „what“ Additional Judging „who, where, which quality, how accesible“ Decisive Factor „when“

Obstacles • (open source) Tools for automated matching / scoring? • Interface for manual comparison / decision making • Multivolume works: low rate of uniformity (near 50% of physical SUB stock before 1900) • Unicode • Transliteration tables • Random bound books • Reliable identifier • ISBN for old books? • Anticipated rate of accuracy: 50 – 70 %

Appreciation of Values • The goal is NOT to build a reliable database in terms of library standards • But to prevent further waste of resources. • If we manage to archive just 50% precision, • We saved a min. 50% of founding!

Work Packages • Define metadata model • Set up database • Implement mapping tools • Define API calls • Implement API • Build some connectors to popular mass digitization workflow tools (e.g. “Goobi”) • Establish ISBN workflow • Harvest existing sources • Start with a community of actual projects • Get some (!) founding • Estimated schedule plan: 6 months

Thank You(stockmann@uni-goettingen.de)

Central Registry for Digitized Objects: Linking Production and Bibliographic Control