190 likes | 279 Views
Existing records. Problems and Solutions. The current situation. 2.5 million bib records in six libraries@cambridge databases 1 million short records At 10 books/hour it would take one cataloguer 62 years (10 over 6 years) Total cost (just salary) approx £1.5 million. Deduplication.
E N D
Existing records Problems and Solutions
The current situation • 2.5 million bib records in six libraries@cambridge databases • 1 million short records • At 10 books/hour it would take one cataloguer 62 years (10 over 6 years) • Total cost (just salary) approx £1.5 million
Deduplication • Move to single bibliographic model? • Deciding on ‘best’ record, manually relinking holdings/deleting surplus bibs • Colleges – c. 50% duplication • Departments/Others – c. 25% duplication • A lot of relinking (1M records affected?)
For single cataloguer – another 40 years work? Overall 100 years? • Or for 10 cataloguers, 10 years? • Total cost rises to £2.5 million • And that’s a lot of money …
Prohibitively expensive/time consuming to handle these problems by manual recataloguing alone • Particularly in the light of a likely migration to a new Library Management System (Ex Libris or otherwise) in the medium term • A non-migratable situation? • Solutions?
Automated Cataloguing Tools! • update@cambridge - short record enrichment • Automated MARC correction • Deduplication routines • Order important – need full, well coded records to deduplicate effectively
update@cambridge • Record enrichment program • Web interface for use in libraries • Looks for match in UL database • If found, corrects MARC in UL record (if necessary) and overlays local record • Match rates 60-70% on average
In testing with 8 libraries • So far: 34,000 bibs processed 21,000 bibs enriched • Match rate of 62% • If all 1M short records were fed through, 620,000 records would be updated, leaving only 380,000 for manual recataloguing.
MARC correction • Like the Bib Checker program, but corrects errors instead of just warning you • Already built into the update@cambridge program • Could be rolled out into other areas for large scale MARC correction
How to get from this • =LDR 00472nam\\2200157\a\4500 • =001 662002 • =005 20071205064734.0 • =008 071129s1985\\\\nyua\\\\\\\\\\001\0\eng\d • =020 \\$a9780961751111 • =100 1\$aBroecker, W.S.,$d1931- • =245 10$aHow to build a habitable planet ;$cBy Wallace S. Broecker. • =260 \\$aNew York ;$bEldigio Press,$cc1985 • =300 \\$a291p $bill $c23cm • =504 \\$aIncludes index. • =650 \0$aAstronomy. • =650 \0$aAstrophysics.
to this! • =LDR 00453nam 2200157 a 4500 • =001 662002 • =005 20071205064734.0 • =008 071129s1985\\\\nyua\\\\\\\\\\001\0\eng\d • =020 \\$a9780961751111 • =100 1\$aBroecker, W. S.,$d1931- • =245 10$aHow to build a habitable planet /$cby Wallace S. Broecker. • =260 \\$aNew York :$bEldigio Press,$cc1985. • =300 \\$a291 p. :$bill. ;$c23 cm. • =504 \\$aIncludes index. • =650 \0$aAstronomy. • =650 \0$aAstrophysics.
Lists corrections • Bib id: 662002 • How to build a habitable planet ; By Wallace S. Broecker. • 100: UPDATE: Spaces inserted between initials in subfield _a • 245: UPDATE: By uncapitalised at start of subfield c • 245: UPDATE: Space forward slash inserted before subfield _c • 260: UPDATE: Full stop inserted at end of field • 260: UPDATE: Space colon inserted before subfield _b • 300: UPDATE: Full stop inserted after the p in pagination • 300: UPDATE: Full stop inserted at end of field • 300: UPDATE: Illustration abbreviation has been corrected • 300: UPDATE: Space colon inserted before subfield _b • 300: UPDATE: Space inserted between digits and cm • 300: UPDATE: Space inserted between digits and p in pagination • 300: UPDATE: Space semi-colon inserted before subfield c
Deduplication • Routines and algorithms: • Find duplicate records • Find ‘best record’ • Relink holdings to this record • Run it through MARC correction routine • Delete duplicate bibs
Tools for Cataloguers, not replacements! • Does the stuff programs do well, allowing you to concentrate on what humans do well • Won’t do all the work, just makes the project feasible
What you can do • Record sharing • Adhere to the Bibliographic Standard • Make sure local information is in the holding record or correctly coded • If you are interested in short record enrichment, MARC correction or deduplication, get in touch