1 / 57

The Other Security: A New and Nimble Approach to Digital Preservation

The Other Security: A New and Nimble Approach to Digital Preservation. UCCSC 2009: Focus on Security UC Davis, June 16–17, 2009. Stephen Abrams Perry Willett Digital Preservation Program California Digital Library University of California. Focus on Security. “Traditional” security risks

abram
Download Presentation

The Other Security: A New and Nimble Approach to Digital Preservation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Other Security:A New and Nimble Approach to Digital Preservation UCCSC 2009: Focus on Security UC Davis, June 16–17, 2009 Stephen Abrams Perry Willett Digital Preservation Program California Digital Library University of California

  2. Focus on Security “Traditional” security risks • Natural disaster • Infrastructure failure • Storage failure • Server failure • Operating system failure • Application failure • Human error • Malicious attack

  3. Focus on Security The “other” security risks • Legal encumbrances • External dependencies • Media obsolescence • Format obsolescence • Staff competencies • Institutional commitment • Financial stability • Changing user expectations

  4. Focus on Security The “other” security risks • Anything that interferes with the usability of managed digital assets now or in the future

  5. Libraries Have a Long Time Horizon The UC Melvyl union catalog holds over 28 million items; 11,000 are more than 500 years old

  6. Libraries Have a Long Time Horizon What can we do to ensure that today’s digital assets are still usable 500 years from now?

  7. Agenda What is digital curation? Redefining the repository: A micro-services approach to curation Web archiving CDL/campus curation collaborations Trusted digital curation services Summary

  8. Digital Curation Activities focused on maintaining and adding value to trusted digital content Encompasses preservation and access, which are complementary, not disparate functions • Preservation ensures access over time • Access depends on preservation up to a pointin time How can we make the “Save” button really mean “save”?

  9. Curation Imperatives Integrated business process • Robust technological infrastructure and • Human analysis and decision-making Programmatic (not project-oriented) Services (not systems) Content (not repositories)

  10. Agenda  What is digital curation? Redefining the repository: A micro-services approach to curation Web archiving CDL/campus curation collaborations Trusted digital curation services Summary

  11. D'où venons nous, que sommes nous, où allons nous? Paul Gauguin, 1897-98, Museum of Fine Arts Boston, 32.270

  12. Where are we from, what are we, where are we going? Paul Gauguin, 1897-98, Museum of Fine Arts Boston, 32.270

  13. Where [is our stuff] from, what [is it], where are we going [with it]? Paul Gauguin, 1897-98, Museum of Fine Arts Boston, 32.270

  14. Where From? What? Where To? Repository Consumer Producer

  15. Where From? What? Where To? Repository Data management / archival storage Consumer Access / preservation planning Producer Ingest

  16. Where From? What? Where To? Repository Data management / archival storage Characterization Consumer Access / preservation planning View paths Producer Ingest Provenance

  17. Information Landscape Increasing diversity in types and uses of content Content arising from non-library contexts Inevitable technological change

  18. Infrastructure Design Goals Devolve repository function into a set of independent, but interoperable, services • Since each is small and self-contained, they are more easily developed and maintained • Since the level of investment is lower, they are more easily replaced Provide complex function through the flexible combination of atomistic services

  19. Infrastructure Design Goals Support interaction through procedural APIs, command line applications, and web interfaces • Let content managers and curators interact with the services without requiring changes to existing work practices Rather than force content to come to the services, push the services out to the content • Easy deployment centrally or locally, either independently or in strategic combinations

  20. Infrastructure Design Goals Defer implementation decision making until needs and outcomes are clearly articulated • Requirements are first stated as sets of values and strategies that promote those values • Strategies are then embodied as abstract services, and, finally, instantiated in technical systems

  21. Object-Centric Values and Strategies

  22. Service-Centric Values and Strategies

  23. Micro-Services

  24. Design Process What are the conceptual entities underlying the service? What are their state properties? What are their behaviors?

  25. Storage Service Storage service • An aggregation of storage nodes Storage node • A particular configuration of object storage Object • An aggregation of files over time Version • A particular configuration of files at a point in time File • A formatted bit stream

  26. Storage Service Methods Help [idempotent, safe] Get-state [idempotent, safe] Get-node-state [idempotent, safe] Get-object-state [idempotent, safe] Get-object [idempotent, unsafe] Get-version-state [idempotent, safe] Get-version [idempotent, unsafe] Get-file-state [idempotent, safe] Get-file [idempotent, unsafe] Add-version [non-idempotent, unsafe]

  27. Storage Service Interfaces

  28. Technological Change and Invariance Circa 1989 • FTP • POSIX • SQL Circa 2029? • HTTP • URI • XML Due to their inherent abstracting nature, protocols and interfaces last longer than systems

  29. Storage Service Implementation Using the file system as the controlling managerial abstraction, what is the thinnest smear of additional functionality that will make it an effective object store? • Namaste • CAN • Pairtree • Dflat • ReDD

  30. Name As Text (Namaste) Tags Directory-level signature files extending Dublin Core Kernel metadata • [ Tag h0 ] 0=name_version • Who h1 1=who • What h2 2=what • When h3 3=when • Where h4 4=where

  31. Content Access Node (CAN) File system conventions (structure and reserved names) for an object store can/ 0=can_0.2 can-info.txt log/ store/ pairtree...

  32. Pairtree Use a bigram decomposition of an object’s identifier to determine its file system path pairtree/ 0=pairtree_0.1 pairtree-info.txt pairtree_root/ id/ en/ ti/ fi/ er/ dflat...

  33. Dflat A “digital flat” for object data and metadata dflat/ 0=dflat_0.11 dflat-info.txt v001/ d-manifest.txt delta/ redd... v002/ f-manifest.txt full/ data/ metadata/ enrichment/ annotation/

  34. Reverse Delta Directory (ReDD) File-level reverse delta compression redd/ 0=redd_0.1 add/ delete.txt

  35. Performance Scaling Modern file systems, e.g. ZFS, exhibit good performance characteristics at reasonable scale 2,272,000 files = 28.5 TB 127,058,820 files = 25.7 TB

  36. Status We are completing development of the foundational Storage and Identity services • Identity is based on N2T (name-to-thing) and Noid systems We are planning for the Ingest, Catalog, and Characterization services • Characterization is based on JHOVE2 As these services become available they will be deployed centrally and locally on campuses

  37. Agenda  What is digital curation?  Redefining the repository: A micro-services approach to curation Web archiving CDL/campus curation collaborations Trusted digital curation services Summary

  38. Today’s Web is History’s Source Material The web is indispensible to science, commerce, education, entertainment, and culture Yet, it is highly volatile UC faculty and researchers have their own web publications Libraries and archives wish to preserve important websites How can we secure this valuable content into the future?

  39. Web Archiving Service (WAS) Provides open source tools for curators to select and preserve content from the free web Allows curators to define scope of collection, frequency of crawling, work collaboratively Content is saved in “projects,” grouped by common subject matter or publisher

  40. Crawl operation in WAS

  41. WAS Public Access Starting in July, curators will be able to provide public access to their projects Rights based on recommendations of Section 108 Study Group • 6 month embargo • Opportunities for content owner to opt-out Libraries will add links in their online catalogs to documents, websites Advantages: curated collections, persistent access and URLs, full-text searching

  42. WAS Partners Library of Congress: grant funding for development • UC campuses, University of North Texas, and others Internet Archive: software and experience • Heritrix crawler, Wayback display, Nutch indexing National Library of France: standards and leadership • IWAW international web archiving workshop • IIPC (national libraries consortium) commitment

  43. Agenda  What is digital curation?  Redefining the repository: A micro-services approach to curation  Web archiving CDL/campus curation collaborations  Trusted digital curation services Summary

  44. CDL Curation Collaborations DataOne • NSF-funded project to preserve distributed scientific data and develop infrastructure for distributed scientific research on global change • University of New Mexico, UC Santa Barbara Media Vault Program • UC Berkeley Historical Newspapers • UC Riverside

  45. Agenda  What is digital curation?  Redefining the repository: A micro-services approach to curation  Web archiving  CDL/campus curation collaborations Trusted digital curation services Summary

  46. Trusted Digital Repositories Trusted Repositories Audit and Certification (TRAC) • Criteria for evaluating repository trustworthiness • Developed by RLG, OCLC, NARA, CRL • Based on Open Archival Information System (OAIS) reference model (ISO 14721)

  47. TRAC Basic approach • TRAC checklist provides framework • Organization documents planning and policies Allows organizations to self-audit and identify gaps Allows other organizations to perform external audit

More Related