1 / 14

Curation , Preservation, & Information Lifecycle Management

Curation , Preservation, & Information Lifecycle Management . Mair éad Martin, Penn State University Commons Solutions Group Storage Workshop May 2010. Designing and implementing storage architectures and systems to support data curation and preservation needs What does this entail?

amable
Download Presentation

Curation , Preservation, & Information Lifecycle Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Curation, Preservation, & Information Lifecycle Management Mairéad Martin, Penn State University Commons Solutions Group Storage Workshop May 2010

  2. Designing and implementing storage architectures and systems to support data curation and preservation needs • What does this entail? • Who’s thinking about this? • Who’s doing anything about this?

  3. Definitions of terms • Digital Preservation • Managed activities to ensure long term retention, retrieval of, and access to data • Digital Curation • Maintaining, preserving, and enhancing data throughout its lifecycle • Archival storage • Depends on who you talk to • Information Lifecycle Management • Storage industry term for the above • Object-based storage • Data with metadata “container”

  4. Drivers & Incentives • eScience/eResearch data management needs • NSF requirement for data management plans • Compliance • e-Discovery, FERPA, HIPAA, Sarbanes-Oxley • Institutional record retention regulations and policies • Storage services for libraries, archives, cultural heritage entities • Great efficiencies

  5. Expectations • Storage is cheap • Storage is smart • Stuff on the Internet is persistent • Digital safer than analog • Storage provider = curators and preservation experts • Repositories take care of preservation • Metadata will take care of it • Libraries will take care of it • The Cloud will take care of it

  6. The Reality • New roles, new responsibilities, new collaborations, practices, workflows • Intellectual capital requirements – digital preservation/curation policy determination and implementation • Bar for trust is rising • Cloud antithetical to preservation? • Increased storage management requirements • Scaling issues with preservation requirements

  7. Preservation requirements • More likely to meet these today at the system level – DR & BC practices and tiered storage architectures • Immutable storage • Data integrity checking • Mitigation of bit rot • Auditing function • Mitigation of obsolescence • File format migration • Deposition as important as retention • Need for storage management metadata • Technical – file size, name, location, ACL, date, time, versioning, • Biggest need: system-independence

  8. Standards/Technologies • iRODS (integrated Rule-based Data System) • Storage Resource Broker (SRB) • Content Addressable Storage (CAS) • Fixed content storage, retrieval based on content rather than location • eXtensible Access Method (XAM) • Emerging SNIA standard for an API for content-addressable storage objects

  9. Initiatives: Preservation Networks • NSF DataNet Program • Data Conservancy project – JHU lead with 23 institutions to create curation, discovery, and preservation network • Chronopolis • SDSC, UCSD, UMIACS, NCAR: Federated data grid using SRB/iRODS • LOCKSS (Lots of Copies Keep Things Safe) • Replication of licensed journals and other content • MetaArchive– • a private LOCKSS archive • Internet Archive

  10. Initiatives • National Digital Information Infrastructure & Preservation Program (NDIIP) • Library of Congress program to “to develop a national strategy to collect, preserve and make available significant digital content via a preservation network of over 130 partners."

  11. Initiatives • California Digital Library • CurationMicro-services • DuraSpace • DuraCloud project to implement a preservation-oriented cloud storage service • HaithiTrust • Repository and storage infrastructure initiated for CIC Google book project • Sun Preservation and Archiving SIG (PASIG) • Storage Networking Industry Association

  12. Penn State activities • Content Stewardship Program – strategic collaboration between University Libraries and Information Technology Services (ITS) • Goal: a suite of services to support the lifecycle of the digital object – creation, discovery, access, storage, preservation and archiving • Hired Digital Library Architect and Digital Collections Curator • Governance in place

  13. Penn State Activities • Anchor projects/activities: • Storage and Preservation strategy development • Prototyped the XAM standard for archival storage • Institutional record repository • Research data prototype • Best practices for data management • ETD platform replacement • Sponsoring curation technology workshop in August • LOCKSS member, recently joined MetaArchive • Exploration of California Digital Library’s curation micro-services • Application of service management principles and processes to the above

  14. Closing……… • What are CSG member institutions doing in this space?

More Related