1 / 13

Archive-It: Archiving & Preserving Digital Content

Archive-It: Archiving & Preserving Digital Content. Internet Archive. We are a Digital Library Founded in 1996 by Brewster Kahle Located in San Francisco California. www.archive.org Largest publicly available web archive in existence Accessible starting in 2001 400 Billion+ URLs

LionelDale
Download Presentation

Archive-It: Archiving & Preserving Digital Content

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Archive-It: Archiving & Preserving Digital Content

  2. Internet Archive • We are a Digital Library • Founded in 1996 by Brewster Kahle • Located in San Francisco California

  3. www.archive.org • Largest publicly available web archive in existence • Accessible starting in 2001 • 400 Billion+ URLs • 80+ million websites • Content in 40+ Languages • Collect a snapshot of the web every 60-90 days 361 Billion pages saved

  4. Web Archiving Service: Archive-It Archive-It is a subscription service launched in February 2006 • Web based application that allows users to create, manage, access and store collections of digital content • The service is a fully hosted solution, and includes access and storage. • Provides tools for selection and scoping including cataloging with metadata • Ability to capture content using 10 different time frequencies • Archived content includes: html, text, videos, audio, social media, PDF, images, online newspapers • Can browse archived content 24 hours after a capture is complete; and full text search is available within 7 days • Restricted access options are available

  5. Archive-It Partners

  6. What is Web Archiving? Web archiving is the process of collecting portions of web content, preserving the collections, and then providing access to the archives - for use and re use. A web archive is a collection of archived URLs grouped by theme, event, subject area, or web address.

  7. Challenge: a lot of data Amount of content that is being archived Amount of data being created by content providers http://www.helenbrowngroup.com/2011/02/rescue-from-the-digital-firehose/gushing-firehose-by-joseph-robertson/ http://www.chaitalag.com/new/s/tubig

  8. Challenge: What to archive? …What is important to you? What do you want people to know about? What are your organization’s collecting activities? Vision?

  9. Archive-It Use Cases • Create a thematic/topical web archive on a specific subject or event. • Different perspectives and social commentary (tweets, blogs, comments). • Can include Spontaneous Events • Often related to traditional collecting activity around the same focus • Mandate to capture/preserve institutional memory and history. Construct an historical record of an institution’s web presence over time. • Support an electronic records system to meet records retention requirements. • Capture publications that aren’t being deposited in print form. • Closure crawls

  10. Access to Public Collections Partners: • Can view through private web application with login/password General Public: • Can view from Archive-It website: http://www.archiveit.org/ • Landing Pages: view from organization’s website with a branded page that links back to Archive-It hosted data • Integration with existing systems and catalogs

  11. Storage & Preservation Multiple ways to Store and Preserve Storage: • 2 copies of the archived data (primary and back-up) are stored at San Francisco Data Center • Collections transferred to the General Archive as a third copy • A copy of archived data can be shipped on a hard drive • Ability to download files from Internet Archive servers Digital Preservation: • 2008: LOCKSS • 2013: Duracloud

  12. Web Archiving Life Cycle Model http://www.archive-it.org/publications

  13. Questions & Answers Lori Donovan lori@archive.org Thank you!

More Related