1 / 34

Digital Immortality

Digital Immortality. OR. Keeping Digital Data for Ever. Dr David Holdsworth <ecldh@leeds.ac.uk>. http://www.leeds.ac.uk/cedars/. Obsolete(?) Data. 1 Things that must be kept by law 2 Things that must be destroyed by law 3 Things that we choose to keep

dexter
Download Presentation

Digital Immortality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digital Immortality OR Keeping Digital Data for Ever Dr David Holdsworth <ecldh@leeds.ac.uk> http://www.leeds.ac.uk/cedars/

  2. Obsolete(?) Data • 1 Things that must be kept by law • 2 Things that must be destroyed by law • 3 Things that we choose to keep • 4 Things that we are certain can be thrown away

  3. Obsolete(?) Data • 5 Things that we would like to keep if we have room • 6 Things that we would like to throw away, but are not sure about • 7 Things that we think we have kept but cannot find • 8 Things that we have kept but now cannot decypher • 9 Things that we have not kept but now wish that we had

  4. What to Keep • All of 1 and 3 • 1 Things that must be kept by law • 3 Things that we choose to keep • As much of 5 and 6 as is cost-effective • 5 Things that we would like to keep if we have room • 6 Things that we would like to throw away, but are not sure about • Data discarded from 5 and 6 has the potential to be in 9 in the future • 9 Things that we have not kept but now wish that we had • Minimise cost per item

  5. Some Pitfalls • Errors are usually not correctable • Failure to index adequately puts data into category 7 • 7 Things that we think we have kept but cannot find • Failure to know the format puts data into category 8 • 8 Things that we have kept but now cannot decypher

  6. Personal Involvement CEDARS • Curl Exemplars in Digital ARchiveS • Collaborative project for libraries • Funded by HEFCE/JISC • Oxford, Cambridge and Leeds

  7. Personal Involvement - contd. CAMiLEON • Creative Archiving at Michigan and LeedsEmulating the Old on the New • Collaborative project on emulation • Funded by NSF/JISC

  8. Challenges to digital preservation • Deteriorating media • Magnetic dropout • Obsolete equipment • Obsolete data formats • EBCDIC • UNICODE has established itself • Machine code software is an extreme example

  9. Philips LaserVision

  10. Challenges to digital preservation contd • Needles in haystacks • ISBN • Meta-data • Deteriorating Institutions • Where are the digital legal deposits? • .. Or even Digital Equipment Corporation • Proprietary systems become obsolete • leaving data inaccessible

  11. Compatibility - Friend or Foe • e.g. OS/z evolves from OS/360 • Windows Vista evolves from 16-bit Windows 3.1 • Modern machines run old software …… but faster • Who keeps old versions? • Computer Museum in California • Microsoft -- ?

  12. Times Change • People don’t always want to process their old data using the tools of yesteryear

  13. THIS IS GEORGE 3 MARK 8.67 ON 31DEC99 10.19.03_ TIMED OUT 10.19.33 THE SYSTEM HAS TEMPORARILY CLOSED DOWN

  14. Times Change • People don’t always want to process their old data using the tools of yesteryear • Need to bridge the gap between data’s origins and the time of access

  15. Use the Past to Illuminate the Future • In 1987 EDCDIC was king • In 2007 UNICODE is heir apparent • In 2027 ……. • In 2038 UNIX time_t overflows 31 bits • What has survived the decades?

  16. Survival of the Abstract • Character sets • Bytes • Unstructured Files (stream of bytes) • Hierarchical file tree • Associative mappings • Programming languages

  17. All is not lost • We can keep a byte-stream for everThe abstract data separated from the medium is technology-neutral • i.e. files can be kept for ever • Copies are perfect • File formats do not last for ever • ….. Remember WORDSTAR

  18. Non-File Objects • e.g. CDs, DVDs, magnetic tapes, web sites • Map each digital object into a byte-stream and then preserve • Multiple files (e.g. websites) can go in a ZIP or tar archive

  19. Abstraction • Identify significant properties of the object • represent them in a byte stream

  20. Example -- magnetic tape • Significant properties • blocks of data • tape marks • start and end of tape • Representation • block-- raw bytes, preceded by 32-bit byte count • tape mark -- 4 bytes all ones • start & end -- ends of stream

  21. When to convert • Conversion is inevitable • a) as soon as the format becomes obsolete • b) only when we want to read the data • c) never - emulate the original system

  22. Convert as soon as Obsolete • Copying to new technology is no longer trivial • Any errors are cast in stone • Digital signatures are lost • Only viable when the number of different formats is small

  23. Convert when we want to read • Preserve the original by simply copying onto current technology • Record the format of each stored object • Keep an index of all the formats held • Maintain access to conversion software from the old to the current • Treasure open-source conversion software

  24. Format Registries • National Archives PRONOM • Harvard Global Digital Format Registry • OAIS ISO14721:2003 Representation Information

  25. Emulation of Yesteryear • Today’s desktop machine far exceeds the mainframe of the 1970s or even 80s • George3 • Emulate the George3 executive • i.e. order code + system calls + peripherals • BBC micro • Publicly available emulation on WWW

  26. Abstraction for Emulation of 1900 system • George3 sits on 1900 instruction set plus executive calls • Executive sits on 1900 instruction set plus Fancy I/O stuff • George3 provides lots of embellishment of 1900 instruction set • Emulate executive + 1900 instruction set

  27. George3 demo

  28. Malawi Census Data • Data stored on ICL magnetic tapes • Rescued by using emulated ICL 1900

  29. Standards • Open Archival Information System • OAIS ISO14721:2003 • Originated by Space Data Community • Proprietary “standards” • Big enough to be reverse engineered e.g. MS Word • XYZ Software Ltd • Open standards, e.g. RFCs

  30. Really Long-Term • Look back 20 years to see how things have changed • Today’s Vista is not the final scene • Ensure that systems can accommodate new formats • Even the standards are likely to change

  31. Domesday 1986 • 900th anniversary of William the Conqueror’s version • BBC collects data (inc pictures) • Data written on 12" LaserVision discs • Discs last 100 years, but not the drives • Access is via BBC Master computer • That won’t last 100 years either • Can we preserve it until the 1000th anniversary?

  32. Stewardship • Copies of the discs are lodged with: • BBC • British Library • National Archives (ex PRO) • Abstract data held by: • DH / Leeds University • Longlife Data Ltd

  33. Stewardship • Current archival activity stresses retention of media • Retention of digital media is useless • Need digital safe deposits

  34. Keeping Digital Data for Ever Digital Immortality OR Dr David Holdsworth <ecldh@leeds.ac.uk> http://www.leeds.ac.uk/cedars/

More Related