1 / 11

Supporting SPs in a working archive: Software Tools

Supporting SPs in a working archive: Software Tools. Challenge. Reality: Infeasible to perform manual maintenance of large number of objects. Require software capable of extracting & maintaining SPs for large of objects Requirements: Object analysis tools Support requisite formats

masako
Download Presentation

Supporting SPs in a working archive: Software Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting SPs in a working archive: Software Tools

  2. Challenge Reality: Infeasible to perform manual maintenance of large number of objects. Require software capable of extracting & maintaining SPs for large of objects Requirements: • Object analysis tools • Support requisite formats • Identify all/some SPs • Support batch analysis • Ideally well supported and documented • Description schemas to record SPs • Flexible • Machine and format idependent • Conversion/emulation tools capable of maintaining SPs

  3. Format identification • File identification through Magic Number and ‘light touch’ scan of encoding structure. • Recognise 100s (potentially 1000s) of formats • Provide basic encoding info, but not detailed structure • Examples: • File (1): Free version created in 1986 & available for all operating systems. http://gnuwin32.sourceforge.net/packages/file.htm (Windows) • DROID: Java app developed by TNA. Integration with PRONOM. Format ID & assignment of PUID, which can be linked to preservation planning. http://droid.sourceforge.net/. • FFIdent: Java library to ID and extract basic information. Recognizes 27 encoding formats using header information (magic number & common structural information)

  4. Detailed Analysis Perform detailed analysis of internal structure of one or more files. • Email: • Aperture - Java framework able to decode structured text and convert to other format • ReadPST: Open source tool for processing Outlook PSTs • XENA - Java tool developed by NAA • Audio: • MP3Info - technical info viewer and ID3 1.x tag editor that supports the MP3 file format. • SoX/SOXI (Sound eXchange): extracts descriptive MD and technical info • MetaFlac: Extractor tool for FLAC audio. • Images: • TiffInfo • ImageMagick • JHOVE See InSPECT Testing Reports available at http://www.significantproperties.org.uk/ for further info on these tools

  5. JHOVE 1/2 JHOVE (http://hul.harvard.edu/jhove/) • Format-specific digital object validation API written in Java • Functionality: Format identification, Format validation, Format Characterisation • Supports: AIFF, ASCII, Bytestream, GIF, HTML, JPEG, JPEG 2000, PDF, TIFF, UTF-8, WAV, and XML. JHOVE2 (https://confluence.ucop.edu/display/JHOVE2Info/Home) • Supports: JPEG 2000, PDF, SGML, Shapefile, TIFF, ASCII & UTF-8 encoded text, WAVE, XML, ICC color profile • Functionality: Format identification, validation, feature extraction & policy-based assessment

  6. JHOVE Demo

  7. XCL (eXtensible Characterization Language) • Content extraction • Extracts content & tech properties through use of XCEL and saved as XCDL. • Format support: • PNG, TIFF, GIF, BMP, JPEG, JP2, PBM, PCD, PCX, PICT, PPM, PSD, SVG, TGA, XBM and XPM, MS DOC, DocX, PDF • Content comparison • Compare 2 objects e.g. TIFF & PNG, PDF & Doc

  8. XCL Extract & compare

  9. XCL Demo

  10. Final thoughts • Analysis tools useful, but have problems: • Limited format support • Variable access methods (GUI, CLI, APIs) • Inconsistent reporting process • Different metrics (e.g. text vs. no.) • Metric variations (e.g. milliseconds) • Partial solution: Wrap tools into services • PLANETS Interoperability Framework

More Related