1 / 18

Data Integrity Issues: How to Proceed

3 August 2006. Data Integrity Issues. 2 of 19. PDS Requirements for Data Integrity. The PDS has made a commitment to ensure the integrity of its data archives. This commitment is primarily spelled out in the Level 3 requirement 4.1.2:PDS will develop and implement procedures for periodically e

colby
Download Presentation

Data Integrity Issues: How to Proceed

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Data Integrity Issues: How to Proceed? Engineering Node Elizabeth Rye

    2. 3 August 2006 Data Integrity Issues 2 of 19 PDS Requirements for Data Integrity The PDS has made a commitment to ensure the integrity of its data archives. This commitment is primarily spelled out in the Level 3 requirement 4.1.2: “PDS will develop and implement procedures for periodically ensuring the integrity of the data.” Several other Level 3 requirements suggest additional implications for data integrity assurance.

    3. 3 August 2006 Data Integrity Issues 3 of 19 PDS Requirements for Data Integrity The PDS is responsible for assisting data providers in determining how to validate the data they provide: “PDS will provide criteria for validating archival products” (1.3.3) The PDS is responsible for ascertaining that the data we deliver to the NSSDC is valid: “PDS will meet U.S. federal regulations for the preservation and management of data.” (2.8.3) “PDS will meet U.S. federal regulations for preservation and management of the data through its Memorandum of Understanding (MOU) with the National Space Science Data Center (NSSDC)” (4.1.5)

    4. 3 August 2006 Data Integrity Issues 4 of 19 PDS Requirements for Data Integrity The PDS is responsible for enabling our users to verify the integrity of the data they receive from us: “PDS will develop and maintain online mechanisms allowing users to download portions of the archive” (3.2.1) “PDS will develop and maintain a mechanism for offline delivery of portions of the archive to users”( 3.2.2) “PDS will provide mechanisms to ensure that data have been transferred intact” (3.2.3) The PDS needs to ensure the maintenance of data integrity through the media refreshing process: “PDS will develop and implement procedures for periodically refreshing the data by updating the underlying storage technology” (4.1.3)

    5. 3 August 2006 Data Integrity Issues 5 of 19 PDS Requirements for Data Integrity The PDS has a stated goal of utilizing standardized procedures in areas that affect inter-node data transfers: “PDS will provide standard protocols for accessing data, metadata and computing resources across the distributed archive” (2.7.3)

    6. 3 August 2006 Data Integrity Issues 6 of 19 PDS Requirements for Data Integrity From the above requirements, we can derive several areas of concern for data integrity: Verifying the integrity of data stored on physical media Detecting errors introduced during transfer of data to newer media Detecting errors that occur during transmission of data: From data providers to the PDS Between PDS nodes From the PDS to the NSSDC From the PDS to end users

    7. 3 August 2006 Data Integrity Issues 7 of 19 PDS Requirements for Data Integrity There are two additional areas not derivable from existing PDS requirements where data integrity issues are involved: The re-delivery of non-archived data during the operations phase of a mission The potential updating of data to newer formats long after it has been archived

    8. 3 August 2006 Data Integrity Issues 8 of 19 Mitch Gordon Survey For each numbered item, do you think that it is an important issue for us to address? Section A - It is critical that the PDS be able to ascertain the integrity of its archive. This includes (but is not limited to): detecting errors that occur during the transmission of data from providers to the PDS, detecting errors that occur during the transmission of data between PDS nodes, detecting errors that occur during the transmission of data from the PDS to end users. detecting errors that occur during the transmission of data from the PDS to the NSSDC verifying the integrity of data stored on various types of external physical media (all of which have finite life spans), detecting errors introduced during transfer of data to newer media,

    9. 3 August 2006 Data Integrity Issues 9 of 19 Mitch Gordon Survey

    10. 3 August 2006 Data Integrity Issues 10 of 19 Possible Solutions to the Problem Checksums are widely accepted in the broader community as a means for ensuring data integrity MD5 checksums, in particular, are well suited to this purpose There has been no mechanism beside checksums suggested by any of the nodes as a means for detecting changes in data There is no consensus within the PDS as to whether we should limit ourselves to the MD5 checksum algorithm There is little consensus within the PDS as to whether we should use a standardized approach to utilizing checksums to verify data integrity

    11. 3 August 2006 Data Integrity Issues 11 of 19 Mitch Gordon Survey Section B - Identify a tool that can help (not necessarily be sufficient) with any, or hopefully all, of the above. Use a single tool, MD5, for generating and validating checksums Section C - Establish policies for the use of the tool in a variety of situations.

    12. 3 August 2006 Data Integrity Issues 12 of 19 Mitch Gordon Survey

    13. 3 August 2006 Data Integrity Issues 13 of 19 Issues to be Addressed

    14. 3 August 2006 Data Integrity Issues 14 of 19 Standardization Issue Should we have a standardized approach across the PDS for storing and accessing checksums or should each node be permitted to use whatever mechanism it chooses? Some flexibility needed to deal with variety of ways in which data providers deliver data to the PDS Standardization permits the development of tools for generating, accessing, and periodically validating against checksums Standardization permits the addition of checksum tools to existing interfaces (like PDS-D and NSSDC delivery mechanism) to utilize and validate against checksums

    15. 3 August 2006 Data Integrity Issues 15 of 19 Urgency Volume of data returned from missions is increasing exponentially every couple of years Going back and calculating checksums for every file already in the PDS holdings is currently feasible, but will become a significantly more difficult task with each passing year

    16. 3 August 2006 Data Integrity Issues 16 of 19 Policy Questions to be Answered At what level of detail should checksums be required? For what parts of the archiving process should checksums be required? To what degree should standardization among nodes be insisted upon? When should we begin requiring checksums?

    17. 3 August 2006 Data Integrity Issues 17 of 19 Current Proposal (SCR 3-1034, V9) Mandates generation of file checksums for every file on every archive volume Mandates standardized format and location for storage of checksums Is insufficient to solve all data integrity problems, but is a necessary part of the solution Required for all missions archiving to v3.8 or higher of Standards Reference (roughly missions starting process late this year)

    18. 3 August 2006 Data Integrity Issues 18 of 19 Most Recent Votes on Checksum SCR

    19. 3 August 2006 Data Integrity Issues 19 of 19 Options for Next Step Proceed with MC vote on version 9 of SCR Form new working group to come up with a new proposal MC draft policy on data integrity to provide further guidance to Tech group Drop the issue (fails to meet our requirements) Other?

More Related