380 likes | 640 Views
Persistent Identifiers. Solving a number of problems through a simplistic mechanism. Agenda. What are Persistent Identifiers (PIDs)? Extended PIDs How to use them. What are PIDs?. Persistent identifiers come in various formats. 10876/abc123 10.1594/WDCC/CMIP5.NCCNMpc
E N D
Persistent Identifiers Solving a number of problems through a simplistic mechanism
Agenda Whatare Persistent Identifiers (PIDs)? Extended PIDs Howtousethem
Persistent identifierscome in variousformats 10876/abc123 10.1594/WDCC/CMIP5.NCCNMpc ark:/13030/tf5p30086k http://purl.org/dc/elements/1.1/ urn:lsid:ubio.org:namebank:11815
Why do weneed PIDs? DRS Syntax Tracking ID CIM ID ...
PIDs pointtoresources 10876/abc123 Resolution service 101010101010101 http://example.com/xyz567
The resourceis a black box Metadata Data ? ? Software code Document 101010101010101 ?
PIDs aregloballyunique 10876/abc123 10876/abc123 101010101010101 101010101010101
URLs are not persistent over time („link rot“) • Today • 2015 • 2020 http://example.com http://example.com http://example.com 404 Not found 101010101010101 101010101010101
PIDs are persistent over time 10876/abc123 10876/abc123 10876/abc123 101010101010101 101010101010101 101010101010101 • Today • 2015 • 2020
PIDs establish a redirection layer 10876/abc123 Stable Unstable 101010101010101 http://... http://... http://...
Operations on a PID Create PID Update the URL the PID pointsto (Delete PID)
Therearemany PID systems / infrastructures Handle System ArchivalResource Key (ARK) Life Science Identifier (LSID) Persistent URL (PURL) Uniform Resource Name (URN) ...
Extended PIDs Weneedtogobeyondthesimple redirectionview
Someinformation must bestoredpersistently Checksum: 7D01E436 ! Verify... Checksum: 7D01E436 10876/A 10876/A 101010101010101 101010101110101 Today 2015
Buildmorecomplexinformationstructures A: 1 B: 4 C: 3 D: 7 [1, 5, 13, 9, 12]
Collectionsof PIDs arerequiredforourusecases 10876/collection1 10876/B 10876/B 10876/B 10876/B 10876/A
Graphs ortreesof PIDs arerequiredaswell 10876/A 10876/B 10876/C
The graphnodesandedgesmaybetyped Data Object 10876/A olderversion hasmetadata 10876/B 10876/C hasmetadata Data object Metadataobject
The graphstructure must bestoredpersistently 10876/B 10876/C hasmetadata Data object Metadataobject Onecombinedentity 101010101010101 101010101010101
Collectionscanberealizedthroughgraphs 10876/collection1 10876/B 10876/B 10876/A
What must bestoredpersistently? Minimal metadata (key-metadata) static • Checksum • PID creation time stamp • Graph structure (links) • Collectionmembership dynamic
Levels ofpreservation Minimal metadata 10876/abc123 Primary levelofpreservation Secondarylevelofpreservation 101010101010101
PIDs are a topicfor international collaboration 10876/A 10876/collection1 10876/B 10876/B 10876/B 10876/B 10876/A 10876/B 10876/C • Relation types must be standardized. • Research Data Alliance • WG ‘PID Information Types’ • WG ‘Type Registry’ • collections?
Usage scenario: Provenance as a DAG Data object PID Link „was derivedfrom“ t cdo
Software How do weactuallyuse PIDs?
I am biased towards the Handle System • Fortechnicalreasons • key-metadataisuniquefeaturethatisrequiredfor PID graphs • Forpracticalreasons – examples: • ARKs and URNs lack wideadoptionandsupport • PURL maintenanceis not clear • LSIDs in olderliteratureare not persistent • Handle System has an operational perspective
Whatisthe Handle System? • Developedby CNRI • Corporation for National Research Initiatives • Registered trademark • Fee forregisteringnewprefixes (e.g. 10876) • Customers e.g. • US military • International DOI Foundation
Howdoesthe Handle System work? 10876/100 100 URL: www.dkrz.de Checksum: ... 10876 Prefix DB 1234 1001 Central resolutionservice
Whatare Digital Objects? http://example.com/xyz789 10876/abc123
Build a stackoflightweightcomponents LAPIS API for Persistent Identifier Services (on GitHub)
Further reading Weigel et al.: “A framework for extended persistent identification of scientific assets” (submitted to the Data Science Journal) Duerr et al., doi:10.1007/s12145-011-0083-6
Thankyou. All slidesavailablehere: redmine.dkrz.de/seminar
The greater plan LTA application: Q4 2012, Q1 2013 EUDAT integration: 2014 CMIP6+: 2014