120 likes | 228 Views
PIDs in Data Infrastructures. Peter Wittenburg CLARIN Research Infrastructure EUDAT Data Infrastructure. Automatic Workflows. most data is created automatically as part of workflows manual operations are exceptions at data creation time it is not obvious what their future life will be
E N D
PIDs in Data Infrastructures Peter Wittenburg CLARIN Research Infrastructure EUDAT Data Infrastructure
Automatic Workflows • most data is created automatically as part of workflows • manual operations are exceptions • at data creation time it is not obvious what their future life will be • later association with metadata and PIDs troublesome and costly • thus immediate generation of metadata and PIDs as part of automated • workflows • data resources need to be referable and often citable (published) • need a reliable and highly performing machinery (registration + resolution) based on stable standards typically Handles via EPIC typically DOIs via DataCite
PID usage in our domain • assume that we have a recording of an extinct language and some • annotations that tell us what someone said about medicine etc • researchers create relations that need to be preserved Video Recording from Repository A from Repository B Recording Session Metadata Record Sound Recording from Repository C How long, stable and persistent? are using Handles from EPIC service Annotations
PID usage in our domain Biological and cultural processes have evolved together, in a symbiotic spiral; they are now indissolubly linked, with human survival unlikely without such culturally produced aids as clothing, cooked food, and tools. The twelve original essays collected in this volume take an evolutionary perspective on human culture, examining the emergence of culture in evolution and the underlying role of brain and cognition. The essay authors, all internationally prominent researchers in their fields, draw on the cognitive sciences -- including linguistics, developmental psychology, and cognition -- to develop conceptual and methodological tools for understanding the interaction of culture and genome. They go beyond the "how" -- the questions of behavioral mechanisms -- to address the "why" -- the evolutionary origin of our psychological functioning. What was the "X-factor," the magic ingredient of culture -- the element that took humans out of the general run of mammals and other highly social organisms?Several essays identify specific behavioral and functional factors that could account for human culture, including the capacity for "mind reading" that underlies social and cultural learning and the nature of morality and inhibitions, while others emphasize multiple partially independent factors -- planning, technology, learning, and language. The X-factor, these essays suggest, is a set of cognitive adaptations for culture. ePublication Repository 1 eRessource Repository 2 How long, etc.? Handles from EPIC
Data Object World handle generator receives disseminations via RAP requests • let‘s isolate external properties of our data objects and collections and ignore the content (structure, semantics, packaging, etc.) for a moment requests originator depositor repository A user stores hands-over deposits via RAP maintains replicates data metadata (Key-MD) PID access rights work ownership registered DO - data - metadata (Key-MD) - location PID property record access rights type (from central registry) ROR flag mutable flag transaction record goes back to a paper by Kahn & Wilensky, 2006 repository B
2 DO flavours in our domain DO access via metadata metadata bit sequence (instance) immediate access ? access via PID PID • way how we organize data • different other variants possible MDO access via metadata metadata bit sequence (instance) search/browse access access via PID PID
collections in our domain (similar to MPEG21 containers, items, sub-items) ISOcat Registry (ISO 12620, compl. ISO 11179) - grouping of related data - large variety of reasons - versions of a DO - presentations of a DO - same interview/experim. - many others - DO part of many collections category 1 - assoc info category 2 - assoc info metadata (collection) - category 1 - category 2 ... - category N - PID1 - PID2 ... - PID K metadata - category 1 - category 2 ... - category N - PID PID collection - assoc info PID1 - assoc info PID2 - assoc info bit sequence PID Registry
EUDAT - common services • two major tracks: • understanding data organization & practices in communities • provide first common services after 12 months
PID Use V1 in EUDAT Federation repository Y repository Z repository X DO1 DO1 DO1 prefx PIDx URL URLy URLz CKSM Rights .... domain Y domain Z domain X
PID Use V2 in EUDAT Federation repository Y repository Z repository X DO1 DO1 DO1 prefx prefy prefz PIDy URL RoR HDL CKSM Rights .... PIDz URL RoR CKSM Rights .... PIDx URL RoR HDL CKSM Rights .... domain Y domain Z domain X
EUDAT relying on EPIC + Handles • EPIC (European PID Consortium: CSC, SARA, GWDG, more) • large data centers with national/organizational (MPS) support • applying redundancy schemes (persistence, availability) • reliability, robustness, performance (registration, resolution) • all the same API (agreement on information associated) • thus PID syntax not crucial but storing /finding information • feasible business model for science • security of administration DB for system • persistent and balanced governance for HS • need a worldwide registry of agreed information types to feed our „stupid“ machines
Information types in discussion • multiple links to resources • checksum • link to metadata • citation metadata • RoR statement • mutability flag • persistency statement • pointers to presentation versions • provenance statement • collection statement • pointer to rights • (support for parts/fragments) • (actionable PIDs) - need agreements - need standard APIs for EUDAT this is crucial