330 likes | 464 Views
GEOTRACES International Data Assembly Centre. Edward Mawji GDAC/BODC ezm@bodc.ac.uk. Discussion Points. Important concepts in data management… What and who are GDAC ( GEOTRACES International Data Assembly Centre )? Attempt to bring together data managers for WG3 meeting
E N D
GEOTRACES International Data Assembly Centre Edward Mawji GDAC/BODC ezm@bodc.ac.uk
Discussion Points • Important concepts in data management… • What and who are GDAC (GEOTRACES International Data Assembly Centre)? • Attempt to bring together data managers for WG3 meeting • Issues with management of GEOTRACES data across Europe • Membership of WG • Dates for first WG3 meeting
Guiding principles of data management • quality assurance of data • treat all information as data • data that lack sufficient metadata have limited value beyond the research program for which they were collected • metadata should include sufficient information to support discovery, value assessment and accurate re-use of data “data stewardship” The data collection generated by a research projectis a valuable component of its legacy.
What and who are GDAC The GEOTRACES International Data Assembly Centre (GDAC) was initially created in 2008 to serve the data management needs of the GEOTRACES community. At present GDAC is jointly funded by NSF and NERC. • Sole responsibility is data management and storage • Starting premise - data must be secure and readily usable in the short term for GEOTRACES participants and for long term without reference to the originator. GDAC is located at the British Oceanographic Data Centre Liverpool UK The Web-site is: http://www.bodc.ac.uk GDAC data manager is Edward Mawji
GDAC role • Main role of GDAC will be to compile global datasets for all key GEOTRACES parameters. • Provide PI with guidance on Metadata requirements • Capture and record supporting documentation (metadata) • Make data easily accessible to participating scientists and the larger science community. • To communicate with national data centres • The policy for data release will be determined by the Scientific Steering Committee (SSC). • Website for data delivery is under construction, the delivery aspect will be developed once data has been submitted to GDAC.
GDAC responsibilities • Maintain contact with national DC • Provide advice and/or assistance for PI’s • prior to planning workshops • prior to cruise … • cruise metadata forms • discuss data management strategy to support research • post-cruise … data publishing … • data inventory • cruise documentation • data contributions … • To meet the requirements of the GEOTRACES programme, GEOTRACES cruises require a high level of data management
National Data Centres • GDAC Ship-based TEI measurements • Store data and all metadata for all GEOTRACES cruises GEOTRACES Data Management set up Flow of Data • Ship-based measurements
Policy • Before cruise • PI/PSO informs GDAC of intended cruise • Down load pre-cruise metadata form and Scientific sampling event log forms from website • Identify appropriate DAC • If no DAC inform GDAC who will act as DAC (cruise planning stage) • After cruise- Chief Scientist • Submits metadata forms and event logs to GDAC and DAC (1 week) • Submits underway navigational files and data (1 week) • Submits CTD data to GDAC (1 week) • Submits cruise report (8-16 weeks) • DAC carries out data tracking and submits final data to GDAC, If no DAC GDAC carries out data tracking
Progress to date Initial steps Designed and published a GDAC Website www.bodc.ac.uk/geotraces/ Please have a look, feedback would be welcome
Website Progress and issues • Things to do • Create links under relevant section for metadata forms and example event logs (waiting for feed back from DMC) • Link to IMBERS data management cook book (once published). • Add a full description of every parameter measured on each cruise. This can only be achieved when PI’s or national data centres pass on this information. A cruise report would help. Unified description of key GEOTRACES parameters • Add cruises to POGO • Development of the data delivery function will be put on hold until it is necessary.
Accessing Data • GEOTRACES data specifications • Ask for user name and details when a data request is made • Provide information about the data with the data file = standard processing file OR metadata form Link from GEOTRACES website to GDAC website No Data will be made public without appropriate approval
Attempt to bring together data managers for WG3 meeting • Initially a meeting was planned for May 2009 • Poor response-so cancelled • Why • Lack of contact with national data centres • To early for a Data meeting?
Why a WG3 meeting is important • Centralisation of data • Version control • Communication • Metadata forms and requirements • Advisable all GEOTRACES cruise have an on board data managers • Greatly helps the organisation of data. IMBER have developed an online guide, will be made available and/or adapted for GEOTRACES once the IMBER SSC approve the draft Data Management Issues
IMBER Data Management Cook Book • Would be useful for feedback from the GEOTRACES community for our own cookbook • http://planktondata.net/imber/
Likely Problems • GEOTRACES Problems • Lack of cooperation from scientist and data centres. • Lead nations need to identify GEOTRACES data managers • Mistrust i.e Australia IPY data will not be made available until data is published Integrate GEOTRACES data into GDAC’s database Version control of Date: Potential problem from holding GEOTRACES data in multiple locations. Measured needed to ensure international data management works smoothly Data Managers should get together regularly to discuss progress.
Contacts and progress with IPY GEOTRACES data • The following contact has been made • Germany- PANGAEA (DE) (Hannes.Grobe@awi.de), Contact via email. Not much data in PANGAEA (Sea Bird Bottle data and some Radio nuclei data for cruise ARK-XXII/2). Issues over version control exist, hopefully will resolve before transferring data. • Netherlands- Taco de Bruin (Taco.de.Bruin@nioz.nl). IPY data manager but will also deal with GOTRACES IPY. New to the job GDAC have only just received his details (March 2009). Hein de Baar and Michiel Rutgers van der Loeff are confident they can make good progress for submitting data for the Polarstern cruises ARK-XXII/2 and ANT XXIV-3 in the next few months. • France -Marie-Paule Torre (FR) (torre@obs-vlfr.fr). Contact via email, Ed Mawji will arrange a meeting in the next few months to discuss BONUS-GOOD HOPE data. Marie Paule Torre expects data to start arriving around May 2009. • USA- Cyndy Chandler cchandler@whoi.edu. Very helpful, no data at present. Will probably start with metadata
New Zealand - Philip Boyd (Pboyd@chemistry.otago.ac.nz) will handle all NZ GEOTRACES data. Ed Mawji is in the process of setting up arrangements for GDAC to act as local data centre for the GEOTRACES process study FeCycle II (March 2009). • Australia –Andy Bowie and Edward Butler are in charge of IPY-GEOTRACES data. Butler’s group is still analysing samples. Bowie has finalised data for SAZ-SENSE, but will not transfer data until it is published. • No contact with Norway and Sweden but have been advised the following people are in charge of IPY data . • Norway -Oystein Godoy, (o.godoy@met.no) • Sweden- Jan Szaron (jan.szaron@smhi.se) SMHI oceanography • No contact with Japan or China.
Summary of progress and problems • Without a detail cruise report or metadata forms, tracking data becomes difficult (no record of what parameters were measured). • Metadata forms have been developed waiting for feed back from Reiner and Chris. • Distrust between some lead PI and data centres. And between data centres • Data retrieval method have not been tested but are likely to vary between data centers and country. No details at present. • No data set have arrived at GDAC yet. Not possible to set up BODC parameter codes until data with detailed method is received. • Future progress • The most important task is mapping all GEOTRACES parameters measured on all IPY cruises
Membership of WG • Need to increase the data management membership • Please nominate an appropriate person from each nation • Dates for the first meeting will depend on increased membership
Feedback • Feedback, comments and suggestions welcome and encouraged • Names of relevant data managers for GEOTRACES data Thank You
Data management support for large, oceanographic research projects • Facilitate the interchange of data within the project community • Work up and quality assure data • Assemble project data into a single high quality coherent data set, maintaining spatial and temporal relationships • Ensure documentation of data sets via developed metadata forms • Final banking and publication of the project data set • Encourage future utilisation of data
Quality Control of GEOTRACES Data • Reformat data to BODC internal format • Check parameters, units, time zone • Visually inspect data using in-house software • Data checked for spikes, gaps, physically unreasonable values • Data compared with that previously received from a site • Adjacent stations compared to check unusual signals • Problems discussed and resolved with data suppliers • Accompanying documentation compiled • Data loaded to BODC data bank • Data made available (via the Web, or on request)
Mission statement of GDAC • To operate world class data management for GEOTRACES • providing data management support for all cruises • maintaining and developing a TEI’s database • international exchange and management of oceanographic data • making high quality TEI’s data readily available to research scientists in academia, government and industry
Quality control and data integration The processing stage includes five main steps: 1) Initial quality control of metadata • Reported metadata are checked against other available sources (e.g. ship’s log, scientific log, cruise report, other data centres) • Disagreements are investigated and originator may be contacted if any doubt subsists • Originator may also be contacted if there appears to be a problem with the data or if information about methodology is insufficient to start banking the data. GEOTRACES metadata forms should eliminate this problem 2) Reformatting • Attribution of BODC parameter codes. • CTD and underway data are transferred to BODC internal binary format
Quality control at GDAC/BODC • Quality control • Visualisation of data – “screening” • Anomalous data points marked • Developing more complex automated systems • Metadata assembly • Oracle tables • Link data to time, position, originator, restrictions, XML documentation… • Audit • Banking • Final version to a secure location • Visible via GIDAC/BODC web software / NDG
Screening of CTD and underway data • Visual screening on high speed graphics workstation using BODC’s graphics editor Serplo • Serplo enables the display of multiple parameters and rapid zooming
Screening of CTD and underway data • So a suspect value is NOT edited but a 1-byte quality control flag becomes associated with it
GEOTRACES DAC • BODC is taking on responsibility for assembling and delivering GEOTRACES data • Who are BODC • A national facility for storing and distributing data concerning the marine environment (started in 1969). • BODC deal with biological, chemical, physical and geophysical data and our data bases contain measurements of over 10,000 different variables. • This includes quality control, dissemination and long term archival • Are involved in International project such as Argo, CLIVAR< WOCE, GLOSS and GEBCO and now GEOTRACES