1 / 40

A presentation for the 4 th COPS Workshop September 25-26, 2006 Hohenheim, Germany

Overview of Atmospheric Radiation Measurements (ARM) Data Management and Archiving in NetCDF formats. A presentation for the 4 th COPS Workshop September 25-26, 2006 Hohenheim, Germany Raymond McCord Oak Ridge National Laboratory* Oak Ridge, Tennessee, USA Assisted by Dave Turner

teigra
Download Presentation

A presentation for the 4 th COPS Workshop September 25-26, 2006 Hohenheim, Germany

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Atmospheric Radiation Measurements (ARM) Data Management and Archiving in NetCDF formats A presentation for the 4th COPS Workshop September 25-26, 2006 Hohenheim, Germany Raymond McCord Oak Ridge National Laboratory* Oak Ridge, Tennessee, USA Assisted by Dave Turner University of Wisconsin Madison, Wisconsin, USA *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725

  2. Overview • Data management • Objectives • Policy • ARM data and systems description • Systems overview • Data storage strategy • About Data Files and Formats • Features • Header attributes • Data structure • Access and Analytical Tools • ARM Data and Information Types • Beyond “the data file” • Where are the metadata?? • Web tour of www.arm.gov • ARM Data Access • Overview of Archive • Demo of ARM Archive user interfaces (time allowing)

  3. Quotes from Raymond • “Storing data is EASY. Finding and using data later is NOT…” • Data accessibility and usage, not storage, are the primary metrics of an Archive • “Systematically and consistently organized data does not occur without cost. Consider the results from previous science projects with no extra effort for data archiving.” • “The natural tendency over time for data and information is chaos. Effort must be exerted to overcome this.” • “Successfully managed data by projects may not be ready to be archived.” • Scientific data systems must be designed to accommodate changes (content, access, users, etc.). This is noticeably different from business systems – the origin of most of our technology.

  4. Data Management: Objectives • ARM Objectives • Create a data product that is: • Logically and structurally consistent through time • Capable of accommodating changes (scope, content, quality information, etc.) • Accessible both “now” and in the future • Develop and operate a data system that is: • Timely to develop and processes data in a timely manner • Modular for expansion and change • Can withstand external review (mostly scientific and quality issues) • COPS Objectives • When possible create data products “like ARM” • When possible attain the same data management objectives as ARM

  5. ARM Data Policy • Provide open data access: • To maximize exchange of data • between collaborating programs • to be available for scientific objectives • In a timely manner (known and minimal delays) • To data of “known and reasonable” quality • From routine instrument operations • With delayed and restricted access for experimental implementations • Record data usage and users • Retrospective notifications of new quality information or reprocessed data • Important for documenting “worth” of data to sponsoring organization • Required for “National User Facility” status • Provides access to operational funding beyond research programs

  6. Data Systems

  7. ARM Scientists 25 GB/Day • Geography Dispersed • Enabled by Internet Technology • Continuous availability •Today - >2000 Different Data Streams • Availability/Quality/Meaning ARM Data Systems: Overview Southern Great Plains (Oklahoma, USA) Tropical Western Pacific ARM Archive 70 TB Data Mgt & Processing Facility (Manus, Nauru, Darwin) (ORNL) (PNNL) North Slope Of Alaska • External • Model • Satellite • GIS (Alaska) General Scientific Community (2100 users, 140 universities 44 countries) Mobile Facility (BNL) Aerial Vehicles

  8. ARM Data Systems: Detail Very Limited User Access laptop Research / Data Quality system Research User system laptop ARM DMF continuous hourly Shared disk Site data systems External disk (shipped) hourly daily hourly Data logger ARM Archive As needed

  9. ARM Data Storage Strategy • ARM data are stored in Data Streams • A “data stream” is a series of files (daily) that have similar contents and structure. • Files can be concatenated across time if needed. • Daily files are created as a convenience for processing, review, transfer, and distribution. • The same instruments at different locations create files with the same data stream structure. • Automated QC flags are contained within the data files.

  10. About Data Files and Formats

  11. NetCDF File “features” • Processed ARM data files are stored in NetCDF format • Self-contained data documentation • Header block • Data arrays • Non-proprietary format (open source) • Efficient binary format • Directly accessible by application software (IDL, MATLAB) • Libraries available for data creation and access from your own software • available for Fortran, C, C++, Perl, Java • http://www.unidata.ucar.edu/software/netcdf/index.html

  12. NetCDF File Structure (Header) • File-specific information • creation time, dimension values for arrays • Data definition attributes • Data field names (varname) • Data field description (longname) • Data limits • min, max – optional • Measurement info • units, resolution, missing value code, etc. – optional?? • Global values (attributes) • Descriptive information that valid for a portion of the data stream • Location name, reference for retrieval algorithm, long term calibration information, contact information, etc.

  13. Examples of ARM Header Information Online Demo Link Here

  14. NetCDF File Structure (Data) • Data are stored in “array” records after the header. • ARM data are “dimensioned” by Time and sometimes Height • Time recording is very important. • ARM uses base time + time offset and composite time • Multi dimensional arrays are possible, but rare. • Data fields are stored in the same order as defined in the header. • Data are accessible by “array number” • Avoid using this!!! • Single and multiple dimension data arrays can occur in any order within a data stream.

  15. NetCDF Data Access and Analysis • Applications using NetCDF can: • Access data by filename / data field name • Concatenate similar files (e.g., from a time series) • Merge of values based on similar dimension values • Links to NetCDF tools can be found at: • http://www.arm.gov/data/tools.stm

  16. ARM Data/Information Structure Going to a “higher” view!!

  17. ARM Data Types - overview • Continuous data (stored offline, accessible by requests from user interface) • ARM collected data • Value added products • External data • Special data (stored online, accessible from web interface) • Field Campaign (IOP) data • Beta data • PI generated data products

  18. ARM Data Types – more detail armarchive@ornl.gov 1-888-ARM-DATA • ARM collected data • RAW data files • Available upon request, but not accessible from User Interface • Minimal documentation; user beware • Wide variety of formats; many are binary • Processed data files • Accessible from user interfaces • Common formats include NetCDF and HDF • Value added products (VAPs) • Include one or more of the following • Advanced algorithms • Multiple data inputs • Input from long-time periods • ARM produces some VAPs to improve the quality of existing measurements. In addition, when more than one measurement is available, ARM also produces "best estimate" VAPs.

  19. Types of Quality Information • Automated products • QC flags • inserted in data files during processing • QA flags • Summaries of flags (data color) • Manual products • Data Quality Reports (DQRs) • web accessible reports • delivered as html files after data requests • event driven and problem-based • Mentor Instrument Reports • web accessible (http://www.db.arm.gov/IMMS/ ) • Also linked to instrument web pages.

  20. Beyond the Data File!! • Overview of Information Structure • “Patience… Please… getting ready for a Web Tour” • You will benefit from our “logic”. • You will need our “content”. • We will need to know your “content”. • Your structural “logic” will also be helpful to us. • A “sneak attack” on Metadata Issues

  21. Sites “Instruments” Data streams Measurements ARM information Structure Location, etc Categories + metadata VAPs Data stream “Family” metadata Documentation + Categories + Metadata ???? Guest

  22. Instruments Data streams Measurements Tour of www.arm.gov What do you see now??

  23. Data Access (user interfaces) How many doors are enough??

  24. Accessing Data from the Archive • User interface options • Overall scheme of user interfaces • Logical view of interfaces • More details and demo (time allowing) • ARM Data Browser • Web Shopping Cart • Catalog Interface • Thumbnail Browser • IOP Data Browser • Contact Us….. • 1-888-ARM-DATA, armarchive@ornl.gov • Continuous data distribution • “Standing Orders”

  25. You are NOT alone... Request Statistics From Archive • 3 sites • 10’s facilities • 100’s data sources • 100’s data users • 1000’s measurement types • 1,000,000’s data files • 1,000,000,000’s measurements • 10,000,000,000,000’s bytes Archive Data Flow

  26. Comparison of User Interface Options

  27. Display summary results from search (# files, # DQRs, # QLs) Display detailed information (file list, DQRs, color map, QLs) Order files Overall Interface Scheme Identify “data of interest” (answer questions)

  28. FTP host Requested files User copy (FTP) query specifications E-mail notification query results Mass Storage System Database File Retrieval Processor File list and tracking You and the Archive (Simplified view) Archive web-basedUser Interface End Start

  29. User Interface “Demo” use presentation Go to web interface

  30. Display Thumbnails

  31. Thumbnail Browser – Catalog Interface Thumbnail Page

  32. IOP Data Browser – IOP View Click for access to more data sub-directories

  33. IOP Data Browser – Data Selection

  34. Standing Order Processing FTP host ftp.so.archive.arm.gov Email specifications to Archive Delivery Directories User copy (FTP) E-mail notification Order specifications Notification Processor Data base Temporary copy New File Processor New Data files

  35. Questions? Comments?

  36. Detailed Reference Slides

  37. Data access policy “goals” (1) • Data exchange between ARM and COPS as open and complete as needed • (more comments on next slide) • Provide online documentation about • Measurement technology • Installation and site information • Data structure • Basic QA review methods and results • Generate data products in a “timely” manner • Predictable schedule for generation and access • Retain complete and comprehensive records of data inventory, usage, and users. • In a searchable database • Distribute to data users updated information for data quality and data revisions (reprocessing) as needed

  38. Data access policy “goals” (2) • Assume that fully open access has the best potential for overall scientific output • No cost for data exchange and access • Protect “rights” of data generators • Provide initial opportunity for publication and evaluation • Especially for data from “new” instruments. • Offer co-authorship or acknowledgement to instrument PI’s. • Prevent premature access of data • Very early access only as needed for operational planning (forecasting) • Before initial QC evaluation is complete • Recipients of data have unrestricted use. • Within an “access group” all requestors have equal access • No favorites between groups (??) • Data file format (netCDF) and structure will match ARM when possible (??)

  39. ARM Archive Systems users users ARM Web documentation User interface IOP Data system Metadata Database DMF system Retrieval processing daily External Data system Archive Storage Processing users Standing Orders External disk (shipped) FTP host As needed Mass Storage System Radar Spectral data

  40. Logical Structure of ARM Metadata Daily files nim1metM1.b1 1met Daily files MET zcc1metM1.b1 30met Daily files nim30metM1.b1 SKYRAD skyrad20s zcc30metM1.b1 Daily Files Storage processing skyrad60s Insturment Class description Inventories of Stored and Retrieved files Insturment Code description Data stream Measurement metadata Site / facility list Web Info Meas type Date range User Interface

More Related