70 likes | 170 Views
Strategies for Adding EML Support to the GCE Data Toolbox for Matlab. Wade Sheldon Georgia Coastal Ecosystems LTER (WWW: gce-lter.marsci.uga.edu/lter). Background. Needed universal solution for processing tabular data sets (majority of IM work) Goals: Import from various data sources
E N D
Strategies for Adding EML Support to the GCE Data Toolbox for Matlab Wade Sheldon Georgia Coastal Ecosystems LTER (WWW: gce-lter.marsci.uga.edu/lter)
Background • Needed universal solution for processing tabular data sets (majority of IM work) • Goals: • Import from various data sources • Standardize units, date formats, attribute names • Assign metadata descriptors • Validate/QAQC • Generate statistical summaries, plots, maps • Export to various data/metadata formats • Support sub-setting & queries, super-setting (unions/joins) • Support automation of all steps • Automatically capture metadata throughout interactive processing
Background • Developed Matlab data structure specification for storing data table tightly coupled with metadata • Developed ‘Toolbox’ (function library) for working with data structures • Many roles in GCE IS: • Primary tool for acquisition, QAQC of data from monitoring network, PI submissions • Data/metadata packaging (linked to RDMS) • Data distribution (flexible formats) • New Role: Automated harvesting/processing/QC/web posting of remote data stores (USGS, NOAA) and post-processing of CSI arrays downloaded via modem • Began public distribution of toolbox in 2002 (primarily for end-user analysis of GCE data)
Toolbox Metadata Standard • Full implementation of FLED (+ user-extensible content) • Attribute-level metadata managed with data • General documentation descriptors stored in simple array format (Category, Field, Value) – designed for pre-formatted metadata, but parseable/updateable • Simple user-editable style definition tables used to produce formatted ASCII metadata
EML Differences • Higher granularity • Hierarchical structure (vs flatter 3-tier) • Different delineation of semantic/numerical attribute descriptors (much overlap, but different philosophy) • New unit dictionary requirements for validation contrary to units/unit conversion conventions (at odds with non-IM end-user focus of toolbox) • XML-based (requires extra steps for presentation)
Strategy • Short term: develop XSLT to convert EML (primarily dataset, entity, attribute) to ASCII headers for importing metadata along with data • Medium term: switch to EML-oriented metadata schema (e.g. use similar arrays, but support direct eml schema mapping by using xpath syntax for category/field info) • Long term: add support for direct caching of EML docs, include native xml routines for syncing metadata during processing (requires more users adopt latest Matlab version - R13)
Significance • Allow IM community take full advantage of these tools/capabilities for their own site’s data with minimal re-mastering (EML + ASCII/Matlab table) • Allow LTER IM community to showcase research-oriented, metadata-driven tools to bolster support for EML efforts immediately • If full EML support achieved, could become a useful mechanism for automatically producing EML-documented/validated data sets (datalogging -> harvest -> process -> QC -> dataset+EML -> validation)