1 / 15

Three Flavors of Data

Three Flavors of Data. Science Data Simulations and Sensor Readings Catalog Data Metadata; descriptors of datasets, data products and other processing artifacts. Active Data Data associated with logging, monitoring and scheduling compute tasks. Three Flavors of Data (1). Science Data

Download Presentation

Three Flavors of Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Three Flavors of Data • Science Data • Simulations and Sensor Readings • Catalog Data • Metadata; descriptors of datasets, data products and other processing artifacts. • Active Data • Data associated with logging, monitoring and scheduling compute tasks.

  2. Three Flavors of Data (1) • Science Data • Simulation Data: Solutions to partial differential equations governing the physics of the Columbia River Estuary • Sensor Data: measurements of the physical characteristics used to guide and validate simulations • Wanted: • Simple means for specifying new data products from these raw data and computing them efficiently • Approach: • Data manipulation language based on a GridField data model.

  3. Three Flavors of Data (2) • Catalog Data • Explicit metadata to describe system artifacts • Wanted: • Tools to locate artifacts given descriptors (query) • A metadata collection facility that tolerates change • The metadata we wish to collect may change (eg, new product ‘lines’ are developed) • The source of the metadata may change (eg, file naming conventions or directory structures evolve.) • Approach: • Generic database; custom collection scripts

  4. Three Flavors of Data (3) • Active Data • Data describing past, current, and future compute tasks. • Wanted: • Tools for scheduling, monitoring, and managing... • individual tasks (eg, a single data product derivation) • groups of interdependent tasks (eg, a daily forecast run) • campaigns (eg, a series of calibration runs followed by a re-computation of the runs of 2002 with a different implicitness) • Approach: • undecided

  5. Simulation Data: GridFields • The data product suite exhibits recurring processing idioms • larger grids reduced to smaller grids Ex: ‘estuary’ data products vs. ‘far’ data products • grids mapped to other grids Ex: 3D grid mapped to a 2D slice • grids combined Ex: 1D depth grid ‘crossed’ with a 2D horizontal grid.

  6. Simulation Data: GridFields (2) • We’re expressing these idioms as operators over a grid-based data model. Advantages: • Simpler recipes • 5 ops for all the data products (plus helper functions) • Flexible model; fewer maintenance troubles • N dimensions • uniform handling of space and time (maybe more...) • Any cell type • segments, triangles, quadrangles, arbitrary polytopes • Optimization opportunities • operators prescribe semantics, but not implementation • topological equivalences exposed and exploited

  7. Simulation Data: GridFields (3) Status: • Core operators functional • Simple examples hooked to XMVIS for viewing • Todo: • Examples hooked to VTK • Write/Test examples from the current product suite • Support GridFields too large for memory • Expose a nice syntax for writing recipes

  8. Catalog Data: Collection Where is the Metadata? /forecasts/2003-184/run/images/isosal_estuary7/anim-sal_estuary_7.gif File Path File Name 1_salt.63 File Content Version: 1.04 Variable: salt : Other Files?

  9. Collection scripts • For each file type the meta-data collection mechanism is different. • gifs • binary output • Param.in • Use a script for each file type that will emit meta-data for that type of file. • Only these simple scripts need change as the system evolves

  10. Example: gif animation Depth = “7” Variable = “Salinity” product line = “isoline” /forecasts/2003-184/.../isosal_estuary7/anim-sal_estuary_7.gif CorieDate = “2003-184” Type = “Animation” Region = “Estuary” Lat = xxxx Long = xxxx Here, a script can just parse the path and file name

  11. Example: Binary output /forecasts/2003-184/run/1_salt.gif Variable= “Salinity” What about number of nodes? Mean Sea Level? 1_salt.63 nodes: 55817 msl: 4285 : : We need to access the file’s content Need a different mechanism than for gif animations; might be convenient to implement it in a different script.

  12. Architecture invokes Reflector Collection Script • Reflector creates XML file containing meta-data for each file and also stores the meta-data into the database • Reflector determines file type (based on regular expressions) and calls appropriate collection script • Collection script uses an “AddItem” Perl function to return the meta-data back to the reflector Meta-data XML DB

  13. Metadata in XML and DB? • These XML files give you filesystem-based access to the metadata for an artifact • Use “info” to present the XML in a readable form: /../run> info 1_salt.63 variable: salt version: 1.04 msl: 4285 nodes: 55817 • Also useful if DB is inaccessible.

  14. Minor Technical Change • Previously we had suggested that the collection scripts should emit metadata on standard output • We have provided a perl function AddItem(Name,Value,Notes,Type)

  15. How does this help ? • Find artifacts via descriptors (query) • ‘find animations showing the estuary where we used a constant bottom friction coefficient’ • where region = “estuary” and type = “animation” and ntau = “0” • Write robust metadata-driven programs • Chris’ low bandwidth zoom web app • Stay-Fresh Powerpoint Slides

More Related