1 / 36

Unidata’s Common Data Model

Unidata’s Common Data Model. John Caron Unidata/UCAR Nov 2006. Goals / Overview. Look at the landscape of scientific datasets from a few thousand feet up. What semantics are needed to make these useful? georeferencing specialized subsetting.

Download Presentation

Unidata’s Common Data Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unidata’s Common Data Model John Caron Unidata/UCAR Nov 2006

  2. Goals / Overview • Look at the landscape of scientific datasets from a few thousand feet up. • What semantics are needed to make these useful? • georeferencing • specialized subsetting

  3. An Abstract Data Model describes data objects and what methods you can use on them. An API is the interface to the Data Model for a specific programming language A file format is a way to persist the objects in the Data Model. An Abstract Data Model removes the details of any particular API and the persistence format. What’s a Data Model?

  4. Scientific Datatypes Point Trajectory Station Profile Radial Grid Swath Common Data Model Layers Coordinate Systems Data Access

  5. THREDDS Catalog.xml Application Scientific Datatypes Datatype Adapter NetCDF-Java version 2.2 architecture NetcdfDataset CoordSystem Builder ADDE NetcdfFile I/O service provider OPeNDAP NetCDF-3 NIDS NcML NetCDF-4 GRIB HDF5 GINI Nexrad DMSP …

  6. NetCDF-4 and Common Data Model (Data Access Layer)

  7. I/O Service Provider Implementations • General: NetCDF, HDF5, OPeNDAP • Gridded: GRIB-1, GRIB-2 • Radar: NEXRAD level 2 and 3, DORADE • Point: BUFR, ASCII • Satellite: DMSP, GINI • In development • NOAA: GOES (Knapp/Nelson), many others

  8. Coordinate Systems needed • NetCDF, OPeNDAP, HDF data models do not have integrated coordinate systems • so georeferencing not part of API • Need conventions to specify (eg CF-1, COARDS, etc) • Contrast GRIB, HDF-EOS, other specialized formats

  9. NetCDF Coordinate Variables dimensions: lat = 64; lon = 128; variables: float lat(lat); float lon(lon); double temperature(lat,lon);

  10. Coordinate Variables • One-dimension variable with same name as its dimension • Strictly monotonic values • No missing values The coordinates of a point (i,j,k) is {CV1(i), CV2(j), CV3(k)}

  11. Limitations of 1D Coordinate Variables • Non lat/lon horizontal grids: float temperature(y,x) float lat(y, x); float lon(y, x); • Trajectory data: float NKoreaRadioactivity(pt); float lat(pt); float lon(pt); float altitude(pt); float time(pt)

  12. General Coordinates in CF-1.0 float P(y,x); P:coordinates = “lat lon”; float lat(y, x); float lon(y, x); float Sr90(pt); Sr90:coordinates = “lat lon altitude time”;

  13. Coordinate Systems (abstract) • A Coordinate System for a data variable is a set of Coordinate Variables2 such that the coordinates of the (i,j,k) data point is {CV1(i,j,k),CV2(i,j,k),CV3(i,j,k),CV4(i,j,k)…} previous was {CV1(i), CV2(j), CV3(k)} • The dimensions of each Coordinate Variable must be a subset of the dimensions of the data variable.

  14. Need Coordinate Axis Types float gridData(t,z,y,x); float time(t); float y(y); float x(x); float lat(y,x); float lon(y,x); float height(t,z,y,x); float radialData(radial, gate) float distance(gate) float azimuth(radial) float elevation(radial) float time(radial)

  15. The same?? float stationObs(pt); float lat(pt); float lon(pt); float z(pt); float time(pt); float trajectory(pt); float lat(pt); float lon(pt); float z(pt); float time(pt);

  16. Revised Coordinate Systems • Specify Coordinate Variables • Specify Coordinate Types (time, lat, lon, projection x, y, height, pressure, z, radial, azimuth, elevation) • Specify connectivity (implicit or explicit) between data points • Implicit: Neighbors in index space are (connected) neighbors in coordinate space. Allows efficient searching.

  17. Gridded Data Connected means Neighbors in index space are neighbors in coordinate space float gridData(t,z,y,x); float time(t); // Time float y(y); // GeoX float x(x); // GeoY float z(t,z,y,x); // Height or Pressure • Cartesian coordinates • All dimensions are connected

  18. Coordinate Systems UML

  19. Scientific Data Types • Based on datasets Unidata is familiar with • APIs are evolving • How are data points connected? • Intended to scale to large, multifile collections • Intended to support “specialized queries” • Space, Time • Corresponding “standard” NetCDF file conventions

  20. Gridded Data • Cartesian coordinates • All dimensions are connected • x, y, z, time • recently added runtime and ensemble • refactored into GridDatatype interface float gridData(t,z,y,x); float time(t); float y(y); float x(x); float lat(y,x); float lon(y,x); float height(t,z,y,x);

  21. GridDatatype methods CoordinateAxis getTaxis(); CoordinateAxis getXaxis(); CoordinateAxis getYaxis(); CoordinateAxis getZaxis(); Projection getProjection(); int[] findXYindexFromCoord( double x_coord, double y_coord); LatLonRect getLatLonBoundingBox(); Array getDataSlice (Range[] …) GridDatatype makeSubset (Range[] …)

  22. Radial Data • Polar coordinates • All dimensionsare connected • Not separate time dimension radialData(radial, gate) : distance(gate) azimuth(radial) elevation(radial) time(radial)

  23. Swath • lat/lon coordinates • not separate time dimension • all dimensionsare connected swathData(line,cell) lat(line,cell) lon(line,cell) time(line) z(line,cell) ??

  24. Point Observation Data • Set of measurements at the same point in space and time • Point dimension not connected float obs1(pt); float obs2(pt); float lat(pt); float lon(pt); float z(pt); float time(pt); Structure { lat, lon, z, time; v1, v2, ... } obs( pt);

  25. PointObsDataset Methods // Iterator<StructureData> Iterator getData( LatLonRect boundingBox, Date start, Date end);

  26. Time series Station Data Structure { name; lat, lon, z; Structure{ time; v1, v2, ... } obs(*); // connected } stn(stn); // not connected

  27. StationObs Methods // List<Station> List getStations( LatLonRectboundingBox); // Iterator<StructureData> Iterator getData( Station s, Date start, Date end);

  28. Trajectory Data • pt dimension is connected • Collection dimension not connected Structure { lat, lon, z, time; v1, v2, ... } obs(pt); // connected Structure { name; Structure { lat, lon, z, time; v1, v2, ... } obs(*); // connected } traj(traj) // not connected

  29. Profiler/Sounding Station Data Structure { name; lat, lon, time; Structure { z; v1, v2, ... } obs(*); // connected } loc(nloc); // not connected Structure { name; lat, lon; Structure { time, Structure { z; v1, v2, ... } obs(*); // connected } time(*); // connected } stn(stn); // not connected

  30. Unstructured Grid • Pt dimension not connected • Looks the same as point data • Need to specify the connectivity explicitly float unstructGrid(t,z,pt); float lat(pt); float lon(pt); float time(t); float height(z);

  31. Data Types Summary • Data access through a standard API • Convenient georeferencing • Specialized subsetting methods • Efficiency for large datasets

  32. CDM Payoff N + M instead of N * M things on your TODO List! File Format #1 Visualization &Analysis NetCDF file File Format #2 OpenDAP Server File Format #N WCS Service Web Service

  33. THREDDS Data Server HTTP Tomcat Server Catalog.xml Application THREDDS Server • OPeNDAP • HTTPServer • WCS NetCDF-Java library hostname.edu Datasets IDD Data

  34. Next: DataType Aggregation • Work at the CDM DataType level, know (some) data semantics • Forecast Model Collection • Combine multiple model forecasts into single dataset with two time dimensions • With NOAA/IOOS (Steve Hankin) • Point/Station/Trajectory/Profile Data • Allow space/time queries, return nested sequences • Start from / standardize “Dapper conventions”

  35. Forecast Model Collections

  36. Conclusion • Standardized Data Access in good shape • HDF5, NetCDF, OPeNDAP • Write an IOSP for proprietary formats (Java) • But that’s not good enough! • To do: • Standard representations of coordinate systems • Classifications of data types, standard services for them

More Related