290 likes | 451 Views
A Data Quality Screening Service for Remote Sensing Data. Christopher Lynnes, NASA/GSFC Edward Olsen, NASA/JPL Peter Fox, RPI Bruce Vollmer, NASA/GSFC Robert Wolfe, NASA/GSFC Shahin Samadi , NASA/GSFC.
E N D
A Data Quality Screening Service for Remote Sensing Data Christopher Lynnes, NASA/GSFC Edward Olsen, NASA/JPL Peter Fox, RPI Bruce Vollmer, NASA/GSFC Robert Wolfe, NASA/GSFC ShahinSamadi, NASA/GSFC Contributions from: N. Most, Y.-I. Won, T. Hearty, R. Strub, S. Ahmad, S. Zednik, M. Hegde and A. Rezaiyan-Nojani Funded by: NASA’s Advancing Collaborative Connections for Earth System Science (ACCESS) Program
Outline Satellite Data Quality: Why So Difficult? Making Quality Information Usable via the Data Quality Screening Service (DQSS) How does DQSS Scale? Future Directions for DQSS
Earth Observing System Satellite sensors collect data for for climate science research
EOS satellite data is archived and distributed by a network of data centers
Satellite-borne collection breeds a tendency to keep low-quality data around • AIRS – Atmospheric Infrared Sounder • atmospheric temperature • atmospheric moisture • trace gas composition Satellites are very expensive to launch and operate. It takes years to understand the data characteristics. Data are used for many different purposes.
Quality schemes can be relatively simple… Total Column Precipitable Water Quality Level kg/m2 Best Good Do Not Use
…or they can be fairly complicated Hurricane Ike, viewed by the Atmospheric Infrared Sounder (AIRS) Air Temperature at 300 mbar PBest : Maximum pressure of “Best” quality values in temperature profiles
Quality flags are also sometimes packed together into bytes Cloud Mask Status Flag 0=Undetermined1=Determined Day / Night Flag 0=Night1=Day Snow/ Ice Flag 0=Yes1=No SunglintFlag 0=Yes1=No Cloud Mask Cloudiness Flag 0=Confident cloudy 1=Probably cloudy 2=Probably clear 3=Confident clear Surface Type Flag 0=Ocean, deep lake/river 1=Coast, shallow lake/river 2=Desert 3=Land Bitfield arrangement for the Cloud_Mask_SDS variable in atmospheric products from Moderate Resolution Imaging Spectroradiometer (MODIS)
Current user scenarios... Repeat for each user • Nominal scenario • Search for and download data • Locate documentation on handling quality • Read & understand documentation on quality • Write custom routine to filter out bad pixels • Equally likely scenario (especially in certain user communities) • Search for and download data • Assume that quality has a negligible effect
Using bad quality data is in general not negligible Total Column Precipitable Water Quality Best Good Do Not Use kg/m2
Heavy cloud impairs retrievals AIRS True Color, 2008-09-10
Neglecting quality may introduce bias (a more subtle effect) AIRS Relative Humidity Comparison against Dropsonde with and without Applying PBest Quality Flag Filtering
Making Quality Information Usable via the Data Quality Screening Service (DQSS).
The DQSS filters out bad pixels for the user • Default user scenario • Search for data • Select science team recommendation for quality screening (filtering) • Download screened data • More advanced scenario • Search for data • Select custom quality screening parameters • Download screened data
DQSS segregates bad-quality pixels into a separate array Original data array (Total column precipitable water) Mask based on user criteria (Quality level < 2) Good quality data pixels retained Rejected data points Output file has the same format and structure as the input file (except for the extra rejected-data fields)
Visualizations help users see the effect of different quality filters Best quality only Best + Good quality
DQSS encodes the science team recommendations on quality screening Straightforward: “Use Best-only for data assimilation uses; Use Best+Good for climatic studies” More complicated: “Use only VeryGood over land; Use Marginal+Good+VeryGood over ocean”
Ontology-driven design helps scale with diversity of quality schemes Quality View Variable Types & Properties
The Quality View links data variables with quality variables.
The code queries the ontology to determine the type of quality variable needed and selects the corresponding module to generate the quality mask.
Execution at distributed data centers helps to scale with demand Screening takes place at the data centers where the data reside to avoid excessive data transfers The jury is out as to whether Java performance will scale to meet demand.
Operationalize DQSS • Promotion to operations for AIRS data is scheduled for Aug 2010 • DQSS will be offered through the main data search interface at the data center • Usage Metrics will be collected: • Basic usage • What criteria are used • Screening invocation is a simple URL GET • What the “ yield” was for the screening
Extend DQSS to Other Datasets Moderate Resolution Imaging Spectroradiometer (MODIS) Microwave Limb Sounder (MLS) Ozone Monitoring Instrument (OMI) Cloudsat Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO)
Collaborative Screening Use Case? Define custom criteria Tag custom criteria Browse tagged criteria Dr. Alice Modify tagged criteria Dr. Bob
DQSS Recap • Screening satellite data currently is troublesome and time consuming for users • The Data Quality Screening System will provide an easy-to-use service • The result should be: • More attention to quality on users’ part • More accurate handling of quality information… • …With less user effort