Local Enhancement of Global Estimation

Local Enhancement of Global Estimation • Molly Leecaster, Ph.D. • Kerry Ritter, Ph.D. DAMARS and STARMAP 2nd Annual Conference Oregon State University Corvallis, OR August 11, 2003

Acknowledgement PROJECT FUNDING • The work reported here was developed under the STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. The views expressed here are solely those of the presenter and STARMAP, the Program they represent. EPA does not endorse any products or commercial services mentioned in this presentation.

Outline of Presentation • Introduction • Two-stage sample design • Spatial modeling of binary EMAP data • Indicator kriging • Conditional autoregressive model • Simulation Example • Future work

Introduction • EMAP developed for estimation of areal extent of resources • Sample locations are spatially separated • EMAP participants are interested in global estimation but also have local concerns • Spatial modeling • EMAP data does not provide information on the local spatial structure required for good spatial models • Therefore …. Augment EMAP design to improve spatial modeling

Goals • Present enhancement to EMAP design • Use of enhanced sample in spatial models of indicator data • Indicator kriging • Conditional autoregressive model

Outline of Presentation • Introduction • Two-stage sample design • Spatial modeling of EMAP data • Simulation Example • Future work

Two-stage: Systematic Grid Plus Star Cluster Sample Design • Two-stage because two goals • Systematic (EMAP) grid for global structure • Star cluster sample for variogram estimation • Enhance EMAP design with additional sample locations • Ideal for areal extent and prediction • Ideal for variogram estimation

Two-Stage Design Pink…….…….absence Blue…….…….presence Black….……...systematic Green.………..star clusters 1 Orange…..…..star clusters 2

Stage One: Systematic Component (EMAP) • Based on global estimation requirements • e.g. 30 spatially separated locations per strata

Stage Two:Star Cluster Component • Star clusters of sample sites around stage-one locations • Star clusters provide estimate of small scale pair-wise variance • Star clusters also provide many added pairs of samples at various distance lags • Star clusters provide directional information at small scale • How to specify star clusters?

Stage Two:Star Cluster Component • Location of star clusters • Adaptive, locate at specified observed response • Does this bias the variogram estimation? • Random stage-one locations • Systematic subset of stage-one locations • Size of star clusters • Diameter of star = variogram range • Diameter of star > variogram range • Number of star clusters • At least two, but how many more?

Outline of Presentation • Introduction • Two-stage sample design • Spatial modeling of EMAP data • Simulation Example • Future work

Spatial Models for Binary Data • Indicator kriging for geo-referenced data • Conditional autoregressive model for binary lattice data

Indicator Kriging • Binary geo-referenced data • Spatial correlation structure modeled from data • Precision of predictions depends on sample spacing and variogram parameters

Ordinary Indicator Kriging • Estimate local indicator mean, , at each location • Apply simple IK estimator using estimated mean

Conditional Autoregressive Model for Binary Data • Binary lattice data • Spatial correlation structure assumed: locally (neighborhood) dependent Markov random field • Neighborhood defined as fixed pattern of surrounding grid points • Precision of predictions depends on neighborhood structure, grid size, and variance of response

Conditional Autoregressive Model for Binary Data

Comparison of Models • Ordinary Indicator Kriging • Advantages • Knowledge of spatial relationship improves prediction • Assumed spatial relationship based on data • Disadvantages • Not robust to variogram mis-specification • Requires strong stationarity assumption • Conditional autoregressive • Advantages • No need to estimate or model variogram • Can be used without geo-referenced data • Disadvantages • Assumed spatial relationship based on a grid size that could be inaccurate

Outline of Presentation • From last year to now … progress & new directions • Two-stage sample design • Spatial modeling of EMAP data • Simulation Example • Future work

Simulation Example • Used simulation so spatial structure was known • Simulated response from specific variogram model on to 50x50 hexagon grid of points • Specified presence/absence cutoff • Applied two-stage sample design (2 realizations) • Estimated and modeled variogram from sample data • For some, did two manual and one automatic fit • Predicted probability of presence using indicator kriging and conditional autoregressive model

Simulation Methods • Simulated data from Gaussian random field (S-Plus) • Spherical variogram, range = 22, sill = 0.4, nugget = 0 • Simulated value > 2 => presence • Sample Designs • Systematic sample (n=30) • Systematic sample plus 2 star clusters (n=54) • Systematic sample plus 4 star clusters (n=78) • Models • Indicator kriging • Conditional autoregressive model

Data Simulation with Sample Sites Pink…….…….absence Blue…….…….presence Black….……...systematic Green.………..star clusters 1 Orange…..…..star clusters 2

Variogram for Sample Designs Systematic Systematic + 2 Stars Systematic + 4 Stars

Systematic Sample Results

Systematic Sample with 2 Stars

Three Fits: Systematic + 2 Stars Automatic Fit Manual Fit #1 • Range Sill Nugget • 17 0.3 0 • 0.4 0 • 0.27 0 • All use correct model Manual Fit #2

Predictions from 3 Variogram Fits Automatic Fit Manual Fit #1 Manual Fit #2

Comparison of Prediction Errors • Sensitivity • Number of presence sites predicted to be present • Specificity • Number of absence sites predicted to be absent • True Positive Rate • Number of predicted presence sites that truly are present • True Negative Rate • Number of predicted absence sites that truly are absent

Comparison of Predictions (Data1F)(positive if probability > 0.5)(Auto, Manual #2)

Comparison of Predictions (Data1F)(positive if probability > 0.3)(Auto, Manual #2)

Data Simulation with Sample Sites Pink…….…….absence Blue…….…….presence Black….……...systematic Green.………..star clusters 1 Orange…..…..star clusters 2

Variograms for Sample Designs Systematic Systematic + 2 Stars Systematic + 4 Stars

Systematic Sample Results

Three Fits: Systematic Automatic Fit Manual Fit #1 • Range Sill Nugget • 30 .25 .21 • 15 .27 0 • .22 0 • All use correct model Manual Fit #2

Predictions from 3 Variogram Fits Automatic Fit Manual Fit #1 Manual Fit #2

Comparison of Predictions (Data3F) (positive if probability > 0.5)(Auto, Manual #2)

Comparison of Predictions (Data3F) (positive if probability > 0.3)(Auto, Manual #2)

Simulation Conclusions - Design • Two star clusters improved small-scale features of variogram • Two star clusters improved prediction accuracy • Four star clusters offered little improvement over two stars

Simulation Conclusions - Models • Variogram model affects predictions • Kriging tends toward overall mean probability of presence, i.e. it smooths • Kriging builds patches whose diameter is approximately the range of the variogram • Conditional autoregressive model attempts to connect observed presence • Neither model had consistently higher sensitivity or specificity

Outline of Presentation • From last year to now … progress & new directions • Two-stage sample design • Spatial modeling of EMAP data • Simulation Example • Future work

Future Work • Further simulation studies on two stage design • Effect of sample size • Number of star clusters necessary to improve variogram estimation • Effect of size of star clusters • Bias from adaptive second-stage sampling • Advantages of indicator kriging and conditional autoregressive model • Sensitivity of conditional autoregressive model to initial values, prior distributions, and grid size • Sensitivity of kriging to variogram model specification

Future Work • Apply two-stage sample design to real data • DDT data from Santa Monica Bay, CA • EMAP data and local monitoring data • Freely distribute functions for applying the conditional autoregressive model on a hexagon lattice • Functions in R to produce hexagon lattice input for WinBUGS • File in WinBUGS to apply model • Investigate optimal grid size to achieve EMAP and spatial modeling goals

Systematic (EMAP) Grid Based on Variogram Model • Kriging variance • Analog for conditional autoregressive model

Systematic (EMAP) Grid Based on Variogram Model • Prediction variance is minimized by large covariance between prediction location and sample locations • For kriging, grid refers to sample locations • For conditional autoregressive, grid refers to sample locations and prediction locations • Want -------- Sample locations “close” together • Samples too far apart => • Kriging -> correctly uses no spatial relationship • Conditional autoregressive -> incorrectly uses assumed spatial relationship • Samples too close together => waste of resources

Local Enhancement of Global Estimation

Local Enhancement of Global Estimation

Presentation Transcript

Local to Global

Marginal Implicit Prices for Federal Land Proximity: A Comparison of Local and Global Estimation Techniques

Focal problem: global position estimation

Local and Local-Global Approximations

Local Politics of Global Sustainability

Local Contrast Enhancement

Local to global

Wavelet Estimation of a Local Long Memory Parameter

Exploring and Enhancement of GOES-R ABI for Rainfall Detection and Estimation

Global TRANSFORMATION ESTIMATION VIA LOCAL REGION CONSENSUS

GLOBAL AND LOCAL OPTIMISATION FOR PARAMETER ESTIMATION

Enhancement of textual images classification using their global and local visual content

Local Cost Estimation for Global Query Optimization in a Multidatabase System

Objectives of Enhancement

Enhancement of MELD

Global versus Local

Real-time Estimation of Accident Likelihood for Safety Enhancement

Local Cost Estimation for Global Query Optimization in a Multidatabase System

Local business SEO enhancement