240 likes | 406 Views
Hidden Error Variances and the optimal combination of static and flow dependent variances. Craig H. Bishop Elizabeth A Satterfield Kevin T. Shanley , David Kuhl and Tom Rosmond Naval Research Laboratory Monterey CA PDP WG , June 2012 Reading, UK. Introduction.
E N D
Hidden Error Variancesand the optimal combination of static and flow dependent variances Craig H. Bishop Elizabeth A Satterfield Kevin T. Shanley, David Kuhl and Tom Rosmond Naval Research Laboratory Monterey CA PDP WG, June 2012 Reading, UK
Introduction • Error Variance: Mean of a large number of squared forecast errors. • Flow Dependent Error Variance: Mean of a large number of squared forecast errors given a particular flow. (In order to obtain a large number of errors the “flow” or “condition” must repeat itself). • Hidden Error Variance: A flow dependent error variance that is formally unobservable because the particular flow does not repeat itself. “A conundrum of predictability research is that while the prediction of flow dependent error distributions is one of its main foci, chaos hides flow dependent forecast error distributions from empirical observation.” Bishop and Satterfield (2012a,b, MWR, in review), Satterfield and Bishop (2012ab, to be submitted)
Introduction • Accurate predictions of error variance are vital todata assimilation, ensemble weather forecasting, climate change assessment, and adaptive observational network design. • It is difficult to find good historical analogs with which one could attempt to characterize the mean square error of forecasts similar to the particular forecast of interest. • Ensemble forecast systems provide imperfect estimates of error variance as a function of the flow of the day.
7 60 a) 850-hpa T (oC) b) 850-hpa Td (oC) 45o 45o 6 50 5 40 4 30 ET-ctl 3 ET-ens 20 2 10 1 0 0 0 1 2 3 4 5 6 7 0 10 20 30 40 50 60 Binned Ensemble variance Binned Ensemble variance 50 c) 850-hpa u (m s-1) d) 850-hpa v (m s-1) 45o 45o 40 40 30 30 20 20 10 10 0 0 0 10 20 30 40 0 10 20 30 40 50 Binned Ensemble variance Binned Ensemble variance Previous work: spread-skill diagrams Collect innovations (ob – fcst) corresponding to similar ensemble variances into a bin. Compute bin averaged squared innovation. It should increase with ensemble variance. 600 Binned squared Error (mm)^2 48-h total precipitation (mm) 45o ET-ctl 500 400 ET-ens 300 200 Mean ET-ctl = 119.41 100 Mean ET-ens = 100.82 0 0 100 200 300 400 500 600 These diagrams do not reveal the climatological range of true error variances, nor the degree of variation of ensemble variance given a true error variance. Binned Ensemble variance (mm^2) Spread-skill plot for COAMPS simulations similar to for 48-h total accumulated precipitation (mm)^2. Significant spread-skill relationships were found for all variables – including precipitation Spread-skill plot for COAMPS simulations relative to the control (solid line) and Mean (dashed line)
Introduction • UKMO, NRL and NCEP have all found that forecast error covariance models based on a linear combination of a static and ensemble based estimate of the forecast error covariance outperform covariance models based on static or ensemble covariances on their own. • Currently, weights for the static and ensemble components of such Hybrid error covariance models must be obtained from trial and error. Each “trial” is the cost of a 1-6 month DA/fcst exp. • Could information about the relative accuracies of the static and flow dependent estimates of forecast equation covariance be used to speed the tuning of Hybrids?
Goals • Develop reasonable analytic forms for the distribution of true error variances given an imperfect ensemble variance. • Derive formulae for deducing the analytic forms for the prior, likelihood and posterior distributions from a large archive of innovations and corresponding ensemble variances. • Justify and improve Hybrid data assimilation schemes that linearly combine static variances with ensemble variances.
What is the true flow dependent error variance ? (Slartibartfast – Magrathean designer of planets, D. Adams, Hitchhikers …) • Imagine an unimaginably large number of quasi-identical Earths.
25000 Lorenz Model ReplicatesReveal Hidden Error Variance • We use 10 variable Lorenz ’96 model with additive model error and 20 member Ensemble Transform Kalman Filter (ETKF). • We create 25,000 independent time series of analyses and forecasts, each having the same true state and differing only in random draws of observation error. • Error Variances are computed for each spatio-temporal point by averaging the squared forecast error across the 25,000 replicates. First demonstration of ETKF accurately predicting true error variance in non-linear system. Scatter plot of ETKF ensemble variance from a single replicate system as a function of true error variance. The true error variance is estimated from all 25,000 replicate systems. The linear fit to the points on the scatter plot is governed by the equation .
(a) M=8 (b) M=4 Controlled accuracy of ensemble variances by degrading ETKF variances • A primary objective was to show how the distribution of true error variances given an imperfect ensemble variance changes as the accuracy of the error variance prediction changes. • We created degraded ensemble variances by sampling a Gamma distribution with the same mean as the ETKF variance and a relative variance determined by an “effective ensemble size”. Fig. 3: Examples of assumed likelihood gamma pdfsof ensemble sample variances with a mean of unity . Panel (a) is for an effective ensemble size of M=8, or equivalently, a relative variance of 2/7. Panel (b) is for an effective ensemble size of 4, or equivalently, a relative variance of 2/3.
Histograms of true error variance given an imperfect ensemble variance The histograms give an empirical estimate of the pdf of true error variances given a constrained range of sample variances for an 8 member ensemble. The ranges are given on each figure; they correspond to the 2nd and 34th bins, respectively, of the 35 bins of true error variance. The solid lines give the fit of an inverse-gamma function to the data in each bin. M=8 M=8 Inverse-gamma distribution is a very good fit to empirically derived histogram of true error variances given an ensemble variance for all ensemble variance categories.
Climatological pdf of true error variances M=8 Inverse-gamma distribution gives a reasonable fit to empirically derived prior climatological pdf of true error variances. M=8 Prior climatological distribution of true error variances. Bars show the probability density histogram of forecast error variances. Solid line shows the fit of the pdf (eq 4) to the data. The thick dashed line marks the mean of both the pdf and the data.
Empirical estimation of pdf of true error variance given ensemble variance from 25000 trials (a) M=8, empirical (b) M=2, empirical Red lines depict empirical estimate of pdf of true error variance (ordinate axis) given fixed values of ensemble variance (abscissa axis). Thin green and blue lines give the mode and mean of the empirical estimates of the mode and mean of these estimates. Panels (a) and (b) show the empirical estimates for random sample ensembles of sizes M=2 and M=8, respectively. The grey shading gives an inverse-gamma pdf fit to the climatological pdf of true error variances.
An analytic model of hidden error variance • The error of the deterministic forecast is a random draw from a Gaussian distribution, whose true variance i2 is a random draw from a prior climatological inverse gamma pdfof error variances. • We assume an imperfect ensemble prediction is drawn from a likelihood gamma pdfof ensemble variances with mean a(i2 - min2)-s2min • Bayes’ Theorem is used to define the posteriorinverse gamma pdfof error variances given an imperfect ensemble variance si2 True Error Variance sensitivity stochastic Climatological Prior Distribution Likelihood distribution of s2 given a particular
(a) M=8, empirical (b) M=2, empirical Green lines give mode Blue lines give mean Given an ensemble variance, there are a broad range of possible true error variances. Current DA schemes require a single value. For the minimum error variance estimate, use the posterior mean. For the maximal likelihood estimate, … For QC, … (c) M=8, analytic (d) M=2, analytic
Posterior mean error variance is a Hybrid combination of static and ensemble variances Implications for Ensemble DA? Implications for 4DVAR? Flow dependent ensemble variance Static climatological mean error variance As the stochastic variation of ensemble variance about the true variance goes to zero, the weight on the ensemble variance goes to 1. If there is any imperfection in the flow-dependent ensemble variance, the optimal error variance estimate gives weight to the climatological covariance. If there is no variance of the true error variance, the weight on the static variance goes to 1. Purely flow dependent error variance models are sub-optimal
Problem Static covariance Flow-dependent ensemble prediction Covariance matrix of unavoidable errors
Solution: Equations that define hidden parameters from data assimilation output
Equations recover hidden parameters “observed” by replicate systems • “Observed” values are obtained from 175 DA cycles of the 25,000 “replicate systems” • Minimum, mean and maximum of the values retrieved from 21 single system independent time series of • with n=200,000. • Retrieved hidden parameters, var(s2),s2, a and M are shown in plots (a), (b), (c) and (d), respectively • “Light blue bars: M=2, Dark blue bars: M=8 • The “given” ensemble sizes in (d) are the random sample ensemble sizes used to degrade the quality of the ETKF ensemble variance.
Recovery of min(sigma^2) is inaccurate when min(sigma^2) is small The values marked “specified” are equal to values retrieved from Lorenz model experiments with a “given” M=8 and differing values of the model error q. Each retrieval is from 2,000,000 (innovation, ensemble-variance) pairs synthetically generated from specified distributions. Each plot summarizes informationfrom 60 independent retrievals. The values marked as min, mean, max and std are the minimum, mean, maximum and standard-deviation of the values retrieved from 60 completely independent synthetically generated data sets.
Variation of optimal weights with model error and ensemble size, M Ensemble variance weight in dark grey. Static variance weight in light grey. q gives model error variance parameter M gives an “effective ensemble size” corresponding to the relative variance of a random normal ensemble of size M. Variation of weights for mean of posterior distribution of true error variances with model error q and given effective ensemble size M. Black bars give the weights for the de-biased flow-dependent ensemble variance while grey bars give the corresponding weights for the static mean of the climatological error variances. The weight on the ensemble variance increases with ensemble size The weight on the static variance increases as model error variance increases
Application to Hybrid DA: Lorenz model 1, perturbed observations. • A suboptimal M=32 member ensemble is generated using a perturbed observations update. • A climatological error covariance matrix (Pfclimatology) is formed by collecting forecast errors for 100,000 time steps (using an 100% ensemble based error covariance matrix) • Pfhybrid is computed at each time step and used in the ETKF DA scheme to obtain an analysis, which is cycled. • We compute the “best practice” hybrid and the “standard” hybrid for all alpha values for comparison. Hybrid based on weights from theory performs as well as that obtained from brute force tuning of the weights. “Best Practice” hybrid: The ensemble based Pf is corrected by a factor of
NAVDAS-AR-Hybrid Resultsfigures provided by David Kuhl Alpha=0.5 Alpha=We computed for 6 regions Hybrid based on weights from theory performs as well as that obtained from brute force tuning of the weights.
Possible approaches to concerns in application of theory to Hybrid DA • The eq’s are for variances not correlations. • Assume that what applies for variances also applies to correlations. • Apply eq’s to coefficients of eigenvectors of approximation to true forecast error covariance matrix. • Develop theory for correlations. • The eq’s include a kurtosis term which is likely to be sensitive to data QC decisions based on the size of innovations. • Fortunately, it turns out that the weight for the ensemble variances is entirely independent of this term . The weight for the static term can then be obtained by insisting that the average of Hybrid variance be consistent with innovation variance. • It is difficult to estimate min(sigma^2) when min(sigma^2) is small. • If the recovered min(sigma^2) is small, just set it and s^2_min to zero.
Conclusions • A simple theory of the relationship between ensemble variances and error variances has been developed. • This theory provides a new method for estimating from an archive of pairs of innovations and variances (s2,ω) • the climatological pdf of error variances, • the pdf of ensemble variances given a true error variance, and • The Posterior Mean of the pdf of error Variances (PMV) • Our approximations of an inverse-gamma PDF for the prior climatological distribution of innovation variances and a gamma PDF for the likelihood distribution of ensemble variances given an innovation variance are reasonably accurate for the Lorenz ’96 system. • Equation for PMV provides a theoretical justification for Hybrid DA systems, which linearly combine static and flow-dependent covariances. • Recovery and application of optimal weights of flow dependent and climatological variances has been demonstrated in the Lorenz ’96 system (model 1) and in a low resolution Hybrid-4DVAR version of the Navy’s operational data assimilation scheme. • Enables Hybrid weights to be defined regionally at a fraction of the cost of weights obtained via trial and error. • QC and ensemble post-processing applications are also possible.