220 likes | 246 Views
Benchmark database inhomogeneous data, surrogate data and synthetic data. Victor Venema. Goals of COST-HOME working group 1. Literature survey Benchmark dataset Known inhomogeneities Test the homogenisation algorithms (HA). Benchmark dataset. Real (inhomogeneous) climate records
E N D
Benchmark databaseinhomogeneous data, surrogate data and synthetic data Victor Venema
Goals of COST-HOME working group 1 • Literature survey • Benchmark dataset • Known inhomogeneities • Test the homogenisation algorithms (HA)
Benchmark dataset • Real (inhomogeneous) climate records • Most realistic case • Investigate if various HA find the same breaks • Good meta-data • Synthetic data • For example, Gaussian white noise • Insert know inhomogeneities • Test performance • Surrogate data • Empirical distribution and correlations • Insert know inhomogeneities • Compare to synthetic data: test of assumptions
Creation benchmark – Outline talk • Start with (in)homogeneous data • Multiple surrogate and synthetic realisations • Mask surrogate records • Add global trend • Insert inhomogeneities in station time series • Published on the web • Homogenize by COST participants and third parties • Analyse the results and publish
1) Start with homogeneous data • Monthly mean temperature and precipitation • Later also daily data (WG4), maybe other variables • Homogeneous • No missing data • Longer surrogates are based on multiple copies • Generated networks are 100 a
1) Start with inhomogeneous data • Distribution • Years with breaks are removed • Mean of section between breaks is adjusted to global mean • Spectrum • Longest period without any breaks in the stations • Surrogate is divided in overlapping sections • Fourier coefficients and phases are adjusted for every small section • No adjustments on large scales!
2) Multiple surrogate realisations • Multiple surrogate realisations • Temporal correlations • Station cross-correlations • Empirical distribution function • Annual cycle removed before, added at the end • Number of stations between 5 and 20 • Cross correlation varies as much as possible
5) Insert inhomogeneities in stations • Independent breaks • Determined at random for every station and time • 5 breaks per 100 a • Monthly slightly different perturbations • Temperature • Additive • Size: Gaussian distribution, σ=0.8°C • Rain • Multiplicative • Size: Gaussian distribution, <x>=1, σ=10%
5) Insert inhomogeneities in stations • Correlated break in network • One break in 50 % of networks • In 30 % of the station simultaneously • Position random • At least 10 % of data points on either side
5) Insert inhomogeneities in stations • Outliers • Size • Temperature: < 1 or > 99 percentile • Rain: < 0.1 or > 99.9 percentile • Frequency • 50 % of networks: 1 % • 50 % of networks: 3 %
5) Insert inhomogeneities in stations • Local trends (only temperature) • Linear increase or decrease in one station • Duration: 30, 60a • Maximum size: 0.2 to 1.5 °C • Frequency: once in 10 % of the stations • Also for rain?
6) Published on the web • Inhomogeneous data will be published on the COST-HOME homepage • Everyone is welcome to download and homogenize the data • http://www.meteo.uni-bonn.de/ mitarbeiter/venema/themes/homogenisation
7) Homogenize by participants • Return homogenised data • Should be in COST-HOME file format (next slide) • Return break detections • BREAK • OUTLI • BEGTR • ENDTR • Multiple breaks at one data possible
7) Homogenize by participants • COST-HOME file format: http://www.meteo.uni-bonn.de/ venema/themes/homogenisation/costhome_fileformat.pdf • For benchmark & COST homogenisation software • New since Vienna: • Stations files include height • Many clarifications
Work in progress • Preliminary benchmark: http://www.meteo.uni-bonn.de/ venema/themes/homogenisation/ • Write report on the benchmark dataset • More input data • Set deadline for the availability benchmark • Deadline for the return of the homogeneous data • Agree on the details of the benchmark • Daily data: other, realistic, fair inhomogeneities