Daiwen Kang 1 , Rohit Mathur 2 , Brian Eder 2 , Kenneth Schere 2 , and S. Trivikrama Rao 2

Five-year Progress in the Performance of Air Quality Forecast Models: Analysis on Categorical Statistics for the National Air Quality Forecast Capacity (NAQFC) Daiwen Kang1, Rohit Mathur2,Brian Eder2, Kenneth Schere2, and S. Trivikrama Rao2 1Computer Science Corporation 2Atmospheric Modeling and Analysis Division NERL/U.S. EPA 8th Annual CMAS Conference, Chapel Hill, NC, October 19 – 21, 2009

Motivations • Assess the progress in performance improvements for categorical metrics of the NAQFC system for O3 forecasts over the past 5 years • Identify categorical metrics that can well characterize AQF performance for categorical forecasts • Assess AQI-based categorical performances • Propose guidelines for AQF categorical evaluations based on the analysis of KF bias-adjusted forecasts and human forecasts.

Traditional Categorical Metrics Observed Exceedances & Non-Exceedances • versus • Forecast Exceedances & Non-Exceedances Forecast Exceedance No Yes a b c d No Yes Observed Exceedance Forecast Observation

AQI Definition and Categories Where: Ip= the index for pollutant p (O3 in this case) Cp = the rounded concentration of pollutant p BPHi = the breakpoint that is ≥ Cp BPLo = the breakpoint that is ≤ Cp IHi = the AQI value corresponding toBPHi ILo = the AQI value corresponding toBPLo

AQI-based Metrics Definition where i is the AQI index category (1, 2, 3, 4, 5) or the color scheme (green, yellow, orange, red, purple), and are the number of observed and forecast instances in the ith category, respectively, is the correctly forecast instances in the ith category, and is the total number of records.

Bias (B) Accuracy (A) Categorical Stats over 3x domain (1) The accuracy is always high (>90%) because the correctly forecast non-exceedence points dominate. Bias indicates that the model has always over estimated execeedences through the years.

Categorical Stats over 3x domain (2) eH eFAR False alarm ratios are quite high across all the years ranging from 70 to 90% on average. Mean hit rates are generally greater than 40% except in the year of 2006; during 2006, a big transition for the meteorology model was made from Eta to WRF.

Categorical Stats over 3x domain (3) Critical success index reflects the combination of false alarm ratio and hit rate. A forecast system can have both high FAR and high H or low FAR and low H, both resulting in low CSI. High CSI values indicate moderate FAR and reasonable H. Critical Success Index (CSI)

Metropolitan Statistical Area (MSA) Local forecasters generally forecast the maximum AQI value that they expect to occur anywhere within an MSA; and then verify this forecast with the maximum monitored value within that area. Here is an example of Charlotte MSA that is comprised of 8 counties, 7 in NC, 1 in SC. There are 8 AQS monitors in those counties, 7 in NC, 1 in SC. And The MSA is represented by 103, 12-km grid cells by the NAQFC. AQI O3

MSAs used in this research • Atlanta • Charlotte • Dallas • Houston • Washington DC

Kalman Filter Bias-adjustment • Kalman Filter (KF) was used to bias-adjust the raw model forecasts for the continental U.S. domain during 2005-2008 summer seasons at all locations where AIRNow monitoring data were available. • The categorical performance of both raw model and KF forecasts was assessed over: 1. all sites (paired observation-model grid cell) within the domain, 2. sites within all MSAs, and 3. MSA value (the maximum value out of all the sites within the MSA for each day)

NAQFC Categorical Performance vs.Human Forecast Exceedance Hit Rate Exceedance False Alarm Rate Human NAQFC Because the NAQFC is positively biased, it tends to capture a higher percentage of exceedance hit rates, but this also results in a higher percentage of false alarm ratios. The critical success index results were mixed over MSAs, but on average the NAQFC performed better than Human Forecasts.

cH for the raw model and KF forecasts at all sites and MSAs Domain All Sites: All AIRNow sites within the domain are included in the calculation MSA All Sites: All the AIRNow sites which are located in one of the MSAs listed earlier MSA: The maximum values from both AIRNow sites and the model forecasts within each of the MSAs are used to generate the stats.

cCSI for the raw model and KF forecasts at all sites and MSAs

eH for the raw model and KF forecasts at all sites and MSAs The hit rates are significantly increased when evaluated over MSAs compared to over individual sites. KF bias-adjusted forecasts improved hit rate, especially when the raw model was significantly flawed with systematic biases as in 2006.

eFAR for the raw model and KF forecasts at all sites and MSAs False alarm ratios are significantly lower when evaluated over MSAs than over the individual sites. The KF bias-adjusted forecasts significantly reduced FAR for all the situations across all the years.

eCSI for the raw model and KF forecasts at all sites and MSAs eCSI values almost doubled when evaluated over MSAs compared to those evaluated over the individual sites. The KF bias-adjusted forecasts had larger eCSI values than the raw model forecasts, especially when evaluated over the individual sites.

oH for the raw model and KF forecasts at all sites and MSAs The overall hit rates were consistent and stable and slowly improving over the years for both the KF and raw model forecasts. KF forecasts always had larger oH values than the raw model. oH values decreased when evaluated over MSAs (but still > 50%) due to overestimation at low AQIs compared to those evaluated over individual sites.

oCSI for the raw model and KF forecasts at all sites and MSAs The overall critical success index (oCSI) is quite consistent and increases over the years. The oCSI values are lower when evaluated over MSAs than over individual site because the MSA values are the maximum of all the sites within the MSA resulting in lower hit rate for low AQI values (overestimate low AQI).

Minimum values of H and CSI during the years 2005-2008 over the continental US domain and MSAs • MSA based analysis provides a more objective assessment of the practical use of the guidance, consistent with the way local forecasts are typically developed; • (2) Bias-adjustment further improves the predictive skill of the system thereby improving the utility of the forecast products.

Guidelines for AQF models These guideline values are in between the minimum values (rounded) of raw model and the KF-adjusted forecasts, which set (1) as targets for what the raw models can realistically achieve as a result of model improvements in the short term; (2) as a reference that any AQF models should perform when combined with KF-adjustment.

Conclusions • Comparisons indicate that the NAQFC performed at least as well as, if not better than, the human forecasts over MSAs. • The categorical performance of NAQFC has been consistent and stable over the years from 2005 to 2008, with the exception in 2006 when the model underwent significant changes resulting in degraded categorical performance. • Kalman filter bias-adjustment resulted in improvement over almost all categorical statistics, especially when the raw model was systematically biased in 2006.

Conclusions • Hit Rate (H), False Alarm Ratio (FAR), and Critical Success Index (CSI) are three most appropriate metrics to gauge the categorical performance of an AQF; CSI is even better than H and FAR, because it reflects the combination of H and FAR. • The AQI based H and CSI over all sites and MSAs are good indicators of overall performance for categorical forecasts. • Based on the analysis in this study, the following guidelines are proposed: eH >= 30%, eCSI >= 20%, oH and oCSI >= 50% for all sites; eH and oH >= 50%, eCSI and oCSI >= 30% for MSAs.

Acknowledgements The authors would like to thank the NOAA/EPA air quality forecast program and the EPA’s AIRNow program for providing forecasted and observed O3 data. Thanks also goes to Scott Jackson for providing the Human forecast data. Disclaimer The United States Environmental Protection Agency through its Office of Research and Development funded and managed the research described here. It has been subjected to Agency’s administrative review and approved for presentation.

Daiwen Kang 1 , Rohit Mathur 2 , Brian Eder 2 , Kenneth Schere 2 , and S. Trivikrama Rao 2

Daiwen Kang 1 , Rohit Mathur 2 , Brian Eder 2 , Kenneth Schere 2 , and S. Trivikrama Rao 2

Presentation Transcript

,,: 1-1 :, 1-2 :, 2-1 :,, 2-2 :, 2-3 :,

2--1- 2- 2- -2-

: 1 : 2 : 1 : 2 : : 1 : 2 :

Zarah Rahman 1 , Jonny Crocker 2 , Kang Chang 2 , Ranjiv Khush 1 and Jamie Bartram 2

2-1 2-2 2-3 2-4

2+2 = 4 2x2 = 2+2 1+2 1/3

2-1 and 2-3

Daiwen Kang 1 , Rohit Mathur 2 , Brian Eder 2 , Kenneth Schere 2 , and S. Trivikrama Rao 2

Markus Geimer 2 ) , Bert Wesarg 1 ) , Brian Wylie 2)

S. Kravtsov 1, 2 , I. Rudeva 3, 2 , and S. K. Gulev 2

x 1 bar – x 2 bar is N( m 1 - m 2 , sqrt( s 1 2 /n 1 + s 2 2 /n 2 ))

Ariel F. Stein 1 , Rohit Mathur 2 , Daiwen Kang 3 and Roland R. Draxler 4

Adel Hanna, 1 Rohit Mathur, 1 Carey Jang 2 and Joseph Pinto 2 1 Environmental Programs

 V 1 2 / 2 + p 1 /  + gz 1 =  V 2 2 /2 + p 2 /  + gz 2 + h lT

m 2 /s 2 m/s 2 m 2 /s m/s m s

Sections 2-1 and 2-2

1 2 1 2

S. Yenaeng 1 , S. Saelee 2 and S. Krootjohn 2

S. K. Mohammed 1 R.S. Gupta 2 R. Rao 2 V. Joseph 2 P. Srikantiah 3

Sections 2-1 and 2-2