210 likes | 348 Views
Putting non-parametric methods in the service of public health. Seydou Doumbia, MD, PhD, Professor of Epidemiology, Department of Public Health & Deputy Director of NIAID/NIH Research Program at Malaria Research & Training Center, Faculty of Medicine, University of Bamako, Mali. INTRODUCTION.
E N D
Putting non-parametric methods in the service of public health Seydou Doumbia, MD, PhD,Professor of Epidemiology, Department of Public Health & Deputy Director of NIAID/NIH Research Program at Malaria Research & Training Center, Faculty of Medicine, University of Bamako, Mali
INTRODUCTION • Importance of forecasting • ‘’Life has to be lived forward but can only be understand backward’’ • Basic and ultimate purposes of forecasting is to predict in the near term what will happen in order to avoid substantial cost or loss • The cost of poor prediction may be the loss of soldiers in war, jobs in economy, profit in business • With informed opinions on future probabilities the planner can mobilize and deploy necessary resources and reduce the substantial cost of miscalculation
Outcome (predicted or measured) = Cost-benefit Financial + non-financial costs Introduction (CONTINUED) Predicting infectious diseases can maximize intervention impact and minimize cost The cost-benefit for an epidemiological intervention may be measured a posteriori or estimated a priori. Optimum predictions may improve outcomes.
There are myriad predictive approaches in the statistical and mathematical epidemiology, ranging in complexity and generalizability. • Most approaches are parametric and hence, often difficult to optimize, disease specific, sensitive to outliers, and setting dependent. • A toolbox encapsulating general-purpose approaches, applicable to different diseases and settings, is needed. • Thus, let’s discuss a few unorthodox predictive approaches that may become part of such toolbox.
General-purpose methods • Disease independent • Easily operated • Versatile • Adaptable Unorthodox approaches Non-parametric methods Fuzzy logic methods Artificial intelligence Predicting infectious diseases Endemic, meso-endemic, or epidemic Multi- or uni-variate requirements Temporally or spatially-temporally extended
Example 1: Non-parametric approach • Exponential smoothing methods: • Econometric tradition (eg inventory control) • Capture non-linearity for endemic and meso-endemic time-series (climates, geography, demography) • Learn from experience (adapt to time-series perturbations) • Usually univariate yet covariates may be introduced • District of Niono, Mali: • Meso-endemic time-series: Diarrhea, Acute Respiratory Infection, Malaria, • Endemic time-series: Schistosomiasis time-series • Sub-optimum for epidemic time-series
Irrigation system and stagnant water reservoirs in the district of Niono, Mali.
Observed diarrhea consultation rate time-series are depicted as black lines while red and blue traces correspond to contemporaneous 2- and 3-month horizon forecasts, respectively; their 95% prediction interval bounds are symbolized by dots of the same colors. Forecasts and prediction interval bounds are calculated with a bootstrap-coupled seasonal multiplicative Holt-Winters method. Panel A: 0–11 months; Panel B: 1–4 years; Panel C: 5–15 years; and, Panel D: >15 years. Medina DC et al. (2007) Forecasting Non-Stationary Diarrhea, Acute Respiratory Infection, and Malaria Time-Series in Niono, Mali. PLoS ONE 2(11): e1181.
Observed ARI consultation rate time-series are depicted as black lines while red and blue traces correspond to contemporaneous 2- and 3-month horizon forecasts, respectively; their 95% prediction interval bounds are symbolized by dots of the same colors. Forecasts and prediction interval bounds are calculated with a bootstrap-coupled seasonal multiplicative Holt-Winters method. Panel A: 0–11 months; Panel B: 1–4 years; Panel C: 5–15 years; and, Panel D: >15 years. Medina DC et al. (2007) Forecasting Non-Stationary Diarrhea, Acute Respiratory Infection, and Malaria Time-Series in Niono, Mali. PLoS ONE 2(11): e1181.
Observed malaria consultation rate time-series are depicted as black lines while red and blue traces correspond to contemporaneous 2- and 3-month horizon forecasts, respectively; their 95% prediction interval bounds are symbolized by dots of the same colors. Forecasts and prediction interval bounds are calculated with a bootstrap-coupled seasonal multiplicative Holt-Winters method. Panel A: 0–11 months; Panel B: 1–4 years; Panel C: 5–15 years; and, Panel D: >15 years. Medina DC et al. (2007) Forecasting Non-Stationary Diarrhea, Acute Respiratory Infection, and Malaria Time-Series in Niono, Mali. PLoS ONE 2(11): e1181.
Thus, SA3 degenerates faster than the MHW method as the forecast horizon increases Medina DC et al. (2007) Forecasting Non-Stationary Diarrhea, Acute Respiratory Infection, and Malaria Time-Series in Niono, Mali. PLoS ONE 2(11): e1181.
Observed Schistosoma haematobium consultation rate time-series in the district of Niono, Mali, are depicted as black lines in this composite panel while red traces correspond to contemporaneous h-month horizon forecasts; 95% prediction interval bounds are symbolized by red dots of the same color. Forecasts were generated with exponential smoothing (ES) methods, which are encapsulated within the state-space forecasting framework. Panels A, B, C, and D correspond to 2-, 3-, 4-, and 5-month horizon forecasts, respectively. Medina DC et al. (2008) State–Space Forecasting of Schistosoma haematobium Time-Series in Niono, Mali. PLoS Negl Trop Dis 2(8): e276.
Mean absolute percentage error (MAPE) values between Schistosoma haematobium time-series observations for the district of Niono, Mali, and their corresponding h-month horizon forecasts measure external accuracy. MAPE values for 1–5 month horizon forecasts were circa 25. Therefore, this panel demonstrates that forecast accuracy is reasonable for short horizons. Of note, MAPE assesses the skill of h-month horizon forecasts. Medina DC et al. (2008) State–Space Forecasting of Schistosoma haematobium Time-Series in Niono, Mali. PLoS Negl Trop Dis 2(8): e276.
Example 2: Knowledge-driven approach • Fuzzy logic functions (e.g. trigonometric, weighted, etc): • Engineering tradition • Attempts to assign membership to an item with different degrees of certainty • Knowledge- and or data-driven • Capture non-linearity (climates, geography, demography) • Learn from experience • Usually multivariate • Optimum for spatially extended system with scarce data • African continent: • Rift Valley Fever
Endemic suitability map for Rift Valley fever in Africa based on ordered weighted averages analysis. Suitability scores range from 0 (completely unsuitable) to 255 (completely suitable). Clements et al.International Journal of Health Geographics 2006 5:57 Epidemic suitability map for Rift Valley fever in Africa based on ordered weighted averages analysis. Suitability scores range from 0 (completely unsuitable) to 255 (completely suitable).Clements et al.International Journal of Health Geographics 2006 5:57
Overlay of observed serological prevalence and estimated endemic suitability for Rift Valley fever in Senegal (ruminant). Suitability estimates were derived using weighted linear combination. Clements et al.International Journal of Health Geographics 2006 5:57 Overlay of observed serological prevalence and estimated epidemic suitability for Rift Valley fever in Senegal (ruminant). Suitability estimates were derived using weighted linear combination. Clements et al.International Journal of Health Geographics 2006 5:57
Example 3: Artificial Intelligence approach • Support vector machines: • Artificial intelligence tradition: Kernel methods, Support-vector Machines (regression, classification, anomaly detection), Neural networks • Solve problems for which analytical treatment is lacking or intractable • Capture non-linearity (climates, geography, demography) • Learn from experience • Usually univariate or multivariate • Temporally or spatially-temporally extended • Support Vector Regression (SVR): • Kernel-Based transform data set into a linear space • Large data sets automatic regularization • Highly generalizeable
Support vector machines is similar to kernel-transforming a non-linear input data into a linear high-dimensional feature space where simple linear regression can be executed. The output is always in the original dimension.
Somalia: • Ruminant IgG sero-prevalence • Two-stage cluster-randomized serological survey • Spatial estimates with SVR • Built-in bootstrap for dispersion estimation Figure 8. Spatial ruminant serological spatial prevalence. Centrality and dispersion were calculated via B = 100 ordinary bootstraps of multivariate observations, SVR-based spatially-resolved prevalence estimation for each re-sample, and finally computation of adequate order statistics. A) median, B) maximum, C) IQR, and D) minimum. Courtesy of Daniel Medina..
Conclusion • Non-parametric approaches may be applied to multiple diseases and settings without parametric disadvantages such as multi-colinearity and sensitivity to outliers. • Although non-parametric approaches are like a “black-box” approach, they are robust, simply interpreted, and easily optimized. • Fuzzy logic is ideal for spatially extended areas for which transmission is epidemic and or data are scarce. [Thus, minimizing data collection needs.] • The general-purpose nature of non-parametric/fuzzy logic/artificial intelligence approaches implies that studies for multiple diseases and sites could be better compared • Adequate predictions maximize intervention and minimize costs
Acknowledgements Thanks to the organizers, participants, Malaria Research & Training Center, Mali; Columbia University, US; and the District Hospital of Niono, Mali.