Limitations of Statistical Measures of Error in Assessing the Accuracy of Glucose Sensors

Limitations of Statistical Measures of Error in Assessing the Accuracy of Glucose Sensors Craig Kollman1, Darrell Wilson2, Tim Wysocki3, Rosanna Fiallo-Scharer4, Eva Tsalikian5, William Tamborlane6, Roy Beck1, Katrina Ruedy1, and the Diabetes Research In Children Network (DirecNet) Study Group. 1Jaeb Center for Health Research, Tampa, FL; 2Division of Pediatric Endocrinology and Diabetes, Stanford University, Stanford, CA; 3Nemours Children’s Clinic, Jacksonville, FL; 4Barbara Davis Center for Childhood Diabetes, University of Colorado, Denver, CO;5Department of Pediatrics, University of Iowa, Carver College of Medicine, Iowa City, IA; 6Department of Pediatrics, Yale University School of Medicine, New Haven, CT.

Abstract • Limitations of Statistical Measures of Error in Assessing the Accuracy of Glucose Sensors. • The appropriate set of accuracy measures to evaluate near continuous glucose monitoring remains to be developed. Traditional methods applied to glucose meters do not adequately capture the time dimension of glucose sensor data. Moreover, some of these methods have substantial limitations which should be understood in order to place analysis results in proper context. We highlight these limitations using data from an inpatient study conducted by the DirecNet Study Group. • Error grid analyses and the area under the curve (AUC) for the detection of hypoglycemia are commonly cited statistics. The percentage of values within error grids A+B are usually quite high, even for inaccurate sensors, potentially giving a false sense of accuracy. When we simulated artificially inaccurate sensors by randomly shuffling paired sensor readings with laboratory reference glucose values, 76% and 78% of pairs still fell within zones A+B for the Clarke and modified error grids, respectively. The mean AUC value was 62%. • Correlation is also frequently used to quantify the accuracy of glucose sensors. This measure, however, is sensitive to the variation in true glucose levels. Simulations were run distributing the “true” glucose levels uniformly over the ranges indicated below (N=10,000 per sensor) and adding a normally distributed error (standard deviation 25 mg/dL in each case) for the sensor value. These simulated sensors all have identical levels of accuracy, but their correlation values vary considerably. • In summary, the use of zone A+B percentage in error grid analysis and the AUC statistic can give misleading notions of sensor accuracy. The correlation coefficient is not a consistent measure of sensor accuracy. Novel statistical approaches are needed to better characterize the near continuous nature of these sensors.

Statistical Methods for Assessing Glucose Accuracy • Originally developed for glucose meters. • Do not capture the near continuous nature of glucose sensors. • Difficult to assess trends. • How well do sensors characterize acute changes in glucose?

Traditional Measures of Accuracy • Error Grid Analysis • Receiver Operating Characteristics (ROC) • Area Under the Curve (AUC) • Correlation • Differences between Reference and Sensor Glucoses • Difference • Absolute difference • Relative difference • Relative absolute difference (RAD)

Goals of Error Grid Analysis • Want to distinguish clinically meaningful vs. less important errors in glucose measurements. • When would an erroneous value lead to an incorrect treatment decision? • Sensors not approved for real time treatment decisions. • Divide measurement errors into zones to distinguish increasing clinical significance of errors.

Potential Problems withError Grid Analysis • Zones A and B are narrower from 50-70 mg/dL. Many studies include few or no glucose values in this range. • Sensor accuracy often measured by the percentage of points falling in zones A+B. • This percentage can appear high, even for very inaccurate sensors. • Can give a misleading notion of sensor accuracy through chance agreement.

Area Under the Curve (AUC) • Often used to measure how accurately hypoglycemia (or hyperglycemia) is detected. • Receiver Operating Characteristics (ROC) analysis. • Look at different alarm levels that could be used and assess the sensitivity/specificity trade-off

Simulation Experiment • Use data from the DirecNet Inpatient Accuracy Study. • Make sensors artificially inaccurate by randomly shuffling the parings with the reference glucose. • Resulting “sensors” still have realistic glucose distribution. • Look at error grid and AUC analyses on resulting simulated data set. • Repeat 10,000 times.

Results for Shuffled Sensors Clarke Error Grid 76% Zone A+B Modified Error Grid 78% Zone A+B AUC 62% mean value

Remarks • Zones A and B on error grids are large enough that even inaccurate sensors will hit them the majority of the time. • Much of the ROC curve involves alarm levels that would not realistically be used in practice. • Resulting AUC value puts too much weight on high alarm settings.

Correlation • Statistical measure of association. • Number between –1 and +1. • Often used as a measure of sensor accuracy. • Sensitive to the amount of variation in the true glucose.

Another Simulation • Create 4 simulated sensors so that each has identical accuracy. • Do this by taking the sensor value to be the “true” value plus a normally distributed error with standard deviation = 25 mg/dL. • Average value of the “true” glucose is 200 mg/dL for all 4 simulated sensors. • Vary the range of true glucose values for each sensor.

Simulated Sensors with Identical Accuracy(N = 10,000 data pairs per sensor) Range of Pearson Intraclass Sensor #True GlucoseCorrelationCorrelation 1 175-225 0.50 0.40 2 150-250 0.76 0.73 3 100-300 0.92 0.91 4 50-350 0.96 0.96

Summary • Error grid analysis and AUC values can give inflated notions of sensor accuracy. • Important to understand that “baseline” values of these statistics from inaccurate sensors are already high. • Correlation is not a consistent measure of sensor accuracy.

Further Research • Develop statistical measures that can incorporate the near continuous nature of sensor values. • Assess sensors’ ability to detect acute changes in glucose. • Measure any time lag in glucose readings.

Stanford University • Bruce Buckingham • Darrell Wilson • Jennifer Block • Paula Clinton • Yale University • William Tamborlane • Stuart Weinzimer • Elizabeth Boland • Kristen Sikes • Amy Steffen • Jaeb Center for Health Research • Roy Beck • Katrina Ruedy • Craig Kollman • Dongyuan Xing • Cynthia Silvester • Barbara Davis Center • H. Peter Chase • Rosanna Fiallo-Scharer • Jennifer Fisher • Barbara Tallant • University of Iowa • Eva Tsalikian • Michael Tansey • Linda Larson • Julie Coffey • Amy Sheehan • Nemours Children’s Clinic • Tim Wysocki • Nelly Mauras • Keisha Bird • Kelly Lofton

Limitations of Statistical Measures of Error in Assessing the Accuracy of Glucose Sensors

Limitations of Statistical Measures of Error in Assessing the Accuracy of Glucose Sensors

Presentation Transcript

The Many Measures of Accuracy: How Are They Related?

Generic measures: limitations of use within specific settings ?

Limitations of traditional error-resilience methods

Limitations of MongoDB

Assessing the Accuracy of Legal Implementation Readiness Decisions

Statistics of Statistical Anisotropy Measures

Limitations of traditional error-resilience methods

Statistical measures of instantaneous spectra

The Limitations of Science

Assessing the Capacity of Statistical Systems

Measures of diagnostic accuracy

Assessing The Development Needs of the Statistical System

Development of Microfluidic Glucose Sensors

Analytics of Risk Management II: Statistical Measures of Risk

Bivariate Statistical Analysis : Measures of Association

Analytical Interferences and Physiological Limitations of Blood Glucose Meters

Statistical Measures of Uncertainty in Inverse Problems

DirecNet Study of the Accuracy of the Navigator Continuous Glucose

Assessing Quality of Statistical Data: The ROSC Procedure in Israel

LIMITATIONS OF PLANNING:- Following are the limitations of planning:-

Assessing The Development Needs of the Statistical System

Statistical evaluation of GPS error