130 likes | 149 Views
Learn about quality control processes, data cleanliness methods, and tools to enhance data reliability in health informatics projects.
E N D
Data Quality Data Cleaning Beverly Musick, M.S. May 20, 2010 This module was recorded at the health informatics –training course—data management series offered by the Regional East African Centre for Health Informatics (REACH-Informatics) in Eldoret, Kenya. Funding was made possible by NIH’s Fogarty Center. The training was held at the Academic Model Providing Access to Healthcare (AMPATH) , a USAID-funded program, supported by the Regenstrief Institute at Indiana University. The moduleswere created in collaboration with the School of Informatics at IUPUI. Creative Commons Attribution-ShareAlike 3.0 Unported License
Quality Control • Quality Control is the process of monitoring and maintaining the reliability, accuracy, and completeness of the data during the conduct of the project. • Requires a multidisciplinary team which includes clinicians, data entry staff, statisticians, systems administrations, and data managers. • Requires sharing knowledge about disease progression, clinical practice patterns, effects of medical treatments, relationships between variables and expected timing of events.
Ensuring Data Quality • Point of Assessment • Collection: review form before patient leaves the clinic • Entry: range restrictions, logical checks • Post-entry clean-up queries • Statistical Analysis: data trends
Ensuring Data Quality (cont.) • To ensure data quality the data manager needs to understand: • Goals of program • Standards of operation • Impact of intervention or program • Relationships between variables • Expected timing of events
Clean-up Queries Missing Data • Generate reports regarding the percent of missing data for each item on the data collection forms • Highlight differences between programs or specific groups of patients in order to identify methods to minimize missing data
Clean-up Queries Date Comparison • Ensure that thedate of birth precedes all other dates. • Calculate age and verify that the date of birth makes sense. • For patients who have died, ensure that the date of death follows all other dates.
Clean-up Queries Date Comparison (cont.) • Generate a clean-up list for observation dates that are after today’s date or, preferably, the date of data entry. • Generate a similar list for observation dates that precede the date of inception of your program. • Examine the interval between observation/visit dates to ensure that the expected time frame is reflected.
Clean-up Queries Checks on Numeric Data • Confirm all values are within the expected range. • Investigate possible outliers by verifying against source document, comparing with other values for same subject, or cross-referencing with other variables such as current illnesses in the case of elevated lab result • Confirm that values make sense with respect to patient’s age, gender, disease status, etc.
Clean-up Queries Checks on Adult Heights/Weights • Calculate BMI from height and weight (BMI=weight (kg) / height(m)) • Most should be between 10 and 40 • Flag unexpected weight fluctuations
Clean-up Queries Checks on Pediatric Heights/Weights • Calculate weight-for-age Z-scores using Epi Info NutStat software (http://www.cdc.gov/epiinfo/) or SAS software (http://www.cdc.gov/nccdphp/dnpao/growthcharts/resources/sas.htm) • Review date of birth, visit date, age and weight for Z-scores less than -5 or greater than 5. • Similar checks can be made with height-for-age and weight-for-height Z-scores.
Clean-up Queries Checks on Numeric Data (cont.) • Review longitudinal data. • If special missing values are coded, ensure that the codes do not overlap with valid data. • For lab results, a qualifier such as < or > should be stored in a separate variable.
Clean-up Queries Cross-Variable Checks • Confirm that there is consistency between gender and other variables such as pregnancy. • Look for contraindicated medication combinations. • Look for data that may have been recorded under the wrong patient ID.