270 likes | 739 Views
Information Management Framework Data Quality. What is quality. Quality is dynamic concept that is continuously changing to respond to changing customer requirements Defined in 3 ways: Conformance to specifications (DQA) Fitness for use (Surveys). Quality issues. Problems can result from:
E N D
What is quality • Quality is dynamic concept that is continuously changing to respond to changing customer requirements • Defined in 3 ways: • Conformance to specifications (DQA) • Fitness for use (Surveys)
Quality issues • Problems can result from: • Human error • Machine error • Process error
Conformance to specifications: Quality Plan Data Quality Assessments
DQ Assessment and Remediation Process Data Remediation Approval / Priority Process Part 3 Data Management Plan Data Quality Assessment Audit Recommendations Data Store Data Collection Data Access Historic data Collection Storage Access & Use Archive/Disposal Information lifecycle phases
Business rules • Each business rule should have an expected outcome (benchmark) • Business rules need to align to quality ANZLIC elements
Findings - DQ Processes • The processes and guidelines are good! • The Data Management Plan is important • Needs to be completed by all data sets prior to Assessment • Benchmarks for quality established with Data Managers before DQA
Soil Profile • Very large and varied data set (millions of soil properties) • Where Data exists - is mostly good • Many missing values • Data Transformation Errors • Data on forms different to values in database • Missing values set to default values in load program.
Data Analysis – Soil Properties • Examples of problems: • Location Accuracy - Invalid grid references for a grid zone • Mandatory Fields missing data • Nature of Exposure - 1269 records missing value • Logical Inconsistencies • If Horizon Code begins with 'B' And ACS Order is 'SO' (Sodosol)Then pH >= 5.5238 records in error.
Data Analysis – Ground Water • Minimal spatial data (point locations only) • Data where present is mostly good • Many missing values
Examples of problems • Invalid Key fields • Work Number of non standard format • Location Accuracy • Invalid grid references for a grid zone • Logical Inconsistencies • Jobs completed before they started • Hole depth of 36km • Mandatory Fields missing data • Work Type Code - 1503 records missing value.
Region Code Region Name GW Licenses in LAS GW Licenses in GDS GW Licenses not in GDS Percentage Missing 10 Sydney - South Coast 3622 3420 202 5% 20 Hunter 2000 1280 720 36% 30 North Coast 3201 3162 39 1% 40 Murrumbidgee 1912 1807 105 5% 50 Murray 2350 911 1439 61% 60 Lower Murray / Darling 84 42 42 50% 70 Lachlan 1913 1371 542 28% 80 Macquarie - Western 2345 2002 343 14% 90 Barwon 4526 4445 81 2% Data Analysis – Ground Water • Database Issues: • No Load or creation date in database (only update date) • Impossible to apply date based business rules • GW licenses mandatory from 2001 onwards. • Logical Inconsistencies: • License Form A received and no GDS record (1000’s) • Needs investigation
Data Analysis • Action Lists • Generated for each data set • Scope of Remedies • Improving data quality goes beyond the identifying, measuring and fixing the data in the IT systems. • Improve data capture • Train entry staff • Replace entry processes • Provide meaningful feedback • Change motivations to encourage quality • Add defensive checkers, Periodic DQ asssessments, Data cleansing
Data Quality Reporting • Data Quality Portal • General DQ information • Statistical Reporting and Monitoring • Data Quality Exception Reporting • Management of Data Quality issues
Ways of improving quality • Tackle quality at source, not downstream in the lifecycle • Training data collectors in importance on getting it right • Continual improvement with quality method
Links among Process Groups in a Phase Initiating process Planning Process Controlling process (check) Executing process (do) (Arrows represent flow of information) Closing process ( PMBOK 2000 Fig 3-1 p31)