1 / 12

Chapter 1

Chapter 1. Introduction to Data Quality. Data Quality Characteristics. Data quality affects several attributes associated with data: Accuracy – Is it realistic or believable? Integrity – Is it structured and managed? Consistency – Is it consistently defined and maintained?

berg
Download Presentation

Chapter 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 1 Introduction to Data Quality

  2. Data Quality Characteristics • Data quality affects several attributes associated with data: • Accuracy – Is it realistic or believable? • Integrity – Is it structured and managed? • Consistency– Is it consistently defined and maintained? • Validity – Is the data valid, based on business or industry rules and standards?

  3. What Causes Poor Data Quality? • These factors can contribute to poor data quality: • Business rules do not exist or there are no standards for data capture. • Standards may exist but are not enforced at the point of data capture. • Inconsistent data entry (incorrect spelling, use of nicknames, middle names, or aliases) occurs. • Data entry mistakes (character transposition, misspellings, and so on) happen. • Integration of data from systems with different data standards is present. • Data quality issues are perceived as time-consuming and expensive to fix.

  4. Primary Sources of Data Quality Problems Source: The Data Warehousing Institute, Data Quality and the Bottom Line, 2002

  5. How Is Clean Data Achieved? • Clean data is the result of a combination of efforts: • making sure that data entered into the system is clean • cleaning up problems after the data is accepted.

  6. Typical Data Quality Issues • The most common processes in a data quality initiative are • Data Analysis and Standardization • consistency analysis • standardization schemes • gender analysis • entity analysis • data parsing and casing. continued...

  7. Typical Data Quality Issues • The most common processes in a data quality initiative are • Matching and Merging • de-duplication • householding • Address Verification – against a CASS certified database • Geocoding – data enrichment using third-party data elements.

  8. Analysis and Standardization Example Who is the biggest supplier? Anderson Construction $ 2,333.50 Briggs,Inc $ 8,200.10 Brigs Inc. $12,900.79 Casper Corp. $27,191.05 Caspar Corp $ 6,000.00 Solomon Industries $43,150.00 The Casper Corp $11,500.00 ... ...

  9. Standardization Scheme • Briggs, Inc  • Brigs Inc.  Briggs Inc. Casper Corp.  Caspar Corp  The Casper Corp  Casper Corp. ... ...

  10. 50,000 Casper Corp. 40,000 Solomon Ind. 30,000 Briggs Inc. 20,000 10,000 Anderson Cons. 0 $ Spent Supplier Spending

  11. Mark Carver SAS SAS Campus Drive Cary, N.C. Mark W. Craver Mark.Craver@sas.com Mark Craver Systems Engineer SAS Data Matching Example Operational System of Records Data Warehouse 01Mark Carver SAS SAS Campus Drive Cary, N.C. 02Mark W. Craver Mark.Craver@sas.com 03Mark Craver Systems Engineer SAS ... ...

  12. Mark Carver SAS SAS Campus Drive Cary, N.C. Mark W. Craver Mark.Craver@sas.com Mark Craver Systems Engineer SAS Data Quality Process Operational System of Records Data Warehouse 01 Mark Craver Systems Engineer SAS SAS Campus Drive Cary, N.C. 27513 Mark.Craver@sas.com DQ ... ...

More Related