1 / 26

IBM Information Server

IBM Information Server. Data Quality Everywhere. Increasing Focus on Data Quality. Businesses are beginning to realize that data quality issues not only cost them time and money, but also inhibit their ability to address core strategic projects

Download Presentation

IBM Information Server

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IBM Information Server Data Quality Everywhere

  2. Increasing Focus on Data Quality • Businesses are beginning to realize that data quality issues not only cost them time and money, but also inhibit their ability to address core strategic projects • More and more businesses are establishing programs for data quality, to measure and improve the reliability of information • Analysts contend that companies with focused data quality programs will find more opportunities to outperform their peers

  3. Business Drivers for Investment Depend on Data Quality • Empowering risk & compliance initiatives with the information they require • Optimizing Revenue Opportunitiesby ensuring effective and efficient interactions with customers, partners, and suppliers • Enabling collaborative business processeswith consistent and trustworthy information • Reducing the total cost of ownership for maintaining consistent information across the enterprise

  4. What is the Impact of Poor Data Quality? • "If you look at...any business function in your company, you're going to find some direct cost there attributed to poor data quality." • - Gartner 2006 Lost Sales Opportunity “Hard” Losses • SKU misplaced or hard to find • Out of stocks attributed to the store 1.5% 1.7% “Soft” Losses 2-4% 1-3% 1-2% • Lost potential for cross-sell and up-sell (staff not trained or available) • Reduced store visit frequency • Abandoned carts (poor service or excessive queues) Total 7.2%- 12% Source: GMA/FMI/CIES 2003 (US grocery), ECR Europe 2003, Lineraires.com, California Management Review, IBM case studies, interviewsand IBM Institute for Business Value analysis

  5. Data Quality is a Subjective Business Standard • Data = facts used as a basis for decision making suitable for storage on a computer • Quality = the general standard or grade of something Business Purpose Data Quality = a subjective standard used to determine if a set of facts is suitable for a particular business purpose Relevant? Accurate? Valid? Complete? Ultimately, Data Quality = Trust

  6. So, What Constitutes Data Quality? • Data is standardized • Data is fit for purpose (conforms to rules) • Each record is unique • View of information is complete • Records are certified against authoritative sources • Lineage is understood • Data quality is measured over time

  7. Common Data Problems Kate A. Roberts 416 Columbus Ave #2, Boston, Mass 02116 Catherine Roberts Four sixteen Columbus APT2, Boston, MA 02116 Mrs. K. Roberts 416 Columbus Suite #2, Suffolk County 02116 • Lack of information standards • Different formats & structures across different systems • Data surprises in individual fields • Data misplaced in the database • Information buried in free-form fields • Data myopia • Lack of consistent identifiers inhibit a single view • The redundancy nightmare • Duplicate records with a lack of standards Name Tax ID Telephone J Smith DBA Lime Cons. 228-02-1975 6173380300 Williams & Co. C/O Bill 025-37-1888 415-392-2000 1st Natl Provident 34-2671434 3380321 HP 15 State St. 508-466-1200 Orlando WING ASSY DRILL 4 HOLE USE 5J868A HEXBOLT 1/4 INCH WING ASSEMBY, USE 5J868-A HEX BOLT .25” - DRILL FOUR HOLES USE 4 5J868A BOLTS (HEX .25) - DRILL HOLES FOR EA ON WING ASSEM RUDER, TAP 6 WHOLES, SECURE W/KL2301 RIVETS (10 CM) 19-84-103 RS232 Cable 6' M-F CandS CS-89641 6 ft. Cable Male-F, RS232 #87951 C&SUCH6 Male/Female 25 PIN 6 Foot Cable 90328574 IBM 187 N.Pk. Str. Salem NH 01456 90328575 I.B.M. Inc. 187 N.Pk. St. Salem NH 01456 90238495 Int. Bus. Machines 187 No. Park St Salem NH 04156 90233479 International Bus. M. 187 Park Ave Salem NH 04156 90233489 Inter-Nation Consults 15 Main Street Andover MA 02341 90345672 I.B. Manufacturing Park Blvd. Bostno MA 04106

  8. Why Does this Problem Exist? • Most enterprises are running distinct sales, services, marketing, manufacturing and financial applications, each with it’s own “master” reference data. • No one system is the universally agreed-to system of record. • Enterprise Application Vendors do not guarantee a complete & accurate integrated view – they point to their dependence on the quality of the raw input data • Data quality continues to erode at the point of entry, though it is not a data entry problem

  9. What Do You Need to Establish a Data Quality Program? • A foundation platform that centralizes quality rules and provides auditable data quality • Business-driven, data-centric design environment for data quality rules • An ongoing process for data quality • A way to measure quality over time • Universal deployment of quality rules across all points of entry • Data quality ownership and data governance • Management sponsorship and a corporate mandate for data quality improvement

  10. IBM Information ServerA Platform for Data Quality IBM Information Server Unified Deployment Transform Deliver Understand Cleanse Discover, model, and govern information structure and content Standardize, merge, and correct information Combine and restructure information for new uses Synchronize, virtualize and move information for in-line delivery Unified Metadata Management Parallel Processing Rich Connectivity to Applications, Data, and Content

  11. A Process For Data Quality Establish Data Quality Ownership & Sponsorship Analyze Source Data Measure & Baseline Data Quality Standardize Certify & Enrich Match Link or Survive Re-Measure Report

  12. Understanding the Problem: Source System Analysis • Quality Controls for Completeness and Validity of data values • Incomplete or Invalid values set by value, range, or reference sources • Consistency checks for data formats

  13. Measuring & Resolving: Designing Data Quality Rules • Data quality rules should be embedded into data flows Investigate source data Standardize information Match records together Survive the best data across sources into a new record

  14. Parsing: Separating multi-valued fields into individual pieces Investigation 123 St. Virginia St. 123 | St. | Virginia | St. Number Street Alpha Street Type Type 123 | St. | Virginia | St. Lexical analysis: Determining business significance of individual pieces House Street Number Street Name Type 123 | St. Virginia | St. Context Sensitive: Identifying various data structures and content “The instructions for handling the data are inherent within the data itself.”

  15. Input File: Address Line 1 Address Line 2 639 N MILLS AVENUE ORLANDO, FLA 32803 306 W MAIN STR, CUMMING, GA 30130 3142 WEST CENTRAL AV TOLEDO OH 43606 843 HEARD AVE AUGUSTA-GA-30904 1139 GREENE ST ACCT #1234 AUGUSTA GEORGIA 30901 4275 OWENS ROAD SUITE 536 EVANS GA 30809 Result File: House # Dir Str. Name Type Unit No. NYSIIS City SOUNDEX State Zip ACCT# 639 N MILLS AVE MAL ORLANDO O645 FL 32803 306 W MAIN ST MAN CUMMING C552 GA 30130 3142 W CENTRAL AVE CANTRAL TOLEDO T430 OH 43606 843 HEARD AVE HAD AUGUSTA A223 GA 30904 1139 GREENE ST GRAN AUGUSTA A223 GA 30901 1234 4275 OWENS RD STE 536 ON EVANS E152 GA 30809 Standardization - Address Results in strongly “typed” fixed fielded standardized data

  16. Effective Matching Matching is the most beneficial and technically challenging part of data quality • Matching should be based on statistical probability • Match rules should take into account frequency, discriminating values, & reliability of fields when determining which fields to weight in a match • Matching against more fields of data produces higher quality matches • Matching logic is a very business-sensitive issue – business users should be involved in the design of matching rules • Matching is a science that requires careful calibration of match rules – design should be iterative, and should give results based on real data • Matching design should allow for baseline comparison to ensure rule changes are improving quality • The matching engine should provide clerical review capabilities • Setting up clerical review and match cutoffs should be intuitive

  17. Designing Data Quality Rules Holding area allows experimental match rules to be retained Visual Histogram allows users to understand results • Pass Composer provides an intuitive overview of match passes Decision Rules define match criteria Cutoff Tuning allows match & clerical cutoffs to be visually fine-tuned Data Viewer provides immediate feedback on match rule effects, using actual data

  18. What Do You Do with Match Results? ? • Clerical review • Record linkage • Survivorship • Append/ Fix sources = Cross-reference

  19. Request Response Deployment Models for Data Quality Rules Data quality rules need to be applied universally • In bulk movement and consolidation of data • Applied when data changes in source systems • Available as data quality services in a SOA • Embedded in federated queries • Callable directly from enterprise applications Logic Reuse Query

  20. Measuring Data Quality Over Time • Complete analysis of structure and content • View differences between current state and the baseline • Analysis can be run on a scheduled basis, or embedded in batch processes

  21. Lessons Learned & Best Practice:Involve the Business Early • Recruit an executive sponsor • Signals that the initiative is important • Assures that funds continue to be available • Discourages other business units from implementing conflicting projects • Convene a data quality working group • Assess and report on quality early in the process • May coincide with implementation teams or data warehousing teams • Business leads, but IT coordinates and facilitates • Strive for consensus • Have the business appoint a data quality steward for each business unit • For business units with large user populations, several stewards are appropriate

  22. Lessons Learned & Best Practice:Control Scope Ruthlessly / Focus on Benefits • Business must own scope • Business should be owners, not renters • IT maintains its independence by not taking sides • Controlling scope encourages project discipline • Iterate • Projects which try to do it all in one pass generally fail • Meaure, Report, and Deliver benefits regularly • Initial projects must provide some benefit within 6 - 9 months at the minimum (even if a small benefit) • Subsequent phases should provide benefits every 3-6 months

  23. Summary • Data quality is becoming an increasingly important organizational issue • Most critical business initiatives depend of quality information • Improving data quality requires a focused programmatic approach • The IBM Information Server provides all of this in a unified platform • At the core of any data quality program is a platform capable of providing auditable data quality services IBM Information Server

  24. How Can IBM Help? • Comprehensive platform for data quality • Experience and repeatable process for helping organizations set up data quality programs • Domain and industry-specific expertise in establishing repeatable data quality services • Data quality assessment offering to report on existing data quality and establish the business value of a data quality program • Contact your IBM representative for more information

  25. Information On Demand 2006Register Now: www.ibm.com/events/informationondemand Why attend: • Participate in the PREMIER discussion on the future of Information Management • Learn how the transformation to Information as a Service will help you unlock business value and drive competitive advantage • Hear how your peers are realizing ROI • Understand the roadmap to long term strategic advantage • Learn best practices in your industry • Receive the best in technical education and free certification • Extensive opportunities for networking with both your peers and industry experts IBM Information On Demand 2006October 15-20, 2006 Anaheim, California • The premier information management event for business and IT executives, managers, professionals, DBA's and developers. • Select from over 800 sessions: a 2 1/2 day business leadership track with 180 sessions and a 5 day technical track with 650 sessions. • Latest strategy and product announcements • Large Expo Center, Hands on labs • One on ones with executives and specialists • Birds of a Feather roundtables

More Related