1 / 24

How well do you know your DATA?

How well do you know your DATA?. Glenn Wiebe May 15, 2012. Is Data Liability?. $$$ for Data Storage $$$ for Data Backups $$$ for Data Archiving $$$ for Data Replication $$$ for Data Synchronization $$$ for Disaster Recovery Planning. Is Data Asset?. Helps in making decisions

aziza
Download Presentation

How well do you know your DATA?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How well do you know your DATA? Glenn Wiebe May 15, 2012

  2. Is Data Liability? • $$$ for Data Storage • $$$ for Data Backups • $$$ for Data Archiving • $$$ for Data Replication • $$$ for Data Synchronization • $$$ for Disaster Recovery Planning

  3. Is Data Asset? • Helps in making decisions • Provides 360 degree view across the enterprise • Helps to understand the customer • Helps in building effective Marketing Campaigns • Predictive Analysis • Statistical Analysis • Sentimental Analysis

  4. Data Governance Program • People • Organizations need executive sponsorship • Process • Documented repeatable processes and procedures • Technology • Data Integration, Data Quality, Data Synchronization, and Data Management

  5. iWay Data Integration Enablement • ERP/Financials • Ariba • I2 • JD Edwards • Lawson • Manugistics • Microsoft • Oracle • SAP • Industry • HIPAA • CIDX • HL7 • RNIF • SWIFT • 1Sync • Legacy Systems • CICS • IMS • VSAM • .NET • Java • TUXEDO • etc • SFA/CRM • Amdocs/Clarify • BMC/Remedy • MSDynamics • Oracle/Siebel • Salesforce.com • SAP • Data Warehouse • DB2 • ETL • Oracle/Essbase • MS SSAS/OLAP • Netezza • SAP BW • Teradata • B2B • Internet EDI • Legacy EDI • MFT • Online B2B • XML 300+ Adapters

  6. Data Profiling Statistical Analysis An overview of summary values, such as extremes, distribution and frequency analysis. Domain Analysis A configurable analysis of data types. Mask and Group Analysis An overview of value formats, groups and dimensions. Business Rules An analysis of the results of user-defined business rules. Foreign Key and Dependency Analyses An inside look into complex connections in the data. Drill Through The option to display individual records that correspond to aggregated results. Data Mart Reporting and analysis across multiple data set analyses Web and/or hardcopy report viewing and distribution

  7. Profiling Data Quality Management Cycle Deviance identification Metadata understanding Ongoing monitoring Issuescauses identification Monitoring and reporting Data understanding KPI definition Parsing Association (householding) Format correction Content evaluation Deduplication / identification Automatic correction Unification Data enhancement Data cleansing Enrichment Context-based cleansing Standardization

  8. iWay Data Quality Center Parsing: Decomposition of fields into component parts. Cleansing: Modification of data values to meet domain restrictions, integrity constraints or other business rules that define sufficient data quality for the organization. Standardization: Formatting of values into consistent layouts based on industry standards, local standards, user-defined business rules and knowledge bases of values and patterns. Validation: Formatting of values into consistent layouts based on industry standards, local standards, user-defined business rules and knowledge bases of values and patterns. Enrichment: Enhancing the value of internally held data by appending related attributes from external sources. Matching: Identification, linking or merging related entries within or across sets of data.

  9. Mastering Master Data • What is Master Data? • Data describing your main business entities • Data duplicated in multiple systems • Data reused by multiple business processes • Examples • Customer/Citizen/Patient • Company/Partner/Agency • Products/Items/Equipment • Vendors/Suppliers • Cost Centers/Employees • Etc, etc, …

  10. Unification identification of the set of records connected to one person address vehicle contact …etc. Deduplication golden record creation (the best representation of the identified subject) Identification new data entries – to identify subject (person, address, etc.) to which the new record is connected (matched) Complex business rules using sophisticated algorithms and functions including Levenstein distance Hamming distance Edit distance Data quality scores values Data stamps of last modification Source system originating data etc. Master Data – Match & Merge

  11. Data Quality Portal - Complex Exception Handling Portal KPI / DQI calculation DQ plan Reports Invalid data extraction Resolution queue Resolution Queue Workflow Exception DB Exception management

  12. Human Mind vs. Computer Systems Hahaharaedtihs! icdnuoltblveieetaht I cluodaulacltyuesdnatnrdwaht I was rdanieg. The phaonemnelpweor of the hmuanmnid, aoccdrnig to a rscheearch at CmabrigdeUinervtisy, it dseno'tmtaetr in wahtoerdr the ltteres in a wrod are, the olnyiproamtnttihng is taht the frsit and lsatltteer be in the rghitpclae. The rset can be a taotlmses and you can sitllraed it whotuit a pboerlm. Tihs is bcuseae the huamnmniddeos not raederveylteter by istlef, but the wrod as a wlohe. Azanmig huh?

  13. Original data – before cleansing

  14. Prepared data (after cleansing)

  15. Match

  16. Merge John Smith M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue M4X 1V5;ON;Toronto;25 Linden Street The newest permanent address The most frequent address

  17. Merged records – before update

  18. Merged records – after update One updated source recordmay cause modification in several records in MDC

  19. Real World Use Case The Goal • Major hospital group is building a Master Patient Index • Need to bring in acquisitioned systems • Cleanse, Standard, Deduplicate The Challenge • Previously manually processed by hiring temporary staff • Current phase projected to take temporary staff of 20 over 18 months The Strategy • Automate the cleansing, matching and merging business rules • Data Stewardship provides human oversight to automated process The Benefits • Identifies the duplicate records according to very complex business rules • Reusable rules for future phases • Significantly reduced project time – from 18 down to 4 months. • Over 400% ROI projected

  20. Real World Use Case Goal • Performance Management • Business Intelligence • Change Management Process The Challenge • 100 Locations • 14 Systems with out-of-sync master data The Strategy • Cleanse, Standardize, Match • Master Data Management – Directorate, Borough, Site, Service Type, Service Point, Team, Staff, Patient • Master Data Governance Workflow The Benefits • Dynamic organizational change to support strategic initiatives • Complete visibility into performance of organization vs goals

  21. Real World Use Case The Goal • Services organization supporting the airline industry sells decision support information to the industry members. The Challenge • Data Quality was adversely affecting the customer base satisfaction • Data Quality was impacting new revenue generation opportunities The Strategy • Profile analysis according to specific business validation rules • Monitor rolling 13 month window comparison of monthly data profiles • Accumulate and report analysis to data providers The Benefits • Improves customer satisfaction and confidence in the information • Increases reliability of the information as new data sources are added • Documents and audits quality-control processes for customer review • Reduces the dependency on human resources to detect and correct data quality issues

  22. Summary of considerations • Access to variety of data sources • Ability to influence data improvement anywhere in the process • Useable in batch and/or (real) real-time processing mode • Extensible by customized business rules • Access to third party data and services • Historical and distributable analysis • Reusability across multiple phases and projects • Integrated data stewardship • Platform flexibility for deployment and licensing • Vendor partnership and support Information Access Data Quality Master Data Management Data Governance

  23. Integrate All Information Any Data Any System Any Protocol Any Platform iWay Software Benefits • Real-time, Online, and Batch • Data Integration • Application Integration • Business Integration • Service Oriented Architecture Any Process Latency Scheduled Process Driven Event Driven User Driven Single Solution Platform Single Engine Fast and Scalable Secure and Reliable Fully Extensible

  24. Questions?

More Related