1 / 23

Doctoral Consortium

Doctoral Consortium. “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh,EH14 4AS { pilar,lachlan}@macs.hw.ac.uk.

miyo
Download Presentation

Doctoral Consortium

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Doctoral Consortium “Solving Data Inconsistencies and Data Integration with a Data Quality Manager” Presented by Maria del Pilar Angeles, Lachlan M.MacKinnon School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh,EH14 4AS {pilar,lachlan}@macs.hw.ac.uk

  2. Introduction Proposal Data Quality Manager Components Reference Model Measure Model Assessment Model Quality Metadata Information Integration Process Classification of DataSources Selection of Best Datasources Query Planning Data Fusion Ranking of Query results Questions Agenda Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  3. Introduction Default value Known inconsistency Data Value Temporal inconsistency Acceptable inconsistency Naming Data Representation Domain Data scaling definition Data Precision Approached by Ontology Metadata Transformation rules Mapping Attribute integrity constraints Database id Entity Naming definition Union compatibility Structural Schema isomorphism Conflicts Missing data item (Sheth92) Generalization Abstract Aggregation Data value attribute Schematic Attribute entity discrepancy Data value entity Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  4. DS 1 DS 2 DS 3 Introduction Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  5. We propose the development of a Data Quality Manager (DQM) to establish communication between the process of integration of information, the user and the application, to deal with semantic heterogeneity. Proposal Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  6. Local User 1 Local User 2 Local User N Selection of data sources Local Schema 1 Local Schema 2 Local Schema N Query Planning Information Integration Process Data Source 1 Data Source 2 Data Source N Wrapper Wrapper Wrapper … Data Quality Manager Detection and Fusion of data inconsistencies DQ Criteria Model DQ Measure Export Schema 1 Export Schema 2 Export Schema N DQ Assessment Query Integration Mediator DQ Metadata Ranking query results Data Quality Manager Applications Global Schema Global User 1 Global User 2 Global User 3 … Global User M Proposal Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  7. DQM Components • Definition of Quality Criteria Reference Model Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  8. DQM Components • Definition of Quality Criteria • Definition of Metrics Measurement Model Reference Model Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  9. DQM Components • Definition of Quality Criteria • Definition of Metrics • Definition of Assessment methods Assessment Model Measurement Model Reference Model Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  10. DQM Components • Definition of Quality Criteria • Definition of Metrics • Definition of Assessment methods • Definition of Quality Metadata (QMD) Quality Metadata Assessment Model Measurement Model Reference Model Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  11. QMD Population QMD Based on DQM components, classify the data sources Completeness Accuracy Currency # incomplete # total # errors # total Age + delivery time – input time Survey, Queries, benchmarks DQM: Data Quality Manager QMD: Quality Meta Data Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  12. Information Integration Process Data Quality Manager Selection of Best Data Sources Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  13. Information Integration Process Data Quality Manager Query Planning Selection of Best Data Sources Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  14. Information Integration Process Data Quality Manager Fusion of Data Inconsistencies Query Planning Selection of Best Data Sources Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  15. Information Integration Process Data Quality Manager Query Integration Fusion of Data Inconsistencies Query Planning Selection of Best Data Sources Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  16. Information Integration Process Data Quality Manager Ranking of Query results Query Integration Fusion of Data Inconsistencies Query Planning Selection of Best Data Sources Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  17. Selection of best Data Sources Quality User Priorities Data sources Involved in the Query Ranking of best Data Sources User Query Mapping Local/Global Schemas 1 4 2 3 QMD 1. The Quality user priorities are given by the user. 2. The ranking of best data sources involved in the query is given before execution Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  18. Query Planning Quality User Priorities QueryA QueryB QueryC QMD User Query Query Partition Top ranking Query Plan Plan 1 Plan 2 Plan 3 . Plan N Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  19. Data Fusion Quality user priorities Data fusion QMD Consistent Query Result ResultX Execute Query Plan Data Inconsistencies Detection Inconsistent Query Result ResultY ResultZ As in the DQM is stored where data comes from, it is possible to make decisions at data fusion time. Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  20. Ranking Query Result Quality user priorities ResultJ Data Fusion ResultK Query Integration Query Result Ranking ResultL Consistent Query Result QMD Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  21. Conclusion • Using Data Quality Manager we can.. • Approach data value level inconsistencies during Information Integration Process, using data quality properties. • User may demand different quality priorities at query time. • Manage user quality priorities AND data quality properties to give the expected quality query result by the user. • What we need to do now…. • Identify tools for measurement, assessment and develop a QMD. • Store quality of data sources involved in the heterogeneous system. • Identify techniques for • Ranking of data sources and plans involved in the query • Inconsistency detection • Fusion data using data source and data level properties • Ranking of query results. Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  22. Questions? Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

  23. Thanks !! Solving Data Inconsistencies and Data Integration with a Data Quality Manager Angeles Maria del Pilar, MacKinnon Lachlan M.

More Related