1 / 40

Data Exchange with  Data-Metadata Translations

Data Exchange with  Data-Metadata Translations. MAD Algorithm. Paolo  Papotti. Mauricio A.  Hernández. Wang-Chiew Tan. Data Exchange. “ Scientia potentia est ” What is Data Exchange?:

istas
Download Presentation

Data Exchange with  Data-Metadata Translations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Exchange with Data-Metadata Translations MAD Algorithm Paolo  Papotti Mauricio A.  Hernández Wang-Chiew Tan

  2. Data Exchange • “Scientiapotentiaest” • What is Data Exchange?: • The process of taking data built under a source schema and transforming it into data built under a target schema • Data Exchange is the restructuring of data

  3. Data Exchange – why? Today when companies merge they also merge information sources.

  4. Data Exchange – why? 2. When several institutions are working on a joint venture – a combined database is

  5. Data Exchange – why? 3. Refreshing and updating data base scheme

  6. Few problems with data exchange • The labels in the Source Schema and the values Target Schema could be very different • Data could be kept in a plethora of ways • For instance: Car price could be stored in Shekels and in U.S dollars • Data could be lost in the exchange process if the Source Schema and Target Schema don’t correspond well

  7. Data Exchange • In the past Data Exchange was done manually, taking many resources such as time and money. • Many researchers struggle with ways of improving data exchange

  8. Antique Car Dealership Schema Buy-A-Wreck cars Car AGENTS Schema Clunkers –R-Us Clunker table

  9. Matching Examples Schema Buy-A-Wreck Schema Clunkers –R-Us

  10. Matching Examples Schema Buy-A-Wreck Schema Clunkers –R-Us

  11. Matching Examples Schema Buy-A-Wreck cars Car AGENTS Schema Clunkers –R-Us

  12. How do we match? • Creating mappings: • schema matching: find matches • create query expressions: for automated data translation or exchange Create Query expressions Schema Matching

  13. Data Exchange There may be no way to transform an instance given all of our constraints. There may be numerous ways to transform the instance (possibly infinitely many). We must identify and justify a best suited choice of solutions for our need.

  14. Data Exchange - Summery To conclude: Data exchange is exchanging data from a Source Schema to a Target Schema It is a greatly dealt problem in the computerized world Some Data exchange scenarios deal with Metadata Source schema S Target schema T S T

  15. What is Metadata? • Metadata: Data on Data. • Metadata can come as: • Video • Audio • Image • Text

  16. Why Do we need Meta – Data? Meta-Data helps us to understand data Can anyone tell what these numbers mean? Jan 120 223 89 Feb 83 168 56

  17. Why Do we need Meta – Data? After adding Meta-Data… Umbrella Sales Month USA UK Italy Jan 120 223 89 Feb 83 168 56

  18. Why Do we need Meta – Data? We all know this picture…

  19. Why Do we need Meta – Data? What is this picture all about?

  20. Why Do we need Meta – Data? Sir Edward Carson signing the Ulster Covenant

  21. Why Do we need Meta – Data?

  22. Why Do we need Meta – Data? Wall Street, New York City, New York.

  23. Data exchange scenarios may involve metadata transformations. Data-Metadata Translations • Transforming the data in the Stock Ticker table to metadata in the Stock Quotes table is vital in the stock exchange world. 23

  24. Data-Metadata Translations • Mapping systems support Data-to-Data transformations with fixed schemas (Clio). • Goal: Extend mapping systems to support Data-Metadata Translations.

  25. Data Exchange Clio • One software developed for simple graphic data exchange is “Clio” • Clio corresponded values between the source scheme and the target scheme • However, the Clio solution did not provide answers for possible data exchange scenarios that involve Metadata • the solution involving Metadatais based on Clio

  26. Clio interface

  27. m1: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA Metadata-to-Data How can we transform the following source data into the corresponding target? Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 Schema mapping m1 Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units “USA” 27

  28. m2: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “UK” and $t.units = $s.UK Metadata-to-Data How can we transform the following source data into the corresponding target? Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 m1: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA Schema mapping m2 Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units “UK” 28

  29. m2: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “UK” and $t.units = $s.UK m3: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “Italy” and $t.units = $s.Italy Metadata-to-Data How can we transform the following source data into the corresponding target? Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 m1: for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA Schema mapping m3 Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units “Italy” 29

  30. Metadata-to-Data: Our solution MetadatA-Data (MAD) mapping: Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units Target.Sales Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Source.Sales Jan 120 223 89 Feb 83 168 56 countries label value Select the elements to group Placeholder Copy elements’ labels Copy elements’ values for $s in Source.Sales, $c in{“USA”, “UK”, “Italy”} exists $t in Target.Sales where $t.month = $s.month and$t.country = $c and$t.units = $s.($c) Set of labels (strings) Is a label value Dynamic selection of the source element 30

  31. Data-to-Metadata Now we want to support the opposite operation The target schema depends on the source data We define a target template: Nested Dynamic Output Schemas (ndos) Source: Rcd StockTicker: SetOf Rcd time symbol price Target: Rcd Stockquotes: SetOf Rcd time symbols label value Dynamic element Run-time: The dynamic element defines the target instance and the target schema. 31

  32. Source: Rcd StockTicker: SetOf Rcd time symbol price Target: Rcd Stockquotes: SetOf Rcd time symbols label value Data-to-Metadata: Heterogeneous records Consider this mapping and this source instance: Source Instance There are two possible interpretations for the target ndos: StockTicker (time: 0900, Symbol : MSFT, Price: 27.20 )StockTicker (time: 0900, Symbol : IBM, Price: 120.00 )StockTicker (time: 0905, Symbol : MSFT, Price: 27.30 ) First alternative: Heterogeneous target records Computed Target Schema Computed Target Instance Target: Rcd Stockquotes: SetOf Rcd time symbols:Choice MSFT IBM Stockquotes (time: 0900, MSFT: 27.20 )Stockquotes (time: 0900, IBM: 120.00 )Stockquotes (time: 0905, MSFT: 27.30 )

  33. Source: Rcd StockTicker: SetOf Rcd time symbol price Target: Rcd Stockquotes: SetOf Rcd time symbols label value Data-to-Metadata: Homogenous records Consider this mapping and this source instance: Source Instance There are two possible interpretations for the target ndos: StockTicker (time: 0900, Symbol : MSFT Price: 27.20 )StockTicker (time: 0900, Symbol : IBM Price: 120.00 )StockTicker (time: 0905, Symbol : MSFT Price: 27.30 ) Second alternative: Homogeneous target records Computed Target Schema Computed Target Instance Target: Rcd Stockquotes: SetOf Rcd time MSFT IBM Stockquotes (time: 0900, MSFT: 27.20, IBM: null )Stockquotes (time: 0900, MSFT: null , IBM: 120.00 )Stockquotes (time: 0905, MSFT: 27.30, IBM: null )

  34. Source: Rcd StockTicker: SetOf Rcd time symbol price Target: Rcd Stockquotes: SetOf Rcd time symbols label value Data-to-Metadata: Homogenous records Natural solution for semi-structured data models (XSD, DTD, JSON) Stockquotes(time: 0900, MSFT : 27.20 ) Stockquotes(time: 0900, IBM : 120.00 ) Stockquotes(time: 0905, MSFT : 27.30 ) Homogeneity Constraint: “For every pair of tuples t1 and t2, if a is a label in t1, then a is a label in t2” Stockquotes(time: 0900, MSFT : 27.20, IBM: null )Stockquotes(time: 0900, MSFT : null , IBM: 120.00)Stockquotes(time: 0905, MSFT : 27.30, IBM: null ) The Homogenous approach is a MAD improvemnet 34

  35. MAD Mapping MetadatA-Data(MAD) mapping three steps: • Preliminary mapping • How do we map the Source schema to the Target schema • Preliminary mapping for <<D>> includes the metadata label and the value label of <<D>>.

  36. Source.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Target.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 36 { $x1  Source.SalesByCountries, $x2<<countries>>; $x3=$x1.($x2) } PreliminaryMapping Source: Rcd SalesByCountries: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units Label Value Transfer countries label value

  37. MAD Mapping • Skeletons: • n x m matrix of skeletons is constructed for the set of source preliminary mapping and the set of target preliminary mapping while each entry(i,j) can be potential mapping. • Creating MAD Mapping: • At this stage, the value correspondences need to be matched against the preliminary mapping in order to factor them into the appropriate skeletons. Source.Sales.country Target.CountrySales.country Matched against one or more source mappings Matched against one or more target mappings 37

  38. MAD Mapping Generation Example Source : { $x1  Source.SalesByCountry, $x2<<countries>>; $x3:=$x1.($x2) } Target : { $y1  Target.Sales} Source.SalesByCountries.<<countries>>  Target.Sales.country Source.SalesByCountries.&<<countries>>  Target.Sales.units Source: Rcd SalesByCountry: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units countries label value

  39. MAD vs Clio Data exchange with data-metadata support: Data to Data is a special case • New construct to iterate over elements’ labels: placeholder • Target schema can be incomplete: nested dynamic output schema (ndos) GUI Source schema S Target schema T Declarative (internal) representation • New mapping & query generation algorithms Executable code (XSLT, XQuery, Java)

  40. Fin. 40

More Related