1 / 24

I ntroduction

Name muaz khan Roll no 2498 Class BSIT 5 th semester . I ntroduction. The process of updating the data warehouse. Data are moved from source to target data bases A very costly, time consuming part of data warehousing

winda
Download Presentation

I ntroduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Name muaz khan Roll no 2498 Class BSIT 5th semester Introduction

  2. The process of updating the data warehouse. Data are moved from source to target data bases A very costly, time consuming part of data warehousing ETL stand for Extraction Transformation Loading

  3. Often performed by COBOL routines (not recommended because of high program maintenance and no automatically generated meta data) Sometimes source data is copied to the target database using the replication capabilities of standard RDMS (not recommended because of “dirty data” in the source systems) Increasing performed by specialized ETL software Data Extraction

  4. Source systems contain “dirty data” that must be cleansed ETL software contains rudimentary data cleansing capabilities Specialized data cleansing software is often used. Important for performing name and address correction and house holding functions Leading data cleansing vendors include Vanity (Integrity), Harte-Hanks (Trillium), and First logic (i.d.Centric) Data Cleansing

  5. Parsing • Correcting • Standardizing • Matching • Consolidating Data Cleansing

  6. Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files. Examples include parsing the first, middle, and last name; street number and street name; and city and state

  7. Corrects parsed individual data components using sophisticated data algorithms and secondary data sources. Example include replacing a vanity address and adding a zip code Correcting

  8. Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules. Examples include adding a pre name, replacing a nickname, and using a preferred street name. Standardizing

  9. Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications. Examples include identifying similar names and addresses. Matching

  10. Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation. Consolidating

  11. Name Umairilyas Roll no 2809 Class BSIT 5th semester Introduction

  12. Transformation

  13. Transforms the data in accordance with the business rules and standards that have been established Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates

  14. Data Quality paradigm Correct Unambiguous Consistent Complete Data quality checks are run at 2 places - after extraction and after cleaning and confirming additional check are run at this point Transformation

  15. Anomaly Detection • Data sampling – count(*) of the rows for a department column • Column Property Enforcement • Null Values in read columns • Numeric values that fall outside of expected high and lows • Cols whose lengths are exceptionally short/long • Cols with certain values outside of discrete valid value sets • Adherence to a read pattern/ member of a set of pattern Transformation - Cleaning Data

  16. Structure Enforcement • Tables have proper primary and foreign keys • Obey referential integrity • Data and Rule value enforcement • Simple business rules • Logical data checks Transformation - Confirming

  17. Introduction Name waqasahmad Roll no 2481 Class BSIT 5th semester

  18. Loading Dimensions Loading Facts Loading

  19. Physically built to have the minimal sets of components The primary key is a single field containing meaningless unique integer ns these keys and never allows any other entity to assign them De-normalized flat tables – all attributes in a dimension must take on a single value in the presence of a dimension primary key. Should possess one or more other fields that compose the natural key of the dimension Loading Dimensions

  20. The data loading module consists of all the steps required to administer slowly changing dimensions (SCD) and write the dimension to disk as a physical table in the proper dimensional format with correct primary keys, correct natural keys, and final descriptive attributes. Creating and assigning the surrogate keys occur in this module. The table is definitely staged, since it is the object to be loaded into the presentation system of the data warehouse.

  21. Facts Fact tables hold the measurements of an enterprise. The relationship between fact tables and measurements is extremely simple. If a measurement exists, it can be modeled as a fact table row. If a fact table row exists, it is a measurement Loading facts

  22. Managing Indexes • Performance Killers at load time • Drop all indexes in pre-load time • Segregate Updates from inserts • Load updates • Rebuild indexes Loading Fact Tables

  23. THANKS

More Related