1 / 93

The Warehouse Designer’s School Of Hard Knocks

International Oracle Users Group. April 15, 2002. The Warehouse Designer’s School Of Hard Knocks. The Data Warehouse Designer’s School of Hard Knocks. A Graduate’s Perspective. David Stanford Sr. Vice President Cognicase Inc. david.stanford@cognicase.com. Objectives.

Download Presentation

The Warehouse Designer’s School Of Hard Knocks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. International Oracle Users Group April 15, 2002 The Warehouse Designer’s School Of Hard Knocks The Data Warehouse Designer’s School of Hard Knocks A Graduate’s Perspective David Stanford Sr. Vice President Cognicase Inc. david.stanford@cognicase.com

  2. Objectives • Obtain a clear understanding of data warehouse design ‘hot points’ • Identify solutions and alternatives for these ‘hot points’ • See how real world solutions are implemented

  3. Agenda • Top 10 Gotchya’s • Warehouse Design • Surrogate Keys • Tracking History • Row Level Security In The Warehouse • BAM Rules, Audit and Administrative Fields • Other Tidbits of Advice

  4. Dave’s Top 10 Gotchya’s • Failing to model for both a) view of the data when the event occurred and b) view of the data as of today’s reality • Limiting the number of dimensions • Failing to model and populate a meta data repository • Failing to provide sufficient audit capabilities to verify loads against source systems • Not using surrogate keys for everything

  5. Dave’s Top 10 Gotchya’s (cont’d) • Failing to design an error correction process • Normalizing too much • Not using a staging area • Failing to load ALL of the fact data • Failing to classify incorrect data

  6. Warehouse Design

  7. Overall Architecture • At the 20,000 foot level we must decide on the use of: • The Operational Data Store (ODS) • The Staging Area • The Data Warehouse proper • The Data Mart(s) • All of which are being coined as “The Corporate Information Factory” or CIF

  8. Design Considerations • The ODS vs the Warehouse • The ODS vs the Staging Area • The Warehouse vs the Data Mart • The lines become blurred • Plan for the world, design for the future, and build for today

  9. Users HR Source OLTP Systems Sales Management Manufacturing Not Designing For The Future… Stove Pipe data marts...

  10. Constituency …leads to trouble …become legamarts

  11. Basic Data Warehouse Architecture Source OLTP Systems Staging Area Data Warehouse

  12. Data Mart DW Architecture Data Marts Source OLTP Systems Staging Area Data Warehouse Source: Enterprise Group

  13. Data Warehouse Process Data Characteristics • Raw Detail • No/Minimal History • Integrated • Scrubbed • History • Summaries • Targeted • Specialized (OLAP) Source OLTP Systems Data Marts Staging Area Data Warehouse • Design • Mapping • Extract • Scrub • Transform • Load • Index • Aggregation • Replication • Data Set Distribution • Access & Analysis • Resource Scheduling & Distribution Meta Data System Monitoring Source: Enterprise Group

  14. Where The Work Is Source OLTP Systems Data Marts Over 80% of the work is here Staging Area Data Warehouse • Design • Mapping • Extract • Scrub • Transform • Load • Index • Aggregation • Replication • Data Set Distribution • Access & Analysis • Resource Scheduling & Distribution Meta Data System Monitoring Source: Enterprise Group

  15. Legacy Legacy MDB Data Warehouse Architecture ETL Tool Warehouse Central Staging Admin. Data Area Warehouse Mid Mid - - Relational Relational Tier Tier Data Local Transform Metadata Extract and Load Process ERP Data Mid - Central Mart Tier e - Commerce Meta Data RDBMS Local Metadata Metadata Exchange External Data Data Data Data Data Cleansing Modeling Modeling Mart Mart Tool Tool Tool Data Local Local Cleansing RDBMS RDBMS MDB MDB Metadata Metadata Tool Source Data Data Data Transforma - Central Data Architected Data Access Databases Extraction Staging tion & Load Warehouse Data Marts and Analysis Components of a Data Warehousing Architecture Components of a Data Warehousing Architecture

  16. The Staging Area • Holds a mirror copy of the extract files • Allows pre-processing of the data before loading • Allows easier reloading (you WILL do this) • Keeps more control with the DW team, rather than an external group (the extract team) • Facilitates easier audit processes • Can facilitate error correction processes

  17. Modelling is not straight forward Marital Status Age Member Donation Time Gender Campaign Income Location

  18. Should These Dimensions Be Combined? MaritalStatus Age Member Donation Time Gender Campaign Income Location

  19. The 10 Step Process-Data Model Design • Identify major subject areas or topics • Add element of time to the tables • Create appropriate names for tables, columns, and views • Add derived fields where applicable • Add administrative fields • Consider security and privacy in design • Make sure data model answers the critical business questions • Consider meta data • Consider error correction • Performance considerations: Tune, Tune, Tune

  20. Independent of Approach… …the goal of the data model is to satisfy two primary criteria: 1. Meet Business Objectives 2. Provide Good Performance

  21. Warehouse Design • Normalized (Relational) Design • Dimensional Design • Hybrid Design • Behind the Scenes

  22. Normalized/Relational Schema • Usually As Normalized as Possible • Used mostly in OLTP databases • Uses entities and relations to describe data • Fast for Inserts and Updates

  23. Relational Schemas

  24. The Star Schema STAR Schema • Used in OLAP (BI) and DWH • Uses FACT and DIMENSION Tables • Normalized FACT table • Dimensions Denormalized

  25. Sample Star Schema Dimension Fact Table

  26. Snowflake Schema • Contains FACT and DIMENSION Tables • Dimension Tables can be FACT for other STAR • Dimension Hierarchies are normalized

  27. Sample Snowflake Schema SnowFlake Dimensions

  28. Hybrid • In reality, the DW is more normalized but has elements of dimensional design • The data marts are star schemas but have elements of normalization

  29. Behind The Scenes • There are several aspects of a design that users don’t directly see: • Meta Data • Error Correction • Audit • Load Control (if not using a scheduling tool) • Transformation Tables (used for transforming the data prior to being loaded into the DW)

  30. Error Correction Audit Load Control Transform Tables Meta Data Behind The Scenes Data Marts Source OLTP Systems Staging Area Data Warehouse

  31. Surrogate Keys

  32. Surrogate Keys • A surrogate key is a single column, unique identifier for each row within a table • Always use surrogate keys for dimensions • Always use surrogate keys for the time dimension • Always use surrogate keys for facts • Always use surrogate keys for transformation tables • Always use surrogate keys for EVERY table

  33. Surrogate Keys Avoid… • Duplicate keys from different source systems • Recycling of primary keys • Use of the same key for different business rows • Lengthy composite key joins • Space in fact tables • Application changes or upgrades in source systems

  34. Using Surrogates In Fact Tables • You will need a surrogate key on the fact table if you allow ‘unknown’ values into the fact table (which is recommended by the way) • The PK of a fact is typically the combination of the base dimensions

  35. Surrogates In Fact Tables

  36. Surrogates In Fact Tables

  37. Surrogates In Fact Tables

  38. Surrogates In Fact Tables • This results in a duplicate primary key in the table

  39. Surrogates In Fact Tables • Thus the need for a surrogate primary key

  40. Tracking History

  41. Tracking History in Dimensions • Type 1 – No history • Type 2 – All history • Type 3 – Some history

  42. Type 1 – No History

  43. Type 1 – No History

  44. Type 2 – All History

  45. Type 3 – Some History

  46. Type 3 – Some History

  47. More Dimension Types…Combinations • Type 3 Prime – Types 1 and 2 (the most common) • Type 4 – Types 1 and 3 • Type 5 – Types 2 & 3 • Type 6 – Types 1, 2, and 3 (the second most common)

  48. Trigger Fields • Trigger Fields are fields within a table that you want to track history • Non-Trigger fields are those which you do not want to track history

  49. Type 3 Prime –All and No History Non Trigger Fields Trigger Field

  50. Type 3 Prime –All and No History

More Related