1 / 52

ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions

ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions. Zhangxi Lin Texas Tech University. Outline. Measures Where we’ve been Populating fact table Types of dimensions. Structure and Components of Business Intelligence. SSMS. SSIS. SSAS. SSRS. SAS EG.

tausiq
Download Presentation

ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ISQS 6339, Data Management and Business Intelligence Cubism – Measures and Dimensions Zhangxi Lin Texas Tech University

  2. Outline • Measures • Where we’ve been • Populating fact table • Types of dimensions

  3. Structure and Components of Business Intelligence SSMS SSIS SSAS SSRS SAS EG SAS EM

  4. Snowflake Schema of the Data Mart Manufacturingfact DimBatch 10 9 DimMachine 8 DimProduct 3 DimMachineType DimPlant 7 5 DimProductSubType 2 DimMaterial DimCountry 6 4 DimProductType 1

  5. Where we’ve been and where we are now • Exercise 1: Getting started • Exercise 2: Creating a data mart with SSMS • Exercise 3: Creating data mart with BIDS • Exercise 4: Populating dimensions of a data mart • Exercise 5: Loading fact tables • Exercise 6: Create and customize a cube

  6. What we need to do with the half-done data mart? • Populate DimBatch dimenstion table • Populate ManufacturingFact table • Build an OLAP cube (we already did this before) • Check measures • Check dimensions

  7. MEASURES

  8. Facts • Facts are measurements associated with a specific business process. • Many facts can be derived from other facts, including additive and semiadditive facts. • Non-additive facts can be avoided by calculating it from additive facts. • Measures are clustered together in a group, called measure group.

  9. Types of measures • Three types • Additive measures. Most facts are additive (calculative), such as sum • Semiadditive measures. The measures that can be added along some dimensions, but not along others. For example, inventory level can be added along product dimension but not time dimension. • Non-additive (such as max, average), or descriptive (e.g. factless fact table). • Aggregate functions • Additive: Sum • Semiadditive: ByAccount, Count, FirstChild, FirstNonEmpty, LastChild, LastNonEmpty, Max, Min • Nonadditive: DistinctCount, None.

  10. Measures and dimensions • Dimensions are used to aggregate measures. Therefore, they must be somehow related to measures • Granularity • Important for the analysis • There could be missing values in the fact table

  11. LOADING FACT TABLES

  12. Exercise 5: Loading Fact Tables • Project name: MMMFactLoad-lastname • Package name: FactLoad.dtsx • Tasks • Create Inventory Fact table • Load Dim Batch • Load Manufacturing Fact • Load Inventory Fact • Deliverable: email a screenshot of the “green” outcome of the ETL project to isqs6347@gmail.com, with a subject title “ISQS 6339 EX5 - <lastname>”

  13. Inventory Fact Table • Create a Table InventoryFact in database MMM-2014-lastname. • Compound primary key: DateOfInventory, ProductCode, and Material • Define two foreign keys

  14. Data Sources for Loading Fact • For loading DimBatch table and ManufacturingFact table • BatchInfo.CSV • For loading InventortyFact table • OREDB.OrderProcessingSystem.Inventory

  15. Control Flow for Loading Facts and the Remaining Dimension • Note: to ease debugging, you may use three packages and test them one by one, instead of doing everything in one package

  16. Flat File Connection • Data types • BatchNumber, MachinNumber: four-byte signed integer [DT_I4] • ProductCode, NumberProduced, NumberRejected: four-byte signed integer [DT_I4] • TimeStarted, TimeStopped: database timestamp [DT_DBTimeStamp] • Only check BatchNumber as the input of Dim Batch • All columns are needed for fact tables

  17. Some Frequently Used Nodes

  18. Load DimBatch Data Flow

  19. Load DimBatch Data Flow Note: Because of duplication in the source file, we may insert An Aggregate item after the Flat File Source item.

  20. The Flat File Source

  21. Sort Transformation In the Aggregate item, Define “Group-by” BatchNumber. In Derived column item, Define BatchName From BatchNumber Use the expression (DT_WSTR, 50)[BatchNumber] To change the data type Of BatchName.

  22. Load Fact Data Flow

  23. Derived Columns for the Fact table

  24. Expressions for the Derived Columns • AcceptedProducts • [NumberProduced] – [NumberRejected] • ElapsedTimeForManufacture • DATEDIFF(“mi”, [TimeStarted],[TimeStopped]) • DateOfManufacture • (DT_DBTIMESTAMP)SUBSTRING((DT_WSTR,25)[TimeStarted],1,10) • This expression converts TimeStarted into a string and selects the first ten characters of that string. This string is then converted back into a date time, without the time portion.

  25. OLE DB Destination For loading the fact table

  26. Load Inventory Fact • OLE DB Source • OrderProcessingSystem.InventoryFact • OLE DB Destination • MaxMinManufacturingDM-lastname.InventoryFact • No transformation • There are two ways to loading the table • Create the table and use ETL to load it • Import directly from the source to the database MaxMinManufacturingDM-lastname

  27. Debugging Results Loading ManufacturingFact Loading DimBatch

  28. BUILDING AN OLAP CUBE

  29. Exercise 6a: Design a Cube • Project name: MMMCube2014_lastname • Tasks • Add in new date items (year, quarter, and month) to two fact tables • Create time dimension using Manufacturing Fact table • Define calculated measures (Total Products, Percent Rejected) • Define hierarchies of attributes in dimension tables • Create a cube from the MaxMinManufacturing data mart with hierarchical date dimension • Deliverable: • Screenshots: dimension hierarchies, dimensions, relationships of facts and dimensions, deployment result, format of measures, and browsing results. 29

  30. Three Steps to Create a Cube from Data Sources • Defining data source • Defining data source view • Add in three new columns of year, quarter, and month for the two fact tables • Building a cube. • Define a new dimension Dim Time from Manufacturing Fact table • Customize the cube: • Link two fact tables in a cube • Define new primary key for Dim Time • Define calculated measures • Relate dimensions to measures

  31. T-SQL Expressions for DS View Definition - Manufacture • YearOfManufacture CONVERT(char(4),YEAR(DateOfManufacture)) • QuarterOfManufacture CONVERT(char(4), YEAR(DateOfManufacture)) + CASE WHEN MONTH (DateOfManufacture) BETWEEN 1 AND 3 THEN 'Q1' WHEN MONTH (DateOfManufacture) BETWEEN 4 AND 6 THEN 'Q2' WHEN MONTH (DateOfManufacture) BETWEEN 7 AND 9 THEN 'Q3' ELSE 'Q4' END • MonthOfManufacture CONVERT(char(4), YEAR(DateOfManufacture)) + RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfManufacture)),2)

  32. T-SQL Expressions for DS View Definition - Inventory • YearOfInventory CONVERT(char(4),YEAR(DateOfInventory)) • QuarterOfInventory CONVERT(char(4), YEAR(DateOfInventory)) + CASE WHEN MONTH (DateOfInventory) BETWEEN 1 AND 3 THEN 'Q1' WHEN MONTH (DateOfInventory) BETWEEN 4 AND 6 THEN 'Q2' WHEN MONTH (DateOfInventory) BETWEEN 7 AND 9 THEN 'Q3' ELSE 'Q4' END • MonthOfInventory CONVERT(char(4), YEAR(DateOfInventory)) + RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfInventory)),2)

  33. Data Source View New columns

  34. Select Measures Page Uncheck Manufacture Fact Count

  35. The finished cube

  36. Cube Structure

  37. Defining a format string

  38. Inventory measures “Number on Backorder” is also set with these two parameters

  39. Calculated measures – made-up facts • The definition of calculated measure is stored in the OLAP cube itself. • The actual values that result from a calculated measure are not calculated, however, until a query containing that calculated measure is executed. The results of that calculation are then cached in the cube. The cached value is then delivered to any subsequent users requesting the same calculation. • The expressions of calculation are created using a language known as Multidimensional Expression Language (MDX) script. MDX is different from T-SQL. It is a special language with features designed to handle the advanced mathematics and formulas required by OLAP analysis. This is not found in T-SQL. 39

  40. DIMENSIONSin SQL Server

  41. Types of Dimensions • Fact dimensions: the Dimensions created from attributes in a fact table • Parent-Childdimensions: Built on a table containing a self-referential relationship, such as a parent attribute. • Role playing dimensions: related to the same measure group multiple times; each relationship represents a different role the dimension play; for example, time dimension plays three different roles: date of sale, data of shipment, and date of payment. • To create a role playing dimension, add the dimension to the Dimension Usage tab multiple times. Then create a relationship between each instance of the dimension and the measure group. • Reference dimensions: Not related directly to the measure group but to another regular dimension which in turn related to the measure group • Data mining dimensions: the information discovered by data mining • Many-to-many dimensions: e.g. multiple ship to addresses • Slowly changing dimensions 43

  42. Slowly changing dimensions • Type 1 SCD – no track • Type 2 SCD – tracking the entire history, adding four attributes: SCD Original ID, SCD Start Date, SCD End Date, SCD Status • Type 3 SCD – Similar to Type 2 SCD but only track current state and the original state; two additional attribute: SCD Start Date, SCD Initial Value

  43. Add a time dimension (a fact dimension)

  44. Rename time dimension

  45. Date Hierarchy

  46. Material Hierarchy & Plant Hierarchy

More Related