1 / 69

SSIS/ETL Project SQL Server Integration Services (BIDS) 2008

SSIS/ETL Project SQL Server Integration Services (BIDS) 2008. SetFocus Business Intelligence Master’s Program Jeff Jacob, MBA P: (312) 772-6142 E1 : Jeffrey.K.Jacob@gmail.com E2 : Jeffrey.Jacob@SetFocus.com. SSIS/ETL Project: AllWorks , Inc. Documentation. Objectives :

hisa
Download Presentation

SSIS/ETL Project SQL Server Integration Services (BIDS) 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SSIS/ETL ProjectSQL Server Integration Services (BIDS) 2008 SetFocus Business Intelligence Master’s Program Jeff Jacob, MBA P: (312) 772-6142 E1: Jeffrey.K.Jacob@gmail.com E2: Jeffrey.Jacob@SetFocus.com

  2. SSIS/ETL Project: AllWorks, Inc.Documentation Objectives: Based on source data, design a 3NF database for AllWorks, Inc. Create the database, tables, and constraints in T-SQL Develop a SSIS Project file with ETL packages to bring data from source files into the AllWorksOLTP database; use a master package to run the ETL packages in proper succession and to effectuate database maintenance tasks Deploy the SSIS Project file to SQL Server and schedule the job to run nightly at midnight

  3. 1. Consider the details of the source data.

  4. 2. Design a relational (3NF) database. * note: LaborMaster name chosen because table contains data about both employees and contractors. NOT USED

  5. 3. Script the creation of the relational (3NF) database AllWorksOLTP and its tables. USE [master] GO IF EXISTS(SELECT name FROM sys.databasesWHERE name= N'AllWorksOLTP') DROP DATABASE [AllWorksOLTP] GO CREATE DATABASE AllWorksOLTP; USE AllWorksOLTP; GO Create Table dbo.CountyMaster( CountyPKint primary key clustered, CountyDescriptionvarchar(100) not null ); Create Table dbo.ClientMaster( ClientPKint primary key clustered, ClientNamevarchar(100) not null, ProjectManagervarchar(100) not null, StreetAddressvarchar(100) not null, Cityvarchar(100) not null, [State] char(2) not null, ZipCodevarchar(50) not null, CountyPKintnot null, ClientImagevarchar(150), Constraint FK_CustMaster_CountyMaster Foreign Key (CountyPK) References dbo.CountyMaster(CountyPK) ); Create Table dbo.JobMaster( JobPKint primary key clustered , JobDescriptionvarchar(50) , ClientPKintnot null, MaterialMarkupPctdecimal (10,2) , AdditionalOverheadPctdecimal (10,2) , JobClosedFlagbit not null , JobClosedDatedatetime, JobStartedDatedatetime, Constraint FK_JobMast_ClientMast Foreign Key (ClientPK) References dbo.ClientMaster (ClientPK) );

  6. Create Table dbo.LaborMaster( WorkerPKint primary key clustered, FirstNamevarchar(50) not null, LastNamevarchar(50) not null, EmployeeFlagbit not null, LastUpdatedatetime default GetDate(), FullNamevarchar(150) not null, CellPhonevarchar(50) null ); Create Table dbo.EmployeeRates( WorkerRatesPKint identity primary key clustered, WorkerPKintnot null, WorkerHourlyRatemoney not null, EffectiveDatedatetime, Constraint FK_EmpRates_EmpMaster Foreign Key (WorkerPK) References dbo.LaborMaster (WorkerPK) ); Create Table dbo.TimeSheets( TimeSheetsPKint identity primary key clustered, WorkerPKintnot null, WorkDatedatetimenot null, JobPKintnot null, RegHoursWorkeddecimal (10,2), Constraint FK_TimeSheets_JobMaster Foreign Key (JobPK) References dbo.JobMaster (JobPK), Constraint FK_TimeSheets_LaborMast Foreign Key (WorkerPK) References dbo.LaborMaster (WorkerPK) ); Create Table dbo.Invoices( InvoicePKint primary key clustered, InvoiceNumbervarchar(50) not null, ClientPKintnot null, InvoiceTermsvarchar(50) , WorkDatesvarchar(50) , InvoiceDatedatetimenot null, Constraint FK_Invoices_JobMaster Foreign Key (ClientPK) References dbo.ClientMaster (ClientPK) ); Create Table dbo.InvoiceReceipts( InvoiceReceiptsPKint identity primary key clustered, InvoicePKintnot null, JobPKintnot null, AmountPaidmoney, ReceiptDatedatetime, Constraint FK_InvoiceReceipts_Invoices Foreign Key (InvoicePK) References dbo.Invoices (InvoicePK), Constraint FK_InvoiceReceipts_JobMaster Foreign Key (JobPK) References dbo.JobMaster (JobPK) ); Create Table dbo.InvoiceXJobMaster( InvoiceXJobMasterPKint identity primary key clustered, InvoicePKintnot null, JobPKintnot null, InvoiceAmountmoney , AdditionalLabormoney , Constraint FK_InvoiceXJobMaster_Invoices Foreign Key (InvoicePK) References dbo.Invoices (InvoicePK) , Constraint FK_InvoiceXJobMaster_JobMaster Foreign Key (JobPK) References dbo.JobMaster (JobPK) ); Create Table dbo.OverheadMaster( OverheadPKint primary key clustered, OverheadDescriptionvarchar(100) not null );

  7. Create Table dbo.OverheadRates( OverheadRatesPKint identity primary key clustered, OverheadPKintnot null, OverheadHourlyRatemoney not null, EffectiveDatedatetimenot null, UseEmployeeFlagbit not null, UseContractorFlagbit not null, Constraint FK_OHRates_OHMaster Foreign Key (OverheadPK) References dbo.OverheadMaster (OverheadPK) ); Create Table dbo.PurchaseTypes( TypeCodeint primary key clustered, CodeDescriptionvarchar(100) ); INSERT INTO dbo.PurchaseTypes (TypeCode, CodeDescription) VALUES (1, 'Regular Materials'),(2,'Petty Cash'),(3,'Fuel'); Create Table dbo.MaterialPurchases( MaterialPurchasesPKint primary key clustered, JobPKintnot null, TypeCodeintnot null, PurchaseAmountmoney not null, PurchaseDatedatetime, PurchaseSalePointvarchar(100) null, Constraint FK_MatPurch_JobMaster Foreign Key (JobPK) References dbo.JobMaster (JobPK), Constraint FK_MatPurch_PurchTypes Foreign Key (TypeCode) References dbo.PurchaseTypes (TypeCode) ); Note: the dbo.PurchaseTypes table did not have any source data; the only three purchase types were known but never incorporated into the AllWorks, Inc. records. Therefore, the dbo.PurchaseTypes table was created and populated in T-SQL script; this is the onlyAllWorksOLTP database table not being populated by the SSIS ETL project file.

  8. SSMS Database Diagram: AllWorksOLTP Database

  9. SSIS Project File (SSISStudentProject.JeffJacob.sln) details of common variables, source files and error log files: • Common Variables for ETL Packages - SSIS project: • Run-time Variables • changedRowCounter (rows updated) • newRowCounter (rows inserted) • errorRowCounter (FK-violating rows written to error log) • unchangedRowCounter (redundant/unused rows) • Configuration Variables • Variable NameDefault (debugging) Value • DBServerlocalhost • smtpServer sfexch0003 • EmailRecipient jeffrey.jacob@setfocus.com • SOURCE FILES: • TimeSheetFiles (Pkg: TimeSheets [uses CSV files]) • EmpTime*.csv • EmployeeExcelSourceFile (Pkgs: EmployeeMaster, EmployeeRates) • Employees.xlsx • OverheadExcelSourceFile (Pkgs: OverheadMaster, OverheadRates) • Overhead.xlsx • ClientGeographiesExcelSourceFile (Pkgs: CountyMaster, ClientMaster) • ClientGeographies.xlsx • ProjectMasterExcelSourceFile (Pkg: JobMaster) • ProjectMaster.xlsx • MaterialPurchasesSourceFile [uses XML file] (Pkg: Material Purchases) • MaterialPurchases.xml / .xsd (no Connection Variable) • InvoicesExcelSourceFile (Pkgs: Invoices, InvoiceXJobMaster, InvoiceReceipts) • Invoices.xlsx • ERROR LOG FILES: • (None - no FKs) (Pkgs: EmployeeMaster, OverheadMaster, CountyMaster) • no FK violations possible • EmployeeRatesErrorLogFile (Pkg: EmployeeRates) • EmployeeRatesErrorLog.txt • OverheadRatesErrorLogFile (Pkg: OverheadRates) • OverheadRatesErrorLog.txt • ClientMasterErrorLogFile (Pkg: ClientMaster) • ClientMasterErrorLog.txt • JobMasterErrorLogFile (Pkg: JobMaster) • JobMasterErrorLog.txt • TimeSheetsErrorLogFile (Pkg: TimeSheets) • TimeSheetsErrorLog.txt • MaterialPurchasesErrorLogFile (Pkg: MaterialPurchases) • MaterialPurchasesErrorLog.txt • InvoiceErrorLogFile (Pkg: Invoices) • InvoicesErrorLog.txt • InvoiceXJobMasterErrorLogFile (Pkg: InvoiceXJobMaster) • InvoicesXJobMasterErrorLog.txt • InvoiceReceiptsErrorLogFile (Pkg: InvoiceReceipts) • InvoiceReceiptsErrorLog.txt

  10. SSIS Project File (SSISStudentProject.JeffJacob.sln) details on project naming convention (ETL packages only): Project Naming Convention– Control Flow and Data Flow Objects: ETL Package Names <Function/Target Table> DataLoad (.dtsx) ETL Control Flow Objects Object Type Name Data Flow Task Read <data description> into (AllWorksOLTP) DB Send Mail Task Email successful (unsuccessful) ETL processing of <DB target table> Script Task Collect count of items processed (TimeSheetsDataLoad ETL package only) File System Task Delete Error File From File System (TimeSheetsDataLoad ETL package only) Foreach Loop Container Loop through TimeSheet data source files (TimeSheetsDataLoad ETL package only) ETL Data Flow Objects Object Type Name Excel Source Get Excel source data Flat File Source Get TimeSheet CSV files XML Source Get Material Purchases data Data Conversion Convert data to DB-compatible data types Lookup Transformation (FKs) Validate incoming FK values against <parent table> Lookup Transformation (PKs) Validate incoming PK values against <parent table> Conditional Split (update check) Split out real update rows from redundant rows Conditional Split (error split) Remove errors from main pipeline Row Count Transformation (new) Count new rows Row Count Transformation (changed) Count changed rows Row Count Transformation (unchanged) Count unchanged rows Row Count Transformation (errors) Count error rows OLE DB Destination Insert new rows Audit Transformation Add Processing-info audit trail fields - package name and time of processing OLE DB Command Transformation Update <Target Table> data Flat File Destination Send errors to error log Derived Column Transformation (No naming convention – named according to usage)

  11. SSIS Project File (SSISStudentProject.JeffJacob.sln) additional details on project naming convention (ETL packages only): Project Naming Convention– Data Flow columns : Data Conversion Field Lookup Column Field Derived Table Field Transformed<source field name> Target<source field name> (purpose-dependent: no naming convention) Samples of Data Conversion, Lookup and Derived Column Transformations in project:

  12. Sample of Email Task customization

  13. Sample of Lookup task to check for Foreign Key violating data rows; such rows are errors and are counted and sent to an error file (below).

  14. Sample of Lookup task to check Primary Key values in incoming data rows; new Primary Key values are new rows to be inserted and pre-existing Primary Key values are update candidates which are checked for redundancy; inserts and updates are split, counted and sent for DB Insert/Update commands respectively.

  15. Package #1: EmployeeMasterDataLoad.dtsx

  16. Sample of source data: Employees.xlsx Sample (SSMS) of AllWorksOLTP database : dbo.LaborMaster

  17. LaborMaster data ETL package: Control Flow and Variables

  18. Overview of LaborMaster ETL package data flow

  19. Package #2: EmployeeRatesDataLoad.dtsx

  20. Sample of source data: Employees.xlsx Sample (SSMS) of AllWorksOLTP database : dbo.EmployeeRates

  21. EmployeeRates data ETL package: Control Flow and Variables

  22. Overview of EmployeeRates ETL package data flow

  23. Package #3: OverheadMasterDataLoad.dtsx

  24. Sample of source data: Overhead.xlsx Sample (SSMS) of AllWorksOLTP database : dbo.OverheadMaster

  25. OverheadRates data ETL package: Control Flow and Variables

  26. Overview of OverheadMaster ETL package data flow

  27. Package #4: OverheadRatesDataLoad.dtsx

  28. Sample of source data: Overhead.xlsx Sample (SSMS) of AllWorksOLTP database : dbo.OverheadRates

  29. OverheadRates data ETL package: Control Flow and Variables

  30. Overview of OverheadRates ETL package data flow

  31. Package #5: CountyMasterDataLoad.dtsx

  32. Sample of source data: ClientGeographies.xlsx Sample (SSMS) of AllWorksOLTP database : dbo.CountyMaster

  33. CountyMaster data ETL package: Control Flow and Variables

  34. Overview of CountyMaster ETL package data flow

  35. Package #6: ClientMasterDataLoad.dtsx

  36. Sample of source data: ClientGeographies.xlsx Sample (SSMS) of AllWorksOLTP database : dbo.ClientMaster

  37. ClientMaster data ETL package: Control Flow and Variables

  38. Overview of ClientMaster ETL package data flow

  39. Package #7: JobMasterDataLoad.dtsx

  40. Sample of source data: ProjectMaster.xlsx Sample (SSMS) of AllWorksOLTP database : dbo.JobMaster

  41. JobMaster data ETL package: Control Flow and Variables

  42. Overview of JobMaster ETL package data flow

  43. Package #8: TimeSheetsDataLoad.dtsx

  44. Sample of source data: EmpTime*.csv Sample (SSMS) of AllWorksOLTP database : dbo.TimeSheets

  45. C# script to enable file and row count totals: MaterialPurchases data ETL package: Control Flow and Variables

  46. Overview of TimeSheets ETL package data flow

  47. Package #9: MaterialPurchasesDataLoad.dtsx

  48. Sample of source data: MaterialPurchases.xml Sample (SSMS) of AllWorksOLTP database : dbo.MaterialPurchases

  49. MaterialPurchases data ETL package: Control Flow and Variables

  50. Overview of MaterialPurchases ETL package data flow

More Related