1 / 47

Incremental ETL Using CDC for SQL and Oracle with SQL Server Integration Services (SSIS) 2012

DBI322. Incremental ETL Using CDC for SQL and Oracle with SQL Server Integration Services (SSIS) 2012. Matt Masson Senior Program Manager Microsoft Corporation. Change Data Capture. Your First Data Warehouse. Challenges. More work to do. Less time to do it. More people using the system.

magnar
Download Presentation

Incremental ETL Using CDC for SQL and Oracle with SQL Server Integration Services (SSIS) 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DBI322 Incremental ETL Using CDC for SQL and Oracle with SQL Server Integration Services (SSIS) 2012 Matt Masson Senior Program Manager Microsoft Corporation

  2. Change Data Capture

  3. Your First Data Warehouse

  4. Challenges More work to do Less time to do it More people using the system

  5. Mo’ Data, Mo’ Problems • Processing time increases with data volume • Scales linearly (assuming no bottlenecks) Do Less Work!

  6. Change Data Capture - How It Works • Enable it on the Source Database • Inserts, Updates, and Deletes are automatically tracked • Consume the changes

  7. Demo CDC in SQL Server 2012

  8. SQL Server CDC – Summary • Enable it on the source system • Automatic capture of changes • Configuration options • Uses transactional replication • SQL Server 2008 and later • -- enable CDC on the database • execsys.sp_cdc_enable_db • -- enable CDC on a table • execsys.sp_cdc_enable_table • @source_schema=N'dbo', • @source_name=N'MyTable', • @role_name=N'cdc_admin', • @supports_net_changes= 1

  9. Using Change Data Capture

  10. CDC Components for SSIS 2012 • Use before and after data flows to handle CDC state information • Read change data from source system • Uses CDC state information • Splits rows based on operation • Smart Conditional Split

  11. Workflow Initial Load Incremental Load

  12. Initial Load • Scenario 1: Active database • Mark Initial Load Start • Bulk load from source to destination • Mark Initial Load End • Scenario 2: Inactive database or using snapshot/flashback • Mark CDC start • Oracle – provide the System Change Number (SCN) • SQL – provide the Log Sequence Number (LSN) or snapshot name • CDC Control Task will use current value if none is provided • Bulk load data from source to destination

  13. Incremental Load • Use CDC Control Task to get the processing range • Read from cdc_statestable • Stored in package variable • Use CDC Source to read changes since last run • Process change rows • Use CDC Control Task to mark processing end

  14. Using the Reprocessing Indicator • Option in the CDC Source to include a reprocessing flag • Adds __$reprocessing column to data flow • Flag is set to True for rows that require special handling • Rows that overlap with initial load • Reprocessing a range after a failure

  15. Demo CDC Components for SSIS

  16. CDC Components - Summary • CDC Control Task • Retrieve and persist state • Use before and after your data flow • CDC Source • Reads change data from source table • CDC Splitter • Separate rows by operation type

  17. Processing Modes • All • All with Old Values • Net • Net with Update Mask • Net with Merge

  18. Which Processing Mode Should I Use? • All vs. Net • All changes • Do I need to record every change? • Do I enjoy complicated ETL work? • Net changes • Do I want to push the work to the source system? • Update Mask / Old Values • Do I need to do something when specific columns change?

  19. Processing Mode - All

  20. Processing Mode - All Control Flow Data Flow

  21. Processing Mode – All with Old Values

  22. Processing Mode – All with Old Values Control Flow Data Flow

  23. Processing Mode – Net

  24. Processing Mode – Net Control Flow Data Flow

  25. Processing Mode – Net with Update Mask

  26. Processing Mode – Net with Update Mask Control Flow Data Flow

  27. Processing Mode – Net with Merge

  28. Processing Mode – Net with Merge Control Flow Data Flow

  29. CDC For Oracle

  30. How it Works CDC Designer Oracle SQL Server CDC Tables LogMiner Database CDC Service Mirror Tables

  31. Setup • Oracle CDC components are not installed by default • MSIs found on SQL media under: \Tools\AttunityCDCOracle • Oracle CDC Service • AttunityOracleCdcService.msi • Oracle CDC Designer • AttunityOracleCdcDesigner.msi

  32. Oracle CDC Service Configuration • Name your service • Provide a service account • SQL instance information • CDC master password • One service per Oracle DB

  33. Oracle CDC Designer • Allows you to create and manage Oracle CDC Instances • You can have multiple instances per CDC Service • An instance contains • Oracle database connection information • Tables and columns being tracked • Mirroring SQL Server instance information

  34. Oracle CDC Designer – Creating an Instance

  35. Oracle CDC Designer – Managing an Instance • List all of your services and CDC capture instances • Control the instance state (start, stop, reset) • Regenerate setup scripts

  36. FAQ

  37. Supported Versions • SQL Server (Enterprise Edition) • 2008, 2008 R2 and 2012 • Oracle • 10g Release 2 • 10.2.0.1—10.2.0.5 (patch set as of April 2010) • 11g Release 1 • 11.1.0.6—11.1.0.7 (patch set as of September 2008) • 11g Release 2 • 11.2.0.1—11.2.0.2 (patch set as of November 2010)

  38. Common Questions • Do I need any special permissions? • Getting current LSN during initial load requires db_owner • Call made to sys.sp_replincrementlsn • Will enabling CDC impact performance? • Similar overhead to transactional replication • Asynchronous processing reduces overall impact • Increases data size • Prevents minimally logged operations • Less than 10% overhead with normal workload (and no IO issues)

  39. Books Online Resources • CDC Flow Components http://msdn.microsoft.com/en-us/library/hh231087.aspx • Tuning the Performance of Change Data Capture http://msdn.microsoft.com/en-us/library/dd266396.aspx • Comparing Change Data Capture and Change Tracking http://msdn.microsoft.com/en-us/library/cc280519.aspx

  40. Required Slide *delete this box when your slide is finalized Speakers, please list the Breakout Sessions, Labs, Demo Stations and Certification Exams that relate to your session. Also indicate when they can find you staffing in the TLC. Related Content • Breakout Sessions • DBI310: EIM: Bringing Together SSIS, MDS and DQS • DBI329: Upgrading SSIS Packages to SQL Server 2012 • DBI207: BI Power Hour Hands-on Labs • DBI24-HOL: Exploring Microsoft SQL Server Integration Services Product Demo Stations Breakthrough Insights: Credible, Consistent Data Related Certification Exam Exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012 Find Me Later At… Breakthrough Insights: Credible, Consistent Data

  41. Contact mattmasson.com mmasson@microsoft.com @mattmasson Matt Masson

  42. Track Resources Hands-On Labs @sqlserver @ms_teched SQL Server 2012 Eval Copy Get Certified! mva Microsoft Virtual Academy

  43. Resources Learning TechNet • Connect. Share. Discuss. • Microsoft Certification & Training Resources http://northamerica.msteched.com www.microsoft.com/learning • Resources for IT Professionals • Resources for Developers • http://microsoft.com/technet http://microsoft.com/msdn

  44. Required Slide Complete an evaluation on CommNet and enter to win!

  45. MS Tag Scan the Tag to evaluate this session now on myTechEd Mobile

  46. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More Related