330 likes | 548 Views
Performance Tuning SSIS. Brian Knight, CEO Pragmatic Works bknight@pragmaticworks.com. About the Ugly Guy Speaking. SQL Server MVP Founder of Pragmatic Works Co-Founder of BIDN.com, SQLServerCentral.com and SQLShare.com Written more than a dozen books on SQL Server. Mobile data.
E N D
Performance Tuning SSIS Brian Knight, CEO Pragmatic Works bknight@pragmaticworks.com
About the Ugly Guy Speaking • SQL Server MVP • Founder of Pragmatic Works • Co-Founder of BIDN.com, SQLServerCentral.com and SQLShare.com • Written more than a dozen books on SQL Server
Mobile data GeoSpatial Components GeoSpatial Data: Semi structured Merges Data-cleansing components Data mining components Standard sources Custom source Warehouse Legacy data: binary files Reports SQL Server Integration Services Application database Integration Services in Action Integration is a seamless, manageable operation Source, prepare, & load data in single, auditable process Scale to handle heavy and complex data requirements Cube
Today’s Problems with Integration • Integration today • Increasing data volumes • Increasingly diverse sources • Requirements reached the Tipping Point • Low-impact source extraction • Efficient transformation • Bulk loading techniques
Tuning Decisions • Choose the right tool for the job • Don’t be afraid to use T-SQL • Will parallelism work?
Source Optimization • Flat files – When available, use Fast Parse • OLE DB sources – Change network packet size • Use T-SQL whenever possible in the OLE DB Source • Joining • NULL handling • Where clauses
Impact of Compression on ETL * Not official Microsoft results.
Tuning the Source Connection manager tuning Flat file tuning OLE DB Source tuning Demo
The Pipeline presents the buffer to each downstream component Transform Components
SSIS Data Flow Architecture Synchronous vs. Non Synchronous
Case Study: Patterns 83 seconds 105 seconds
Source Data Extraction • Extracting data from the source is expensive • Efficient extraction is key to improving ETL performance • Involves bulk loading data into staging areas or warehouse • Time consuming & resource intensive • Triggers (synchronous IO penalty) • Timestamp columns (Schema changes) • Complex queries (delayed IO penalty) • Custom (ISV, mirror, snapshot, …) • Incremental data load is key to efficient extraction • Need to know what changed at source since a point in time • Expensive lookups to determine changed columns • Providing information up front about which columns changedwill improve efficiency
SQL Server 2008: Change Data Capture (CDC) • Information about what changed at the source • Changes captured from the log asynchronously • Enabled per table • CDC APIs provide access to change data OLTP Data Warehouse Change Tables
Change Data Capture Traditional CDC with SSIS Integrating CDC in 2008 Demo
Lookup Component • Three modes of operation • Full Cache: for small lookup datasets • No Cache: for volatile lookup datasets • Partial Cache: for large lookup datasets • Tradeoff memory vs. performance • Use Cascaded Lookups • Merge Join may be alternative
SQL Server 2008: Lookup Transform • Hydrate cache files for large data sets • Can reuse cache • Can load cache during day and use in nightly ETL
Demo Cascading lookup optimizations Cache file lookup
Data Destinations • Use “Fast Load” or SQL Server Destination • Table Lock on insert operations • Trace flags for improvement • Old principles still apply
Destination Tuning Demo
Building a Work Queue System Create a work queue table. Create a loop to shift over the work queue constantly checking out work Spawn x times with a batch file
Parallel Load Demo
Managing Resources • Logging events to watch pipeline internals • PipelineExecutionPlan, PipelineExecutionTree, BufferSizeTuning • System Monitor to track I/O issues • Buffers In Use tracks how many buffers are presently being used • Buffers Spooled tracks how many 10 mb buffers have been spooled to disk
Measuring Performance Perfmon
SQL Server 1 SQL Server 2 Location • Consider the following configuration… Where should SSIS run? • (Licensing issues aside) SSIS Server
WSRM • Windows System Resource Manager (WSRM) can throttle CPU and memory • Creates a soft throttle • Can be scheduled so SSIS gets priority on weekends and nights • Only activates policy if resources begin to become constrained (about 70%) • WSRM is free with Windows Server 2003 Enterprise Edition and included in Windows Server 2008
WSRM Creating a soft schedule cap Demo
Summary • Planning • Don’t underestimate the power of the whiteboard! • Use the right tool for the right job • Leverage the power of the engine • Patterns and Practices • Understand best practices • But don’t be afraid to experiment
The End Already? Questions http://www.bidn.com/people/brianknight @BrianKnight bknight@pragmaticworks.com http://www.youtube.com/pragmaticworks