310 likes | 348 Views
IBM Information Server. Simplifying the Creation of the Data Warehouse. The New Role of the Data Warehouse. The data warehouse is becoming a more active and integrated participant in enterprise architectures A source of the best information in the business Active source of analytics
E N D
IBM Information Server Simplifying the Creation of the Data Warehouse
The New Role of the Data Warehouse • The data warehouse is becoming a more active and integrated participant in enterprise architectures • A source of the best information in the business • Active source of analytics • Because of this, the data warehouse has new requirements • Must be more flexible and adaptable to change • Must have trustworthy, auditable information • Must represent the business view • Must be capable of scaling to meet ever-growing information volumes
Critical Success Factors for Data Warehousing Metadata-Driven Design Acceleration & Automation Auditable Data Quality • Ensure quality embedded in data flows • Understand quality changes over time • Provide proof of quality and lineage Collaboration • Automate connection between design and build tasks • Provide in-tool metadata visibility • Easily connect to any data source • Seamless flow of metadata across roles • Shared understanding between business & IT • Team development Scalability Reuse • Seamless expansion of capacity • Resource estimation • Accurate performance analysis and balancing • Integrated object search • Object reuse optimization • Reuse of data flows through shared services Data Warehouse
The IBM Solution: IBM Information ServerDelivering information you can trust IBM Information Server Unified Deployment Transform Deliver Understand Cleanse Discover, model, and govern information structure and content Standardize, merge, and correct information Combine and restructure information for new uses Synchronize, virtualize and move information for in-line delivery Unified Metadata Management Parallel Processing Rich Connectivity to Applications, Data, and Content
Critical Success Factors for Data Warehousing Metadata-Driven Design Acceleration & Automation Auditable Data Quality • Ensure quality embedded in data flows • Understand quality changes over time • Provide proof of quality and lineage Collaboration • Automate connection between design and build tasks • Provide in-tool metadata visibility • Easily connect to any data source • Seamless flow of metadata across roles • Shared understanding between business & IT • Team development Scalability Reuse • Seamless expansion of capacity • Resource estimation • Accurate performance analysis and balancing • Integrated object search • Object reuse optimization • Reuse of data flows through shared services Data Warehouse
Subject Matter Experts, Data Stewards Architects Data Admin Implementers Analysts Collaboration for Data Warehouse Design Collaboration Rational Data Architect IBM Business Glossary IBM Information Analyzer IBM QualityStage IBM DataStage Data-driven analysis, reporting, monitoring, data rule and integration specification Database application and transformation development Metadata and data-driven data modeling and management Business definition & ontology mapped to physical data IBM Metadata Server • Facilitate change management & reuse • Increase compliance to standards • Increase trust and confidence in information • Simplify integration
Collaborative Metadata: From Analysis to Build Collaboration • Common metamodel provides seamless flow of metadata • Analysis activities populate information into data flow design • DataStage users can see the table metadata from Information Analyzer • Analysis Results and notes visible • Provides insight into quality of source • Provides guidance on how flow should be defined • Notes allow free-form collaboration across roles – ensuring knowledge is completed transferred from analysis to build
Critical Success Factors for Data Warehousing Metadata-Driven Design Acceleration & Automation Auditable Data Quality • Ensure quality embedded in data flows • Understand quality changes over time • Provide proof of quality and lineage Collaboration • Automate connection between design and build tasks • Provide in-tool metadata visibility • Easily connect to any data source • Seamless flow of metadata across roles • Shared understanding between business & IT • Team development Scalability Reuse • Seamless expansion of capacity • Resource estimation • Accurate performance analysis and balancing • Integrated object search • Object reuse optimization • Reuse of data flows through shared services Data Warehouse
Easily Embed Data Quality with Unified Design Auditable Data Quality • One design experience • Speeds Development time • Extended User orientation in a simplified design environment • Performance Oriented
Measure Data Quality Over Time Using Baseline Reporting Auditable Data Quality • Compare quality results to a baseline to understand quality changes over time • Embed profiling tasks into sequencer to take before and after snapshots of data quality and rules adherence
Critical Success Factors for Data Warehousing Metadata-Driven Design Acceleration & Automation Auditable Data Quality • Ensure quality embedded in data flows • Understand quality changes over time • Provide proof of quality and lineage Collaboration • Automate connection between design and build tasks • Provide in-tool metadata visibility • Easily connect to any data source • Seamless flow of metadata across roles • Shared understanding between business & IT • Team development Scalability Reuse • Seamless expansion of capacity • Resource estimation • Accurate performance analysis and balancing • Integrated object search • Object reuse optimization • Reuse of data flows through shared services Data Warehouse
In-tool Metadata Visibility Metadata-Driven Design Impact Analysis: • Find dependencies …What does this item depend on? -Find where used …Where is this item used? Results shown using the Advanced Find window
Job Difference – Integrated report Metadata-Driven Design Difference report displayed in Designer - jobs opened automatically from report hot links • Options available to: • Print report • Save report as HTML
Slowly Changing Dimension Design Acceleration Metadata-Driven Design • New engine capabilities • Surrogate Key management • Updatable in-memory lookups • New & enhanced stages • Surrogate Key Generator • Slowly Changing Dimension • Single stage per Dimension • Quick setup and definition • Easy single point of maintenance
Rapid Connectivity: Common Connectors Metadata-Driven Design Connection objects allow properties to be dropped onto stage Test the connection instantly Diagram lets you select the link to edit as though your on the canvas Parameter button on every field Graphical ODBC specific SQL builder Warning sign tells you which fields are mandatory
Critical Success Factors for Data Warehousing Metadata-Driven Design Acceleration & Automation Auditable Data Quality • Ensure quality embedded in data flows • Understand quality changes over time • Provide proof of quality and lineage Collaboration • Automate connection between design and build tasks • Provide in-tool metadata visibility • Easily connect to any data source • Seamless flow of metadata across roles • Shared understanding between business & IT • Team development Scalability Reuse • Seamless expansion of capacity • Resource estimation • Accurate performance analysis and balancing • Integrated object search • Object reuse optimization • Reuse of data flows through shared services Data Warehouse
Reuse: Find It Reuse • Find item in Repository tree • In-place find • Find by Name (Full or Partial) • Wild card support • Find next… • Filter on type
Find – Advanced Search Criteria Reuse • Search on following criteria: • Object type • Job, Table Definition, Stage etc. • Creation • Date/Time • By User • Last Modification • Date/Time • By User • Where Used • What other objects use this object? • Dependencies of • What does this object use? • Options • Case • Match on “name & description” or “name or description”
Reuse: Connection Objects Reuse • Allows saving of a re-usable connection path to a specific source or target • Username, password, db name etc. • Can be used for: • Stage connection properties • Loading onto a stage in the stage editor • Drag ‘n’ drop from Repository tree • Meta data import from that source or target • Drag ‘n’ drop table imported from that source or target onto canvas to create a pre-configured stage instance
Reuse: Job Parameter Sets Reuse • New object in repository that contains the names and values of job parameters. • A Job Parameter set can be referenced by one or more jobs enabling easier deployment of jobs across machines and also enabling easy propagation of a changed job parameter value
Reuse: Simply Deploy Data Flows as Shared Services Reuse • Automates the creation of information integration services including federation • Provides fundamental infrastructure services (security, logging, monitoring) • Provisions to leading web services JMS, EJB and SOAP over HTTP • Provides load balancing & fault tolerance for requests across multiple Service providers
Critical Success Factors for Data Warehousing Metadata-Driven Design Acceleration & Automation Auditable Data Quality • Ensure quality embedded in data flows • Understand quality changes over time • Provide proof of quality and lineage Collaboration • Automate connection between design and build tasks • Provide in-tool metadata visibility • Easily connect to any data source • Seamless flow of metadata across roles • Shared understanding between business & IT • Team development Scalability Reuse • Seamless expansion of capacity • Resource estimation • Accurate performance analysis and balancing • Integrated object search • Object reuse optimization • Reuse of data flows through shared services Data Warehouse
Job Performance Analysis Scalability A new visualization tool which: • Provides deeper insight into runtime job behavior. • Offers several categories of visualizations, including: • Record Throughput • CPU Utilization • Job Timing • Job Memory Utilization • Physical Machine Utilization • Hides runtime complexity by emphasizing the stages the customer placed on the designer canvas.
Record Throughput Scalability • Breakdown of records read and records written per second. • Filters to show one line for each link drawn on the canvas initially. • Names used to refer to each dataset are the actual stage names on the canvas. • Advanced users can turn off filters and see every runtime dataset, including the inner operators of composites, and inserted operator datasets. • One tab for each partition, as well as the ability to show an overlay view including every partition for smaller jobs.
CPU Utilization Scalability Total CPU and System Time • Visualizes the time in CPU of each operator. • Shows what operators were dominating the CPU at different points during the run. • Percentage view shows what percentage of the CPU load of the job each stage on the canvas was responsible for. • Inserted operators and Composite sub-operators automatically get bundled up in these results. • Advanced users can see combination, which will change this chart to reflect each process and the stages contained within. Percentage CPU Pie Chart
Physical Machine Utilization Scalability Average Process Distribution Disk Throughput Free Memory Whisker Box Percent CPU Utilization
Resource Estimation Scalability • Provides estimates for required disk space and CPU utilization. Helps with: • Job design –detect bottlenecks and optimize transformation logic to improve performance • Error protection – run with a range of data of particular interest for a better protection from job aborts due to bad data formats or insufficient null-handling • Resource allocation – determine allocation of scratch space and disk space to protect the job from aborts due to lack of space • Two Statistical Models • Static – provides worst case disk space estimates based on schema and job design. • Dynamic – Runs job and statistically samples actual resource usage. Then provides calculated estimates per node
Resource Estimation Tool Layout Scalability
Migration Path • Seamless upgrade for WebSphere DataStage users into the IBM Information Server • All DataStage jobs along with all other objects will migrate into the IBM Information Server • Upgrade for WebSphere QualityStage into the IBM Information Server • Existing QualityStage projects will migrate into the IBM Information Server • Conversion utilities for Standardize, Match, and Survive stages • All other stages will continue to execute
Metadata-Driven Design Acceleration & Automation Collaboration Auditable Data Quality Scalability Reuse Data Warehouse Summary • Data Warehouses are becoming a tier one operational system in many companies • They must be able to adapt to change more quickly, must have authoritative information, and must be scalable • Platforms for building data warehouses must support metadata-driven design, collaboration, reuse, and auditable data quality, and must be able to scale to support growing data volumes • The IBM Information Server provides all of this in a unified platform IBM Information Server