880 likes | 1.05k Views
Oracle9i: The Data Warehouse Database OracleWorld 2002 #31348. Ian Abramson IAS Inc. ian@abramson.ca. Agenda. The database Partitioning External Tables Parallel Execution Materialized Views The Optimizer Bit-map join indexes The SQL CUBE and ROLLUP Rolling Windows
E N D
Oracle9i:The Data Warehouse DatabaseOracleWorld 2002 #31348 Ian Abramson IAS Inc. ian@abramson.ca
Agenda • The database • Partitioning • External Tables • Parallel Execution • Materialized Views • The Optimizer • Bit-map join indexes • The SQL • CUBE and ROLLUP • Rolling Windows • Multi-table Inserts • OLAP Overview • Data Mining Overview
Oracle9i for Data Warehousing Oracle 7.3 • Hash Join • Bitmap Indexes • Parallel-Aware Optimizer • Partition Views • Instance Affinity: Function Shipping • Parallel Union All • Asynchronous Read-Ahead • Histograms • Anti-Join Oracle 8.0 Oracle8i • Partitioned Tables and Indexes • Partition Pruning • Parallel Index Scans • Parallel Insert, Update, Delete • Parallel Bitmap Star Query • Parallel ANALYZE • Parallel Constraint Enabling • Server Managed Backup/Recovery • Point-in-Time Recovery • Hash and Composite Partitioning • Resource Manager • Progress Monitor • Adaptive Parallel Query • Server-based Analytic Functions • Materialized Views • Transportable Tablespaces • Direct Loader API • Functional Indexes • Partition-wise Joins • Security Enhancements Oracle9i • List Partitioning • Bitmap Join Index • Dynamic Aggregation Buffersize • Materialized Intermediate Results • Grouping Sets • Concatenated Grouping Sets • Aggregate Pruning • New Analytic Functions • Self-Tuning Execution Memory • System Managed Undo • Dynamic Resizing of Buffer Pool • ETL Infrastructure • and much more ...
Metadata Relational OLAP Data Mining ETL Oracle9.2i Database
Metadata Portal Query & Reporting BI Components Real-time Personalization Oracle9.2i Application ServerRuns All Your Business Intelligence Applications
Oracle9iDB Oracle9iAS Data Warehousing Metadata Portal Query & Reporting ETL BI Components OLAP Real-Time Personalization Data Mining Oracle9i R2 DW Architecture
OLAPEngine DataIntegrationEngine Data Warehouse Engine MiningEngine The Old Way:Everything is a Different Product DataSources
Oracle9i Data Warehousing ETL OLAP Data Mining The New Way: Oracle9i • All aspects of architecture are integrated
The Oracle9i DW Database Computers are useless. They only give you answers Picasso
Partitioning Advantages • Separates data in separate pieces • Partition key defined at creation • Partition pruning • Partition-wise joins • May partition • Tables • Indexes
Partition table Approach • Single table Approach Traditional vs. Partitions
More Advantages • Partitions - separate physical entities • physical attributes (PCTFREE, PCTUSED, INITRANS,MAXTRANS) may vary for different partitions of the same table or index • Different partitions - different tablespaces • minimizes the impact of data corruption • independent back up and recovery of each partition • balance the I/O load
Partition Options • Range • Hash • Composite • List • Range-List
Range Partitioning • Range partitioning - maps rows to partitions based on ranges of column values CREATE TABLE Sales_by_department (Department NUMBER, SalesId NUMBER, Amount NUMBER ) PARTITION BY RANGE ( Department ) PARTITION single_digits VALUES LESS THAN (10) TABLESPACE sd_low, PARTITION double_digits VALUES LESS THAN (100) TABLESPACE dd_middle, PARTITION multiple_digits VALUES LESS THAN (maxvalue) TABLESPACE md_high);
Hash Partitioning • maps rows to partitions based on a hash value of the partitioning key (oracle internally determines that) CREATE TABLE Sales_by_department (Department NUMBER, SalesId NUMBER, Amount NUMBER ) PARTITION BY HASH ( SalesID ) PARTITION hash_name1 TABLESPACE hash_name1_tbls, PARTITION hash_name2 TABLESPACE hash_name2_tbls, PARTITION hash_name3 TABLESPACE hash_name3_tbls);
Composite Partitioning • Partitions data using the range method, and within each partition sub-partitions it using the hash method CREATE TABLE Sales_by_department (Department NUMBER, SalesId NUMBER, Amount NUMBER ) PARTITION BY RANGE ( Department ) SUBPARTITION BY HASH (SalesID) SUPPARTITIONS 2 STORE IN (sub1_tbls, sub2_tbls) (PARTITION single_digits VALUES LESS THAN (10) TABLESPACE sd_tbls, PARTITION double_digits VALUES LESS THAN (100) TABLESPACE dd_tbls, PARTITION multiple_digits VALUES LESS THAN (maxvalue) TABLESPACE md_tbls) );
List Partitions CREATE TABLE sales_list ( salesman_id NUMBER(5), salesman_name VARCHAR2(30), sales_state VARCHAR2(20), sales_amount NUMBER(10), sales_date DATE) PARTITION BY LIST(sales_state) (PARTITION sales_west VALUES('California', 'Hawaii'), PARTITION sales_east VALUES ('New York', 'Virginia', 'Florida'), PARTITION sales_central VALUES('Texas', 'Illinois') PARTITION sales_other VALUES(DEFAULT) );
Range-List Partitions CREATE TABLE quarterly_regional_sales (deptno NUMBER, item_no VARCHAR2(20), txn_date DATE, txn_amount NUMBER, state VARCHAR2(2)) PARTITION BY RANGE (txn_date) SUBPARTITION BY LIST (state) (PARTITION q1_2002 VALUES LESS THAN(TO_DATE('1-APR-2002','DD-MON-YYYY')) (SUBPARTITION q1_2002_northwest VALUES ('OR', 'WA'), SUBPARTITION q1_2002_southwest VALUES ('AZ', 'UT', 'NM'), SUBPARTITION q1_2002_northeast VALUES ('NY', 'VM', 'NJ'), SUBPARTITION q1_2002_southeast VALUES ('FL', 'GA'), SUBPARTITION q1_2002_northcentral VALUES ('SD', 'WI'), SUBPARTITION q1_2002_southcentral VALUES ('NM', 'TX')), PARTITION q2_2002 VALUES LESS THAN(TO_DATE('1-JUL-2002','DD-MON-YYYY')) …
Partition Maintenance Functions • Add • Drop • Exchange • Move • Split and Merge • Truncating • Coalesce (Hash only)
Partition Exchange • Allows you to create data in a separate table and then replace a partition with the table • Validate or don’t validate is the question • Actually exchanges table and partition • Nice for archiving SQL> run 1 alter table sales_transactions 2 exchange partition sales_feb_2000 3 with table load_sales 4 including indexes 5* without validation Table altered.
Partition Notes • Separate partitions - separate tablespaces • Beware of MAXVALUE in rolling-window • Use naming conventions • range partitions --> table_name_YYYY_MM_DD table_name_tbls_YYYY_MM_DD • No global indexes for hash partitions
External Tables • Data resides on operating system • Table definition resides in database (SYSTEM tablespace) • Need to define: • Directory for files • Table definition • Read-only • No indexes • Problems exhibit themselves at SELECT time
Oracle8i: Multiple staging tables and SQL statements Staging Table Staging Table Step 1: Load into staging table Step 2: Transform data using function Step 3: Insert and update into target table TARGET FLAT FILES UPSERT LOAD TRANS- FORM Oracle9i: Single SQL statement Oracle9i: Parallel pipelining of data Oracle9i: ETL Scenario
External Tables the SQL • Create directory: CREATE DIRECTORY data_dir AS 'd:\wkdir'; CREATE DIRECTORY log_dir AS 'c:\TEMP';
CREATE TABLE products_delta( PROD_ID NUMBER(6), PROD_NAME VARCHAR2(50), PROD_DESC VARCHAR2(4000), PROD_SUBCATEGORY VARCHAR2(50), PROD_SUBCAT_DESCVARCHAR2(2002), PROD_CATEGORYVARCHAR2(50), PROD_CAT_DESCVARCHAR2(2002), PROD_WEIGHT_CLASS NUMBER(2), PROD_UNIT_OF_MEASURE VARCHAR2(20), PROD_PACK_SIZE VARCHAR2(30), SUPPLIER_ID NUMBER(6), PROD_STATUS VARCHAR2(20), PROD_LIST_PRICE NUMBER(8,2), PROD_MIN_PRICE NUMBER(8,2) ) ORGANIZATION external (TYPE oracle_loader DEFAULT DIRECTORY data_dir ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII BADFILE log_dir:'prod_delta.bad_xt‘ LOGFILE log_dir:'prod_delta.log_xt‘ FIELDS TERMINATED BY "|" LDRTRIM) Location ('prodDelta.dat') ) REJECT LIMIT UNLIMITED NOPARALLEL; External Tables the SQL (2)
Parallel Execution • Do more work at the same time • Best for: • Large table scans • Creation of large indexes • Partition table scans • Bulk inserts, updates and deletes • Aggregations and summarizations • System characteristics • SMP, MPP, Clusters • Sufficient I/O bandwidth • Under-utilized CPU • Sufficient memory
Getting Parallel to Work • In the init.ora parallel_automatic_tuning=TRUE parallel_max_servers=n • 2 * DOP * # of concurrent users parallel_min_servers=n • 0 or ??(your choice) large_pool_size & shared_pool_size mem in bytes = (3 x size x users x groups x connections) • SIZE = PARALLEL_EXECUTION_MESSAGE_SIZE • USERS = the number of concurrent parallel execution users that you expect to have running with the optimal DOP • GROUPS = the number of query server process groups used for each query • A simple SQL statement requires only one group. However, if your queries involve subqueries which will be processed in parallel, then Oracle uses an additional group of query server processes. • CONNECTIONS = (DOP2 + 2 x DOP)
Materialized Views • Provide summary tables • A “Physical” view • Performance gains are significant • Similar to snapshots • Refresh may be FAST (need log on master) • COMPLETE • FORCE • Oracle packages provide help and guidance DBMS_MVIEW.EXPLAIN_MVIEW ('SH.CAL_MONTH_SALES_MV');
Materialized Views • Enable it in the init.ora • Query rewrite QUERY_REWRITE_ENABLED = TRUE QUERY_REWRITE_INTEGRITY=TRUSTED • May enable at session level as well
Creating a Materialized View CREATE MATERIALIZED VIEW cust_sales_mv PCTFREE 0 STORAGE (initial 8k next 8k pctincrease 0) BUILD IMMEDIATE REFRESH FORCE ENABLE QUERY REWRITE AS SELECT c.cust_id, SUM(amount_sold) AS dollar_sales FROM sales s, customers c WHERE s.cust_id= c.cust_id GROUP BY c.cust_id ;
Collecting the Details • Use the DBMS_MVIEW supplied package • Has many other packages to help you with your materialized views. • Today you get one! EXECUTE DBMS_MVIEW.EXPLAIN_MVIEW('SH.CAL_MONTH_SALES_MV');
Verifying a Materialized View SELECT capability_name, possible, SUBSTR(related_text,1,8) AS rel_text, SUBSTR(msgtxt,1,60) AS msgtxt FROM MV_CAPABILITIES_TABLE ORDER BY seq; CAPABILITY_NAME P REL_TEXT MSGTXT --------------- - -------- ------ PCT N (Partition Change Tracking) REFRESH_COMPLETE Y REFRESH_FAST N REWRITE Y PCT_TABLE N SALES no partition key or PMARKER in select list PCT_TABLE N TIMES relation is not a partitioned table REFRESH_FAST_AFTER_INSERT N SH.TIMES mv log must have new values REFRESH_FAST_AFTER_INSERT N SH.TIMESmv log must have ROWID REFRESH_FAST_AFTER_INSERT N SH.TIMESmv log does not have all necessary columns REFRESH_FAST_AFTER_INSERT N SH.SALES mv log must have new values REFRESH_FAST_AFTER_INSERT N SH.SALES mv log must have ROWID REFRESH_FAST_AFTER_INSERT N SH.SALES mv log does not have all necessary columns REFRESH_FAST_AFTER_ONETAB_DML N DOLLARS SUM(expr) without COUNT(expr) REFRESH_FAST_AFTER_ONETAB_DML N see the reason why REFRESH_FAST_AFTER_INSERT is disabled REFRESH_FAST_AFTER_ONETAB_DML N COUNT(*) is not present in the select list REFRESH_FAST_AFTER_ONETAB_DML N SUM(expr) without COUNT(expr) REFRESH_FAST_AFTER_ANY_DML N see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled REFRESH_FAST_AFTER_ANY_DML N SH.TIMES mv log must have sequence REFRESH_FAST_AFTER_ANY_DML N SH.SALES mv log must have sequence REFRESH_PCT N PCT is not possible on any of the detail tables in the materialized view REWRITE_FULL_TEXT_MATCH Y REWRITE_PARTIAL_TEXT_MATCH Y REWRITE_GENERAL Y REWRITE_PCT N PCT is not possible on any detail tables
Dimensions • Categorizes data • Provides hierarchy guidance • Allows roll-up and roll-down • Needed for query rewrite • Needed for materialized views • OEM has Dimension Wizard Category Sub-Category Product
Dimensions the SQL CREATE DIMENSION products_dim LEVEL product IS (products.prod_id) LEVEL subcategory IS (products.prod_subcategory) LEVEL category IS (products.prod_category) HIERARCHY prod_rollup ( product CHILD OF subcategory CHILD OF category ) ATTRIBUTE product DETERMINES (products.prod_name, products.prod_desc, prod_weight_class, prod_unit_of_measure, prod_pack_size,prod_status, prod_list_price, prod_min_price) ATTRIBUTE subcategory DETERMINES (prod_subcategory, prod_subcat_desc) ATTRIBUTE category DETERMINES (prod_category, prod_cat_desc);
The Optimizer is a Star • Star Transform must be enabled STAR_TRANSFORMATION_ENABLED=TRUE • Requires Bit Mapped indexes or bit mapped join index on foreign key columns CREATE BITMAP INDEX sales_c_state_bjix ON sales(customers.cust_state_province) FROM sales, customers WHERE sales.cust_id = customers.cust_id LOCAL NOLOGGING COMPUTE STATISTICS; • Cost-based optimizer must be used • Optimizer looks for small set of dimensions to satisfy query, even if large number of rows in fact
The Star SQL SELECT ch.channel_class, c.cust_city, t.calendar_quarter_desc, SUM(s.amount_sold) sales_amount FROM sales s, times t, customers c, channels ch WHERE s.time_id = t.time_id AND s.cust_id = c.cust_id AND s.channel_id = ch.channel_id AND c.cust_state_province = 'CA' AND ch.channel_desc in ('Internet','Catalog') AND t.calendar_quarter_desc IN (‘2001-Q1',‘2001-Q2') GROUP BY ch.channel_class, c.cust_city, t.calendar_quarter_desc;
How the Optimizer Sees It SELECT ch.channel_class, c.cust_city, t.calendar_quarter_desc, SUM(s.amount_sold) sales_amount FROM sales WHERE time_id IN (SELECT time_id FROM times WHERE calendar_quarter_desc IN(‘2001-Q1',‘2001-Q2')) AND cust_id IN (SELECT cust_id FROM customers WHERE cust_state_province='CA') AND channel_id IN (SELECT channel_id FROM channels WHERE channel_desc IN('Internet','Catalog'));
Star Transform Restrictions • The star transform use is restricted: • A hint tells optimizer to not use a bitmap index • Query contains bind variables • Not enough bitmap indexes • The fact table is a remote table • The optimizer may not choose the star: • Tables have a single access path • Tables are too small to be useful • Database is in read-only mode
Sales(Customer.state) Sales(Customer.state) Index key is Customer.State Indexed table is Sales Bitmap Join Indexes Sales Customer CREATE BITMAP INDEX cust_sales_bji ON Sales(Customer.state) FROM Sales, Customer WHERE Sales.cust_id = Customer.cust_id;
Resumable Transactions • Suspend transactions • Ability to resume these transactions • Only errors currently handled: • SQL statements that run out of TEMP space • DML/Export/CREATE TABLE as … • Space limits • Out-of-space transaction • Exceed space quota • Full support of locally managed tablespaces • Great for large DW queries
Resumable the SQL • Enable Resumable transactions ALTER SESSION ENABLE RESUMABLE TIMEOUT 1200; • The Transaction: CREATE TABLE sales_prod_dept ( prod_category, prod_subcategory, cust_id, time_id,channel_id,promo_id, quantity_sold, amount_sold ) NOLOGGING TABLESPACE transfer PARTITION BY LIST (prod_category) ( PARTITION boys_sales values ('Boys'), PARTITION girls_sales values ('Girls'), PARTITION men_sales values ('Men'), PARTITION women_sales values ('Women') ) AS SELECT p.prod_category, p.prod_subcategory, s.cust_id, s.time_id, s.channel_id, s.promo_id, SUM(s.amount_sold) amount_sold, SUM(s.quantity_sold) quantity_sold FROM sales s, products p, times t WHERE p.prod_id=s.prod_id AND s.time_id = t.time_id AND t.fiscal_year=2002 GROUP BY prod_category, prod_subcategory,cust_id, s.time_id, channel_id, promo_id ;
The Proof it Stopped SELECT name, status, error_msg FROM dba_resumable;
The SQL • Oracle provides many analytical functions • Cross tabular reports • Ranking functions • Percentile functions • Regression analysis • Moving windows • Ratios within a report • CASE statements
CUBE, ROLLUP and GROUP functions • Efficient data access • Creates matrix of totals • Crosstabs are computed for you • CUBE provides all totals • GROUPING provides guidance of totals
ROLLUP the SQL SELECTchannel_desc, calendar_month_desc, country_id, TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$ FROM sales, customers, times, channels WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id AND sales.channel_id= channels.channel_id AND channels.channel_desc IN ('Direct Sales', 'Internet') AND times.calendar_month_desc IN ('2002-09', '2002-10') AND country_id IN ('CA', 'US') GROUP BY ROLLUP(channel_desc,calendar_month_desc,country_id);
ROLLUP Results CHANNEL_DESC CALENDAR CO SALES$ ------------- -------- -- -------------- Direct Sales 2002-09 CA 1,378,126 Direct Sales 2002-09 US 2,835,557 Direct Sales 2002-09 4,213,683 BY Channel and Month Direct Sales 2002-10 CA 1,388,051 Direct Sales 2002-10 US 2,908,706 Direct Sales 2002-10 4,296,757 BY Month Direct Sales 8,510,440 BY Channel Internet 2002-09 CA 911,739 Internet 2002-09 US 1,732,240 Internet 2002-09 2,643,979 BY Channel and Month Internet 2002-10 CA 876,571 Internet 2002-10 US 1,893,753 Internet 2002-10 2,770,324 BY Month Internet 5,414,303 BY Channel 13,924,743For Everything
CUBE the SQL SELECTchannel_desc, calendar_month_desc, country_id, TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$ FROM sales, customers, times, channels WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id AND sales.channel_id= channels.channel_id AND channels.channel_desc IN ('Direct Sales', 'Internet') AND times.calendar_month_desc IN ('2002-09', '2002-10') AND country_id IN ('CA', 'US') GROUP BY CUBE (channel_desc,calendar_month_desc,country_id);
CUBE the Results CHANNEL_DESC CALENDAR CO SALES$ -------------------- -------- -- ---------- Direct Sales 2002-09 CA 1,378,126 Direct Sales 2002-09 US 2,835,557 Direct Sales 2002-09 4,213,683 BY Channel and Month Direct Sales2002-10CA 1,388,051 Direct Sales 2002-10 US 2,908,706 Direct Sales 2002-10 4,296,757 BY Channel and Month Direct Sales CA 2,766,177 BY Channel and Country Direct Sales US 5,744,263 Direct Sales 8,510,440 BY Channel Internet 2002-09 CA 911,739 Internet 2002-09 US 1,732,240 Internet 2002-09 2,643,979 BY Channel and Month Internet 2002-10 CA 876,571 Internet 2002-10 US 1,893,753 Internet 2002-10 2,770,324 BY Channel and Month Internet CA 1,788,310 BY Channel and Country Internet US 3,625,993 Internet 5,414,303 BY Channel 2002-09 CA 2,289,865 BY Month and Country 2002-09 US 4,567,797 2002-09 6,857,662 BY Month 2002-10 CA 2,264,622 2002-10 US 4,802,459 2002-10 7,067,081 CA 4,554,487 US 9,370,256 13,924,743 Everything