510 likes | 710 Views
Lecture 9 Indexes, Optimizer, Hints. Oracle Database Administration. Indexes in Oracle. Indexes are used to: provide faster access to data help enforce primary key and unique constraints help enforce foreign key constraints Index types: B-Tree indexes (default) Bitmap indexes.
E N D
Lecture 9 Indexes, Optimizer, Hints Oracle Database Administration
Indexes in Oracle • Indexes are used to: • provide faster access to data • help enforce primary key and unique constraints • help enforce foreign key constraints • Index types: • B-Tree indexes (default) • Bitmap indexes
Indexes in Oracle • B-Tree Index can be: • unique – each index value except NULL must be unique • non unique • single column – one column is indexed • multiple column (compound index) – multiple columns are indexed • NULL values in B-Tree index are ignored (are not indexed)
Compound indexes • Column values are combined together in the order they appear in CREATE INDEX statement Statements: CREATE INDEX IND1 ON EMP(ID, SALARY); CREATE INDEX IND2 ON EMP(SALARY,ID); create two different indexes • Compound index value is NULL if all the columns are NULL
Index usage • Index can be used by a SELECT statement to: • limit number of rows to be processed by the WHERE clause • order the results of the SELECT statement • join two tables • Index can be accessed in the following ways: • unique index scan (0 or 1 rows) • non unique index scan (any number of rows) • index range scan (any number of rows)
Oracle optimizer • Many statements can be executed in a different way • Each statement executed is analyzed by the Oracle optimizer • Oracle optimizer creates execution plan for the statement
Oracle optimizer • Execution plan defines: • what tables to access and how to access them (access path) • the order of operations (join order) • join method / methods • the optimizer can rewrite statement to different one, as long as the results of the statement are the same
Access paths • Access path is a way of accessing table data. • Example access paths: • FULL – full table scan (all rows are searched) • ROWID – table access by rowid (the fastest) • INDEX_ASC, INDEX_DES – access table using an index • INDEX_FFS – index fast full scan • INDEX_SS – index skip scan
Full table scan • Full table scan is relatively expensive • It is used when: • entire table needs to be scanned (for example in SELECT * FROM table) • table is small (for example: 100 blocks) • there is no index that can be used • there are indexes, but expected number of rows is large and total cost of execution would be larger when using indexes
Full table scan example • LIKE '%...' operator cannot use index • Table test1 contains 2000 rows • Optimizer estimates (guesses) that 100 rows will be returned from the query
Full table scan example • SELECT * FROM emp WHERE gender = 'F' • approximately 50% of the rows have gender = 'F' • using an index is inefficient when 50% of the rows are to be searched – Oracle performs full table scan • this index has low selectivity – this column is not a good candidate for B-Tree index
Index range scan • Index range scan uses index to search for multiple rows • Index contains: • ROWID of the actual row • indexed values • Index can be used to: • locate row using ROWID • get indexed value directly from the index
Index range scan example 1 • Index is used to get ROWID of the table row • ROWID retrieved from index is used to get data from table data block
Index range scan example 2 • Since SALARY column exists in the index IDX1, value is retrieved directly from the index, without accessing the table
Index fast full scan • Index fast full scan is sometimes used instead of full table scan • Index FFS can be used when data to be retrieved in the query is in the index • For FFS of compound indexes, the order of columns is not important
Join operations • When two tables are joined, the optimiser decides: • join order – which table will be accessed first, which second • join method – how the tables will be joined. Available join methods include: • nested loops • merge join (backward compatibility only) • hash join • anti join, semi join
Nested loops join • For each row in the first table: • find all rows in the second table that match the where clause
Merge join • Sort rows in the first table by the join key • Sort rows in the second table by the join key • Merge sorted rows (in a single pass)
Hash join • Similar to merge join • Instead of sorting – hash table of all rows indexed by the join key is used
Semi join • Statement with EXISTS clause SELECT D.NAME FROM DEPT D WHERE EXISTS (SELECT * FROM EMP E WHERE D.ID = E.DEPT_ID) SELECT D.NAME FROM DEPT D WHERE D.ID IN (SELECT E.DEPT_ID FROM EMP E WHERE E.SALARY > 2000)
Anti join • Statement with NOT EXISTS or NOT IN clause SELECT D.NAME FROM DEPT D WHERE NOT EXISTS (SELECT * FROM EMP E WHERE D.ID = E.DEPT_ID) SELECT D.NAME FROM DEPT D WHERE D.ID NOT IN (SELECT E.DEPT_ID FROM EMP E WHERE E.SALARY > 2000)
Execution plan • sample EXPLAIN PLAN command output: Rows Execution Plan -------- ---------------------------------------------------- 12 SORT AGGREGATE 2 SORT GROUP BY 76563 NESTED LOOPS 76575 NESTED LOOPS 19 TABLE ACCESS FULL CN_PAYRUNS_ALL 76570 TABLE ACCESS BY INDEX ROWID CN_POSTING_DETAILS_ALL 76570 INDEX RANGE SCAN (object id 178321) 76563 TABLE ACCESS BY INDEX ROWID CN_PAYMENT_WORKSHEETS_ALL 11432983 INDEX RANGE SCAN (object id 186024)
Oracle optimizer • initialization parameter OPTIMIZER_MODE can take values: • first_rows • first_rows_1, first_rows_10, first_rows_100, first_rows_1000, • all_rows, • choose
Cost based optimizer • Cost based optimizer tries to estimate real cost of executing statement (cpu, memory, io operations) • CBO uses statistics to estimate execution cost • The following types of statistics exist: • table statistics • column statistics • index statistics • column histograms
Table statistics • Table statistics include: • number of rows in a table • total number of blocks • number of free blocks • average row length • number of chained blocks • Table statistics can be viewed in DBA_TABLES view
Column statistics • Column statistics include: • minimum value • maximum value • number of distinct values • number of NULL values • Table statistics can be viewed in DBA_TAB_COLS view
Index statistics • Index statistics include: • BTree level • number of leaf blocks • number of distinct keys • average leaf blocks per key • average data blocks per key • number of rows • Index statistics can be viewed in DBA_INDEXES view
Column histograms • Column histograms can be computed for columns with non-uniform distribution • Example: select * from emp where salary between 0 and 1000 • Without histogram Oracle estimates number of rows matching the where condition using: • total number of rows (table statistics) • minimum value of salary (column statistics) • maximum value of salary (column statistics)
Column histograms • With histograms Oracle can better estimate number of rows that match WHERE condition • Histograms are not useful if: • column distribution is uniform (normal statistics are enough) • column is accessed using bind variable: select * from emp where salary between ? and ?
Gathering statistics • There are many ways to gather statistics: • manualy for a table: ANALYZE TABLE employees COMPUTE STATISTICS; ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 10 PERCENT ANALYZE TABLE employees COMPUTE STATISTICS FOR ALL INDEXED COLUMNS • manually for a user: EXECUTE dbms_stats.gather_schema_stats( ownname => 'Username', method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO', cascade=>TRUE);
Automating statistics gathering • Oracle 9i and before: • create a job for computing statistics every day, week or month • Oracle 10g and 11g: • Oracle 10g and 11g automatically gather statistics
Optimizer modes • RULE – force rule based optimizer • FIRST_ROWS – force cost based optimizer, optimize statement based on time of returning the first row • FIRST_ROWS_1, FIRST_ROWS_10, etc. – optimize statement based on time of returning first X rows • ALL_ROWS - force cost based optimizer, optimize statement based on execution of entire statement • CHOOSE – if statistics are present, use CBO, otherwise use RBO
Hints • SQL statements can have hints • hints help optimizer choose best access path • optimizer can ignore hint(s) if: • hint does not make sense in the query • example: using index unrelated to sort order or join condition • hints are in conflict with each other • hints have incorrect syntax
Hints • Hints are included in SQL as comments: SELECT /*+ INDEX(employees emp_idx) */ * FROM employees WHERE id > 100; SELECT /*+ INDEX(emp emp_idx) USE_NL(emp, dept)*/ * FROM employees emp, departments dept, WHERE emp.id > 100 AND emp.dept_id = dept.id • Hints start with /*+ • Hints must immediately follow SELECT, UPDATE, etc. • Hints can be included in sub-selects and in views
Optimizer mode hints • Hint can change global optimizer mode: • ALL_ROWS • FIRST_ROWS • FIRST_ROWS_10, FIRST_ROWS_100, ... – minimum time to return first 10, 100 rows • RULE • CHOOSE • Example SELECT /*+ FIRST_ROWS */ * FROM EMPLOYEES ORDER BY ID ASC;
Access path hints • Access path is a way of accessing table data. The following hints can be used: • FULL – full table scan (all rows are searched) • ROWID – table access by rowid (the fastest) • CLUSTER – cluster scan (only for clustered objects) • HASH – hash scan for cluster objects • INDEX – access table using an index • INDEX_FFS – index fast full scan
Access path hints • INDEX_ASC, INDEX_DESC • INDEX_COMBINE – used for bitmap indexes • NO_INDEX – disables specified index for a query • AND_EQUAL – merge several indexes
Other hints • Query transformation hints • USE_CONCAT – rewrite OR query to UNION ALL • MERGE – merge view into a query • NO_MERGE – disable view merge • Join order hints • ORDERED – forces join of tables in the order in which they appear in the FROM clause (very useful hint!)
Join operation hints • USE_NL – Nested loops • USE_MERGE – join two tables using sort-merge join • USE_HASH – join two tables using hash join • LEADING – select table that is a first table in a join order • HASH_AJ, MERGE_AJ, NL_AJ – hash, merge or nested loops for anti join query • HASH_SJ, MERGE_SJ, NL_SJ – hash, merge or nested loops for semi join query
Samples SELECT /*+ use_hash(employees departments)*/ * FROM employees, departments WHERE employees.department_id = departments.department_id; SELECT * FROM departments WHERE exists (SELECT /*+ HASH_SJ*/ * FROM employees WHERE employees.department_id = departments.department_id AND salary > 200000);
Optimizing SQL statements • Check execution plan (EXPLAIN PLAN command). If possible: • update statistics • create indexes • provide optimizer hints • Check AUTOTRACE output • Optimize entire database
SQLPlus • SET TIMING ON: • shows execution time for each SQL statement • SET AUTOTRACE ON: • shows detailed statistics for each statement: Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 1749 consistent gets 0 physical reads 0 redo size 395 bytes sent via SQL*Net to client 512 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
AUTOTRACE output • recursive calls – how many additional SQL statements were executed • db block gets – how many blocks were processed (read) in memory • consistent gets – how many blocks were processed (read) in memory • physical reads – number of blocks read from disk • redo size – how much redo was generated from this statement
AUTOTRACE output • bytes sent and received via SQL*NET – amount of network traffic generated by the statement • sorts in memory and on disk • number of rows processed (returned) from the statement
Resource intensive statements • long running SELECT statements can have: • large number of recursive calls (complex subselect) • large number of consistent gets • large number of physical reads • large amount of data received over the network • many sorts in memory and on disk
Resource intensive statements • long running UPDATE and DELETE statements can have: • large number of db block gets • large number of physical reads • lot of redo log generated
Solving typical problems • large number of physical reads can mean: • database memory cache is too small • index should be created on one or more tables • large amount of data received over the network: • query results are processed on the client instead on the server • many sorts (especially disk): • order by, group by is used on a very large table without an index. Create index on some columns