Query Tree Disassembly

8 Query Tree Disassembly

The Query Tree - 1 • After the Lexical and SyntacticalAnalyzer has decided that the SQL you have written conforms to the rules • Parser then converts the table, column, view etc. names into object codes and hands the parsed query to the ……

The Query Tree – 2 Optimiser • The Optimiser then “plans” the query in such a way as to break down the complex query into a series of simple steps • This series of simple steps is known as the query tree

Plan and Detail - 1 • There are a series of commands that provide levels of detail about the optimiser plans for a query • The primary command to generate the plan output is Set Option Query_Plan = ON

Just Query_Plan • This generates an output similar to this :-

For more information • The query_detail option provides a much richer functionality in the optimiser output • Mainly the information concerns what is to be passed (or pushed) around in the query tree • To generate detail statistics Set Option Query_Plan = ON Set Option Query_Detail = ON

Query Detail • Query_Detail generates output like this:-

Everything about the query • For all the available information about a query from the optimiser, Secure Statistics is what is required • In IQ 11 this was a hidden option • In ASIQ-M12.4.2 it is still sort of hidden

Secure Stats • Secure Stats provides a very high level of detail about the optimiser plans • Including the optimiser estimates as to the selectivity and join criteria • To use Secure Stats : Set Option Query_Plan = ON Set Option Query_Detail = ON Set Option dml_option10 = ON

dml_option10 Output • dml_option10 generates output like this:-

See the SQL • New Engine startup Option for 12.4.2 • -zo <filename> and –zr sql • This option prints out all the SQL passed to the optimizer for the duration of the server execution • Much better than the ODBC trace log! • These options must be in the config file – not on the server start line

New in 12.4.2 • Option Query_Timing • This option when set on reports in the query tree output the timing of • The overall Query • The Sub-Query Timing • The timing of every node in the Query • For this to work you must have Query_Plan_After_Run set to ON (for obvious reasons!)

Query Timing Request from Higher Node to start processing Answer from Tree: Tree complete Time of First Prepare : 11:15:57.797306 Time of First Prepare Return: 11:15:57.919217 Time of First Fetch Call : 11:15:57.919574 Time of First Fetch Return : 11:16:01.290265 Time of Last Fetch : 11:16:01.292390 Time of Last Complete : 11:16:01.326976 Request for first row Ack. Of first row passed up Answer from Tree: All resources Returned Ack. Of last row passed up

New in 12.4.3 • Option Query_Plan_As_HTML • This option generates the Query Plan as an HTML document – stored in the execution directory of the server. • For this to work the option Query_Name must be set • The output name of the file is then: <UserName>_<Query_Name>.htm Dba_Query7.htm

Execution Output • Of course the options just described are only prior to query execution • The Run Time Engine also generates output • This output is the Command Statistics

Command Statistics • Command Statistics are now documented – sort of! You have to set the COMMAND_STATS option, then use sp_iqcommandstats • During command execution statistics are collected. • Note this is not the case for • Set, Dump, Restore etc. • Stats. are kept in memory until the next time COMMAND_STATS is set to ON.

Statistics collected • Original Command String • Total Execution Time • State of all Buffer Managers, before and after • The set of indexes used, and why • Rows returned • Sub-Query re-execution count • Average time to execute sub-query • Maximum time to execute sub-query • etc...

Command Statistics • Command Statistics provided by Set temporary option command_stats = ON (before the query is run) and sp_iqcommandstats n (after the query is run) • 1 - provides summary info - 1 page • 2 - medium detail (suggested) - 4 pages • 3 - Max Detail (really only for Tech. Support) - 12 pages.

Command Statistics Output

So you have a Query • All of the information relating to the Optimiser and Execution engine relate to the TPCD Query 5, which is :

Building the Query Tree • The first task that I perform on checking out a query is to build the query tree (or run with HTML on) • This is a sort-of upside down tree with all the leaves at the base and the root at the top • Result row flow up from the leaves to the root. Hence the correct name of the Query Tree is the dataflow model

Query Plan • Query Plan Output is mapped to a “tree” • Query Tree consists of “nodes” • Each node represents a stage of work • Each node has a Name and DFO Number • DFO = Data Flow Operator • ‘Top’ of the tree is the ROOT node (highest node#) • Lowest node is the “Leaf”

Root Group By Join Table 2 Table 1 Dataflow Model

Nodes in the Query Plan - 1 17 15 14 11 12 13 The Level Number represents the “depth” in the Query Tree

Nodes in the Query Plan - 2 The DFO Node Number is a Unique Number that identifies the Node. The “chaining” down the tree is shown by the Child Node numbers

Dataflow 17 17 Nodes request Data from the node below Data is sent up the tree 15 15 14 14 11 11 12 13 12 13

Sort Merge Pushdown • The section below shows the Push Down section (TPCD Query 9) Condition 3 (Pushed): (lineitem.l_partkey PROBABLY_IN BV(0, part.p_partkey)) Condition 3 Selectivity: 0.15000000 Condition 3 Index: FP ASIQ_IDX_T193_C2_FP Join comparator: non-null identically typed key PD Filter page count: 2 PD Filter EQ classes possible: 32768 PD Filter EQ classes present: 97 PD Filter EQ class bit density: 0.00296021 PD Filter input row count: 60175 PD Filter output row count: 2901 PD Filter selectivity: 0.04820939 PD Filter est. selectivity: 0.15000000

Node Types • All of the nodes in the query plan have a type • There are three main types of nodes • Vertical Cursors and Filters • Aggregation Nodes (Group By Nodes) • Join Nodes • All of the types of nodes is discussed further

Order By Group By Join Filter Join Vertical Cursor Filter Vertical Cursor Vertical Cursor Definitions • Local Predicates • These are conditions in the WHERE clause which access only one table. • There are two “types” • Vertical Cursor Nodes • Filter Nodes

Vertical Cursor • A vertical cursor the part of a query that can be solved directly using IQ indexes • If there are local predicates, then these can be solved through selective filtering using the indexes directly • The multiple indexes per column allows IQ to use this at the base of ALL table searches • After processing the local predicates the list of rows is “projected” up the index tree as elements of the Fast Project Index for the column concerned • After this point IQ processes the remainder of the query from the result set of FP data for the required columns

Vertical Cursor Node Vertical Cursor Node • When index can satisfy WHERE clause predicate • bitmaps are read directly Example: WHERE avail_qty > 100 • There are three complex Vertical Cursors • Vertical Aggregation Cursor • Vertical Grouping Cursor • Vertical Distinct Cursor • (These are generated by some functions or group by or distinct clauses)

Filter Node - 1 • Filter Node • Used when bitmaps alone cannot satisfy query • Becomes an “index scan” to resolve query WHERE avail_qty * 5 = 20 or WHERE datepart(mm, order_date) = 1 • Check the SQL - the clause may be able to be changed • This may be an area where changing the query could cause a substantial performance improvement

Filter Node - 2 • In IQ 12 the optimiser can perform 2 types of predicate processing at the index level • LIKE predicates can be (and are) pushed down into the index for processing • This ability is also true for very large IN lists (> 1,000 values) • In IQ 12.4 the decision has been moved to a very much greater size of IN list for index processing (around 16,000 entries)

IQ Table Scan • If a query plan has a filter above a vertical cursor, this is an IQ table scan • This can, and usually is expensive • If the SQL can be modified to get the optimiser to “push” the filter down into the vertical cursor then the query should speed up

Predicate Diagnostics Filter - requires a “scan” -- 1 #03: Group By Single -- Child Node 1: #02 -- 2 . #02: Filter -- Child Node 1: #01 -- Condition 1: (customer.account_type IN (1,2,3,4,5,6)) -- 3 . . #01: Vertical Cursor -- Table Name: customer -- Table Row Count: 5000 -- Condition 1: (customer.account_balance > 0) -- Condition 2: (customer.account_balance < 5000) Vertical Cursor - satisfied by Index This is an “old” example – the IN list would now be “pushed” down to the vertical cursor for IQ 12

Indexes and Predicates Certain Index types are better for certain predicates

Inferred Predicates - 1 • Consider the case: Select count(*) from TABLEwhere TABLE.a = 10 and TABLE.a = TABLE.b • In the above IQ will apply the rules for transitive closure • [if a=b and b=c then a=c] • And add the clause and TABLE.b = 10 • This allows the optimiser another clause to help solve the query

Inferred Predicates - 2 • This can also help a join • Consider the case: Select count(*) from TABLE1, TABLE2, TABLE3where TABLE1.key = TABLE2.key and TABLE2.key = TABLE3.key • IQ will then add the clause and TABLE1.key = TABLE3.key • This allows the optimiser another clause to help solve the query

IN List vs. OR ? • In IQ-M the following statements are processed in the same way Select * from T where T.a IN(1,2,3,4)Select * from T where T.a = 1 or T.a = 2 or T.a =3 or T.a = 4 • This means that there is no real requirement for the removal of “or” processing in SQL • This favors third party tools – as “or” queries are usually easier to write from query generators

Range Predicates • In IQ 11 the following were the same Where table.date >= ‘1-Jan-1997’and table.date < ‘1-Jan-1998’ And Where table.date between ‘1-Jan-1997’ and 31-Dec-1997’ • New to 12 is the new optimization Where table.date >= ‘1-Jan-1997’and table.date < ‘1-Jan-1998’ is the same as Where table.date between ‘1-Jan-1997’ and ‘1-Jan-1998’ excluding rows on ‘1-Jan-1998’ • Again this provides the optimiser with more potential “routes” to solve the query

Predicate Factoring SELECT * FROM T1, T2 WHERE T1.key = T2.key AND (T1.nation = ‘Spain’ AND T1.food = ‘apple’ AND T2.name = ‘Chen’) OR (T1.nation = ‘India’ AND T1.food = ‘apple’ AND T2.name = ‘Chen’ AND T2.car = ‘Fiat’) With the above the optimiser cannot process the local predicates before the join – so we have to do the join, then filter, negating the index advantages of IQ. So the optimiser now can change the query : SELECT * FROM T1, T2 WHERE T1.key = T2.key AND (T1.nation = ‘Spain’ OR (T1.nation = ‘India’ AND T2.car = ‘Fiat’)) AND T1.food = ‘apple’ AND T2.name = ‘Chen’ Now the optimiser can easily decide that food and name can be processed in the indexes.

PredicateParameters • Force_All_Predicates_To_Postfilters (only in 12.4.2) • This forces IN lists to a post filter, if set, this may not be too useful • dml_options2 (bit field option – beware) • Bit 0 set - ALL IN lists are forced to bitmapped indexes • Bit 1 set - ALL IN lists will be forced to FP indexes • Neither 0 or 1 set - will choose based on the index speed and size of IN list

Selectivity - 1 • As we can see from the queries, sometimes the optimiser gets the selectivity wrong • Because it does not have an enumerated index • 1-byte FP, 2-byte FP, LF or HG • Where col1 < 100 and col1 only has a flat FP and HNG • Because the predicate list is complex • Like, substr(), datepart() • Or where there is a correlation between two columns X > 100 and y > 50

Selectivity – 2 • In any of the above cases we can apply a selectivity force Where (x > 200, 22) • This says in the above case 22% of the rows have the value of x > 200 • For like, substr() and datepart() this is wonderful

Correlated Columns - 1 • This is the case where two columns are related, but the optimiser cannot know X > 100 and y > 50 (for example x will always be twice as big as y • In this case the columns may both have a (true) selectivity of 10%, so the optimiser assumes no correlation so the overall selectivity is determined to be 1% (10% of 10%) • This may not (in this case is not) correct

Correlated Columns - 2 • In this case we can write the forced selectivity as (x > 100, 100) and y > 50 • Not we leave the selectivity on y as 10% (the optimiser determined value • And we change the selectivity on x to 100% • The optimiser then determines the correct selectivity for the table as 10%

Sort Elimination 17 Under certain circumstances the optimiser can remove sorts (if the correct ordering applies) 17 15 15 Order A,B 14 Order A,B 14 11 11 12 13 Order A 12 13 Order A,B

Agg. Node Types - Vertical • Vertical Grouping Cursor • grouping is done in indexes • Vertical distinct Cursor • a Select Distinct that can be satisfied by an index • Vertical Aggregation Cursor • This will be generated if a query has certain functions in the select list which can be processed vertically (MAX, MIN)

Agg. Node Types - Group by 1 • Group By Single • Aggregation Function with no Group By that Must have a Unitary Result Set • E.g. COUNT(*) • For COUNT(distinct), and AVG() and SUM() operations a Hashing function used • For horizontal aggregation • count(distinct, brand) … group by store • Group is performed using sort • Count performed by hash

Agg. Node Types - Group by 2 • Explicit group by clauses that are not processed vertically generate a “group by” node • Group by • groups data with a sort • Group By (hash) • groups data with a hashing function

Query Tree Disassembly

Query Tree Disassembly

Presentation Transcript

PC assembly / disassembly workshop

Understanding Simulators Using Tree-Query Language

Product Disassembly

Remanufacturing, disassembly…..

Mining Tree-Query Associations in a Graph

Active Disassembly

Mining for Tree-Query Associations in a Graph

Product Disassembly

Engine Disassembly

ideapad U350 Hardware Disassembly

Disassembly

Assembly / Disassembly Machines

A Query-Based Routing Tree in Sensor Networks

No Disassembly Required!

Spatial Query Processing using the R-tree

Product Disassembly

Code Disassembly Techniques

Understanding Simulators Using Tree-Query Language