Monday, September 27, 2010

Using EXPLAIN PLAN and TKPROF To Tune Your Applications

Consider the following query and execution plan:
SELECT a.customer_name, COUNT (DISTINCT b.invoice_id) "Open Invoices",
COUNT (c.invoice_id) "Open Invoice Items"
FROM customers a, invoices b, invoice_items c
WHERE b.invoice_status = 'OPEN'
AND a.customer_id = b.customer_id
AND c.invoice_id (+) = b.invoice_id
GROUP BY a.customer_name;

ID PARENT OPERATION OBJECT_NAME
---- ------ ----------------------------------- ------------------------------
0 SELECT STATEMENT
1 0 SORT GROUP BY
2 1 NESTED LOOPS OUTER
3 2 HASH JOIN
4 3 TABLE ACCESS BY INDEX ROWID INVOICES
5 4 INDEX RANGE SCAN INVOICES_STATUS
6 3 TABLE ACCESS FULL CUSTOMERS
7 2 INDEX RANGE SCAN INVOICE_ITEMS_PK
This execution plan is more complex than the previous two, and here you can start to get a feel for the way in which complex operations get broken down into simpler subordinate operations. To execute this query, the database server will do the following: First Oracle will perform a range scan on the invoices_status index to get the ROWIDs of all rows in the invoices table with the desired status. For each ROWID found, the record from the invoices table will be fetched.

This set of invoice records will be set aside for a moment while the focus turns to the customers table. Here, Oracle will fetch all customers records with a full table scan. To perform a hash join between the invoices and customers tables, Oracle will build a hash from the customer records and use the invoice records to probe the customer hash.
Next, a nested loops join will be performed between the results of the hash join and the invoice_items_pk index. For each row resulting from the hash join, Oracle will perform a unique scan of the invoice_items_pk index to find index entries for matching invoice items. Note that Oracle gets everything it needs from the index and doesn’t even need to access the invoice_items table at all. Also note that the nested loops operation is an outer join. A sort operation for the purposes of grouping is performed on the results of the nested loops operation in order to complete the SELECT statement.

It is interesting to note that Oracle choose to use a hash join and a full table scan on the customers table instead of the more traditional nested loops join. In this database there are many invoices and a relatively small number of customers, making a full table scan of the customers table less expensive than repeated index lookups on the customers_pk index. But suppose the customers table was enormous and the relative number of invoices was quite small. In that scenario a nested loops join might be better than a hash join. Examining the execution plan allows you to see which join method Oracle is using. You could then apply optimizer hints to coerce Oracle to use alternate methods and compare the performance.

You may wonder how I got that whole detailed explanation out of the eight line execution plan listing shown above. Did I read anything into the execution plan? No! It’s all there! Understanding the standard inputs and outputs of each type of operation and coupling this with the indenting is key to reading an execution plan.

A nested loops join operation always takes two inputs: For every row coming from the first input, the second input is executed once to find matching rows. A hash join operation also takes two inputs: The second input is read completely once and used to build a hash. For each row coming from the first input, one probe is performed against this hash. Sorting operations, meanwhile, take in one input. When the entire input has been read, the rows are sorted and output in the desired order.

Now let’s look at a query with a more complicated execution plan:
SELECT customer_name
FROM customers a
WHERE EXISTS
(
SELECT 1
FROM invoices_view b
WHERE b.customer_id = a.customer_id
AND number_of_lines > 100
)
ORDER BY customer_name;

ID PARENT OPERATION OBJECT_NAME
---- ------ ----------------------------------- ------------------------------
0 SELECT STATEMENT
1 0 SORT ORDER BY
2 1 FILTER
3 2 TABLE ACCESS FULL CUSTOMERS
4 2 VIEW INVOICES_VIEW
5 4 FILTER
6 5 SORT GROUP BY
7 6 NESTED LOOPS
8 7 TABLE ACCESS BY INDEX ROWID INVOICES
9 8 INDEX RANGE SCAN INVOICES_CUSTOMER_ID
10 7 INDEX RANGE SCAN INVOICE_ITEMS_PK

This execution plan is somewhat complex because the query includes a subquery that the optimizer could not rewrite as a simple join, and a view whose definition could not be merged into the query. The definition of the invoices_view view is as follows:

CREATE OR REPLACE VIEW invoices_view
AS
SELECT a.invoice_id, a.customer_id, a.invoice_date, a.invoice_status,
a.invoice_number, a.invoice_type, a.total_amount,
COUNT(*) number_of_lines
FROM invoices a, invoice_items b
WHERE b.invoice_id = a.invoice_id
GROUP BY a.invoice_id, a.customer_id, a.invoice_date, a.invoice_status,
a.invoice_number, a.invoice_type, a.total_amount;

Here is what this execution plan says: Oracle will execute this query by reading all rows from the customers table with a full table scan. For each customer record, the invoices_view view will be assembled as a filter and the relevant contents of the view will be examined to determine whether the customer should be part of the result set or not.

Oracle will assemble the view by performing an index range scan on the invoices_customer_id index and fetching the rows from the invoices table containing one specific customer_id. For each invoice record found, the invoice_items_pk index will be range scanned to get a nested loops join of invoices to their invoice_items records. The results of the join are sorted for grouping, and then groups with 100 or fewer invoice_items records are filtered out.

What is left at the step with ID 4 is a list of invoices for one specific customer that have more than 100 invoice_items records associated. If at least one such invoice exists, then the customer passes the filter at the step with ID 2. Finally, all customer records passing this filter are sorted for correct ordering and the results are complete.

Note that queries involving simple views will not result in a “view” operation in the execution plan. This is because Oracle can often merge a view definition into the query referencing the view so that the table accesses required to implement the view just become part of the regular execution plan. In this example, the GROUP BY clause embedded in the view foiled Oracle’s ability to merge the view into the query, making a separate “view” operation necessary in order to execute the query.

Also note that the filter operation can take on a few different forms. In general, a filter operation is where Oracle looks at a set of candidate rows and eliminates some based on certain criteria. This criteria could involve a simple test such as number_of_lines > 100 or it could be an elaborate subquery.

In this example, the filter at step ID 5 takes only one input. Here Oracle evaluates each row from the input one at a time and either adds the row to the output or discards it as appropriate. Meanwhile, the filter at step ID 2 takes two inputs. When a filter takes two inputs, Oracle reads the rows from the first input one at a time and executes the second input once for each row. Based on the results of the second input, the row from the first input is either added to the output or discarded.

Oracle is able to perform simple filtering operations while performing a full table scan. Therefore, a separate filter operation will not appear in the execution plan when Oracle performs a full table scan and throws out rows that don’t satisfy a WHERE clause. Filter operations with one input commonly appear in queries with view operations or HAVING clauses, while filter operations with multiple inputs will appear in queries with EXISTS clauses.


An important note about execution plans and subqueries: When a SQL statement involves subqueries, Oracle tries to merge the subquery into the main statement by using a join. If this is not feasible and the subquery does not have any dependencies or references to the main query, then Oracle will treat the subquery as a completely separate statement from the standpoint of developing an execution plan—almost as if two separate SQL statements were sent to the database server. When you generate an execution plan for a statement that includes a fully autonomous subquery, the execution plan may not include the operations for the subquery. In this situation, you need to generate an execution plan for the subquery separately.

No comments:

Post a Comment