In addition to yesterday's post - a bit of old-timer knowledge share: if you see some features for years, you have a tendency to assume how they work. And often enough to your own surprise "the owls are not what they seem" (c) :)
My favorite story here is about hierarchical queries with CONNECT-BY clause. For years I was wondering why every generation of developers was having various performance issues writing such queries... Recently, while working on the presentation about recursive structures in the database I came to realization of the following: people look up code snippets without going any further!
Let's take a basic problem based on HR schema - list all employees working for the specified department in the hierarchical order starting from its manager. A lot of junior developer would write something like that:
select *
from hr.employees
where department_id = 100
start with job_id like '%MGR%'
connect by prior employee_id = manager_id
Result of the query would be functionally correct - so, developers would start using that pattern... And eventually (when they would start working with large data sets!) that code becomes the source of the major performance degradation! Here is why - let's take a look at the explain plan:
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 34 | 4522 |
|* 1 | FILTER | | | |
|* 2 | CONNECT BY NO FILTERING WITH START-WITH| | | |
| 3 | TABLE ACCESS FULL | EMPLOYEES | 107 | 7383 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("EMPLOYEES"."DEPARTMENT_ID"=100)
2 - access("EMPLOYEES"."MANAGER_ID"=PRIOR "EMPLOYEES"."EMPLOYEE_ID")
filter("JOB_ID" LIKE '%MGR%')
Even though we have lots of indexes on the table, we've got FTS - and if we look closer, we will notice that our department filter was applied at the very end. It seems that Oracle first walked the whole table hierarchically - and after that threw out all rows that do not belong to the department #100!
Indeed non-Join predicates are evaluated only afterwards. In a lot of cases it indeed makes sense - depending on the conditions. But in THIS case, where we
know (and that's the key knowledge that Oracle is missing) that hierarchies never cross departments - although we forgot to tell Oracle about it! So, no wonder, the result is les then optimal.
How to fix it? There are two approaches to solve this problem that both significantly improve performance, but provide slightly different executions plans.
Option #1 is to use in-line views (to be fair, that's my favorite approach, because it makes code way more readable):
select *
from (select *
from hr.employees
where department_id = 100)
start with job_id like '%MGR%'
connect by prior employee_id = manager_id
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 266 |
|* 1 | CONNECT BY NO FILTERING WITH START-WITH| | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED | EMPLOYEES | 6 | 414 |
|* 3 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 6 | |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("EMPLOYEES"."MANAGER_ID"=PRIOR "EMPLOYEES"."EMPLOYEE_ID")
filter("EMPLOYEES"."JOB_ID" LIKE '%MGR%')
3 - access("DEPARTMENT_ID"=100)
As you can see here, explicitly filtering the department #100 in the in-line view immediately dropped number of processed rows and bytes - exactly as needed.
Option #2 is to tell Oracle to start only from rows that belong to the department #100:
select *
from hr.employees
start with job_id like '%MGR%' and department_id = 100
connect by prior employee_id = manager_id
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 7 | 931 |
|* 1 | CONNECT BY WITH FILTERING | | | |
|* 2 | TABLE ACCESS BY INDEX ROWID BATCHED | EMPLOYEES | 1 | 69 |
|* 3 | INDEX RANGE SCAN | EMP_DEPARTMENT_IX | 6 | |
| 4 | NESTED LOOPS | | 6 | 492 |
| 5 | CONNECT BY PUMP | | | |
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEES | 6 | 414 |
|* 7 | INDEX RANGE SCAN | EMP_MANAGER_IX | 6 | |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("EMPLOYEES"."MANAGER_ID"=PRIOR "EMPLOYEES"."EMPLOYEE_ID")
2 - filter("JOB_ID" LIKE '%MGR%')
3 - access("DEPARTMENT_ID"=100)
7 - access("connect$_by$_pump$_002"."prior employee_id "="MANAGER_ID")
It has slightly higher cost an somewhat more complex execution plan - but that way you still have better performance than FTS alternative.
Summary: please, beware - solutions that work may not scale if you don't understand what is going under the hood! Read those manuals, please!