Thursday, May 5, 2011

Small thoughts about hierarchical queries

I am just wondering, how many times the question of how exactly does Oracle execute CONNECT BY has been raised? I think, all known database experts had at least something on this :-) I didn't expect this thing to be so popular! Although, it could say something about either the quality of existing books/docs or about the fact very few newcomer ever read aforementioned books/docs.

Knowing a number of currently published authors my guess would be the latter has higher probability. H-m-m-m-m (if anybody reads this blog) - here is a survey: if you are planning to use a new feature would you rather read docs first and try later - or you would first try and check the doc only is something doesn't work as expected? :-)

P.s. I should not break the trend.  Here are my 2c on CONNECT BY.
I finally realized what exactly happens in the following query

-- average idea
select *
from (select *
      from OrgTreeDetail
      where OrgTree_OID = :1)
connect by OrgTreeDtl_RFK = prior OrgTreeDtl_OID
start with OrgTreeDtl_RFK is null

This kind of query where the task is to access only one tree from the existing set is very popular, but unfortunately very misused. About 99% of beginners would usually write the query like that:
-- bad idea
select *
from OrgTreeDetail
where OrgTree_OID = :1
connect by OrgTreeDtl_RFK = prior OrgTreeDtl_OID
start with OrgTreeDtl_RFK is null

Unfortunately this works fine on the small number of rows - and nobody notices Full-Table-Scan on OrgTreeDetail (caused by the fact that WHERE clause with predicates is applied AFTER the process of building of the tree). Minor problem - if OrgTreeDetail contains 2M rows, this query would take about 2 minutes. That performance was definitely not acceptable, so long time ago I came with the original SELECT-from-SELECT case (dropping processing time to 0.1 sec).

Only now I finally had time to take a closer look. Apparently, Oracle rewrites my query like this:
-- Oracle's idea
select *
from OrgTreeDetail
connect by OrgTreeDtl_RFK = prior OrgTreeDtl_OID 
       and OrgTree_OID = :1
start with OrgTreeDtl_RFK is null 
       and OrgTree_OID = :1

And now it becomes clear what is going on - by using double condition Oracle can quickly find the root node of my requested tree without doing FTS. And since I have indexes on all involved columns - everything after that becomes straightforward. Yeah... Kind of! :-)

No comments: