- I have a table that contains a DATE column (HIREDATE)
- Input parameter is also a date (IN_DT)
- For each department I need to show all rows that have HIREDATE after IN_DT plus the most recent row from the past (only one!).
Obviously, if you split the last condition into two, the solution is reasonably clear:
SELECT empno, ename, deptno, hiredate
FROM scott.emp
WHERE hiredate>:IN_DT
UNION ALL
SELECT empno, ename, deptno, hiredate
FROM (
SELECT e.*,
row_number()
over(PARTITION BY deptno
ORDER BY hiredate desc,
empno desc
) rank_nr
FROM scott.emp e
WHERE hiredate<=:IN_DT
)
WHERE rank_nr = 1
The only problem of this solution - it needs two passed over EMP table. But in my production case that table is huge. Also, just to make life interesting, the result of the specified query has to be joined to 8 other tables to provide the required output. It is understandable, that my first optimization idea was to somehow do a single-pass query. To my own surprise, Oracle analytic functions are flexible enough to help me out:
SELECT empno, ename, deptno, hiredate, rank_nr
FROM
(
SELECT e.*,
dense_rank()
over(PARTITION BY deptno
ORDER BY
CASE WHEN hiredate > :IN_DT THEN '9999'
ELSE to_char(hiredate,'YYYYMMDDHH24MISS')||
lpad(empno,10,'0')
END desc
) rank_nr,
max(hiredate)
over(
PARTITION BY deptno) max_dt
FROM scott.emp e
)
WHERE max_dt > :IN_DT AND rank_nr IN (1,2)
OR
max_dt < = :IN_DT AND rank_nr = 1
In this solution I two analytical functions
1. DENSE_RANK is forced to order in the following way:
- bring all future dates to the top by assigning them exactly the same value '9999'
1. DENSE_RANK is forced to order in the following way:
- bring all future dates to the top by assigning them exactly the same value '9999'
- order all other dates by converting them to 'YYYMMDDHH24MISS' format. In that format DESC would bring the most recent date up anyways.
- to resolve conflicts of multiple entries associated with the same date concatenate past dates with padded EMPNO. This way textual sort would correctly work even if the length of EMPNO may differ.
2. MAX() OVER() brings the highest hire date for each department.
Now the outer SELECT has enough information to filter out correct rows: if the highest date if after the parameter, you need to bring ranks 1 and 2 (all future dates + the last past date). Otherwise you need only rank 1 (last past date).
I understand, that this solution may take a couple of seconds to wrap your head around :-). Still - I was able to do exactly what I needed in a single pass!
Summary: if you are working on reports, analytic function is the MUST. That's the only safe way to write efficient code that would consistently work over large data sets.
Summary: if you are working on reports, analytic function is the MUST. That's the only safe way to write efficient code that would consistently work over large data sets.
No comments:
Post a Comment