Wednesday, April 24, 2013

Performance tuning. Spending time is NOT OK (if you do not know exactly why)

Yet another performance tuning story, similar to one that happened about a month ago. Sad thing - something was was coded years ago we never questioned the time spent... Just to keep it short:
  • once upon a time there was a very time-consuming module.
  • eventually we were forced to take much closer look - WHY is it so time-consuming
  • we found in our own view a function call that was completely unnecessary.
  • this function was very light that initially we ignored it altogether, but later we realized that we've been calling it 80000 times - and in that case even the lightest cost adds up.
  • removing this function (all needed data was already available via joined tables) took the cost of a class from 40 seconds down to 3.
  • Profit! :-)
We knew that the module is extremely complicated and that a lot of database operations was involved - so we assumed that it is absolutely fine! Of course, we've hit a case when assumptions are the worst enemies of a real knowledge... Especially if we are talking about performance tuning - because unless we've proven where exactly we are losing N seconds in the module X, we cannot say that module X is cannot take less than N seconds. Because those N seconds could be related to the different module/outdated coding technique/structural change etc. And only knowing what exactly is going on we can make a decision whether it is OK or not.

Summary: A couple of lessons learned:
  1. If something does not perform well in the existing system - check it (unless it is something known and documented)! You may think that that time loss is inevitable, but who knows? 
  2. Function calls in SELECT statements could cause a lot of issues even if each call is light - you must think not only about the cost of a function call, but about how many times this function will be fired
By the way, at the recent IOUG Collab'13 in the presentation "Top 5 Issues that Cannot be Resolved by DBAs (other than missed bind variables)" I covered the second topic (number of calls) a bit deeper. Below is a snippet from my white-paper:

Number of calls in SELECT clause

There are multiple ways of ensuring that if a function is referenced in the SELECT clause, it is not fired more often than needed. Unfortunately, few developers are even aware of this problem. My recommendation is to include the following set of examples in any PL/SQL class. This explicitly illustrates the difference between real understanding and guessing.

First, set up a basic environment to count total number of calls: a package variable to store the counter, a simple function, and a checker to display/reset the counter:

create package misha_pkg is
    v_nr number:=0;   
end;

create or replace function f_change_tx (i_tx varchar2)
return varchar2 is
begin
    misha_pkg.v_nr:=misha_pkg.v_nr+1;   
    return lower(i_tx);
end;

Create or replace procedure p_check is
begin
     dbms_output.put_line('Fired:'||misha_pkg.v_nr);
     misha_pkg.v_nr:=0;
end;

Second, run a very simple query against table EMP,  where the function above will be applied against EMP.JOB. And let us keep in mind that there are 14 total rows in the table EMP:

SQL> select empno, ename, f_change_tx(job) job_change_tx
  2  from emp;
   ...
14 rows selected.
SQL> exec p_check
Fired:14
PL/SQL procedure successfully completed.

If you just use the function, it will be fired for every row. But we know that there are only 5 distinct JOB values, so we should try to decrease the number of calls. In Oracle 11gR2, there is a very interesting internal operation called “scalar sub-query caching” being used while processing SQL queries. It allows Oracle to internally reuse previously calculated results on SELECT statements if they are called multiple times in the same query. The following example tests to see if using this operation helps:

SQL> select empno, ename, (select f_change_tx(job) from dual)
  2  from emp;
  ...
14 rows selected.
SQL> exec p_check
Fired:5
PL/SQL procedure successfully completed.
SQL>

The result shows that it did help. Now, only five distinct calls are registered, which isexactly as needed. Although, since we are discussing cache, why not use it explicitly? There is another very powerful feature called “PL/SQL function result cache.” The following example enables it on the function while the same query is run two times:

create or replace function f_change_tx (i_tx varchar2)
return varchar2 result_cache is
begin
    misha_pkg.v_nr:=misha_pkg.v_nr+1;   
    return lower(i_tx);
end;

SQL> select empno, ename, f_change_tx(job) from emp;
...
14 rows selected.
SQL> exec p_check
Fired:5
SQL> select empno, ename, f_change_tx(job) from emp;
...
14 rows selected.
SQL> exec p_check
Fired:0

The result is impressive! If the first call matches the sub-query caching, the second call is a fantastic example of great performance tuning – everything works as needed, but nothing is being done (actually, this is not 100% true, since the cache should be retrieved anyway, but for practical purposes it is a very simple PK lookup).

1 comment:

Sayan Malakshinov said...

Would be better to declare such functions as deterministic, because their results will be cached without scalar subqueries. It's easier than to change all queries with these functions.

More about caching: http://orasql.org/category/oracle/deterministic-functions/

Regards,
Sayan Malakshinov