Friday, May 27, 2011

Base64 Encoding in PL/SQL

Just before the holiday weekend it is always nice to tell a good story to cheer everybody up. Here is mine! You know, I've had a couple of good news for you :-) And no, they are not about car insurance - they are about Base64 encoding in PL/SQL:

- with some manual tweaking or parameters it is possible to get a really good performance out of UTL_ENCODE package in both 10g and 11g. In my case - up to 50x performance gain!

And now let's get to details. In the environment I am working right now it just historically happened that we had two different Base64 encoding/decoding procedures (you can google for either of them - there will be  tons of hits on OTN):
- Java-based (classes Base64 and UTLBase64)
     * encodes into 76-char per line Base64 encoding with CHR(10) at the end of the line ~ 77 char/line
- PL/SQL-based (UTL_ENCODE.base64_encode inside).
     * encodes to 64-char per line Base64 encoding with CHR(10)||CHR(13) ~ 66 char/line

Recently we started to use encoded data much more for a number of different reasons - and my developers started to complain about the performance, especially for large data volumes (just FYI - my development server is Dell 6850 4 2-core CPUs, MS Windows 2008, Oracle 10.2.0.5 64 bit) processed by Java-based module.

Eventually I decided to take a step back, and since my area of knowledge is mostly PL/SQL to take a look whether I can make PL/SQL version run faster. Of course, the first suspect in any string processing was the size of buffer - it just seemed strange to me that people (in majority of examples I found online) use single-line size to both operations. It definitely didn't seem optimized. Also a lot of examples use concatenation to generate an output CLOB, which is also a major slow-down. Here are my final version of encode/decode for BLOB:

---------------------------------------------------------------------
function base64EncodeBlob_plsql (i_bl BLOB) 
return clob is
  v_out_cl     CLOB;
  file_len     PLS_INTEGER;
  modulo       PLS_INTEGER;
  pieces       PLS_INTEGER;
  amt          BINARY_INTEGER      := 23808;
  buf          RAW (32767);
  buf_tx       varchar2(32767);
  pos          PLS_INTEGER         := 1;
  filepos      PLS_INTEGER         := 1;
  counter      PLS_INTEGER         := 1;
BEGIN
  DBMS_LOB.createtemporary 
     (v_out_cl, TRUE, dbms_lob.call);
  file_len := DBMS_LOB.getlength (i_bl);
  modulo := MOD (file_len, amt);
  pieces := TRUNC (file_len / amt);

  WHILE (counter <= pieces) LOOP
    DBMS_LOB.READ (i_bl, amt, filepos, buf);
    buf_tx:=utl_raw.cast_to_varchar2
       (UTL_ENCODE.base64_encode (buf));
    dbms_lob.writeappend
       (v_out_cl,length(buf_tx),buf_tx);
    filepos := counter * amt + 1;
    counter := counter + 1;
  END LOOP;

  IF (modulo <> 0) THEN
    DBMS_LOB.READ
      (i_bl, modulo, filepos, buf);
    buf_tx:=utl_raw.cast_to_varchar2
      (UTL_ENCODE.base64_encode (buf));
    dbms_lob.writeappend
      (v_out_cl,length(buf_tx),buf_tx);
  END IF;
  return v_out_cl;
END;

function base64DecodeClobAsBlob_plsql(i_data_cl CLOB) 
return blob is
  v_out_bl blob;
  clob_size number;
  pos number;
  charBuff varchar2(32767);
  dBuffer RAW(32767);
  v_readSize_nr number;
  v_line_nr number;
begin
  dbms_lob.createTemporary
    (v_out_bl, true, dbms_lob.call);
  v_line_nr:=greatest(65, 
                 instr(i_data_cl,chr(10)),
                 instr(i_data_cl,chr(13)));
  v_readSize_nr:=
      floor(32767/v_line_nr)*v_line_nr;
  clob_size := dbms_lob.getLength(i_data_cl);
  pos := 1;

  WHILE (pos < clob_size) LOOP
    dbms_lob.read
      (i_data_cl, v_readSize_nr, pos, charBuff);
    dBuffer := UTL_ENCODE.base64_decode
      (utl_raw.cast_to_raw(charBuff));
    dbms_lob.writeAppend
     (v_out_bl,utl_raw.length(dBuffer),dBuffer);
    pos := pos + v_readSize_nr;
  end loop;
  return v_out_bl;
end;
----------------------------------------------------------------

Performance improvement was just enormous! Let's take the process of encoding of 1 MB BLOB:
- Java version - 5 sec
- unoptimized PL/SQL (48 chars per read) - 11 sec

- optimized PL/SQL (23808 char per read) - 0.1 (50x gain and 110x gain!)

Decoding on 1 MB BLOB:
- Java version - 5 sec
- unoptimized PL/SQL (66 chars per read) - 2 sec
- optimized PL/SQL (32736 char per read) - 0.1  sec (50x and 20x gain).

Now let me explain where I found these magic numbers. For encode the logic is as follows:
- Known:
   * the length of the line is 66 or 64 data chars + 2 end-of-line.
   * 64 Base64 encoded data chars represent 64*(3/4)=48 original data bytes.
   * the result cannot exceed 32K for each processing (stay in the scope of VARCHAR2 datatype).
- Highest data read is calculated in the following way: floor(32767/66)*48 = 23808.

For decode:
- Known:
  * The routine may get different kinds of Base64 encoded strings.
  * for now I decided to support 64-char and 76-char formats (plus end-of-line)
  * these specified formats could generate 65,66,77 or 78 char/line
  * length of the line is identified by the highest of first occurances of CHR(10) / CHR(13) 
  * in when the original volume of data was less then 48 - default 65 char/line is established
  * the result cannot exceed 32K for each processing (stay in the scope of VARCHAR2 datatype). 
  * read should always happen in complete lines
- Highest data read is calculated in the following way: floor(32767/<lineSize>)*<lineSize>

P.s I also run a number of test in 11g (11.2.0.1). Unfortunately, I the hardware on that box is completely different from what I have in my 10g instance, so at the current point I will not post a lot of real numbers (it will be like comparing apples to oranges), but the overall findings are the following:
- Java-version - 0.5 sec
- Optimized PL/SQL - 0.1

So, additional good news - 11g version of JVM is significantly faster than 10g, but my PL/SQL code still outruns it.

P.p.s. I am wondering if anybody from Java-side would like to come up with some optimized Java-code to be loaded into the database. It would be interesting to compare!

P.p.p.s. Yes, I know that JVM is not coming with XE-edition. That's one of the reasons I decided to optimize my PL/SQL version (since Dulcian's BRIM products are compatible with all editions)

Thursday, May 5, 2011

Small thoughts about hierarchical queries

I am just wondering, how many times the question of how exactly does Oracle execute CONNECT BY has been raised? I think, all known database experts had at least something on this :-) I didn't expect this thing to be so popular! Although, it could say something about either the quality of existing books/docs or about the fact very few newcomer ever read aforementioned books/docs.

Knowing a number of currently published authors my guess would be the latter has higher probability. H-m-m-m-m (if anybody reads this blog) - here is a survey: if you are planning to use a new feature would you rather read docs first and try later - or you would first try and check the doc only is something doesn't work as expected? :-)

P.s. I should not break the trend.  Here are my 2c on CONNECT BY.

Tuesday, May 3, 2011

Hierarchical queries

For the last couple of days I am working on the new paper of ODTUG KScope'11 on how to appropriately work with hierarchical data. The reason I decided to open this topic was Oracle's introduction in 11g R2 an extended version of CTE (common table expressions) - or to call it simpler a recursive WITH-clause (here is a set of examples: link, another link). For years it was used in a number of RDBMS implementations (SQL Server, PostgreSQL etc), while Oracle people were using Oracle's own CONNECT-BY clause.

I understand, that compatibility with other platforms is a good thing - but I am still struggling to figure out why anybody in Oracle would care. Me and my co-worker Grigoriy Novikov are running a set of tests and will keep you posted whether there are any performance reasons (doesn't look like - yet), but even architecturally I am not 100% sure that I see the "25-words-or-less" case that would prove the existence of this feature.

To be fair, I know that I am biased - if the problem cannot be solved with simple SQL statement, I immediately build function that would return a result collection as an output and do all my processing in PL/SQL. Maybe this approach is limited - but this code I can give to my junior developers to maintain. Unfortunately, proposed solutions involving CTE (as far as I can see) are above the comprehension point of mere mortals... I appreciate the level of the knowledge of people providing such solutions - but I am still looking for a good "PL/SQL for Dummies"-level example.

To be continued!

Tuesday, April 26, 2011

ASSOCIATE STATISTICS

Oracle RDBMS environment is too too big to know everything - as a result from time to time I just stumble at   some functionality that existed for years but I never heard of it! My today's finding is 'ASSOCIATE STATISTICS' command (more details here).

Briefly, it allows to hard-code some statistics (like cost or cardinalities) to different Oracle objects - columns, functions, packages, indexes etc. And this statistics will directly impact CBO's decisions - as any other statistics! Since in Dulcian we use a lot of PL/SQL, this feature could significantly simplify my tuning efforts in cases when we have functions inside of SQL queries.

Minor issues:
1. It looks like statistics cannot be associated with a separate function in the package - only standalone function. Too bad - will have to have logical wrappers!
2. Statistics can be associated with a package. Hmmm... Not sure that I understand that logic (and I was not able to find meaningful examples).

Will keep you posted on further discoveries - but it seems to be a nice add-on to my list of tricks (available in both 10g and 11g!)

Thursday, April 21, 2011

Danger of explicit conversions

At the last conference (Collab'11) I've had a discussion with one of the speakers about the danger of explicit conversions in Oracle (like TO_CHAR(SYSDATE) without a format mask or direct concatenations). Strange enough, from his experience still too many people are trying to save a couple of extra key-strokes - and eventually get themselves into major troubles.

IMHO, in the environment where the process of code review is reasonably common, any code with such conversions should not pass evaluation at all! There is a number of good reasons (and here I could've typed 2-3 pages)... My favorite one is that explicit conversions open an additional hole for SQL injections. Here is a sample:
SQL> alter session set  NLS_Date_Format = '"'''' or 1=1--';
Session altered.
SQL> declare
  2      v_dt date:=sysdate;
  3      v_count_nr number;
  4
  5      v_sql_tx varchar2(4000);
  6  begin
  7      v_sql_tx:='select count(*) from scott.emp where hiredate>'||v_dt;
  8
  9      dbms_output.put_line('Query:'||v_sql_tx);
 10      execute immediate v_sql_tx into v_count_nr;
 11
 12      dbms_output.put_line('Count:'||v_count_nr);
 13  end;
 14  /
Query:select count(*) from scott.emp where hiredate>'' or 1=1--
Count:14
PL/SQL procedure successfully completed.
SQL>

You may notice a very strange date format - and that's exactly how the SQL injection happens! Concatenation of a DATE attribute (line 7) to a string under the hood fires TO_CHAR(V_DT) with the default format mask that was just overridden by ALTER SESSION. As a result, the query shows the total count of rows in the EMP table (14) when it should return 0! 

Good enough reason? I would think so.

Wednesday, April 20, 2011

Oracle Bug

One of my developers just noticed a strange behavior in all versions of Oracle 10g that we could test (10.2.0.3 on Win32, 10.2.0.5 on Win64).


Here is a simplified case:
REPLACE command is fired against a CLOB variable where we are trying to change 'T_' to 'ZZ'. In the CLOB there is a part of text that looks like 'T('.

To my own surprise, Oracle replaces both 'T_' and 'T(' with 'ZZ': it looks like underscore is being interpreted as a wild-card! By the way, this behavior is completely gone in 11gR2.

Code sample:
declare
    v1_cl CLOB := 'T_T(a number)';
begin
    v1_cl := replace (v1_cl, 'T_', 'ZZ');
    dbms_output.put_line(v1_cl);
end;

Received result: ZZZZa number)
Expected result: ZZT(a number)


Update (thanks to Grigoriy Novikov): It is not a feature - it is a real Oracle bug (Metalink Bug ID:4598943), first detected in 9.2.0.6 and fixed in 11g. So, please, be aware of it!

Tuesday, April 19, 2011

New book!

After a couple of years of staying outside of publishing business I decided to make something new "in paper". Although, getting a whole new book was too much for me to survive right now, but Jonathan Gennick from APress came with an idea of "Expert PL/SQL practices" - one chapter per author.

And looks like the chances are very high that around OpenWorld'11 you could have such book printed! It even got a link on APress web-site and a preliminary cover:
http://www.apress.com/databases/oracle/9781430234852




P.s.  Hmm... My first book ("PL/SQL for Dummies", Wiley, 2006) is still getting enough people interested: today in the morning it was in Top-10 Oracle books on Amazon!
http://www.amazon.com/Oracle-SQL-Dummies-Michael-Rosenblum/dp/0764599577/ref=zg_bs_4092_10