I wonder...: Base64 Encoding in PL/SQL

Just before the holiday weekend it is always nice to tell a good story to cheer everybody up. Here is mine! You know, I've had a couple of good news for you :-) And no, they are not about car insurance - they are about Base64 encoding in PL/SQL:

- with some manual tweaking or parameters it is possible to get a really good performance out of UTL_ENCODE package in both 10g and 11g. In my case - up to 50x performance gain!

And now let's get to details. In the environment I am working right now it just historically happened that we had two different Base64 encoding/decoding procedures (you can google for either of them - there will be tons of hits on OTN):
- Java-based (classes Base64 and UTLBase64)
* encodes into 76-char per line Base64 encoding with CHR(10) at the end of the line ~ 77 char/line
- PL/SQL-based (UTL_ENCODE.base64_encode inside).
* encodes to 64-char per line Base64 encoding with CHR(10)||CHR(13) ~ 66 char/line

Recently we started to use encoded data much more for a number of different reasons - and my developers started to complain about the performance, especially for large data volumes (just FYI - my development server is Dell 6850 4 2-core CPUs, MS Windows 2008, Oracle 10.2.0.5 64 bit) processed by Java-based module.

Eventually I decided to take a step back, and since my area of knowledge is mostly PL/SQL to take a look whether I can make PL/SQL version run faster. Of course, the first suspect in any string processing was the size of buffer - it just seemed strange to me that people (in majority of examples I found online) use single-line size to both operations. It definitely didn't seem optimized. Also a lot of examples use concatenation to generate an output CLOB, which is also a major slow-down. Here are my final version of encode/decode for BLOB:

---------------------------------------------------------------------
function base64EncodeBlob_plsql (i_bl BLOB)
return clob is
v_out_cl CLOB;
file_len PLS_INTEGER;
modulo PLS_INTEGER;
pieces PLS_INTEGER;
amt BINARY_INTEGER := 23808;
buf RAW (32767);
buf_tx varchar2(32767);
pos PLS_INTEGER := 1;
filepos PLS_INTEGER := 1;
counter PLS_INTEGER := 1;
BEGIN
DBMS_LOB.createtemporary
(v_out_cl, TRUE, dbms_lob.call);
file_len := DBMS_LOB.getlength (i_bl);
modulo := MOD (file_len, amt);
pieces := TRUNC (file_len / amt);

WHILE (counter <= pieces) LOOP
DBMS_LOB.READ (i_bl, amt, filepos, buf);
buf_tx:=utl_raw.cast_to_varchar2
(UTL_ENCODE.base64_encode (buf));
dbms_lob.writeappend
(v_out_cl,length(buf_tx),buf_tx);
filepos := counter * amt + 1;
counter := counter + 1;
END LOOP;

IF (modulo <> 0) THEN
DBMS_LOB.READ
(i_bl, modulo, filepos, buf);
buf_tx:=utl_raw.cast_to_varchar2
(UTL_ENCODE.base64_encode (buf));
dbms_lob.writeappend
(v_out_cl,length(buf_tx),buf_tx);
END IF;
return v_out_cl;
END;

function base64DecodeClobAsBlob_plsql(i_data_cl CLOB)
return blob is

v_out_bl blob;

clob_size number;

pos number;

charBuff varchar2(32767);

dBuffer RAW(32767);

v_readSize_nr number;

v_line_nr number;

begin

dbms_lob.createTemporary
(v_out_bl, true, dbms_lob.call);

v_line_nr:=greatest(65,

instr(i_data_cl,chr(10)),

instr(i_data_cl,chr(13)));

v_readSize_nr:=
floor(32767/v_line_nr)*v_line_nr;

clob_size := dbms_lob.getLength(i_data_cl);

pos := 1;

WHILE (pos < clob_size) LOOP

dbms_lob.read
(i_data_cl, v_readSize_nr, pos, charBuff);

dBuffer := UTL_ENCODE.base64_decode

(utl_raw.cast_to_raw(charBuff));

dbms_lob.writeAppend

(v_out_bl,utl_raw.length(dBuffer),dBuffer);

pos := pos + v_readSize_nr;

end loop;

return v_out_bl;

end;

----------------------------------------------------------------

Performance improvement was just enormous! Let's take the process of encoding of 1 MB BLOB:

- Java version - 5 sec

- unoptimized PL/SQL (48 chars per read) - 11 sec

- optimized PL/SQL (23808 char per read) - 0.1 (50x gain and 110x gain!)

Decoding on 1 MB BLOB:

- Java version - 5 sec

- unoptimized PL/SQL (66 chars per read) - 2 sec

- optimized PL/SQL (32736 char per read) - 0.1 sec (50x and 20x gain).

Now let me explain where I found these magic numbers. For encode the logic is as follows:

- Known:

* the length of the line is 66 or 64 data chars + 2 end-of-line.
* 64 Base64 encoded data chars represent 64*(3/4)=48 original data bytes.

* the result cannot exceed 32K for each processing (stay in the scope of VARCHAR2 datatype).
- Highest data read is calculated in the following way: floor(32767/66)*48 = 23808.

For decode:

- Known:

* The routine may get different kinds of Base64 encoded strings.

* for now I decided to support 64-char and 76-char formats (plus end-of-line)

* these specified formats could generate 65,66,77 or 78 char/line

* length of the line is identified by the highest of first occurances of CHR(10) / CHR(13)

* in when the original volume of data was less then 48 - default 65 char/line is established

* the result cannot exceed 32K for each processing (stay in the scope of VARCHAR2 datatype).

* read should always happen in complete lines

- Highest data read is calculated in the following way: floor(32767/<lineSize>)*<lineSize>

P.s I also run a number of test in 11g (11.2.0.1). Unfortunately, I the hardware on that box is completely different from what I have in my 10g instance, so at the current point I will not post a lot of real numbers (it will be like comparing apples to oranges), but the overall findings are the following:
- Java-version - 0.5 sec

- Optimized PL/SQL - 0.1

So, additional good news - 11g version of JVM is significantly faster than 10g, but my PL/SQL code still outruns it.

P.p.s. I am wondering if anybody from Java-side would like to come up with some optimized Java-code to be loaded into the database. It would be interesting to compare!

P.p.p.s. Yes, I know that JVM is not coming with XE-edition. That's one of the reasons I decided to optimize my PL/SQL version (since Dulcian's BRIM products are compatible with all editions)

5 comments:

Nashev said...: Great, thanks!; June 17, 2013 at 8:11 AM
Unknown said...: THANKS A LOT - that's the first solution I found which deals correct with clob larger than 32k; February 15, 2014 at 2:14 PM
Unknown said...: Great job!!! Thanks a lot; January 31, 2015 at 4:37 AM
Rooshen said...: This comment has been removed by the author.; November 16, 2017 at 5:18 PM
Rooshen said...: Thank you very much Michael :).. i was struggling with this, your values really seem to help, and the performance improvement is crazy :D

Thank you again :D; November 16, 2017 at 5:19 PM

Friday, May 27, 2011

Base64 Encoding in PL/SQL

5 comments: