- with some manual tweaking or parameters it is possible to get a really good performance out of UTL_ENCODE package in both 10g and 11g. In my case - up to 50x performance gain!
And now let's get to details. In the environment I am working right now it just historically happened that we had two different Base64 encoding/decoding procedures (you can google for either of them - there will be tons of hits on OTN):
- Java-based (classes Base64 and UTLBase64)
* encodes into 76-char per line Base64 encoding with CHR(10) at the end of the line ~ 77 char/line
- PL/SQL-based (UTL_ENCODE.base64_encode inside).
* encodes to 64-char per line Base64 encoding with CHR(10)||CHR(13) ~ 66 char/line
Recently we started to use encoded data much more for a number of different reasons - and my developers started to complain about the performance, especially for large data volumes (just FYI - my development server is Dell 6850 4 2-core CPUs, MS Windows 2008, Oracle 10.2.0.5 64 bit) processed by Java-based module.
Eventually I decided to take a step back, and since my area of knowledge is mostly PL/SQL to take a look whether I can make PL/SQL version run faster. Of course, the first suspect in any string processing was the size of buffer - it just seemed strange to me that people (in majority of examples I found online) use single-line size to both operations. It definitely didn't seem optimized. Also a lot of examples use concatenation to generate an output CLOB, which is also a major slow-down. Here are my final version of encode/decode for BLOB:
---------------------------------------------------------------------
function base64EncodeBlob_plsql (i_bl BLOB)
return clob is
v_out_cl CLOB;
file_len PLS_INTEGER;
modulo PLS_INTEGER;
pieces PLS_INTEGER;
amt BINARY_INTEGER := 23808;
buf RAW (32767);
buf_tx varchar2(32767);
pos PLS_INTEGER := 1;
filepos PLS_INTEGER := 1;
counter PLS_INTEGER := 1;
BEGIN
DBMS_LOB.createtemporary
(v_out_cl, TRUE, dbms_lob.call);
file_len := DBMS_LOB.getlength (i_bl);
modulo := MOD (file_len, amt);
pieces := TRUNC (file_len / amt);
DBMS_LOB.READ (i_bl, amt, filepos, buf);
buf_tx:=utl_raw.cast_to_varchar2
(UTL_ENCODE.base64_encode (buf));
dbms_lob.writeappend
(v_out_cl,length(buf_tx),buf_tx);
filepos := counter * amt + 1;
counter := counter + 1;
END LOOP;
DBMS_LOB.READ
(i_bl, modulo, filepos, buf);
buf_tx:=utl_raw.cast_to_varchar2
(UTL_ENCODE.base64_encode (buf));
dbms_lob.writeappend
(v_out_cl,length(buf_tx),buf_tx);
END IF;
return v_out_cl;
END;
function base64DecodeClobAsBlob_plsql(i_data_cl CLOB)
return blob is
return blob is
v_out_bl blob;
clob_size number;
pos number;
charBuff varchar2(32767);
dBuffer RAW(32767);
v_readSize_nr number;
v_line_nr number;
begin
dbms_lob.createTemporary
(v_out_bl, true, dbms_lob.call);
(v_out_bl, true, dbms_lob.call);
v_line_nr:=greatest(65,
instr(i_data_cl,chr(10)),
instr(i_data_cl,chr(13)));
v_readSize_nr:=
floor(32767/v_line_nr)*v_line_nr;
floor(32767/v_line_nr)*v_line_nr;
clob_size := dbms_lob.getLength(i_data_cl);
pos := 1;
WHILE (pos < clob_size) LOOP
dbms_lob.read
(i_data_cl, v_readSize_nr, pos, charBuff);
(i_data_cl, v_readSize_nr, pos, charBuff);
dBuffer := UTL_ENCODE.base64_decode
(utl_raw.cast_to_raw(charBuff));
dbms_lob.writeAppend
(v_out_bl,utl_raw.length(dBuffer),dBuffer);
pos := pos + v_readSize_nr;
end loop;
return v_out_bl;
end;
----------------------------------------------------------------
Performance improvement was just enormous! Let's take the process of encoding of 1 MB BLOB:
- Java version - 5 sec
- unoptimized PL/SQL (48 chars per read) - 11 sec
- optimized PL/SQL (23808 char per read) - 0.1 (50x gain and 110x gain!)
Decoding on 1 MB BLOB:
- Java version - 5 sec
- unoptimized PL/SQL (66 chars per read) - 2 sec
- optimized PL/SQL (32736 char per read) - 0.1 sec (50x and 20x gain).
Now let me explain where I found these magic numbers. For encode the logic is as follows:
- Known:
* the length of the line is 66 or 64 data chars + 2 end-of-line.
* 64 Base64 encoded data chars represent 64*(3/4)=48 original data bytes.
* 64 Base64 encoded data chars represent 64*(3/4)=48 original data bytes.
* the result cannot exceed 32K for each processing (stay in the scope of VARCHAR2 datatype).
- Highest data read is calculated in the following way: floor(32767/66)*48 = 23808.
- Highest data read is calculated in the following way: floor(32767/66)*48 = 23808.
For decode:
- Known:
* The routine may get different kinds of Base64 encoded strings.
* for now I decided to support 64-char and 76-char formats (plus end-of-line)
* these specified formats could generate 65,66,77 or 78 char/line
* length of the line is identified by the highest of first occurances of CHR(10) / CHR(13)
* in when the original volume of data was less then 48 - default 65 char/line is established
* the result cannot exceed 32K for each processing (stay in the scope of VARCHAR2 datatype).
* read should always happen in complete lines
- Highest data read is calculated in the following way: floor(32767/<lineSize>)*<lineSize>
P.s I also run a number of test in 11g (11.2.0.1). Unfortunately, I the hardware on that box is completely different from what I have in my 10g instance, so at the current point I will not post a lot of real numbers (it will be like comparing apples to oranges), but the overall findings are the following:
- Java-version - 0.5 sec
- Java-version - 0.5 sec
- Optimized PL/SQL - 0.1
So, additional good news - 11g version of JVM is significantly faster than 10g, but my PL/SQL code still outruns it.
P.p.s. I am wondering if anybody from Java-side would like to come up with some optimized Java-code to be loaded into the database. It would be interesting to compare!
P.p.p.s. Yes, I know that JVM is not coming with XE-edition. That's one of the reasons I decided to optimize my PL/SQL version (since Dulcian's BRIM products are compatible with all editions)