Monday, October 24, 2011

UTF8 vs AL32UTF8

To be fair, I am one of reasonably few people who like Oracle documentation - you indeed can find there everything needed. But... Sometimes it will be either in the unexpected place, or not explicit enough, or not very clear at all. Here is an illustration of such case. 

The beginning of this story is pretty simple - Friday, around 4 p.m. People are already day-dreaming about their weekend plans. And suddenly our main development database starts to behave really funky. Our senior web-architect eventually called me to show that in one of the newly created modules the whole session just crushes with ORA-600 after the call to XML Parser (we've been building XML document manually as a CLOB and than converting it into XMLDocument). Alternative via XMLType.CreateXML completely froze the whole database... 

After about an hour of pretty painful debugging the problem was found - and not where we've been expecting. The exact match was found in Oracle's Database Companion CD installation Guide:
-----------------------------------------------
AL32UTF8 is the Oracle Database character set that is appropriate for XMLType data. It is equivalent to the IANA registered standard UTF-8 encoding, which supports all valid XML characters.

Do not confuse the Oracle Database database character set UTF8 (no hyphen) with the database character set AL32UTF8 or with character encoding UTF-8. Database character set UTF8 has been superseded by AL32UTF8. Do not use UTF8 for XML data. UTF8 supports only Unicode version 3.1 and earlier; it does not support all valid XML characters. AL32UTF8 has no such limitation.

Using database character set UTF8 for XML data could potentially cause a fatal error or affect security negatively.
-----------------------------------------------

"Fatal errors" indeed! We immediately switched from:
 <?xml version="1.0" encoding="UTF-8"?>
to
 <?xml version="1.0" encoding="AL32UTF8"?>

And our problem was solved! Friday, 5 pm... Perfect timing!

No comments: