Monday, March 25, 2013

Explaining the miracle

Last week I've got a pretty interesting case of why I so much love doing performance tuning: you take the task, spend 15-20 minutes, "do something"(c), present results to the waiting audience - and proudly walk back to your desk while the whole room is staring at your back in disbelief! :-)

Funny part of the whole process - the fix was so obvious that the biggest wonder was: why it wasn't done before... But to be fair, this performance problem was so typical, that I decided to write a special blog post about it from the technical side (since Dr. Paul Dorsey already covered its architectural aspect).

The whole story started in the days of Oracle 9i and JDeveloper 3.x when we've been using BC4J components (now ADF/BC). Since we had a lot of logical transformations between the repository in the database and its visual representation, we introduced a layer of updatable views (with INSTEAD-OF triggers).

BC4J had a very explicit requirement - to be able to do DML against a specified data source, it must have a primary key. Since at any point of time we were working with a limited set of data, I decided that it is safe to introduce composite columns in views serving as primary keys (yes, I tested performance impact - it was negligible). For example, I have a view that displays attributes of the table EMP located in the diagram TEST, but the same table is also located in the diagram DEV. It means that to uniquely identify an attribute I need to concatenate: DiagramID||'_'||AttributeId - only this concatenation is a real primary key.

Years passed, a lot of other people were working with the project, a lot of new functionality added on the top. Unfortunately, at some point developers started to use those original database views as "black boxes" (yes, it is true that they were very complex inside), without understanding (a) why such views were build (b) what was going on inside. As a result they started to join those synthetic views, sometimes even by composite columns.

I don't want to sound too "old-style", but this "black box" approach becomes more and more popular in the current OO-driven IT world: everything is considered a service with some number of ins and outs. But if you have a lot of closely related services - I can give you about 100% warranty that they have internal overlaps and as a result you pay heavy performance price for the comfort of using them. It is true that in some cases this price is acceptable - but in a lot of cases it isn't! And even more importantly - you cannot use "black box" approach if what you applying it to is NOT A BLACK BOX!

Let me explain the very last sentence: unless you very explicitly say so with hint Oracle CBO processes your request as a whole, unnesting all views into a single query. Maybe each of those views is tuned to properly work standalone - but it is very hard to guess how CBO would decide to process them together. And let's not forget about one more thing - because views were designed to work on a limited data set, they contain significant number of PL/SQL functions in the SELECT clause. But we all know that if Oracle rewrite the query, total number of function calls can significantly change.

Overall, not a big surprise: wrong execution plan + thousands of function calls = a lot of time wasted:

call     count       cpu    elapsed       disk      query
------- ------  -------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0
Execute    142      1.96       1.99          0          0
Fetch      154     42.04      42.14          0     174410
------- ------  -------- ---------- ---------- ----------
total      297     44.00      44.14          0     174410

My solution was very strait-forward: break "black boxes" and write a query that would return exactly what's needed here and now:

call     count       cpu    elapsed       disk      query
------- ------  -------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0
Execute    142      0.01       0.03          0          0
Fetch      154      0.20       0.18          1       3266
------- ------  -------- ---------- ---------- ----------
total      297      0.21       0.22          1       3266

Yeah, from 44 seconds to 0.22... I like 200 times improvement!

Summary:
  • action: took two crazy views and rewrote it into a simple query
  • result: got 200 times improvement and fixed key performance bottleneck
  • key thought: views cannot be considered "black boxes" under any circumstances - Oracle just does not work this way!
  •  

No comments: