Monday, May 20, 2024

The Power of Proactive Monitoring. Part 3

After talking for a while about some abstract ideas - let's get a bit more technical and a bit closer to Oracle. Before we really talk about the "proactive" part - we should be a bit more realistic. In about 90% of cases the need for monitoring is only being recognized after the fact that something is wrong. As a result the problem of finding "who"who's-done-it" is complicated by not having that much forensic data to work with.

Luckily, in Oracle technology space some of the things are being recorded by Oracle background processes no matter what, and ASH / AWR could help at least to narrow down the area to dig (sometimes). But let's for now not get into the DBA weeds and stay in the developer's realm.

Assume, you can replicate the problem (that's often a big "if") and assume that you can have access to the top level of the spaghetti code you inherited from 30+ years of development. From my experience, the entry point is usually detectable - but nobody living knows what exactly happens inside of the "black box".

Luckily #2, Oracle engineers left us instrumentation to access their internal mechanisms - in my case, calls to SQL and PL/SQL as one program units call the other. That access in implemented via one of the most powerful (and, unfortunately, underutilized) features - Oracle Hierarchical Profiler (HProf). That feature is available since Oracle 11 R1 and supersedes Oracle Profiler (notice the omission of the word "hierarchical"!) . In the las couple of years I made a number of presentations about that product, but I feel that sometimes articles are a bit more helpful (yes, I still can't say the podcast - sorry, I just cant adjust to the video style). This and the following post(s?) will cover some basic ideas why YOU should care about HProf.

Let's first define a use-case: the database is doing "something" and that "something" takes too much time. And key caveat: somebody indeed has proven that it is the database that is slow - not data transmission, not the middle tier, not the client. Remember the classical image that my esteemed guru Dr. Paul Dorsey came up years ago?


Users are always complaining (that's their nature) - and it is your responsibility as a system architect to understand that there are soooo many places where you can lose some critical resources, including time (well, and money!) But only when you think in terms of a logical action (rather than technical call) you can really tune your system:
  • it doesn't matter how fast is your database if there are tens of thousands of calls from the middle-tier to accomplish a single activity - the time will be spent closing/opening sessions. And only if you can hierarchically step up to the point where the activity is initiated you can see the "big picture"
  • it doesn't matter how slow is your database if the time is spent on the client side because of various JavaScript performance issues

But let's assume that you found out that you need to dive into the internals of the database. The problem that is that the database calls are rarely something simple - usually you have a combination of PL/SQL units-calling SQL-calling-PL/SQL units-calling other PL/SQL units-calling SQL etc. I.e. you have a hierarchy of database calls. And that's where HProf comes to rescue. 

I will start with the simplest module and simplest logging implementation (writing logs to the file system rather than storing it in the database):

SQL> CREATE DIRECTORY IO AS 'C:\IO';
SQL> exec dbms_hprof.start_profiling 
                   (location=>'IO',filename=>'HProf.txt');

SQL> DECLARE
2      PROCEDURE p_doSomething (pi_empno NUMBER) IS
3      BEGIN
4          dbms_lock.sleep(0.1);
5      END;
6      PROCEDURE p_main IS
7      BEGIN
8          dbms_lock.sleep(0.5);
9          FOR c IN (SELECT * FROM emp) LOOP
10             p_doSomething(c.empno);
11         END LOOP;
12     END;
13 BEGIN
14     p_main();
15 END;
16 /

SQL> exec dbms_hprof.stop_profiling;

As you can see, nothing fancy: (1) anonymous block calls a procedure (2) procedure wastes half-second and walks a loop based on EMP table (3) for every row in the EMP table another procedure is called that wastes 0.1 sec. All of the processing is started/stopped via DBMS_HPROF calls, and results are written into the file, as I said earlier.

If you want to read the raw file - it is pretty simple, but still doesn't provide any insights:

P#V PLSHPROF Internal Version 1.0
P#! PL/SQL Timer Started
P#C PLSQL."".""."__plsql_vm"
P#X 8
P#C PLSQL."".""."__anonymous_block"
P#X 6
P#C PLSQL."".""."__anonymous_block.P_MAIN"#980980e97e42f8ec #6
P#X 63
P#C PLSQL."SYS"."DBMS_LOCK"::9."__pkg_init"
P#X 7
P#R
P#X 119
P#C PLSQL."SYS"."DBMS_LOCK"::11."SLEEP"#e17d780a3c3eae3d #197
P#X 500373
P#R
P#X 586
P#C SQL."".""."__sql_fetch_line9" #9."4ay6mhcbhvbf2"
P#! SELECT * FROM SCOTT.EMP
P#X 3791
P#R
P#X 17
<<… and so on …>>

P#C is a call, P#R - return from the subprogram, P#X - elapsed time etc. But you need an aggregator - PLSHPROF routine:

C:\Utl_File\IO>plshprof -output hprof_intro HProf.txt
PLSHPROF: Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
[8 symbols processed]
[Report written to 'hprof_intro.html']

Now the raw data finally makes sense (the resulting report has a lot of sorting options, I picked up the most explicit one):


All of the steps exactly as I described above - including SQL ID of executed query. So, if needed you have a way to link that report with V$SQLAREA and/or DBMS_XPLAN analysis.

To be continued...







No comments: