Tuesday, September 3, 2013

Unknown DATAPUMP functionality: getting catalog from the file

To be fair, I am not spending all of my days writing the new PL/SQL book :-) I even work sometimes! And while solving real problems I reasonably often find interesting thing.

Today I was asked by one of my colleagues: is it possible to look inside of the existing DATAPUMP file. This file was a FULL import of client's database, and we needed to check for the existence of a very specific public synonym.

The question was perfectly valid and I started the research. First, I found that IMPDP has parameter SQLFILE=<filename>, that generates all DDL from the dump. It could work out, but the perspective of digging through megabytes of textual data was not very promising, so I continued my research.

Eventually I found a comment in of Oracle-related forums mentioning another IMPDP parameter: MASTER_ONLY=YES. We all know that every time you run IMPDP, it creates a special table that contains the catalog of objects to be processed (names like SYS_IMPORT_TABLE_##, SYS_IMPORT_SCHEMA_##, SYS_IMPORT_FULL_##). As it appears to be, this catalog is human-readable (mostly)! The parameter mentioned above just loads this table - and stops at this point:

C:\>impdp system/system@testDB dumpfile=StrangeFile.dmp logfile=imp_master.log master_only=yes full=y

Import: Release 11.2.0.3.0 - Production on Tue Sep 3 13:09:15 2013

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit
Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Master table "SYSTEM"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Job "SYSTEM"."SYS_IMPORT_FULL_01" successfully completed at 13:09:32
C:\>

Now the table is ready to be examined! The following query shows the most important data:

select distinct
  object_schema,
  object_name,
  object_type,
  object_tablespace,
  process_order,
  duplicate,
  processing_status,
  processing_state
from system.SYS_IMPORT_FULL_01
where process_order > 0
and object_name is not null
order by object_schema, object_name

Just to make sure that everybody gets limitations here: this method does not provide you DDLs of objects. It allows to find what objects exist in the dump file. So, there will be no guessing anymore, which is always a good thing!

By the way, getting DDLs not is also trivial now:
* take required object names and put it into INCLUDE clause
* add SQLFILE clause to generate DDLs instead of firing them