Ilmar Kerm

Oracle, databases, Linux and maybe more

I just published for free the work I’ve done for the past 1,5 years 🙂

We used to have a few problems with backups:
* Weekly full backup took 2+ days to run on the larger databases and it always took a lot of resources to run.
* To test the backups we needed to allocate the same amount of storage space and it took too long time to restore the database from backup.
* To make matters worse, backup software provider charged by GB just to run a fancy GUI over RMAN.

But we can do better 🙂 And a script suite to manage RMAN image copy backups and to test them automatically was born.

Here is a separate page for the scripts

Code itself is published in GitHub

We’ve run this system excludively in production for over a year now and it has been working great.

I’ll be presenting “Using image copies for Oracle database backups” at the following upcoming conferences:

DOAG2016

Lately I’ve been upgrading our 11g Standard Edition databases in test environments to 12c Enterprise Edition and also plugging them in to a multitenant container database.
It’s a new technology for Oracle, but I was still quite surprised about the number of issues I faced when trying to plug in an existing non-CDB database. After resolving all these issues it has been quite painless process since then.

In short, upgrading 11g database to 12c pluggable database involves the following steps:
* Upgrade 11g database to 12c using the normal database upgrade procedures. This step will result in 12c non-CDB database.
* In the target CDB database plug in the new upgraded database as a new pluggable database.
* Run $ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql in the new PDB. This step will convert the non-CDB data dictionary to a PDB data dictionary. After this step you can open the newly added PDB.

This post is mostly about the issues I encountered when running the last described step – executing $ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql. Hopefully it’ll be helpful if you hit similar problems.

Version: 12.1.0.2, 2-node RAC
Patches: April 2016 PSU + OJVM PSU
Platform: Oracle Linux 6 x86-64

noncdb_to_pdb.sql takes a really long time to execute

This was the first problem I encountered. After 1,5 hours I killed my session. That was really weird, because executing it should onbly take about 20 minutes according to Oracle documentation. The step script was stuck on was:

-- mark objects in our PDB as common if they exist as common in ROOT

Looking at the wait events the session was not waiting for a blocker, it was actively executing many parallel sessions.
I found the following blog post that described the same problem and the described solution also helped for me: Link to Bertrand Drouvot blog

But one addition, instead of modifying the noncdb_to_pdb.sql script, I executed ALTER SESSION before running noncdb_to_pdb.sql.

SQL> alter session set container=newpdb;
SQL> alter session set optimizer_adaptive_features=false;
SQL> $ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql

noncdb_to_pdb.sql hangs at alter pluggable database close

Next issue I faced – noncdb_to_pdb.sql just hanged mid-execution and the statement it was executing was

SQL> alter pluggable database "&pdbname" close;

The session was waiting for opishd.

Solution: Apply bugfix for Bug 20172151 – NONCDB_TO_PDB.SQL SCRIPT HANGS DURING UPGRADE. This will just update noncdb_to_pdb.sql script itself to execute alter pluggable database “&pdbname” close IMMEDIATE instances = all; instead of normal close.

noncdb_to_pdb.sql fails with ORA-600 [kspgsp2]

That was a fun one 🙂 Not every time, but most executions noncdb_to_pdb.sql failed almost at the end with the following message:

SQL> alter session set "_ORACLE_SCRIPT"=true;

Session altered.

SQL>
SQL> drop view sys.cdb$tables&pdbid;
drop view sys.cdb$tables5
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kspgsp2], [0xBF3C9E3F8], [2174],
[recyclebin], [], [], [], [], [], [], [], []

Every time at the same drop view statement. Search in Oracle support did not give me anything helpful, there were many ORA-600 [kspgsp2] issues, but nothing matched my case. Finally I noticed that one argumnt was [recyclebin] and decided to try turning the recyclebin off for the session. It helped.

Successful non-CDB to PDB conversion

Getting successful execution of noncdb_to_pdb.sql required me to:
* Apply patch 20172151
* Running noncdb_to_pdb.sql using the following sequence of commands:

SQL> alter session set container=newpdb;
SQL> alter session set optimizer_adaptive_features=false;
SQL> alter session set recyclebin=off;
SQL> @$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql

Take care of the services!

This problem may be our environment specific, but I’ll describe it anyway.
We use services a lot, all applications that connect to the database get their own dedicated service. So the applications connect using JDBC connection string that looks something like this:

jdbc:oracle:thin:@//clusername.example.com/application.scrum.example.com

Where application is configured as a service using srvctl and scrum.example.com is database domain name depending on the environment. The same application in QA environment will have connection string:

jdbc:oracle:thin:@//clusername.example.com/application.qa.example.com

We decided to use only one CDB for all environments, but db_domain parameter cannot be different for each PDB. In order to not change the application connection strings I had to create the new services in srvctl using the FULL service name, then Oracle will not append the database domain name to the service name:

srvctl add service -database cdb -preferred cdb1,cdb2 -pdb newpdb -service application.scrum.example.com
srvctl start service -database cdb -service application.scrum.example.com

After adding the services for the first database all of them started just fine and applications connected just fine, but when starting the services for the second environment (qa) I got the following error:

srvctl add service -database cdb -preferred cdb1,cdb2 -pdb newpdbqa -service application.qa.example.com
srvctl start service -database cdb -service application.qa.example.com

...
ORA-44311: service application not running
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 86
ORA-06512: at "SYS.DBMS_SERVICE_ERR", line 40
ORA-06512: at "SYS.DBMS_SERVICE", line 421
ORA-06512: at line 1
...

But when I tried to add a new service that did not exist previously it started just fine. I started digging into services on CDB level and found that all imported PDB-s also imprted their old short name services to CDB:

SQL> alter session set container=cdb$root;

Session altered.

SQL> select name from cdb_services order by 1;

NAME
------------------------------------------------
SYS$BACKGROUND
SYS$USERS
...
application
application
application.scrum.example.com
application2
application2
application2.scrum.example.com
...

I just assumed it can be confusing for CDB if different PDB-s have conflicting services running and I manually went into each PDB and removed the old short service names.

SQL> alter session set container=newpdb;
SQL> exec dbms_service.delete_service('application');
SQL> exec dbms_service.delete_service('application2');
SQL> alter session set container=newpdbqa;
SQL> exec dbms_service.delete_service('application');
SQL> exec dbms_service.delete_service('application2');

After that new services started just fine.

SQL> alter session set container=cdb$root;
SQL> select name from cdb_services order by 1;

NAME
------------------------------------------------
SYS$BACKGROUND
SYS$USERS
...
application.scrum.example.com
application2.scrum.example.com
application.qa.example.com
application2.qa.example.com
...

I hit this issue by accident, developers wanted to disable inserts to child table so they could perform some one-time maintenance operation, and this maintenance only affected one rown from the parent table (and all it’s children). I started wondering if there is any smaller level impact solution possible than taking a shared read lock on child table.

Database version: 12.1.0.2, but the same behaviour was also present in 11.2.0.4.

Very simple test schema set up:

create table p (
  id number(10) primary key,
  v varchar2(10) not null
) organization index;

create table c (
  id number(10) primary key,
  p_id number(10) not null references p(id),
  v varchar2(10) not null
);

insert into p values (1, '1');
insert into p values (2, '2');
insert into c values (1, 1, '1');
insert into c values (2, 1, '2');
insert into c values (3, 2, '3');

create index cpid on c (p_id);

Note, the foreign key is indexed.

I just had a thought that what will happen if I lock the parent table row first using SELECT FOR UPDATE in order to take the lock as low level as possible. What would happen then with inserts to the child table? Database needs to somehow protect that the parent row does not change/disappear while the child is being inserted.

SQL> SELECT * FROM p WHERE id=1 FOR UPDATE;

        ID V        
---------- ----------
         1 1         

SQL> SELECT sys_context('userenv','sid') session_id from dual;

SESSION_ID                                                                     
--------------------
268                                                                             

Now row id=1 is locked in the parent table p by session 268.
Could another session insert into table c when the the parent table row is locked?

SQL> SELECT sys_context('userenv','sid') session_id from dual;

SESSION_ID                                                                     
---------------------
255                                                                             

SQL> INSERT INTO c (id, p_id, v) VALUES (12, 2, 'not locked');

1 row inserted.

SQL> INSERT INTO c (id, p_id, v) VALUES (11, 1, 'locked');

So I could insert into the child table c a new row where p_id=2 (p.id=2 was not locked), but the second insert where p_id=1 (p.id=1 was locked by session 268 earlier) just hangs. Lets look why session 255 is hanging:

SQL> select status, event, state, blocking_session from v$session where sid=255;

STATUS   EVENT                                                        STATE                                 BLOCKING_SESSION
-------- ------------------------------------------------------------ ------------------- ----------------------------------
ACTIVE   enq: TX - row lock contention                                WAITING                                            268

Session doing the insert is blocked by the session who is holding a lock on the parent table row that the insert is referring to.
Lets look at the locks both sessions are holding/requesting:

SQL> select session_id, lock_type, mode_held, mode_requested, blocking_others, trunc(lock_id1/power(2,16)) rbs, bitand(lock_id1, to_number('ffff','xxxx'))+0 slot, lock_id2 seq
from dba_locks where session_id in (255,268) and lock_type != 'AE'
order by rbs, slot,seq

SESSION_ID LOCK_TYPE       MODE_HELD       MODE_REQUESTED  BLOCKING_OTHERS        RBS       SLOT SEQ    
---------- --------------- --------------- --------------- --------------- ---------- ---------- --------
       255 DML             Row-X (SX)      None            Not Blocking             1      34910 0       
       268 DML             Row-X (SX)      None            Not Blocking             1      34910 0       
       255 DML             Row-X (SX)      None            Not Blocking             1      34912 0       
       255 Transaction     Exclusive       None            Not Blocking             2         23 3016    
       255 Transaction     None            Share           Not Blocking             9          3 2903    
       268 Transaction     Exclusive       None            Blocking                 9          3 2903    

 6 rows selected 

SQL> select o.object_type, o.object_name from v$locked_object lo join dba_objects o on o.object_id = lo.object_id
where lo.xidusn=9 and lo.xidslot=3 and lo.xidsqn=2903;

OBJECT_TYPE     OBJECT_NAME   
--------------- ---------------
TABLE           P              

Here we see that session 268 is holding transaction (TX, row level) lock in Exclusive mode on table P and it is blocking session 255 that is requesting lock on the same row in Share mode.
Here I have to conclude, that when inserting a row to child table, Oracle also tries to get a Shared row lock on the parent table. Looks perfect for my use case and I announced victory in our internal DBA mailing list. But a few minutes later a colleque emailed me back, that it does not work. He recreated the setup (his own way, not using my scripts) and after locking the parent table row he was able to insert to the child table just fine.
It took some time to work out the differences in our setups and in the end I reduced the difference to a simple fact that I create index-organized tables by default, and he creates heap tables by default and that makes all the difference in this case. This only works when the parent table is index-organized.

Lets try the same example, but now creating parent table as HEAP:

create table pheap (
  id number(10) primary key,
  v varchar2(10) not null
);

create table cheap (
  id number(10) primary key,
  p_id number(10) not null references pheap(id),
  v varchar2(10) not null
);

insert into pheap values (1, '1');
insert into pheap values (2, '2');
insert into cheap values (1, 1, '1');
insert into cheap values (2, 1, '2');
insert into cheap values (3, 2, '3');

create index cheappid on cheap (p_id);

Lock the parent row in one session:

SQL> SELECT * FROM pheap WHERE id=1 FOR UPDATE;
        ID V        
---------- ----------
         1 1         

SQL> SELECT sys_context('userenv','sid') session_id from dual;

SESSION_ID                                                                                                                                                                                             
--------------------
3  

And try to insert into child from another session:

SQL> SELECT sys_context('userenv','sid') session_id from dual;

SESSION_ID    
---------------
12             

SQL> INSERT INTO cheap (id, p_id, v) VALUES (12, 2, 'not locked');

1 row inserted.

SQL> INSERT INTO cheap (id, p_id, v) VALUES (11, 1, 'locked');

1 row inserted.

No waiting whatsoever. Does anybody know why this difference in behaviour between IOT and HEAP tables?

UPDATE 2016-06-26: After getting Jonathan Lewis involved in Twitter, he solved this mystery very quickly 🙂 Link to twitter conversation

Oracle EE 11.2.0.4 on Linux x86-64.

I got a really surprising error message today when setting up a new data guard standby database.
I created a standby controlfile as usual and placed it on a common NFS share accessible also to the new data guard host:

SQL> alter database create standby controlfile as '/nfs/install/oemdb/cf2.f';

Database altered.

Now, on a new node I tried to restore that controlfile, but got a really surprising RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece. This shouldn’t happen, it is just stored on a common NFS share, file should not be damaged.

RMAN> restore controlfile from '/nfs/install/oemdb/cf2.f';

Starting restore at 20-MAY-16
using channel ORA_DISK_1

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 05/20/2016 12:58:33
RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece

Although the error message does not say it, but I remembered that I had mounted the NFS using SOFT mount option and when trying to restore datafiles from soft mounted NFS shared you will usually get ORA-27054: NFS file system not mounted with correct options, unless you have turned on Direct-NFS on the database kernel. So I just wondered, maybe this is the real error message in this case also.
After turning on Direct NFS, restoring the control file worked as expected:

[production|oracle@vdb0005.mlt.unibet.com oemdb]$ cd $ORACLE_HOME/rdbms/lib
[production|oracle@vdb0005.mlt.unibet.com lib]$ make -f ins_rdbms.mk dnfs_on
rm -f /u01/app/oracle/product/11.2.0.4/db/lib/libodm11.so; cp /u01/app/oracle/product/11.2.0.4/db/lib/libnfsodm11.so /u01/app/oracle/product/11.2.0.4/db/lib/libodm11.so
[production|oracle@vdb0005.mlt.unibet.com lib]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Fri May 20 13:01:56 2016

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount
ORACLE instance started.

Total System Global Area 9620525056 bytes
Fixed Size                  2261368 bytes
Variable Size            2449477256 bytes
Database Buffers         7147094016 bytes
Redo Buffers               21692416 bytes
SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[production|oracle@vdb0005.mlt.unibet.com lib]$ rman target /

Recovery Manager: Release 11.2.0.4.0 - Production on Fri May 20 13:02:14 2016

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: OEM (not mounted)

RMAN> restore controlfile from '/nfs/install/oemdb/cf2.f';

Starting restore at 20-MAY-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=474 device type=DISK

channel ORA_DISK_1: copied control file copy
output file name=+DATA/oem/controlfile/current.257.912344539
Finished restore at 20-MAY-16

The NFS share was mounted using options:
type nfs (rw,bg,soft,rsize=32768,wsize=32768,tcp,nfsvers=3,timeo=600,addr=10.10.10.10)