I just published for free the work I’ve done for the past 1,5 years 🙂
We used to have a few problems with backups:
* Weekly full backup took 2+ days to run on the larger databases and it always took a lot of resources to run.
* To test the backups we needed to allocate the same amount of storage space and it took too long time to restore the database from backup.
* To make matters worse, backup software provider charged by GB just to run a fancy GUI over RMAN.
But we can do better 🙂 And a script suite to manage RMAN image copy backups and to test them automatically was born.
We’ve run this system excludively in production for over a year now and it has been working great.
I’ll be presenting “Using image copies for Oracle database backups” at the following upcoming conferences:
- HROUG 2016 in Rovinj, Croatia, 18-21 October 2016
- DOAG 2016 in Nuremberg, Germany, 15-18 November 2016
Lately I’ve been upgrading our 11g Standard Edition databases in test environments to 12c Enterprise Edition and also plugging them in to a multitenant container database.
It’s a new technology for Oracle, but I was still quite surprised about the number of issues I faced when trying to plug in an existing non-CDB database. After resolving all these issues it has been quite painless process since then.
In short, upgrading 11g database to 12c pluggable database involves the following steps:
* Upgrade 11g database to 12c using the normal database upgrade procedures. This step will result in 12c non-CDB database.
* In the target CDB database plug in the new upgraded database as a new pluggable database.
* Run $ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql in the new PDB. This step will convert the non-CDB data dictionary to a PDB data dictionary. After this step you can open the newly added PDB.
This post is mostly about the issues I encountered when running the last described step – executing $ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql. Hopefully it’ll be helpful if you hit similar problems.
Version: 220.127.116.11, 2-node RAC
Patches: April 2016 PSU + OJVM PSU
Platform: Oracle Linux 6 x86-64
noncdb_to_pdb.sql takes a really long time to execute
This was the first problem I encountered. After 1,5 hours I killed my session. That was really weird, because executing it should onbly take about 20 minutes according to Oracle documentation. The step script was stuck on was:
-- mark objects in our PDB as common if they exist as common in ROOT
Looking at the wait events the session was not waiting for a blocker, it was actively executing many parallel sessions.
I found the following blog post that described the same problem and the described solution also helped for me: Link to Bertrand Drouvot blog
But one addition, instead of modifying the noncdb_to_pdb.sql script, I executed ALTER SESSION before running noncdb_to_pdb.sql.
SQL> alter session set container=newpdb; SQL> alter session set optimizer_adaptive_features=false; SQL> $ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql
noncdb_to_pdb.sql hangs at alter pluggable database close
Next issue I faced – noncdb_to_pdb.sql just hanged mid-execution and the statement it was executing was
SQL> alter pluggable database "&pdbname" close;
The session was waiting for opishd.
Solution: Apply bugfix for Bug 20172151 – NONCDB_TO_PDB.SQL SCRIPT HANGS DURING UPGRADE. This will just update noncdb_to_pdb.sql script itself to execute alter pluggable database “&pdbname” close IMMEDIATE instances = all; instead of normal close.
noncdb_to_pdb.sql fails with ORA-600 [kspgsp2]
That was a fun one 🙂 Not every time, but most executions noncdb_to_pdb.sql failed almost at the end with the following message:
SQL> alter session set "_ORACLE_SCRIPT"=true; Session altered. SQL> SQL> drop view sys.cdb$tables&pdbid; drop view sys.cdb$tables5 * ERROR at line 1: ORA-00600: internal error code, arguments: [kspgsp2], [0xBF3C9E3F8], , [recyclebin], , , , , , , , 
Every time at the same drop view statement. Search in Oracle support did not give me anything helpful, there were many ORA-600 [kspgsp2] issues, but nothing matched my case. Finally I noticed that one argumnt was [recyclebin] and decided to try turning the recyclebin off for the session. It helped.
Successful non-CDB to PDB conversion
Getting successful execution of noncdb_to_pdb.sql required me to:
* Apply patch 20172151
* Running noncdb_to_pdb.sql using the following sequence of commands:
SQL> alter session set container=newpdb; SQL> alter session set optimizer_adaptive_features=false; SQL> alter session set recyclebin=off; SQL> @$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql
Take care of the services!
This problem may be our environment specific, but I’ll describe it anyway.
We use services a lot, all applications that connect to the database get their own dedicated service. So the applications connect using JDBC connection string that looks something like this:
Where application is configured as a service using srvctl and scrum.example.com is database domain name depending on the environment. The same application in QA environment will have connection string:
We decided to use only one CDB for all environments, but db_domain parameter cannot be different for each PDB. In order to not change the application connection strings I had to create the new services in srvctl using the FULL service name, then Oracle will not append the database domain name to the service name:
srvctl add service -database cdb -preferred cdb1,cdb2 -pdb newpdb -service application.scrum.example.com srvctl start service -database cdb -service application.scrum.example.com
After adding the services for the first database all of them started just fine and applications connected just fine, but when starting the services for the second environment (qa) I got the following error:
srvctl add service -database cdb -preferred cdb1,cdb2 -pdb newpdbqa -service application.qa.example.com srvctl start service -database cdb -service application.qa.example.com ... ORA-44311: service application not running ORA-06512: at "SYS.DBMS_SYS_ERROR", line 86 ORA-06512: at "SYS.DBMS_SERVICE_ERR", line 40 ORA-06512: at "SYS.DBMS_SERVICE", line 421 ORA-06512: at line 1 ...
But when I tried to add a new service that did not exist previously it started just fine. I started digging into services on CDB level and found that all imported PDB-s also imprted their old short name services to CDB:
SQL> alter session set container=cdb$root; Session altered. SQL> select name from cdb_services order by 1; NAME ------------------------------------------------ SYS$BACKGROUND SYS$USERS ... application application application.scrum.example.com application2 application2 application2.scrum.example.com ...
I just assumed it can be confusing for CDB if different PDB-s have conflicting services running and I manually went into each PDB and removed the old short service names.
SQL> alter session set container=newpdb; SQL> exec dbms_service.delete_service('application'); SQL> exec dbms_service.delete_service('application2'); SQL> alter session set container=newpdbqa; SQL> exec dbms_service.delete_service('application'); SQL> exec dbms_service.delete_service('application2');
After that new services started just fine.
SQL> alter session set container=cdb$root; SQL> select name from cdb_services order by 1; NAME ------------------------------------------------ SYS$BACKGROUND SYS$USERS ... application.scrum.example.com application2.scrum.example.com application.qa.example.com application2.qa.example.com ...
I hit this issue by accident, developers wanted to disable inserts to child table so they could perform some one-time maintenance operation, and this maintenance only affected one rown from the parent table (and all it’s children). I started wondering if there is any smaller level impact solution possible than taking a shared read lock on child table.
Database version: 18.104.22.168, but the same behaviour was also present in 22.214.171.124.
Very simple test schema set up:
create table p ( id number(10) primary key, v varchar2(10) not null ) organization index; create table c ( id number(10) primary key, p_id number(10) not null references p(id), v varchar2(10) not null ); insert into p values (1, '1'); insert into p values (2, '2'); insert into c values (1, 1, '1'); insert into c values (2, 1, '2'); insert into c values (3, 2, '3'); create index cpid on c (p_id);
Note, the foreign key is indexed.
I just had a thought that what will happen if I lock the parent table row first using SELECT FOR UPDATE in order to take the lock as low level as possible. What would happen then with inserts to the child table? Database needs to somehow protect that the parent row does not change/disappear while the child is being inserted.
SQL> SELECT * FROM p WHERE id=1 FOR UPDATE; ID V ---------- ---------- 1 1 SQL> SELECT sys_context('userenv','sid') session_id from dual; SESSION_ID -------------------- 268
Now row id=1 is locked in the parent table p by session 268.
Could another session insert into table c when the the parent table row is locked?
SQL> SELECT sys_context('userenv','sid') session_id from dual; SESSION_ID --------------------- 255 SQL> INSERT INTO c (id, p_id, v) VALUES (12, 2, 'not locked'); 1 row inserted. SQL> INSERT INTO c (id, p_id, v) VALUES (11, 1, 'locked');
So I could insert into the child table c a new row where p_id=2 (p.id=2 was not locked), but the second insert where p_id=1 (p.id=1 was locked by session 268 earlier) just hangs. Lets look why session 255 is hanging:
SQL> select status, event, state, blocking_session from v$session where sid=255; STATUS EVENT STATE BLOCKING_SESSION -------- ------------------------------------------------------------ ------------------- ---------------------------------- ACTIVE enq: TX - row lock contention WAITING 268
Session doing the insert is blocked by the session who is holding a lock on the parent table row that the insert is referring to.
Lets look at the locks both sessions are holding/requesting:
SQL> select session_id, lock_type, mode_held, mode_requested, blocking_others, trunc(lock_id1/power(2,16)) rbs, bitand(lock_id1, to_number('ffff','xxxx'))+0 slot, lock_id2 seq from dba_locks where session_id in (255,268) and lock_type != 'AE' order by rbs, slot,seq SESSION_ID LOCK_TYPE MODE_HELD MODE_REQUESTED BLOCKING_OTHERS RBS SLOT SEQ ---------- --------------- --------------- --------------- --------------- ---------- ---------- -------- 255 DML Row-X (SX) None Not Blocking 1 34910 0 268 DML Row-X (SX) None Not Blocking 1 34910 0 255 DML Row-X (SX) None Not Blocking 1 34912 0 255 Transaction Exclusive None Not Blocking 2 23 3016 255 Transaction None Share Not Blocking 9 3 2903 268 Transaction Exclusive None Blocking 9 3 2903 6 rows selected SQL> select o.object_type, o.object_name from v$locked_object lo join dba_objects o on o.object_id = lo.object_id where lo.xidusn=9 and lo.xidslot=3 and lo.xidsqn=2903; OBJECT_TYPE OBJECT_NAME --------------- --------------- TABLE P
Here we see that session 268 is holding transaction (TX, row level) lock in Exclusive mode on table P and it is blocking session 255 that is requesting lock on the same row in Share mode.
Here I have to conclude, that when inserting a row to child table, Oracle also tries to get a Shared row lock on the parent table. Looks perfect for my use case and I announced victory in our internal DBA mailing list. But a few minutes later a colleque emailed me back, that it does not work. He recreated the setup (his own way, not using my scripts) and after locking the parent table row he was able to insert to the child table just fine.
It took some time to work out the differences in our setups and in the end I reduced the difference to a simple fact that I create index-organized tables by default, and he creates heap tables by default and that makes all the difference in this case. This only works when the parent table is index-organized.
Lets try the same example, but now creating parent table as HEAP:
create table pheap ( id number(10) primary key, v varchar2(10) not null ); create table cheap ( id number(10) primary key, p_id number(10) not null references pheap(id), v varchar2(10) not null ); insert into pheap values (1, '1'); insert into pheap values (2, '2'); insert into cheap values (1, 1, '1'); insert into cheap values (2, 1, '2'); insert into cheap values (3, 2, '3'); create index cheappid on cheap (p_id);
Lock the parent row in one session:
SQL> SELECT * FROM pheap WHERE id=1 FOR UPDATE; ID V ---------- ---------- 1 1 SQL> SELECT sys_context('userenv','sid') session_id from dual; SESSION_ID -------------------- 3
And try to insert into child from another session:
SQL> SELECT sys_context('userenv','sid') session_id from dual; SESSION_ID --------------- 12 SQL> INSERT INTO cheap (id, p_id, v) VALUES (12, 2, 'not locked'); 1 row inserted. SQL> INSERT INTO cheap (id, p_id, v) VALUES (11, 1, 'locked'); 1 row inserted.
No waiting whatsoever. Does anybody know why this difference in behaviour between IOT and HEAP tables?
Oracle EE 126.96.36.199 on Linux x86-64.
I got a really surprising error message today when setting up a new data guard standby database.
I created a standby controlfile as usual and placed it on a common NFS share accessible also to the new data guard host:
SQL> alter database create standby controlfile as '/nfs/install/oemdb/cf2.f'; Database altered.
Now, on a new node I tried to restore that controlfile, but got a really surprising RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece. This shouldn’t happen, it is just stored on a common NFS share, file should not be damaged.
RMAN> restore controlfile from '/nfs/install/oemdb/cf2.f'; Starting restore at 20-MAY-16 using channel ORA_DISK_1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of restore command at 05/20/2016 12:58:33 RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece
Although the error message does not say it, but I remembered that I had mounted the NFS using SOFT mount option and when trying to restore datafiles from soft mounted NFS shared you will usually get ORA-27054: NFS file system not mounted with correct options, unless you have turned on Direct-NFS on the database kernel. So I just wondered, maybe this is the real error message in this case also.
After turning on Direct NFS, restoring the control file worked as expected:
[firstname.lastname@example.org oemdb]$ cd $ORACLE_HOME/rdbms/lib [email@example.com lib]$ make -f ins_rdbms.mk dnfs_on rm -f /u01/app/oracle/product/188.8.131.52/db/lib/libodm11.so; cp /u01/app/oracle/product/184.108.40.206/db/lib/libnfsodm11.so /u01/app/oracle/product/220.127.116.11/db/lib/libodm11.so [firstname.lastname@example.org lib]$ sqlplus / as sysdba SQL*Plus: Release 18.104.22.168.0 Production on Fri May 20 13:01:56 2016 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to an idle instance. SQL> startup nomount ORACLE instance started. Total System Global Area 9620525056 bytes Fixed Size 2261368 bytes Variable Size 2449477256 bytes Database Buffers 7147094016 bytes Redo Buffers 21692416 bytes SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 22.214.171.124.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options [email@example.com lib]$ rman target / Recovery Manager: Release 126.96.36.199.0 - Production on Fri May 20 13:02:14 2016 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. connected to target database: OEM (not mounted) RMAN> restore controlfile from '/nfs/install/oemdb/cf2.f'; Starting restore at 20-MAY-16 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=474 device type=DISK channel ORA_DISK_1: copied control file copy output file name=+DATA/oem/controlfile/current.257.912344539 Finished restore at 20-MAY-16
The NFS share was mounted using options:
type nfs (rw,bg,soft,rsize=32768,wsize=32768,tcp,nfsvers=3,timeo=600,addr=10.10.10.10)