Lately I’ve been upgrading our 11g Standard Edition databases in test environments to 12c Enterprise Edition and also plugging them in to a multitenant container database.
It’s a new technology for Oracle, but I was still quite surprised about the number of issues I faced when trying to plug in an existing non-CDB database. After resolving all these issues it has been quite painless process since then.
In short, upgrading 11g database to 12c pluggable database involves the following steps:
* Upgrade 11g database to 12c using the normal database upgrade procedures. This step will result in 12c non-CDB database.
* In the target CDB database plug in the new upgraded database as a new pluggable database.
* Run $ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql in the new PDB. This step will convert the non-CDB data dictionary to a PDB data dictionary. After this step you can open the newly added PDB.
This post is mostly about the issues I encountered when running the last described step – executing $ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql. Hopefully it’ll be helpful if you hit similar problems.
Version: 188.8.131.52, 2-node RAC
Patches: April 2016 PSU + OJVM PSU
Platform: Oracle Linux 6 x86-64
noncdb_to_pdb.sql takes a really long time to execute
This was the first problem I encountered. After 1,5 hours I killed my session. That was really weird, because executing it should onbly take about 20 minutes according to Oracle documentation. The step script was stuck on was:
-- mark objects in our PDB as common if they exist as common in ROOT
Looking at the wait events the session was not waiting for a blocker, it was actively executing many parallel sessions.
I found the following blog post that described the same problem and the described solution also helped for me: Link to Bertrand Drouvot blog
But one addition, instead of modifying the noncdb_to_pdb.sql script, I executed ALTER SESSION before running noncdb_to_pdb.sql.
SQL> alter session set container=newpdb; SQL> alter session set optimizer_adaptive_features=false; SQL> $ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql
noncdb_to_pdb.sql hangs at alter pluggable database close
Next issue I faced – noncdb_to_pdb.sql just hanged mid-execution and the statement it was executing was
SQL> alter pluggable database "&pdbname" close;
The session was waiting for opishd.
Solution: Apply bugfix for Bug 20172151 – NONCDB_TO_PDB.SQL SCRIPT HANGS DURING UPGRADE. This will just update noncdb_to_pdb.sql script itself to execute alter pluggable database “&pdbname” close IMMEDIATE instances = all; instead of normal close.
noncdb_to_pdb.sql fails with ORA-600 [kspgsp2]
That was a fun one 🙂 Not every time, but most executions noncdb_to_pdb.sql failed almost at the end with the following message:
SQL> alter session set "_ORACLE_SCRIPT"=true; Session altered. SQL> SQL> drop view sys.cdb$tables&pdbid; drop view sys.cdb$tables5 * ERROR at line 1: ORA-00600: internal error code, arguments: [kspgsp2], [0xBF3C9E3F8], , [recyclebin], , , , , , , , 
Every time at the same drop view statement. Search in Oracle support did not give me anything helpful, there were many ORA-600 [kspgsp2] issues, but nothing matched my case. Finally I noticed that one argumnt was [recyclebin] and decided to try turning the recyclebin off for the session. It helped.
Successful non-CDB to PDB conversion
Getting successful execution of noncdb_to_pdb.sql required me to:
* Apply patch 20172151
* Running noncdb_to_pdb.sql using the following sequence of commands:
SQL> alter session set container=newpdb; SQL> alter session set optimizer_adaptive_features=false; SQL> alter session set recyclebin=off; SQL> @$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql
Take care of the services!
This problem may be our environment specific, but I’ll describe it anyway.
We use services a lot, all applications that connect to the database get their own dedicated service. So the applications connect using JDBC connection string that looks something like this:
Where application is configured as a service using srvctl and scrum.example.com is database domain name depending on the environment. The same application in QA environment will have connection string:
We decided to use only one CDB for all environments, but db_domain parameter cannot be different for each PDB. In order to not change the application connection strings I had to create the new services in srvctl using the FULL service name, then Oracle will not append the database domain name to the service name:
srvctl add service -database cdb -preferred cdb1,cdb2 -pdb newpdb -service application.scrum.example.com srvctl start service -database cdb -service application.scrum.example.com
After adding the services for the first database all of them started just fine and applications connected just fine, but when starting the services for the second environment (qa) I got the following error:
srvctl add service -database cdb -preferred cdb1,cdb2 -pdb newpdbqa -service application.qa.example.com srvctl start service -database cdb -service application.qa.example.com ... ORA-44311: service application not running ORA-06512: at "SYS.DBMS_SYS_ERROR", line 86 ORA-06512: at "SYS.DBMS_SERVICE_ERR", line 40 ORA-06512: at "SYS.DBMS_SERVICE", line 421 ORA-06512: at line 1 ...
But when I tried to add a new service that did not exist previously it started just fine. I started digging into services on CDB level and found that all imported PDB-s also imprted their old short name services to CDB:
SQL> alter session set container=cdb$root; Session altered. SQL> select name from cdb_services order by 1; NAME ------------------------------------------------ SYS$BACKGROUND SYS$USERS ... application application application.scrum.example.com application2 application2 application2.scrum.example.com ...
I just assumed it can be confusing for CDB if different PDB-s have conflicting services running and I manually went into each PDB and removed the old short service names.
SQL> alter session set container=newpdb; SQL> exec dbms_service.delete_service('application'); SQL> exec dbms_service.delete_service('application2'); SQL> alter session set container=newpdbqa; SQL> exec dbms_service.delete_service('application'); SQL> exec dbms_service.delete_service('application2');
After that new services started just fine.
SQL> alter session set container=cdb$root; SQL> select name from cdb_services order by 1; NAME ------------------------------------------------ SYS$BACKGROUND SYS$USERS ... application.scrum.example.com application2.scrum.example.com application.qa.example.com application2.qa.example.com ...
I hit this issue by accident, developers wanted to disable inserts to child table so they could perform some one-time maintenance operation, and this maintenance only affected one rown from the parent table (and all it’s children). I started wondering if there is any smaller level impact solution possible than taking a shared read lock on child table.
Database version: 184.108.40.206, but the same behaviour was also present in 220.127.116.11.
Very simple test schema set up:
create table p ( id number(10) primary key, v varchar2(10) not null ) organization index; create table c ( id number(10) primary key, p_id number(10) not null references p(id), v varchar2(10) not null ); insert into p values (1, '1'); insert into p values (2, '2'); insert into c values (1, 1, '1'); insert into c values (2, 1, '2'); insert into c values (3, 2, '3'); create index cpid on c (p_id);
Note, the foreign key is indexed.
I just had a thought that what will happen if I lock the parent table row first using SELECT FOR UPDATE in order to take the lock as low level as possible. What would happen then with inserts to the child table? Database needs to somehow protect that the parent row does not change/disappear while the child is being inserted.
SQL> SELECT * FROM p WHERE id=1 FOR UPDATE; ID V ---------- ---------- 1 1 SQL> SELECT sys_context('userenv','sid') session_id from dual; SESSION_ID -------------------- 268
Now row id=1 is locked in the parent table p by session 268.
Could another session insert into table c when the the parent table row is locked?
SQL> SELECT sys_context('userenv','sid') session_id from dual; SESSION_ID --------------------- 255 SQL> INSERT INTO c (id, p_id, v) VALUES (12, 2, 'not locked'); 1 row inserted. SQL> INSERT INTO c (id, p_id, v) VALUES (11, 1, 'locked');
So I could insert into the child table c a new row where p_id=2 (p.id=2 was not locked), but the second insert where p_id=1 (p.id=1 was locked by session 268 earlier) just hangs. Lets look why session 255 is hanging:
SQL> select status, event, state, blocking_session from v$session where sid=255; STATUS EVENT STATE BLOCKING_SESSION -------- ------------------------------------------------------------ ------------------- ---------------------------------- ACTIVE enq: TX - row lock contention WAITING 268
Session doing the insert is blocked by the session who is holding a lock on the parent table row that the insert is referring to.
Lets look at the locks both sessions are holding/requesting:
SQL> select session_id, lock_type, mode_held, mode_requested, blocking_others, trunc(lock_id1/power(2,16)) rbs, bitand(lock_id1, to_number('ffff','xxxx'))+0 slot, lock_id2 seq from dba_locks where session_id in (255,268) and lock_type != 'AE' order by rbs, slot,seq SESSION_ID LOCK_TYPE MODE_HELD MODE_REQUESTED BLOCKING_OTHERS RBS SLOT SEQ ---------- --------------- --------------- --------------- --------------- ---------- ---------- -------- 255 DML Row-X (SX) None Not Blocking 1 34910 0 268 DML Row-X (SX) None Not Blocking 1 34910 0 255 DML Row-X (SX) None Not Blocking 1 34912 0 255 Transaction Exclusive None Not Blocking 2 23 3016 255 Transaction None Share Not Blocking 9 3 2903 268 Transaction Exclusive None Blocking 9 3 2903 6 rows selected SQL> select o.object_type, o.object_name from v$locked_object lo join dba_objects o on o.object_id = lo.object_id where lo.xidusn=9 and lo.xidslot=3 and lo.xidsqn=2903; OBJECT_TYPE OBJECT_NAME --------------- --------------- TABLE P
Here we see that session 268 is holding transaction (TX, row level) lock in Exclusive mode on table P and it is blocking session 255 that is requesting lock on the same row in Share mode.
Here I have to conclude, that when inserting a row to child table, Oracle also tries to get a Shared row lock on the parent table. Looks perfect for my use case and I announced victory in our internal DBA mailing list. But a few minutes later a colleque emailed me back, that it does not work. He recreated the setup (his own way, not using my scripts) and after locking the parent table row he was able to insert to the child table just fine.
It took some time to work out the differences in our setups and in the end I reduced the difference to a simple fact that I create index-organized tables by default, and he creates heap tables by default and that makes all the difference in this case. This only works when the parent table is index-organized.
Lets try the same example, but now creating parent table as HEAP:
create table pheap ( id number(10) primary key, v varchar2(10) not null ); create table cheap ( id number(10) primary key, p_id number(10) not null references pheap(id), v varchar2(10) not null ); insert into pheap values (1, '1'); insert into pheap values (2, '2'); insert into cheap values (1, 1, '1'); insert into cheap values (2, 1, '2'); insert into cheap values (3, 2, '3'); create index cheappid on cheap (p_id);
Lock the parent row in one session:
SQL> SELECT * FROM pheap WHERE id=1 FOR UPDATE; ID V ---------- ---------- 1 1 SQL> SELECT sys_context('userenv','sid') session_id from dual; SESSION_ID -------------------- 3
And try to insert into child from another session:
SQL> SELECT sys_context('userenv','sid') session_id from dual; SESSION_ID --------------- 12 SQL> INSERT INTO cheap (id, p_id, v) VALUES (12, 2, 'not locked'); 1 row inserted. SQL> INSERT INTO cheap (id, p_id, v) VALUES (11, 1, 'locked'); 1 row inserted.
No waiting whatsoever. Does anybody know why this difference in behaviour between IOT and HEAP tables?
Oracle EE 18.104.22.168 on Linux x86-64.
I got a really surprising error message today when setting up a new data guard standby database.
I created a standby controlfile as usual and placed it on a common NFS share accessible also to the new data guard host:
SQL> alter database create standby controlfile as '/nfs/install/oemdb/cf2.f'; Database altered.
Now, on a new node I tried to restore that controlfile, but got a really surprising RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece. This shouldn’t happen, it is just stored on a common NFS share, file should not be damaged.
RMAN> restore controlfile from '/nfs/install/oemdb/cf2.f'; Starting restore at 20-MAY-16 using channel ORA_DISK_1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of restore command at 05/20/2016 12:58:33 RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece
Although the error message does not say it, but I remembered that I had mounted the NFS using SOFT mount option and when trying to restore datafiles from soft mounted NFS shared you will usually get ORA-27054: NFS file system not mounted with correct options, unless you have turned on Direct-NFS on the database kernel. So I just wondered, maybe this is the real error message in this case also.
After turning on Direct NFS, restoring the control file worked as expected:
[firstname.lastname@example.org oemdb]$ cd $ORACLE_HOME/rdbms/lib [email@example.com lib]$ make -f ins_rdbms.mk dnfs_on rm -f /u01/app/oracle/product/22.214.171.124/db/lib/libodm11.so; cp /u01/app/oracle/product/126.96.36.199/db/lib/libnfsodm11.so /u01/app/oracle/product/188.8.131.52/db/lib/libodm11.so [firstname.lastname@example.org lib]$ sqlplus / as sysdba SQL*Plus: Release 184.108.40.206.0 Production on Fri May 20 13:01:56 2016 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to an idle instance. SQL> startup nomount ORACLE instance started. Total System Global Area 9620525056 bytes Fixed Size 2261368 bytes Variable Size 2449477256 bytes Database Buffers 7147094016 bytes Redo Buffers 21692416 bytes SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 220.127.116.11.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options [email@example.com lib]$ rman target / Recovery Manager: Release 18.104.22.168.0 - Production on Fri May 20 13:02:14 2016 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. connected to target database: OEM (not mounted) RMAN> restore controlfile from '/nfs/install/oemdb/cf2.f'; Starting restore at 20-MAY-16 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=474 device type=DISK channel ORA_DISK_1: copied control file copy output file name=+DATA/oem/controlfile/current.257.912344539 Finished restore at 20-MAY-16
The NFS share was mounted using options:
type nfs (rw,bg,soft,rsize=32768,wsize=32768,tcp,nfsvers=3,timeo=600,addr=10.10.10.10)
I’ll be presenting my brand new presentation “Using image copies for Oracle database backups” at ilOUG Tech Days on 30. May in Israel.
More information about the event can be found here
Abstract of my presentation:
When databases get ever larger and larger, backing them up using traditional RMAN backupsets will quickly get unfeasible. Completing a backup requires too much time and resources, but more importantly the same also applies to restores. RMAN has always provided a solution as incrementally updated image copies, but they are much less manageable than backupsets. This presentation goes into detail on how to successfully implement incrementally updated image copy backups, automate them and implement features that together with a capable storage system can provide almost everything that Oracle ZDLRA promises and beyond.
Looking forward to the event!
Since 11.1 RMAN has had a silent new feature – RMAN Backup Undo Optimization. This feature will exclude undo from committed transactions (after undo_retention time has also passed) from backups, possibly making the undo tablespace backup much smaller. The documentation just says that it will work for disk backups and Oracle Secure Backup tape backups. Since lately I’m been playing around a lot with image copy backups I wanted to find out if this feature only works with backupsets or does it also work for incrementally refreshed image copies.
I first thought that it cannot possibly work with image copies, since image copies should be exact datafile copies, but on the other hand when you refresh and image copy, then you at first also have to create incremental backupset of the changes that you then apply to the image copy, so maybe the optimization is applied silently there also 🙂 Would be really good. Better to test it out. Fingers crossed.
I’m using 22.214.171.124 on OEL 7.2.
Before taking the test I created an image copy from my undo tablespace (309 338 112 bytes):
RMAN> BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG 'image_copy_backup' TABLESPACE UNDOTBS1; -rw-r-----+ 1 oracle oinstall 309338112 Dec 28 05:06 data_D-ORCL_I-1433672784_TS-UNDOTBS1_FNO-3_04qvtmir
Yes I know, my filesystem dates were wrong at that point 🙂 Ignore this, NTP wasn’t running on the storage box.
Also a level 0 uncompressed backupset of the same tablespace (207 110 144 bytes, so it has already been optimized, but I’m interested in the next incremental backup size):
RMAN> BACKUP INCREMENTAL LEVEL 0 TABLESPACE UNDOTBS1; -rw-r-----+ 1 oracle oinstall 207110144 Dec 28 05:16 0kqvtpaj_1_1
Next I ran a large UPDATE statement and committed it immediately. I also had snapper running to catch the amount of undo my update caused. Snapper reported that my update generated 146MB of undo:
STAT, undo change vector size , 146 042 740
Now immediately I run incremental backup for both, backupset and to incrementally update the image copy.
BACKUP INCREMENTAL LEVEL 1 TABLESPACE UNDOTBS1 command produced file named 0mqvtpkf_1_1 and command BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG ‘image_copy_backup’ tablespace undotbs1 produced file named 0oqvtpm2_1_1. As you can see, both are almost equally as big and close to the reported undo change vector size.
No surprise herem undo optimization did not kick in since undo_retention time has not yet passed.
-rw-r-----+ 1 oracle oinstall 151470080 Dec 28 05:21 0mqvtpkf_1_1 -rw-r-----+ 1 oracle oinstall 181190656 Dec 28 05:22 0oqvtpm2_1_1
Then I deleted both these files and removed them from RMAN catalog.
After 30 minutes or so (my undo_retention time is 600 = 10 minutes) I ran the backup commands again:
RMAN> backup incremental level 1 tablespace undotbs1; Starting backup at 07-MAR-16 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=45 device type=DISK channel ORA_DISK_1: starting incremental level 1 datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00003 name=/u01/app/oracle/oradata/ORCL/datafile/o1_mf_undotbs1_cfvpb5hx_.dbf channel ORA_DISK_1: starting piece 1 at 07-MAR-16 channel ORA_DISK_1: finished piece 1 at 07-MAR-16 piece handle=/nfs/backup/orcl/14qvtsgf_1_1 tag=TAG20160307T230238 comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01 Finished backup at 07-MAR-16 RMAN> BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG 'image_copy_backup' tablespace undotbs1; Starting backup at 07-MAR-16 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=61 device type=DISK channel ORA_DISK_1: starting incremental level 1 datafile backup set channel ORA_DISK_1: specifying datafile(s) in backup set input datafile file number=00003 name=/u01/app/oracle/oradata/ORCL/datafile/o1_mf_undotbs1_cfvpb5hx_.dbf channel ORA_DISK_1: starting piece 1 at 07-MAR-16 channel ORA_DISK_1: finished piece 1 at 07-MAR-16 piece handle=/nfs/backup/orcl/16qvtsj0_1_1 tag=IMAGE_COPY_BACKUP comment=NONE channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07 Finished backup at 07-MAR-16
This can’t be good.. regular backupset took only 1 second to execute and taking an incremental backup for image copy refresh took 7 seconds.
Looking at the file sizes the difference is clear – 1,7MB for the incremental backup and 181MB (no change) for the image copy refresh:
-rw-r-----+ 1 oracle oinstall 1794048 Mar 7 23:02 14qvtsgf_1_1 -rw-r-----+ 1 oracle oinstall 181567488 Mar 7 23:04 16qvtsj0_1_1
So the backup undo optimization works, but only if you use backupsets.