Ilmar Kerm

Oracle, databases, Linux and maybe more

Oracle EE 11.2.0.4 on Linux x86-64.

I got a really surprising error message today when setting up a new data guard standby database.
I created a standby controlfile as usual and placed it on a common NFS share accessible also to the new data guard host:

SQL> alter database create standby controlfile as '/nfs/install/oemdb/cf2.f';

Database altered.

Now, on a new node I tried to restore that controlfile, but got a really surprising RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece. This shouldn’t happen, it is just stored on a common NFS share, file should not be damaged.

RMAN> restore controlfile from '/nfs/install/oemdb/cf2.f';

Starting restore at 20-MAY-16
using channel ORA_DISK_1

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 05/20/2016 12:58:33
RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or piece

Although the error message does not say it, but I remembered that I had mounted the NFS using SOFT mount option and when trying to restore datafiles from soft mounted NFS shared you will usually get ORA-27054: NFS file system not mounted with correct options, unless you have turned on Direct-NFS on the database kernel. So I just wondered, maybe this is the real error message in this case also.
After turning on Direct NFS, restoring the control file worked as expected:

[production|oracle@vdb0005.mlt.unibet.com oemdb]$ cd $ORACLE_HOME/rdbms/lib
[production|oracle@vdb0005.mlt.unibet.com lib]$ make -f ins_rdbms.mk dnfs_on
rm -f /u01/app/oracle/product/11.2.0.4/db/lib/libodm11.so; cp /u01/app/oracle/product/11.2.0.4/db/lib/libnfsodm11.so /u01/app/oracle/product/11.2.0.4/db/lib/libodm11.so
[production|oracle@vdb0005.mlt.unibet.com lib]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Fri May 20 13:01:56 2016

Copyright (c) 1982, 2013, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount
ORACLE instance started.

Total System Global Area 9620525056 bytes
Fixed Size                  2261368 bytes
Variable Size            2449477256 bytes
Database Buffers         7147094016 bytes
Redo Buffers               21692416 bytes
SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[production|oracle@vdb0005.mlt.unibet.com lib]$ rman target /

Recovery Manager: Release 11.2.0.4.0 - Production on Fri May 20 13:02:14 2016

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: OEM (not mounted)

RMAN> restore controlfile from '/nfs/install/oemdb/cf2.f';

Starting restore at 20-MAY-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=474 device type=DISK

channel ORA_DISK_1: copied control file copy
output file name=+DATA/oem/controlfile/current.257.912344539
Finished restore at 20-MAY-16

The NFS share was mounted using options:
type nfs (rw,bg,soft,rsize=32768,wsize=32768,tcp,nfsvers=3,timeo=600,addr=10.10.10.10)

I’ll be presenting my brand new presentation “Using image copies for Oracle database backups” at ilOUG Tech Days on 30. May in Israel.

More information about the event can be found here

Abstract of my presentation:
When databases get ever larger and larger, backing them up using traditional RMAN backupsets will quickly get unfeasible. Completing a backup requires too much time and resources, but more importantly the same also applies to restores. RMAN has always provided a solution as incrementally updated image copies, but they are much less manageable than backupsets. This presentation goes into detail on how to successfully implement incrementally updated image copy backups, automate them and implement features that together with a capable storage system can provide almost everything that Oracle ZDLRA promises and beyond.

Looking forward to the event!

Since 11.1 RMAN has had a silent new feature – RMAN Backup Undo Optimization. This feature will exclude undo from committed transactions (after undo_retention time has also passed) from backups, possibly making the undo tablespace backup much smaller. The documentation just says that it will work for disk backups and Oracle Secure Backup tape backups. Since lately I’m been playing around a lot with image copy backups I wanted to find out if this feature only works with backupsets or does it also work for incrementally refreshed image copies.

I first thought that it cannot possibly work with image copies, since image copies should be exact datafile copies, but on the other hand when you refresh and image copy, then you at first also have to create incremental backupset of the changes that you then apply to the image copy, so maybe the optimization is applied silently there also 🙂 Would be really good. Better to test it out. Fingers crossed.

I’m using 12.1.0.2 on OEL 7.2.

Before taking the test I created an image copy from my undo tablespace (309 338 112 bytes):

RMAN> BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG 'image_copy_backup' TABLESPACE UNDOTBS1;
-rw-r-----+ 1 oracle oinstall 309338112 Dec 28 05:06 data_D-ORCL_I-1433672784_TS-UNDOTBS1_FNO-3_04qvtmir

Yes I know, my filesystem dates were wrong at that point 🙂 Ignore this, NTP wasn’t running on the storage box.

Also a level 0 uncompressed backupset of the same tablespace (207 110 144 bytes, so it has already been optimized, but I’m interested in the next incremental backup size):

RMAN> BACKUP INCREMENTAL LEVEL 0 TABLESPACE UNDOTBS1;
-rw-r-----+ 1 oracle oinstall 207110144 Dec 28 05:16 0kqvtpaj_1_1

Next I ran a large UPDATE statement and committed it immediately. I also had snapper running to catch the amount of undo my update caused. Snapper reported that my update generated 146MB of undo:

STAT, undo change vector size                                   ,     146 042 740

Now immediately I run incremental backup for both, backupset and to incrementally update the image copy.
BACKUP INCREMENTAL LEVEL 1 TABLESPACE UNDOTBS1 command produced file named 0mqvtpkf_1_1 and command BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG ‘image_copy_backup’ tablespace undotbs1 produced file named 0oqvtpm2_1_1. As you can see, both are almost equally as big and close to the reported undo change vector size.
No surprise herem undo optimization did not kick in since undo_retention time has not yet passed.

-rw-r-----+ 1 oracle oinstall 151470080 Dec 28 05:21 0mqvtpkf_1_1
-rw-r-----+ 1 oracle oinstall 181190656 Dec 28 05:22 0oqvtpm2_1_1

Then I deleted both these files and removed them from RMAN catalog.

After 30 minutes or so (my undo_retention time is 600 = 10 minutes) I ran the backup commands again:

RMAN> backup incremental level 1 tablespace undotbs1;

Starting backup at 07-MAR-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=45 device type=DISK
channel ORA_DISK_1: starting incremental level 1 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00003 name=/u01/app/oracle/oradata/ORCL/datafile/o1_mf_undotbs1_cfvpb5hx_.dbf
channel ORA_DISK_1: starting piece 1 at 07-MAR-16
channel ORA_DISK_1: finished piece 1 at 07-MAR-16
piece handle=/nfs/backup/orcl/14qvtsgf_1_1 tag=TAG20160307T230238 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 07-MAR-16

RMAN> BACKUP INCREMENTAL LEVEL 1 FOR RECOVER OF COPY WITH TAG 'image_copy_backup' tablespace undotbs1;

Starting backup at 07-MAR-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=61 device type=DISK
channel ORA_DISK_1: starting incremental level 1 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00003 name=/u01/app/oracle/oradata/ORCL/datafile/o1_mf_undotbs1_cfvpb5hx_.dbf
channel ORA_DISK_1: starting piece 1 at 07-MAR-16
channel ORA_DISK_1: finished piece 1 at 07-MAR-16
piece handle=/nfs/backup/orcl/16qvtsj0_1_1 tag=IMAGE_COPY_BACKUP comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07
Finished backup at 07-MAR-16

This can’t be good.. regular backupset took only 1 second to execute and taking an incremental backup for image copy refresh took 7 seconds.
Looking at the file sizes the difference is clear – 1,7MB for the incremental backup and 181MB (no change) for the image copy refresh:

-rw-r-----+ 1 oracle oinstall   1794048 Mar  7 23:02 14qvtsgf_1_1
-rw-r-----+ 1 oracle oinstall 181567488 Mar  7 23:04 16qvtsj0_1_1

So the backup undo optimization works, but only if you use backupsets.

I published my 2014 presentation “Making MySQL highly available using Oracle Grid Infrastructure” in Slideshare.
Please also read my page how to set up the mysql scripts for Oracle GI

I’m currently involved in a project where we are replacing one company’s entire hardware platform. They also have Oracle database 10.2.0.5 (that we cannot upgrade right now) and what is really unusual for me, is that this Oracle database runs under Windows (and we cannot migrate to another platform). We also decided to use Oracle Grid Infrastructure (aka Oracle Clusterware) 11.2.0.4 to implement active-passive standby server for this database. Why? Because Windows Cluster was not an option, we didn’t have RAC licenses, Oracle Clusterware is free (if you are protecting Oracle software or running on Oracle OS) and we have really good previous experience with it under Linux.
For more information on how I’ve used Oracle Grid Infrastructure to provide high availability for MySQL (or any other application), check out my page MySQL HA with Oracle Clusterware.

When I started testing this it was quite surprising that I didn’t find any Oracle Clusterware action script examples for Windows in Oracle documentation or even on google 🙂 Oracle documentation just refers that in Windows the action script has to be a batch script.

This is the example action script I came up with to manage stand alone Oracle 10.2.0.5 database. It uses oradim to start and stop the database instance dbgp and a small sqlplus script to check if the database instance is alive. Tested using Oracle Grid Infrastructure 11.2.0.4 under Windows 2008R2.
I named the action script: d:\scripts\dbgp.cmd

@echo off
@setlocal enableextensions enabledelayedexpansion
set action=%~1
set ORACLE_HOME=D:\app\oracle\product\10.2.0.5\db
set ORACLE_SID=dbgp
set TEMPFILE=d:\scripts\script.out
set CHECKSCRIPT=d:\scripts\check.sql
set OUTP=
set EXITCODE=0

if exist %TEMPFILE% (
  del %TEMPFILE%
)

if "%action%" == "start" goto :start
if "%action%" == "stop" goto :stop
if "%action%" == "check" goto :check
if "%action%" == "clean" goto :clean
goto :exit

:start
  %ORACLE_HOME%\bin\oradim -startup -sid %ORACLE_SID% -starttype srvc,inst > %TEMPFILE%
  call :setsize "%TEMPFILE%"
  if %size% gtr 0 call :checkoutput
  if exist %TEMPFILE% type %TEMPFILE%
  goto :exit

:stop
  %ORACLE_HOME%\bin\oradim -shutdown -sid %ORACLE_SID% -shuttype srvc,inst -shutmode immediate > %TEMPFILE%
  call :setsize "%TEMPFILE%"
  if %size% gtr 0 call :checkoutput
  if exist %TEMPFILE% type %TEMPFILE%
  goto :exit

:check
  %ORACLE_HOME%\bin\sqlplus /nolog @%CHECKSCRIPT% > %TEMPFILE%
  if %ERRORLEVEL% GTR 0 set EXITCODE=1
  goto :exit

:clean
  %ORACLE_HOME%\bin\oradim -shutdown -sid %ORACLE_SID% -shuttype srvc,inst -shutmode abort
  goto :exit

:setsize
  set size=%~z1
  goto :eof

:checkoutput
  set /p OUTP=<%TEMPFILE%
  if not "x%OUTP:DIM-=%" == "x%OUTP%" set EXITCODE=1
  if not "x%OUTP:ORA-=%" == "x%OUTP%" set EXITCODE=1
  goto :eof

:exit
  set OUTP=
  if exist sqlnet.log del sqlnet.log
  exit /b %EXITCODE%

Just for completeness, this script refers to d:\scripts\check.sql that is just used to run a quick database healt check, here is its contents:

whenever sqlerror exit failure
conn / as sysdba
select 1 from dual;
exit

WINDOWS SPECIFIC ONE TIME OPERATION: Before clusterware can execute the action script in Windows, you need to create OracleCRSToken_username service for the OS user who is executing the script. In my setup both Oracle Clusterware, managed database and the action script are executed by the local user WINRAC1\oracle and on the second node as WINRAC2\oracle. It is actually easier if you use domain user, please check the referred note.
Reference: Windows: How to Modify OS User Privileges for 11gR2 Grid Infrastructure and RAC Services (Needed for Backup To Network Shares) (Doc ID 1339053.1) steps 2 and 3.

set ORACLE_HOME=d:\app\11.2.0.4\grid
%ORACLE_HOME%\bin\crsuser add winrac1\oracle

.. it will show errors ..
.. but repeat the command on other node also ..

set ORACLE_HOME=d:\app\11.2.0.4\grid
%ORACLE_HOME%\bin\crsuser add winrac2\oracle

After that need to open services.msc and edit service OracleCRSToken_oracle. First set its startup type to Automatic and then on Log On As tab also set the oracle user password. After that start service OracleCRSToken_oracle and repeat these steps on all cluster nodes.

ADDING THE RESOURCE TO CLUSTERWARE: Adding the resource to cluster is the same as under Linux:

set ORACLE_HOME=d:\app\11.2.0.4\grid
%ORACLE_HOME%\bin\crsctl add resource oradb_dbgp -type cluster_resource -attr "ACTION_SCRIPT=d:\scripts\dbgp.cmd, CHECK_INTERVAL=60, RESTART_ATTEMPTS=2, PLACEMENT=favored, HOSTING_MEMBERS=winrac1"