Ilmar Kerm

Oracle, databases, Linux and maybe more

I published my 2014 presentation “Making MySQL highly available using Oracle Grid Infrastructure” in Slideshare.
Please also read my page how to set up the mysql scripts for Oracle GI

I’m currently involved in a project where we are replacing one company’s entire hardware platform. They also have Oracle database 10.2.0.5 (that we cannot upgrade right now) and what is really unusual for me, is that this Oracle database runs under Windows (and we cannot migrate to another platform). We also decided to use Oracle Grid Infrastructure (aka Oracle Clusterware) 11.2.0.4 to implement active-passive standby server for this database. Why? Because Windows Cluster was not an option, we didn’t have RAC licenses, Oracle Clusterware is free (if you are protecting Oracle software or running on Oracle OS) and we have really good previous experience with it under Linux.
For more information on how I’ve used Oracle Grid Infrastructure to provide high availability for MySQL (or any other application), check out my page MySQL HA with Oracle Clusterware.

When I started testing this it was quite surprising that I didn’t find any Oracle Clusterware action script examples for Windows in Oracle documentation or even on google ๐Ÿ™‚ Oracle documentation just refers that in Windows the action script has to be a batch script.

This is the example action script I came up with to manage stand alone Oracle 10.2.0.5 database. It uses oradim to start and stop the database instance dbgp and a small sqlplus script to check if the database instance is alive. Tested using Oracle Grid Infrastructure 11.2.0.4 under Windows 2008R2.
I named the action script: d:\scripts\dbgp.cmd

Just for completeness, this script refers to d:\scripts\check.sql that is just used to run a quick database healt check, here is its contents:

whenever sqlerror exit failure
conn / as sysdba
select 1 from dual;
exit

WINDOWS SPECIFIC ONE TIME OPERATION: Before clusterware can execute the action script in Windows, you need to create OracleCRSToken_username service for the OS user who is executing the script. In my setup both Oracle Clusterware, managed database and the action script are executed by the local user WINRAC1\oracle and on the second node as WINRAC2\oracle. It is actually easier if you use domain user, please check the referred note.
Reference: Windows: How to Modify OS User Privileges for 11gR2 Grid Infrastructure and RAC Services (Needed for Backup To Network Shares) (Doc ID 1339053.1) steps 2 and 3.

set ORACLE_HOME=d:\app\11.2.0.4\grid
%ORACLE_HOME%\bin\crsuser add winrac1\oracle

.. it will show errors ..
.. but repeat the command on other node also ..

set ORACLE_HOME=d:\app\11.2.0.4\grid
%ORACLE_HOME%\bin\crsuser add winrac2\oracle

After that need to open services.msc and edit service OracleCRSToken_oracle. First set its startup type to Automatic and then on Log On As tab also set the oracle user password. After that start service OracleCRSToken_oracle and repeat these steps on all cluster nodes.

ADDING THE RESOURCE TO CLUSTERWARE: Adding the resource to cluster is the same as under Linux:

set ORACLE_HOME=d:\app\11.2.0.4\grid
%ORACLE_HOME%\bin\crsctl add resource oradb_dbgp -type cluster_resource -attr "ACTION_SCRIPT=d:\scripts\dbgp.cmd, CHECK_INTERVAL=60, RESTART_ATTEMPTS=2, PLACEMENT=favored, HOSTING_MEMBERS=winrac1"

Oracle has released it ZFS Storage Appliance software simulator. It is sold as an Tier 2 storage hardware, but using the simulator you can get all its powerful storage management features on your Virtualbox. The simulator is free and it has no time limits nor restrictions (although one restriction – the simulator is not clustered).

Some use cases for the simulator:
* You can try out the new ZFS Storage Applicance software patches before applying them on the real physical storage box
* You want to test some Oracle database features that are only activated when using Oracle storage (HCC)
* You want to provide shared network storage fro your VM-s, with advanced storage capabilities, like snapshots, cloning, compression, deduplication, remote replication, encryption etc
* You want to evaluate ZFS Storage Appliance features before purchasing the real box or before using a specific feature in production

You can download it here

Setting up the simulator under virtualbox is very simple and quick. On the configuration below I make no effort to secure the system, since it is intended to be used only in my Virtualbox environment.

Requirements

Virtualbox 4.2.12 or later.
I’m using Virtualbox 5.

Download the software

Go to the ZFS Storage software page and click Try the simulator. First you have to register the download and then you can download ZIP file containing the software.
Unzip the file you downloaded, then you’ll get OracleZFSStorageVM directory with 18 files under it.

Import VM to Virtualbox

Open Virtualbox Manager, go to File > Import Appliance.
Browse to Oracle_ZFS_Storage.ovf from the unzipped software directory.

Click Continue.
On the next screen you can view the imported VM settings and then press Import.

Initial configuration

After importing you will have a new VM called Oracle_ZFS_Storage.
Open Settings for this VM and go to Network tab. Verify that it is connected to the correct virtual network. Mine is automatically imported under the only Host-only Adapter network I have and I’m going to keep that setting.

Now launch the VM.

It will take a 10+ seconds to boot up and then on the first boot it will ask you a few questions on basic network settings and root user password.
Supply the requested values:

Press Enter when done. After that it will show you a blank screen for 30+ seconds and after it is done the system is ready.

Now open the URL requested https://192.168.56.98:215/ in your browser to finish the initial setup.

Log in as root supplying the password you set previously.
Welcome screen appears. Click Start to go through a small initial setup wizard. Everything can also be changed later.

First screen, networking, just press Commit.

Second screen, DNS, just press Commit.
Next, NTP, just press Commit.
Next, Name Services… here you need to set up LD… just kidding, press Commit.

Now, Configure storage, here you can set up the storage pool.

Press the plus sign before Available pools and supply a pool name – pool1 in my example.

On Verify and allocate devices screen you can just press Commit
The next, Choose storage profile is the most interesting. In production it requires a lot of consideration, because every choice has very different availability, read performance and write performance implications. Here, Iรคll just choose Striped to get the maximum performance and no loss in available storage size. Obviously this would be a very bad choice for production system due to no availability or fault tolerance at all.

Press Commit. You are now back in Configure storage screen, here again press Commit.
On the final Registration & Support page there is no Commit button, but there is a button called Later, press it and then confirm it by pressing OK.

All done ๐Ÿ™‚

Set up iSCSI

If you want to share iSCSI block devices, then first need to create iSCSI Target.
Go to Configuration > SAN and click on ISCSI.

Click the plus sign before Targets.

If you don’t care how the IQN looks, just provide some name for Alias and press OK.

It will then automatically generate IQN for you, so in my example it is named: iqn.1986-03.com.sun:02:455fe302-6504-6eaf-d478-9b3acf9f4afe

Set up SNMP

By default SNMP service is not enabled, but if you want to test Hybrid Columnar Compression it is required. You can get more details about it from this blog post.
Go to Configuration > Services, and click on the aroow button to expand the Services menu.

Click on SNMP from the left-side menu.
First click on the power button symbol under SNMP to enable the service and set the following values:
* Authorized network/mask: 0.0.0.0 / 0
* Appliance contact: your email address
* Trap destination: 127.0.0.1

Click Apply.

REST API

All storage configuration can also be done using REST API. You can create new LUNs, filesystems, change thir properties, snapshot, clone, drop… everything that you can do in GUI you can also do over REST API.
I use it a lot in production in database backup&restore scripts and also in providing production database clones for testing.
I have also blogged about it before: SAMPLE CODE: USING THE ORACLE ZFS STORAGE APPLIANCE REST API FROM PYTHON
I really hope I can soon publish my full Oracle database backup&restore script suite that also relies on ZFSSA sotrage features heavily.

ZFSSA REST API documentation is here

REST API is turned on by default and it is accessible over the same URL as the management GUI: https://192.168.56.98:215/

Create NFS filesystem

To create a new NFS filesystem go to Shares

Click the plus sign next to Filesystems.

Write filesystem name and you can also set the share UID, GID and permissions. For example if you are using it for oracle database and have installed the oracle-rdbms-preinstall rpm package under linux, then you could set User and Group values to 54321 to get the correct permissions on mount.
Click Apply.

The newly created filesystem appears on the list.

You can now mount the NFS filesystem on a target machine, in my example using the settings:

mount -t nfs 192.168.56.98:/export/oradata /mnt

If you hover over the filesystem entry, you notice a pencil icon on the left. If you click it you can change filesystem properties, restrictions and snapshots.

If the filesystem is going to be used for Oracle database data files (not for RMAN, UNDO, REDO, TEMP) , then one thing that you may want to change is Database record size to be equal to the tablespace block size 8K.
Click Apply.

Create iSCSI LUN

Go to Shares and click on LUNs. This way you can present block devices to servers, to be used for example as ASM disks.

Click plus sign next to LUNs.

Fill out the properties, again, if the LUN is to be used for Oracle tablespace data files, then you may want to set the Volume block size to tablespace block size (8K).
Click Apply.
The newly create LUN appears on the list.

Again, the pencil icon opens the LUN detailed properties page.

NB! LUNs can also be compressed and deduplicated!

Let’s try accessing this LUN from Linux. First connect to ZFS:

[root@localhost ~]# iscsiadm -m discovery -t sendtargets -p 192.168.56.98
192.168.56.98:3260,2 iqn.1986-03.com.sun:02:455fe302-6504-6eaf-d478-9b3acf9f4afe
[root@localhost ~]# iscsiadm -m node --login
Logging in to [iface: default, target: iqn.1986-03.com.sun:02:455fe302-6504-6eaf-d478-9b3acf9f4afe, portal: 192.168.56.98,3260] (multiple)
Login to [iface: default, target: iqn.1986-03.com.sun:02:455fe302-6504-6eaf-d478-9b3acf9f4afe, portal: 192.168.56.98,3260] successful.

Now I should have the new LUN mapped to linux:

[root@localhost proc]# cat /proc/scsi/scsi
Attached devices:
Host: scsi1 Channel: 00 Id: 00 Lun: 00
  Vendor: VBOX     Model: CD-ROM           Rev: 1.0
  Type:   CD-ROM                           ANSI  SCSI revision: 05
Host: scsi2 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: VBOX HARDDISK    Rev: 1.0
  Type:   Direct-Access                    ANSI  SCSI revision: 05
Host: scsi3 Channel: 00 Id: 00 Lun: 00
  Vendor: SUN      Model: Sun Storage 7000 Rev: 1.0
  Type:   Direct-Access                    ANSI  SCSI revision: 05

[root@localhost proc]# cat /proc/partitions
major minor  #blocks  name

  11        0      57620 sr0
   8        0   12582912 sda
   8        1     204800 sda1
   8        2   12377088 sda2
 252        0   11325440 dm-0
 252        1    1048576 dm-1
   8       16   10485760 sdb    

My new device is /dev/sdb. Lets confirm its SCSI ID.

[root@localhost proc]# /usr/lib/udev/scsi_id -g -u /dev/sdb
3600144f09ff1616800005662f2f40001

Matches perfectly with the ID in ZFS management interface GUID column (ignore the first digit).

For our backup strategy we are using incrementally updated image copies on most Oracle databases. This method can save a lot of time during restore operations, since you don’t need to restore full backup and then apply all the incremental backups, you can either restore the full backup directly (or skip restore operation and switch over to the image copy directly). At the same time, taking backups is as easy and fast as taking incremental backups (in Enterprise Edition block change tracking also helps here).

Today I wanted to change the naming scheme of the incrementally updated image copies, in our case the image copies are stored in NFS, not ASM. I expected it to be straign forward… Rename the files, crosscheck, delete expired and then catalog again (like with normal backupsets). After doing that I tried to update the incremental copy and this is what happened.

First this is my current setup, and my goal was to remove the double dbarep1_ from the beginning of the file name I added there myself wth the backup format string.

SQL> select file#, tag, incremental_level, name from v$datafile_copy where deleted='NO' order by 1;

     FILE# TAG                  INCREMENTAL_LEVEL NAME
---------- -------------------- ----------------- -----------------------------------------------------------------------------------
         1 IMAGE_COPY_BACKUP                    0 /nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-SYSTEM_FNO-1_r8qjk7v2
         2 IMAGE_COPY_BACKUP                    0 /nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-SYSAUX_FNO-2_r6qjk7uc
         3 IMAGE_COPY_BACKUP                    0 /nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-UNDOTBS1_FNO-3_r7qjk7ur
         4 IMAGE_COPY_BACKUP                    0 /nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-UNDOTBS2_FNO-4_r9qjk7va
         5 IMAGE_COPY_BACKUP                    0 /nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-USERS_FNO-5_raqjk7vd
         7 IMAGE_COPY_BACKUP                    0 /nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-SASH_FNO-7_r5qjk7sb

6 rows selected.

Now rename the files to:

data_D-DBAREP1_I-1714430310_TS-SASH_FNO-7_r5qjk7sb
data_D-DBAREP1_I-1714430310_TS-SYSAUX_FNO-2_r6qjk7uc
data_D-DBAREP1_I-1714430310_TS-SYSTEM_FNO-1_r8qjk7v2
data_D-DBAREP1_I-1714430310_TS-UNDOTBS1_FNO-3_r7qjk7ur
data_D-DBAREP1_I-1714430310_TS-UNDOTBS2_FNO-4_r9qjk7va
data_D-DBAREP1_I-1714430310_TS-USERS_FNO-5_raqjk7vd

After renaming I tried to catalog the files again and all looks good at first.

RMAN> crosscheck datafilecopy all;

allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=216 instance=dbarep11 device type=DISK
validation failed for datafile copy
datafile copy file name=/nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-SYSTEM_FNO-1_r8qjk7v2 RECID=4001 STAMP=897265906
validation failed for datafile copy
datafile copy file name=/nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-SYSAUX_FNO-2_r6qjk7uc RECID=4003 STAMP=897265908
validation failed for datafile copy
datafile copy file name=/nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-UNDOTBS1_FNO-3_r7qjk7ur RECID=4002 STAMP=897265906
validation failed for datafile copy
datafile copy file name=/nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-UNDOTBS2_FNO-4_r9qjk7va RECID=4000 STAMP=897265905
validation failed for datafile copy
datafile copy file name=/nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-USERS_FNO-5_raqjk7vd RECID=3999 STAMP=897265901
validation failed for datafile copy
datafile copy file name=/nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-SASH_FNO-7_r5qjk7sb RECID=4004 STAMP=897265910
Crosschecked 6 objects


RMAN> delete expired datafilecopy all;

released channel: ORA_DISK_1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=216 instance=dbarep11 device type=DISK
List of Datafile Copies
=======================

Key     File S Completion Time Ckp SCN    Ckp Time
------- ---- - --------------- ---------- ---------------
990260  1    X 01-DEC-15       6146800878535 01-DEC-15
        Name: /nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-SYSTEM_FNO-1_r8qjk7v2
        Tag: IMAGE_COPY_BACKUP

990262  2    X 01-DEC-15       6146800878535 01-DEC-15
        Name: /nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-SYSAUX_FNO-2_r6qjk7uc
        Tag: IMAGE_COPY_BACKUP

990261  3    X 01-DEC-15       6146800878535 01-DEC-15
        Name: /nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-UNDOTBS1_FNO-3_r7qjk7ur
        Tag: IMAGE_COPY_BACKUP

990259  4    X 01-DEC-15       6146800878535 01-DEC-15
        Name: /nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-UNDOTBS2_FNO-4_r9qjk7va
        Tag: IMAGE_COPY_BACKUP

990258  5    X 01-DEC-15       6146800878535 01-DEC-15
        Name: /nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-USERS_FNO-5_raqjk7vd
        Tag: IMAGE_COPY_BACKUP

990263  7    X 01-DEC-15       6146800878535 01-DEC-15
        Name: /nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-SASH_FNO-7_r5qjk7sb
        Tag: IMAGE_COPY_BACKUP

Do you really want to delete the above objects (enter YES or NO)? yes
deleted datafile copy
datafile copy file name=/nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-SYSTEM_FNO-1_r8qjk7v2 RECID=4001 STAMP=897265906
deleted datafile copy
datafile copy file name=/nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-SYSAUX_FNO-2_r6qjk7uc RECID=4003 STAMP=897265908
deleted datafile copy
datafile copy file name=/nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-UNDOTBS1_FNO-3_r7qjk7ur RECID=4002 STAMP=897265906
deleted datafile copy
datafile copy file name=/nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-UNDOTBS2_FNO-4_r9qjk7va RECID=4000 STAMP=897265905
deleted datafile copy
datafile copy file name=/nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-USERS_FNO-5_raqjk7vd RECID=3999 STAMP=897265901
deleted datafile copy
datafile copy file name=/nfs/backup/dbarep1/dbarep1_data_D-DBAREP1_I-1714430310_TS-SASH_FNO-7_r5qjk7sb RECID=4004 STAMP=897265910
Deleted 6 EXPIRED objects

RMAN> catalog start with '/nfs/backup/dbarep1/data_';

searching for all files that match the pattern /nfs/backup/dbarep1/data_

List of Files Unknown to the Database
=====================================
File Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SYSAUX_FNO-2_r6qjk7uc
File Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-USERS_FNO-5_raqjk7vd
File Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-UNDOTBS1_FNO-3_r7qjk7ur
File Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-UNDOTBS2_FNO-4_r9qjk7va
File Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SYSTEM_FNO-1_r8qjk7v2
File Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SASH_FNO-7_r5qjk7sb

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SYSAUX_FNO-2_r6qjk7uc
File Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-USERS_FNO-5_raqjk7vd
File Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-UNDOTBS1_FNO-3_r7qjk7ur
File Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-UNDOTBS2_FNO-4_r9qjk7va
File Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SYSTEM_FNO-1_r8qjk7v2
File Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SASH_FNO-7_r5qjk7sb

RMAN shows that the datafile copies are nicely registered, with a correct tag:

RMAN> list datafilecopy all;

List of Datafile Copies
=======================

Key     File S Completion Time Ckp SCN    Ckp Time
------- ---- - --------------- ---------- ---------------
991836  1    A 01-DEC-15       6146800878535 01-DEC-15
        Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SYSTEM_FNO-1_r8qjk7v2
        Tag: IMAGE_COPY_BACKUP

991832  2    A 01-DEC-15       6146800878535 01-DEC-15
        Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SYSAUX_FNO-2_r6qjk7uc
        Tag: IMAGE_COPY_BACKUP

991834  3    A 01-DEC-15       6146800878535 01-DEC-15
        Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-UNDOTBS1_FNO-3_r7qjk7ur
        Tag: IMAGE_COPY_BACKUP

991835  4    A 01-DEC-15       6146800878535 01-DEC-15
        Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-UNDOTBS2_FNO-4_r9qjk7va
        Tag: IMAGE_COPY_BACKUP

991833  5    A 01-DEC-15       6146800878535 01-DEC-15
        Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-USERS_FNO-5_raqjk7vd
        Tag: IMAGE_COPY_BACKUP

991837  7    A 01-DEC-15       6146800878535 01-DEC-15
        Name: /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SASH_FNO-7_r5qjk7sb
        Tag: IMAGE_COPY_BACKUP

And then thinking all is good I tried to refresh that copy, this is what happened:

RMAN> backup incremental level 1 for recover of copy with tag 'image_copy_backup' database;

Starting backup at 01-DEC-15
using channel ORA_DISK_1
no parent backup or copy of datafile 7 found
no parent backup or copy of datafile 2 found
no parent backup or copy of datafile 3 found
no parent backup or copy of datafile 1 found
no parent backup or copy of datafile 4 found
no parent backup or copy of datafile 5 found
channel ORA_DISK_1: starting datafile copy
input datafile file number=00007 name=+DATA/dbarep1/datafile/sash.286.779298095
...

RMAN thinks that there is no image copy to update and tries to create a new full image copy! If you have >20TB databases it is an expensive price to pay ๐Ÿ™‚
Lets query the data dictionary directly to see more information about the datafilecopy that was registered:

SQL> select file#, tag, incremental_level, name from v$datafile_copy where deleted='NO' order by 1;

     FILE# TAG                  INCREMENTAL_LEVEL NAME
---------- -------------------- ----------------- -----------------------------------------------------------------------------------
         1 IMAGE_COPY_BACKUP                      /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SYSTEM_FNO-1_r8qjk7v2
         2 IMAGE_COPY_BACKUP                      /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SYSAUX_FNO-2_r6qjk7uc
         3 IMAGE_COPY_BACKUP                      /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-UNDOTBS1_FNO-3_r7qjk7ur
         4 IMAGE_COPY_BACKUP                      /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-UNDOTBS2_FNO-4_r9qjk7va
         5 IMAGE_COPY_BACKUP                      /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-USERS_FNO-5_raqjk7vd
         7 IMAGE_COPY_BACKUP                      /nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SASH_FNO-7_r5qjk7sb

6 rows selected.

Incremental level is NULL! Catalog start with command did not register the datafiles as a base of incremental backup. CATALOG START WITH was a wrong command to use. To register datafilecopy properly for incremental updates, there is a separate catalog command: CATALOG DATAFILECOPY ‘filename’ LEVEL 0 TAG ‘tagname’;

First I remove the invalid registrations:

change datafilecopy '/nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SYSAUX_FNO-2_r6qjk7uc' uncatalog;
change datafilecopy '/nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-USERS_FNO-5_raqjk7vd' uncatalog;
change datafilecopy '/nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-UNDOTBS1_FNO-3_r7qjk7ur' uncatalog;
change datafilecopy '/nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-UNDOTBS2_FNO-4_r9qjk7va' uncatalog;
change datafilecopy '/nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SYSTEM_FNO-1_r8qjk7v2' uncatalog;
change datafilecopy '/nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SASH_FNO-7_r5qjk7sb' uncatalog;

And then register the datafilecopy properly:

catalog datafilecopy '/nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SASH_FNO-7_r5qjk7sb' level 0 tag 'IMAGE_COPY_BACKUP';
catalog datafilecopy '/nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SYSAUX_FNO-2_r6qjk7uc' level 0 tag 'IMAGE_COPY_BACKUP';
catalog datafilecopy '/nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-SYSTEM_FNO-1_r8qjk7v2' level 0 tag 'IMAGE_COPY_BACKUP';
catalog datafilecopy '/nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-UNDOTBS1_FNO-3_r7qjk7ur' level 0 tag 'IMAGE_COPY_BACKUP';
catalog datafilecopy '/nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-UNDOTBS2_FNO-4_r9qjk7va' level 0 tag 'IMAGE_COPY_BACKUP';
catalog datafilecopy '/nfs/backup/dbarep1/data_D-DBAREP1_I-1714430310_TS-USERS_FNO-5_raqjk7vd' level 0 tag 'IMAGE_COPY_BACKUP';

After that incremental update worked again.

I recently had an interesting challenge to direct data guard network traffic over a dedicated network interface and not go though the main network routes. The reason was that in order to reach the remote datacenter, the dataguard traffic should be sent via WAN optimizer network, that will automatically compress and dedup all traffic (hard to believe, that there are still some places left in Europe, where internet bandwidth is extremely expensive). I know that Oracle since 11.2 can do redo compression for data guard traffic also, but this requires advanced compression option and also uses database host CPU-s to do the compression, WAN optimizer software licenses are cheaper and usable for other tasks as well.

The following tasks need to be completed:

  1. Configure a new network interface in OS. I’m not going to discuss it here, but it is a prerequisite for any tasks to be done in Oracle.
  2. Configure the new network for Oracle listener. If you are using standalone database and haven’t got some special configuration for you listener, then it should already listen on the new network interface also. RAC needs special attention and it is described in this blog post. Basically you need to configure new set of VIP addresses, create a new network resource in clusterware and configure database instance to register on the new network. SCAN address cannot be created on the additional network.
  3. Configure Data Guard to communicate over the newly created network. The necessary steps are described below and I’m assuming that Data Guard Broker is in use.

My database setup is the following: primary database is RAC (prd_db_rac) and it has two physical standbys, one local (prd) and one remote (prd_wan1).

$ ORACLE_SID=prd dgmgrl /
DGMGRL for Linux: Version 11.2.0.4.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected.
DGMGRL> show configuration

Configuration - prd

  Protection Mode: MaxPerformance
  Databases:
    prd_db_rac - Primary database
    prd        - Physical standby database
    prd_wan1   - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

Now lets look at the properties of one of the databases.

DGMGRL> show database verbose prd_wan1;

Database - prd_wan1

  Enterprise Manager Name: prd_wan1.prod.example.com
  Role:                    PHYSICAL STANDBY
  Intended State:          APPLY-ON
  Transport Lag:           0 seconds (computed 1 second ago)
  Apply Lag:               0 seconds (computed 0 seconds ago)
  Apply Rate:              3.08 MByte/s
  Real Time Query:         ON
  Instance(s):
    prd

  Properties:
    DGConnectIdentifier             = '//oc1dg2.wan1.example.com/prd_wan1.prod.example.com'
    ObserverConnectIdentifier       = ''
    LogXptMode                      = 'ASYNC'
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    MaxConnections                  = '1'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    LogShipping                     = 'ON'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyParallel                   = 'AUTO'
    StandbyFileManagement           = 'AUTO'
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '4'
    LogArchiveMinSucceedDest        = '1'
    DbFileNameConvert               = ''
    LogFileNameConvert              = ''
    FastStartFailoverTarget         = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    SendQEntries                    = '(monitor)'
    LogXptStatus                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    ApplyLagThreshold               = '0'
    TransportLagThreshold           = '0'
    TransportDisconnectedThreshold  = '30'
    SidName                         = 'prd'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=oc1dg2.wan1.example.com)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=prd_wan1_DGMGRL.prod.example.com)(INSTANCE_NAME=prd)(SERVER=DEDICATED)))'
    StandbyArchiveLocation          = 'USE_DB_RECOVERY_FILE_DEST'
    AlternateLocation               = ''
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = '%t_%s_%r.dbf'
    TopWaitEvents                   = '(monitor)'

Database Status:
SUCCESS

Each database in the broker configuration has a property telling how other databases in the same broker configuration can connect and send redo to it. This parameter is the first on the list – DGConnectIdentifier. All we need to do is change this property to point to an address in the new network. For single instance databases this is easy task, we can use the easy connect syntax, so for my case need to change the physical standby connection string as follows (sb. prefix in host names are located in the new wan optimized network).

DGMGRL> EDIT DATABASE prd SET PROPERTY 'DGConnectIdentifier'='//sb.oc1dg1.prod.example.com/prd.prod.example.com';
Property "DGConnectIdentifier" updated.

DGMGRL> EDIT DATABASE prd_wan1 SET PROPERTY 'DGConnectIdentifier'='//sb.oc1dg2.wan1.example.com/prd_wan1.prod.example.com';
Property "DGConnectIdentifier" updated.

The same property needs to be changed for primary database also, but short SCAN address cannot be created for the secondary network and in broker the property value string cannot be long enough to hold the full RAC connection string for my configuration. So I needed to add the following entry to tnsnames.ora for all Oracle Homes involved in that configuration (both primary and all standbys). Here again, sb part of the DNS name refers to IP on the new network.

PRD_DB_RAC_SB =
  (DESCRIPTION =
    (ADDRESS_LIST=
      (ADDRESS= (PROTOCOL = TCP)(HOST = oc1n01-sb-vip.prod.example.com)(PORT = 1521))
      (ADDRESS= (PROTOCOL = TCP)(HOST = oc1n02-sb-vip.prod.example.com)(PORT = 1521))
      (ADDRESS= (PROTOCOL = TCP)(HOST = oc1n03-sb-vip.prod.example.com)(PORT = 1521))
      (ADDRESS= (PROTOCOL = TCP)(HOST = oc1n04-sb-vip.prod.example.com)(PORT = 1521))
      (FAILOVER = on)
    ) (CONNECT_DATA=
     (SERVICE_NAME = prd_db_rac.prod.example.com)
  ))

Now I can make the same Broker property change for primary RAC database also.

DGMGRL> EDIT DATABASE prd_db_rac SET PROPERTY 'DGConnectIdentifier'='PRD_DB_RAC_SB';
Property "DGConnectIdentifier" updated.

That was all, no need to restart anything, after property change Broker changes automatically the instance parameters itself. We did this change in order to replace the old SAN based replication to remote datacenter with Data Guard and to save on bandwidth we needed to run the Data Guard traffic through separate WAN optimized network. On the average we see that the WAN optimizer compresses Data Guard traffic about 3-4 times and compared to the old SAN replication setup (SAN did compress the change log also internally before sending it to the remote site) we use about 2 times less bandwidth (or actually more, since we were previously limiting SAN replication bandwidth, during the day it always ran on the limit and caught up during the night). Considering how much internet bandwidth costs on some places, the extra Data Guard license needed will pay for itself in surprisingly short time in reduced Internet bandwidth alone (not calculating the extra features it also brings, like Active Data Guard).