Ilmar Kerm

Oracle, databases, Linux and maybe more

All posts in this series

Data Guard setup

My end goal is to find out how fast database clients (applications) can resume work (reading and writing) after Data Guard role change operation – either it happens as a planned or unplanned operation. Testing different options to find the shortest downtime – results apply for my environment only, your results will vary.

I’m doing it just to increase my knowledge about different Data Guard scenarios.

My database setup

I’m ONLY interested in the role change operation between instances failtesti1 and failtesti2. Whitch are in the same DC, different availability zones (ping 0.165 ms avg). They equal in every way and are intended to service application read+write traffic as primary databases. These two instances are in SYNC mode.

During the test I will also throw in a remote standby database failtesti3 that is located a few thousand km away (53ms ping). This remote standby is in ASYNC or ARCH mode. failtesti3 is only a BYSTANDER during all tests.

DGMGRL> show configuration verbose;

Configuration - failtest

  Protection Mode: MaxAvailability
  Members:
  failtesti1 - Primary database
    failtesti2 - (*) Physical standby database 
    failtesti3 - Physical standby database 

  (*) Fast-Start Failover target

  Properties:
    FastStartFailoverThreshold      = '25'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '1800'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'FAILOVER'
    ConfigurationWideServiceName    = 'QKNPUXMZ_CFG'

Fast-Start Failover: Enabled in Zero Data Loss Mode
  Lag Limit:          30 seconds (not in use)
  Threshold:          25 seconds
  Active Target:      failtesti2
  Potential Targets:  "failtesti2"
    failtesti2 valid
  Observer:           failtest-observer
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: 1800 seconds
  Observer Override:  FALSE

Configuration Status:
SUCCESS

Services

Both CDB and PDB have a simple service and on database open trigger created, to start the service when database opens.

exec dbms_service.create_service('dbauser_failtest','dbauser_failtest');
exec dbms_service.start_service('dbauser_failtest');

CREATE OR REPLACE TRIGGER sys.cdb_start_services AFTER STARTUP OR DB_ROLE_CHANGE ON DATABASE
DECLARE
    v_srv_name varchar2(30):= 'dbauser_failtest';
    cnt NUMBER;
BEGIN
    SELECT count(*) INTO cnt FROM v$active_services WHERE name = v_srv_name;
    IF cnt = 0 THEN
        dbms_service.start_service(v_srv_name);
    END IF;
END;
/

For PDB application use

alter session set container=soefail;

-- Lets create a simple service
exec dbms_service.create_service('soe','soe');

-- Now lets create one service with Application Continuity turned on
DECLARE
    new_service_params dbms_service.svc_parameter_array;
    p_srv_name varchar2(20):= 'soe_ac';
BEGIN
    new_service_params('FAILOVER_TYPE'):= 'AUTO';
    new_service_params('REPLAY_INITIATION_TIMEOUT'):= 1800;
    new_service_params('RETENTION_TIMEOUT'):= 86400;
    new_service_params('FAILOVER_DELAY'):= 5;
    new_service_params('FAILOVER_RETRIES'):= 10;
    new_service_params('FAILOVER_RESTORE'):= 'AUTO';
    new_service_params('commit_outcome'):= 'true';
    new_service_params('aq_ha_notifications'):= 'true';
    dbms_service.create_service(p_srv_name, p_srv_name);
    dbms_service.modify_service(p_srv_name, new_service_params);
    dbms_service.start_service(p_srv_name);
END;
/

CREATE OR REPLACE TRIGGER sys.pdb_start_services AFTER STARTUP ON DATABASE 
BEGIN
    dbms_service.start_service('soe');
    dbms_service.start_service('soe_ac');
    dbms_service.start_service('dbauser_soefail');
END;
/

Load swingbench SOE schema

swingbench/bin/oewizard -cl -cs tcp://failtest-1/dbauser_soefail.prod1.dbs -dba dbauser -dbap dbauser -u soe -p soe -scale 1 -create -tc 2 -ts soe

Command to run swingbench test

bin/charbench -cs "(description=(failover=on)(connect_timeout=2)(transport_connect_timeout=1 sec)(address_list=(address=(protocol=tcp)(host=failtest-1)(port=1521))(address=(protocol=tcp)(host=failtest-2)(port=1521)))(connect_data=(service_name=soe.prod1.dbs)))" \
-u soe -p soe -uc 4 -c SOE_Server_Side_V2.xml

During all test I have swingbench running at around 1600 TPS – which is enough for my small setup to keep the CPUs 40% idle. I want databases under load, but still enough free CPU time for it not to affect the test.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          52.11    0.05    5.96    7.39    0.00   34.49

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdc             33.00 1506.40      1.68      9.85    66.00     0.00  66.67   0.00    0.95    1.48   2.49    52.07     6.70   0.40  61.38
vdd             38.80 1487.80      1.84      8.83    72.00     0.60  64.98   0.04    0.84    1.42   2.37    48.66     6.08   0.40  61.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          50.33    0.05    4.84    9.33    0.00   35.44

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
vdc             29.80 1388.80      1.41      8.91    23.60     1.20  44.19   0.09    0.84    2.05   3.11    48.32     6.57   0.42  59.42
vdd             34.20 1417.40      1.37      8.55    21.00     0.20  38.04   0.01    0.72    1.82   2.86    41.12     6.18   0.42  60.88

The client

I wrote a simple client in Python that will connect two threads to the database – one that only reads and one that only writes, in separate threads. And the program will record all errors the client experiences during the test and in case connection becomes unusable – reconnect to the database. Results are recorded in CSV file to be analysed later.

Side note: The client is indeed using the new python-oracledb module, but since it is currently not complete, specially in the HA area – I’m using it in thick mode with Instantclient 21.9.

Access the test client code here (github.com)

Examples queries for data analytics are also part of the github repository.

The plan

The next post will be about switchover… then about failover. Maybe more, who knows.

I noticed that the latest Oracle Linux 8 U7 Cloud Image from https://yum.oracle.com/oracle-linux-templates.html comes with UEKr7 kernel by default (5.15.0). But this is not yet supported by Oracle Database 19c and it is also missing for example ceph filesystem kernel module.

So here are instructions how to downgrade kernel on OEL8.

First enable UEKr6 tum repository and disable UEKr7 respository. I’m not going to write these instructions, since the copy I have uses our own internal repositories and not public ones.

After that execute:

update file /etc/sysconfig/kernel and set
DEFAULTKERNEL=kernel-uek

# Install old UEKr6 kernel
dnf install kernel-uek
# Remove the new UEKr7 kernel
rpm -e kernel-uek-core
# Regenerate GRUB config
grub2-mkconfig -o /boot/grub2/grub.cfg

If you use Oracle database on NFS, make sure you use Oracle DirectNFS, which is a built in NFS client in Oracle database kernel. This makes it possible for the database software to access files on NFS share without calling the operating system kernel. The performance improvement is significant!

I use NFS a lot for Oracle databases, because it is so much easier to manage than any block device based system, specially when using Oracle RAC. Or when you need to clone the database.

And then one day I noticed that some Data Guard standby databases, rather large and busy ones, started to lag – more and more. Confusion started – why – same standby on another datacenter on same type of storage is completely fine. It was fine on this system also up to last week. What has changed? The long confusion part is not important… Until realised, that on the system that was working fine, Linux OS command nfsiostat (displaying Kernel level NFS statistics) – it showed hardly any traffic at all. But on the problematic system, nfsiostat displayed a very busy NFS traffic. The aha moment – DNFS! Linux kernel should not be able to see DNFS traffic.

When doing major OS upgrades, after you should also relink your Oracle database binaries:

Relinking Oracle Home FAQ ( Frequently Asked Questions) (Doc ID 1467060.1)

The note above even states that you should relink after every OS patch… So I just incuded relinking to the playbook that replaces database VM OS root disk. But since DNFS (probably other similar features too, like Unified Auditing??) needs to be linked into Oracle kernel separately – executing “relink all” silently disables DNFS.

Lets test. I’m using Oracle RDBMS 19.15.2, single instance. First I have DNFS enabled and working:

SQL> select id, svrname, dirname, nfsversion from v$dnfs_servers;

        ID SVRNAME    DIRNAME         NFSVERSION
---------- ---------- --------------- ----------------
         1 10.122.9.9 /pt101          NFSv3.0

Then I shut down the database and execute relink all.

SQL> shu immediate
oracle$ lsnrctl stop
oracle$ umask 022 && $ORACLE_HOME/bin/relink all
root$ $ORACLE_HOME/root.sh

Lets start up and see what happened.

SQL> startup mount
Database mounted.

SQL> alter database open;

Database altered.

SQL> select id, svrname, dirname, nfsversion from v$dnfs_servers;

no rows selected

And no DNFS! alert log is also missing the DNFS message:

Oracle instance running with ODM: Oracle Direct NFS ODM Library Version 6.0

Let shut it down and link DNFS back.

SQL> shu immediate

oracle$ cd $ORACLE_HOME/rdbms/lib && make -f ins_rdbms.mk dnfs_on
root$ $ORACLE_HOME/root.sh

SQL> startup mount
SQL> select id, svrname, dirname, nfsversion from v$dnfs_servers;

        ID SVRNAME    DIRNAME         NFSVERSION
---------- ---------- --------------- ----------------
         1 10.122.9.9 /pt101          NFSv3.0

And I have my precious DNFS back. So keep that in mind when you relink Oracle home binaries. Would be good to test if the same problem affects Unified Auditing.

Oracle recently released a thin driver for Python, meaning it does not require any Oracle client installation (like cx_Oracle does) and it is written natively in Python – one benefit of which is that it works on all platforms where Python is present, including ARM64.

You can read more about it here: https://levelup.gitconnected.com/open-source-python-thin-driver-for-oracle-database-e82aac7ecf5a

Interestingly it is not yet trivial to find examples, on how to use the new thin driver to connect to Oracle Autonomous Database.

By default Autonomous database requites mTLS connections, so first create and download the Instance wallet for your Autonomous database (zip file), do remember the password you set. Unzip and transfer ewallet.pem and tnsnames.ora to your Python host.

# Create directory for wallet files
$ mkdir wallet
$ pwd
/home/ilmar/wallet

# Transport ewallet.pem and tnsnames.ora to that directory
$ ls /home/ilmar/wallet/
ewallet.pem
tnsnames.ora

If you have not yet done so, install the pythoin thin driver.

$ pip3.9 install --user oracledb
Collecting oracledb
  Downloading oracledb-1.1.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.7 MB)
     |████████████████████████████████| 6.7 MB 24.1 MB/s 
Collecting cryptography>=3.2.1
  Downloading cryptography-38.0.1-cp36-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_24_aarch64.whl (3.7 MB)
     |████████████████████████████████| 3.7 MB 34.9 MB/s 
Collecting cffi>=1.12
  Downloading cffi-1.15.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (448 kB)
     |████████████████████████████████| 448 kB 90.9 MB/s 
Collecting pycparser
  Downloading pycparser-2.21-py2.py3-none-any.whl (118 kB)
     |████████████████████████████████| 118 kB 119.8 MB/s 
Installing collected packages: pycparser, cffi, cryptography, oracledb
Successfully installed cffi-1.15.1 cryptography-38.0.1 oracledb-1.1.1 pycparser-2.21

Connect from Python code to Autonomous database, connect.py

import oracledb, os

db = oracledb.connect(
       user=os.environ['ORACLE_USER'],
       password=os.environ['ORACLE_PASSWORD'],
       dsn=os.environ['ORACLE_TNS'],
       config_dir="/home/ilmar/wallet",
       wallet_location="/home/ilmar/wallet",
       wallet_password="wallet_password_here")

with db.cursor() as c:
    for row in c.execute("SELECT owner, object_name FROM all_objects FETCH FIRST 10 ROWS ONLY"):
        print(row)

db.close()

Execute

export ORACLE_USER=my_database_username
export ORACLE_PASSWORD=my_database_password
export ORACLE_TNS=connection_identifier_from_tnsnames.ora

$ python3.9 connect.py 
('SYS', 'ORA$BASE')
('SYS', 'DUAL')
('PUBLIC', 'DUAL')
('PUBLIC', 'MAP_OBJECT')
('SYS', 'SYSTEM_PRIVILEGE_MAP')
('SYS', 'I_SYSTEM_PRIVILEGE_MAP')
('PUBLIC', 'SYSTEM_PRIVILEGE_MAP')
('SYS', 'TABLE_PRIVILEGE_MAP')
('SYS', 'I_TABLE_PRIVILEGE_MAP')
('PUBLIC', 'TABLE_PRIVILEGE_MAP')

Here I used Oracle Linux 8 sever running on Ampere (ARM64) in Oracle Cloud Always Free Tier. No Oracle database client or Instantclient was installed.

Lets say your company has an automated system that propagates and renews TLS certificates for each server automatically, for example using NDES. And you want to use the same files also for your Oracle database TCPS protocol connections.

For Oracle database the database server user certificate needs to be placed inside Oracle wallet and the default workflow for Oracle wallet is that you create the private key and certificate request inside Oracle wallet and then use the exported certificate request to request a certificate from your certificate authority and then you import that certificate to the wallet.

But you have already issued certificate and private key as separate files, like many popular open source tools like it. It is possible to create Oracle wallet from these.

We have two files.

  • Private key – /etc/pki/tls/private/servername.key
  • Issued certificate – /etc/pki/tls/private/servername.crt

We also need the certificate authority certificates (all of them, the full chain) who issued your server certificate. You can usually download them from your CA webpage or there also might be URL embedded in the certificate itself.

I’m using orapki executable from 19c installation.

First create an empty Oracle wallet.

# orapki wallet create -wallet /oracle/wallet/location -pwd oracle_wallet_password

Then add all the certificate authority certificates to the wallet

# You can check the certificate if it has links to issuer server certificates embedded. You need to repeat that until there is no output (you have reached the root) and download each file along the way. Usually there are about 2-3 of them.
# openssl x509 -in /etc/pki/tls/private/servername.crt -inform pem -noout -issuer -ext authorityInfoAccess


# orapki wallet add -wallet /oracle/wallet/location -pwd oracle_wallet_password -trusted_cert -cert /tmp/ca_certificate_intermediate.cer
# orapki wallet add -wallet /oracle/wallet/location -pwd oracle_wallet_password -trusted_cert -cert /tmp/ca_certificate_root.cer

A quick sanity check, the server certificate you have should have TLS Web Server Authentication extended key usage set. If it is not, ask your CA to reissue the certificate. Without it your clients might get an error like this: IO error: extended key usage does not permit use for TLS server authentication 

# openssl x509 -ext extendedKeyUsage -in /etc/pki/tls/private/servername.crt -noout
X509v3 Extended Key Usage: 
    TLS Web Client Authentication, TLS Web Server Authentication

If the private key is unencrypted, then you first need to encrypt it in PKCS#8 format – into a separate temporary file. oraplki cannot import unencrypted private key.

# Check first, if the key in unencrypted
# head -1 /etc/pki/tls/private/servername.key
-----BEGIN PRIVATE KEY-----

# export wallet_build_pass="long_key_encryption_password"
# openssl pkcs8 -topk8 -in /etc/pki/tls/private/servername.key -out /tmp/encrypted.key -passout env:wallet_build_pass

# Lets check that the resulting file has encrypted key
# # head -1 /tmp/encrypted.key 
-----BEGIN ENCRYPTED PRIVATE KEY-----

Now you can import the private key and certificate into Oracle wallet.

# orapki wallet import_private_key -wallet /oracle/wallet/location -pwd oracle_wallet_password -pvtkeyfile /tmp/encrypted.key -pvtkeypwd long_key_encryption_password -cert /etc/pki/tls/private/servername.crt

Code – to automatically renew Oracle wallet. NB! read and modify the code according to your needs.