Ilmar Kerm

Oracle, databases, Linux and maybe more

I published my 2014 presentation “Making MySQL highly available using Oracle Grid Infrastructure” in Slideshare.
Please also read my page how to set up the mysql scripts for Oracle GI

I’m currently involved in a project where we are replacing one company’s entire hardware platform. They also have Oracle database (that we cannot upgrade right now) and what is really unusual for me, is that this Oracle database runs under Windows (and we cannot migrate to another platform). We also decided to use Oracle Grid Infrastructure (aka Oracle Clusterware) to implement active-passive standby server for this database. Why? Because Windows Cluster was not an option, we didn’t have RAC licenses, Oracle Clusterware is free (if you are protecting Oracle software or running on Oracle OS) and we have really good previous experience with it under Linux.
For more information on how I’ve used Oracle Grid Infrastructure to provide high availability for MySQL (or any other application), check out my page MySQL HA with Oracle Clusterware.

When I started testing this it was quite surprising that I didn’t find any Oracle Clusterware action script examples for Windows in Oracle documentation or even on google 🙂 Oracle documentation just refers that in Windows the action script has to be a batch script.

This is the example action script I came up with to manage stand alone Oracle database. It uses oradim to start and stop the database instance dbgp and a small sqlplus script to check if the database instance is alive. Tested using Oracle Grid Infrastructure under Windows 2008R2.
I named the action script: d:\scripts\dbgp.cmd

@echo off
@setlocal enableextensions enabledelayedexpansion
set action=%~1
set ORACLE_HOME=D:\app\oracle\product\\db
set ORACLE_SID=dbgp
set TEMPFILE=d:\scripts\script.out
set CHECKSCRIPT=d:\scripts\check.sql
set OUTP=

if exist %TEMPFILE% (
  del %TEMPFILE%

if "%action%" == "start" goto :start
if "%action%" == "stop" goto :stop
if "%action%" == "check" goto :check
if "%action%" == "clean" goto :clean
goto :exit

  %ORACLE_HOME%\bin\oradim -startup -sid %ORACLE_SID% -starttype srvc,inst > %TEMPFILE%
  call :setsize "%TEMPFILE%"
  if %size% gtr 0 call :checkoutput
  if exist %TEMPFILE% type %TEMPFILE%
  goto :exit

  %ORACLE_HOME%\bin\oradim -shutdown -sid %ORACLE_SID% -shuttype srvc,inst -shutmode immediate > %TEMPFILE%
  call :setsize "%TEMPFILE%"
  if %size% gtr 0 call :checkoutput
  if exist %TEMPFILE% type %TEMPFILE%
  goto :exit

  %ORACLE_HOME%\bin\sqlplus /nolog @%CHECKSCRIPT% > %TEMPFILE%
  goto :exit

  %ORACLE_HOME%\bin\oradim -shutdown -sid %ORACLE_SID% -shuttype srvc,inst -shutmode abort
  goto :exit

  set size=%~z1
  goto :eof

  set /p OUTP=<%TEMPFILE%
  if not "x%OUTP:DIM-=%" == "x%OUTP%" set EXITCODE=1
  if not "x%OUTP:ORA-=%" == "x%OUTP%" set EXITCODE=1
  goto :eof

  set OUTP=
  if exist sqlnet.log del sqlnet.log
  exit /b %EXITCODE%

Just for completeness, this script refers to d:\scripts\check.sql that is just used to run a quick database healt check, here is its contents:

whenever sqlerror exit failure
conn / as sysdba
select 1 from dual;

WINDOWS SPECIFIC ONE TIME OPERATION: Before clusterware can execute the action script in Windows, you need to create OracleCRSToken_username service for the OS user who is executing the script. In my setup both Oracle Clusterware, managed database and the action script are executed by the local user WINRAC1\oracle and on the second node as WINRAC2\oracle. It is actually easier if you use domain user, please check the referred note.
Reference: Windows: How to Modify OS User Privileges for 11gR2 Grid Infrastructure and RAC Services (Needed for Backup To Network Shares) (Doc ID 1339053.1) steps 2 and 3.

set ORACLE_HOME=d:\app\\grid
%ORACLE_HOME%\bin\crsuser add winrac1\oracle

.. it will show errors ..
.. but repeat the command on other node also ..

set ORACLE_HOME=d:\app\\grid
%ORACLE_HOME%\bin\crsuser add winrac2\oracle

After that need to open services.msc and edit service OracleCRSToken_oracle. First set its startup type to Automatic and then on Log On As tab also set the oracle user password. After that start service OracleCRSToken_oracle and repeat these steps on all cluster nodes.

ADDING THE RESOURCE TO CLUSTERWARE: Adding the resource to cluster is the same as under Linux:

set ORACLE_HOME=d:\app\\grid
%ORACLE_HOME%\bin\crsctl add resource oradb_dbgp -type cluster_resource -attr "ACTION_SCRIPT=d:\scripts\dbgp.cmd, CHECK_INTERVAL=60, RESTART_ATTEMPTS=2, PLACEMENT=favored, HOSTING_MEMBERS=winrac1"

When instance fails in Oracle RAC, the services that were using this instance as a preferred instance are automatically relocated to instances marked as available for this service. But after the failed instance recovers and starts up again, the relocated services are not moved back and need manual srvctl relocate service command from administrator to move them back.

Here is a little Bash script to automate this process. Oracle Clusterware (Grid Infrastructure) can execute user callout scripts on FAN events, like INSTANCE up/down. Place this script under $GRID_HOME/racg/usrco/ and set the execute bits on the file. Then clusterware will execute that script for all FAN events, but the script will start processing only for instance up event.

Why is it needed? We just switched over to 4-node RAC consisting of many different applications, almost each of them connecting to its own schema. We created each application its own service restricting it to 1 (or max 2) nodes (1 or 2 nodes as preferred, all other nodes listed as available). After the first rolling patching, I noticed that the connection count and load on each node was very unbalanced, vast majority of the connections were connected to node1 and the last patched node had almost none and it did not get better over a few hours. This was because most of the services ended up on node1 and I had to manually look over each service and relocate it back where it belongs. This script attempts to automate this process.

Please use the download link for downloading, seems that blogger escapes some characters in the code below.

# GI callout script to catch INSTANCE up event from clusterware and relocate services to preferred instance
# Copy or symlink this script to $GRID_HOME/racg/usrco
# Tested on Oracle Linux 5.8 with Oracle Grid Infrastructure and & Oracle Database Enterprise Edition
# 2012 Ilmar Kerm

SCRIPTDIR=`dirname $0`

# Determine grid home
if [[ "${SCRIPTDIR:(-11)}" == "/racg/usrco" ]]; then
  CRS_HOME=""${SCRIPTDIR:0:$(( ${#SCRIPTDIR} - 11 ))}""
  export CRS_HOME

# Only execute script for INSTANCE events
if [ "$1" != "INSTANCE" ]; then
  exit 0


# Parse input arguments
for arg in ${args[@]}; do
  if [[ "$arg" == *=* ]]; then
    case "$KEY" in

# If database, status and instance values are not set, then exit
# status must be up
if [[ -z "$DATABASE" || -z "$INSTANCE" || "$STATUS" != "up" ]]; then
  exit 0

echo "`date`" >> "$LOGFILE"
echo "[$DATABASE][`hostname`] Instance $INSTANCE up" >> "$LOGFILE"

# Read database software home directory from clusterware
DBCONFIG=`$CRS_HOME/bin/crsctl status res ora.$DATABASE.db -f | grep "ORACLE_HOME="`
if [ -z "$DBCONFIG" ]; then
  exit 0
declare -r "$DBCONFIG"

# Array function
in_array() {
    local hay needle=$1
    for hay; do
        [[ $hay == $needle ]] && return 0
    return 1

# Read information about services
for service in `$CRS_HOME/bin/crsctl status res | grep -E "ora.$DATABASE.(.+).svc" | sed -rne "s/NAME=ora.$DATABASE.(.+).svc/1/gip"`; do
  SERVICECONFIG=`$ORACLE_HOME/bin/srvctl config service -d $DATABASE -s $service`
  echo "Service $service" >> "$LOGFILE"
  if [[ `echo "$SERVICECONFIG" | grep "Service is enabled" | wc -l` -eq 1 ]]; then
    echo " enabled" >> "$LOGFILE"
    PREFERRED=( `echo "$SERVICECONFIG" | grep "Preferred instances:" | sed -rne "s/.*: ([a-zA-Z0-9]+)/1/p" | tr "," "n"` )
    # Check if current instance is preferred for this service
    if in_array "$INSTANCE" "${PREFERRED[@]}" ; then
      echo " preferred" >> "$LOGFILE"
      # Check if service is already running on current instance
      SRVSTATUS=`$ORACLE_HOME/bin/srvctl status service -d $DATABASE -s $service`
      if [[ "$SRVSTATUS" == *"is not running"* ]]; then
          # if service is not running, then start it
        echo " service stopped, starting" >> "$LOGFILE"
        $ORACLE_HOME/bin/srvctl start service -d "$DATABASE" -s "$service" >> "$LOGFILE"
        # Service is running, but is it running on preferred instance?
        RUNNING=( `echo "$SRVSTATUS" | sed -rne "s/.* ([a-zA-Z0-9]+)/1/p" | tr "," "n"` )
        echo "${RUNNING[@]} = ${PREFERRED[@]}"
        if ! in_array "$INSTANCE" "${RUNNING[@]}" ; then
          echo " not running on preferred $INSTANCE" >> "$LOGFILE"
          # Find the first non-preferred running instance
          for inst in "${RUNNING[@]}"; do
            if ! in_array "$inst" "${PREFERRED[@]}" ; then
          # Relocate
          if [[ -n "$CURRENT" ]]; then
            echo " relocate $CURRENT -> $INSTANCE" >> "$LOGFILE"
            $ORACLE_HOME/bin/srvctl relocate service -d "$DATABASE" -s "$service" -i "$CURRENT" -t "$INSTANCE" >> "$LOGFILE"
          # Service is already running on preferred instance, no need to do anything
          echo " running on preferred $INSTANCE" >> "$LOGFILE"


Tested on Oracle Linux 5.8 with Oracle Grid Infrastructure and Oracle Database and

I’ve been doing lately quite many database clustering implementations; Oracle RAC and since we have many MySQL instances in production, had to find a good way to make MySQL instances highly available also.

One good solution for this is managing MySQL instances with clusterware and since we are planning to use Oracle RAC on Oracle Enterprise Linux anyway, then Oracle Clusterware is an excellent candidate for this task. Also… Oracle Clusterware is included with Oracle Enterprise Linux at no additional charge.

Requirements I had:

  • Multiple MySQL instances running in the same cluster, in case of node failure affected MySQL instances are moved to any other surviving node (least loaded)
  • Different MySQL instances may run different MySQL RDBMS software versions
  • Each MySQL instance is listening to its own dedicated and clusterware managed IP address (so MySQL can always respond on default 3306 port and can move independently around nodes without any dependencies to other MySQL instances)
  • Clusterware monitors each MySQL instance and in case of trouble automatically moves IP and starts instance on another cluster node

For this task I wrote some Bash scripts to help managing MySQL instances in Oracle Clusterware environment. All these scripts are available here: Google Code project page, also with documentation of the whole setup and how to use scripts: Documentation

All comments welcome!