Ilmar Kerm

Oracle, databases, Linux and maybe more

When instance fails in Oracle RAC, the services that were using this instance as a preferred instance are automatically relocated to instances marked as available for this service. But after the failed instance recovers and starts up again, the relocated services are not moved back and need manual srvctl relocate service command from administrator to move them back.

Here is a little Bash script to automate this process. Oracle Clusterware (Grid Infrastructure) can execute user callout scripts on FAN events, like INSTANCE up/down. Place this script under $GRID_HOME/racg/usrco/ and set the execute bits on the file. Then clusterware will execute that script for all FAN events, but the script will start processing only for instance up event.

Why is it needed? We just switched over to 4-node RAC consisting of many different applications, almost each of them connecting to its own schema. We created each application its own service restricting it to 1 (or max 2) nodes (1 or 2 nodes as preferred, all other nodes listed as available). After the first rolling patching, I noticed that the connection count and load on each node was very unbalanced, vast majority of the connections were connected to node1 and the last patched node had almost none and it did not get better over a few hours. This was because most of the services ended up on node1 and I had to manually look over each service and relocate it back where it belongs. This script attempts to automate this process.

Tested on Oracle Linux 5.8 with Oracle Grid Infrastructure 11.2.0.3 and Oracle Database 11.2.0.2 and 11.2.0.3.

16 comments

  1. NandanDas K says:

    Thanks Ilmar, the script worked like a charm !

  2. jayasudha says:

    Hi Ilmar,

    Could you please let me know how i need to run this script to test . I ran the script as shell script it doesnt work . I have placed the script $CRS_HOME/racg/usrco

    ksh -x service_callout.sh
    + LOGFILE=/tmp/grid_callout.txt
    + + dirname service_callout.sh
    SCRIPTDIR=.
    service_callout.sh[11]: "${SCRIPTDIR:(-11)}": bad substitution

  3. Ilmar Kerm says:

    This is a bash script not ksh script and you need to set the executable bits on this file also:
    chmod a+x $CRS_HOME/racg/usrco/relocate_services_callout.sh

    Then you can just execute $CRS_HOME/racg/usrco/relocate_services_callout.sh with the parameters you need.

  4. Can you please give an detail example , as which all parameters to provide to your script for service relocation.

  5. jayasudha says:

    please give an example , as which all parameters to provide to your script for service relocation

  6. Ilmar Kerm says:

    The parameters are defined by Grid Infrastructure. For example to simulate instance up event for instance "db1" in database "db" execute:

    $CRS_HOME/racg/usrco/relocate_services_callout.sh INSTANCE status=up instance=db1 database=db

  7. Hi,
    Thanks a lot .

    Do you have simiilar script for 10g ?

  8. Ilmar Kerm says:

    I have not tested this script on 10g.

  9. Patrick Santucci says:

    I’m having a couple of issues with this script and hoping you can help.

    I had to change this line:

    if [[ `echo “$SERVICECONFIG” | grep “Service is enabled” | wc -l` -eq 1 ]]; then

    to this:

    if [[ `echo “$SERVICECONFIG” | grep “^Service is enabled$” | wc -l` -eq 1 ]]; then

    because it was returning false every time.

    $ echo “$SERVICECONFIG” | grep “Service is enabled” | wc -l
    2
    $ echo “$SERVICECONFIG” | grep “Service is enabled”
    Service is enabled
    Service is enabled on instances: BDUAT1,BDUAT2

    Next, I think there’s a problem with the logic flow. I’m going to use line numbers because it’s easier.
    On line 91 you check if the current instance is in the list of preferred instances.
    If so, you check to see if the service is started (line 97).
    If it’s not started, you start it (line 102). Otherwise, you check to see if it’s not running on a preferred instance (line 109). The problem here is that you’re still inside the if logic from line 91. If line 91 evaluates as FALSE, everything down to line 135 will be skipped, including the check on line 109.

    Here’s an example. I have two services for database bduat_dg. Service bduats is running on its preferred instance BDUAT1; service bduat_maint has preferred instance BDUAT1 but is running on BDUAT2. When I run your script, here is the output I get in the log file:

    Fri Oct 6 14:04:28 EDT 2017
    [bduat_dg][hq-xdb01.intl] Instance BDUAT1 up
    ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_3
    Service bduat_maint
    enabled
    Service bduats
    enabled
    preferred
    running on preferred BDUAT1

    As you can see, it does not take any action on the bduat_maint service, or even list where it is running, just that it is enabled. Not sure what the best way is to fix this.

    1. Patrick Santucci says:

      Sorry, my line numbers were off by 1. Corrected below:

      On line 90 you check if the current instance is in the list of preferred instances.
      If so, you check to see if the service is started (line 96).
      If it’s not started, you start it (line 101). Otherwise, you check to see if it’s not running on a preferred instance (line 108). The problem here is that you’re still inside the if logic from line 90. If line 90 evaluates as FALSE, everything down to line 134 will be skipped, including the check on line 108.

    2. ilmarkerm says:

      Thanks for the comment. I just noticed that the version on this page was out of date and I have already fixed the first issue you mentioned in the script I’m using in production 🙂
      But i don’t really follow the second issue… The check on line 89 (new version) is there to check that if the instance that was just brought up is listed as a preferred instance for that service, if it is not, then nothing should be done with that service. The callout script should not work cluster-wide, it should work on each node separately, since each clusterware instance calls the script itself.

  10. Egeil Sanderson says:

    Dear Ilmar,
    Where can I find the script? Is it not available anymore?

    Regards,
    Egeil

      1. Egeil Sanderson says:

        Hi,
        Thanks. Was wondering where it was gone until I realized that my enterprise would block clear-cut scripts.

        Regards,
        Egeil

  11. sam says:

    Hi ,

    Does this work if cluster has multiple databases running

    1. ilmarkerm says:

      yes it does

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.