Scipt to automatically move RAC 11gR2 services back to preferred instances
- Written by: ilmarkerm
- Category: Blog entry
- Published: May 10, 2012
When instance fails in Oracle RAC, the services that were using this instance as a preferred instance are automatically relocated to instances marked as available for this service. But after the failed instance recovers and starts up again, the relocated services are not moved back and need manual srvctl relocate service command from administrator to move them back.
Here is a little Bash script to automate this process. Oracle Clusterware (Grid Infrastructure) can execute user callout scripts on FAN events, like INSTANCE up/down. Place this script under $GRID_HOME/racg/usrco/ and set the execute bits on the file. Then clusterware will execute that script for all FAN events, but the script will start processing only for instance up event.
Why is it needed? We just switched over to 4-node RAC consisting of many different applications, almost each of them connecting to its own schema. We created each application its own service restricting it to 1 (or max 2) nodes (1 or 2 nodes as preferred, all other nodes listed as available). After the first rolling patching, I noticed that the connection count and load on each node was very unbalanced, vast majority of the connections were connected to node1 and the last patched node had almost none and it did not get better over a few hours. This was because most of the services ended up on node1 and I had to manually look over each service and relocate it back where it belongs. This script attempts to automate this process.
Tested on Oracle Linux 5.8 with Oracle Grid Infrastructure 11.2.0.3 and Oracle Database 11.2.0.2 and 11.2.0.3.
Thanks Ilmar, the script worked like a charm !
Hi Ilmar,
Could you please let me know how i need to run this script to test . I ran the script as shell script it doesnt work . I have placed the script $CRS_HOME/racg/usrco
ksh -x service_callout.sh
+ LOGFILE=/tmp/grid_callout.txt
+ + dirname service_callout.sh
SCRIPTDIR=.
service_callout.sh[11]: "${SCRIPTDIR:(-11)}": bad substitution
This is a bash script not ksh script and you need to set the executable bits on this file also:
chmod a+x $CRS_HOME/racg/usrco/relocate_services_callout.sh
Then you can just execute $CRS_HOME/racg/usrco/relocate_services_callout.sh with the parameters you need.
Can you please give an detail example , as which all parameters to provide to your script for service relocation.
please give an example , as which all parameters to provide to your script for service relocation
The parameters are defined by Grid Infrastructure. For example to simulate instance up event for instance "db1" in database "db" execute:
$CRS_HOME/racg/usrco/relocate_services_callout.sh INSTANCE status=up instance=db1 database=db
Hi,
Thanks a lot .
Do you have simiilar script for 10g ?
I have not tested this script on 10g.
I’m having a couple of issues with this script and hoping you can help.
I had to change this line:
if [[ `echo “$SERVICECONFIG” | grep “Service is enabled” | wc -l` -eq 1 ]]; then
to this:
if [[ `echo “$SERVICECONFIG” | grep “^Service is enabled$” | wc -l` -eq 1 ]]; then
because it was returning false every time.
$ echo “$SERVICECONFIG” | grep “Service is enabled” | wc -l
2
$ echo “$SERVICECONFIG” | grep “Service is enabled”
Service is enabled
Service is enabled on instances: BDUAT1,BDUAT2
Next, I think there’s a problem with the logic flow. I’m going to use line numbers because it’s easier.
On line 91 you check if the current instance is in the list of preferred instances.
If so, you check to see if the service is started (line 97).
If it’s not started, you start it (line 102). Otherwise, you check to see if it’s not running on a preferred instance (line 109). The problem here is that you’re still inside the if logic from line 91. If line 91 evaluates as FALSE, everything down to line 135 will be skipped, including the check on line 109.
Here’s an example. I have two services for database bduat_dg. Service bduats is running on its preferred instance BDUAT1; service bduat_maint has preferred instance BDUAT1 but is running on BDUAT2. When I run your script, here is the output I get in the log file:
Fri Oct 6 14:04:28 EDT 2017
[bduat_dg][hq-xdb01.intl] Instance BDUAT1 up
ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_3
Service bduat_maint
enabled
Service bduats
enabled
preferred
running on preferred BDUAT1
As you can see, it does not take any action on the bduat_maint service, or even list where it is running, just that it is enabled. Not sure what the best way is to fix this.
Sorry, my line numbers were off by 1. Corrected below:
On line 90 you check if the current instance is in the list of preferred instances.
If so, you check to see if the service is started (line 96).
If it’s not started, you start it (line 101). Otherwise, you check to see if it’s not running on a preferred instance (line 108). The problem here is that you’re still inside the if logic from line 90. If line 90 evaluates as FALSE, everything down to line 134 will be skipped, including the check on line 108.
Thanks for the comment. I just noticed that the version on this page was out of date and I have already fixed the first issue you mentioned in the script I’m using in production 🙂
But i don’t really follow the second issue… The check on line 89 (new version) is there to check that if the instance that was just brought up is listed as a preferred instance for that service, if it is not, then nothing should be done with that service. The callout script should not work cluster-wide, it should work on each node separately, since each clusterware instance calls the script itself.
Dear Ilmar,
Where can I find the script? Is it not available anymore?
Regards,
Egeil
It is right here in the post. or access it through github https://gist.github.com/ilmarkerm/4845e4288d1cda98b7f88425d199f979/raw/4c80adf9874358e9e1ec41911fc4215026ac6424/relocate-services-callout.sh
Hi,
Thanks. Was wondering where it was gone until I realized that my enterprise would block clear-cut scripts.
Regards,
Egeil
Hi ,
Does this work if cluster has multiple databases running
yes it does
Hello Ilmarkerm,
Thanks for creating such a useful script. Appreciate that.
It worked on 11g but when I tried on 12c (CDB/PDB), It wouldn’t pick up the CRS_HOME (/u01/app/12.2.0.1/grid/racg/usrco)
This is how I called it.
sh -x relocate_srv.sh INSTANCE status=up instance=CDEV1 database=CDEV
+ LOGFILE=/u01/app/oracle/relocate_db_services_script.log
++ dirname relocate_srv.sh
+ SCRIPTDIR=.
If you look at the code it all depends on that the script is placed under $CRS_HOME/racg/usrco
SCRIPTDIR=`dirname $0`
# Determine grid home
if [[ “${SCRIPTDIR:(-11)}” == “/racg/usrco” ]]; then
CRS_HOME=”${SCRIPTDIR:0:$(( ${#SCRIPTDIR} – 11 ))}”
export CRS_HOME
fi
Thanks ilmarkerm,
I placed the script in the right location.
[oracle@node1 usrco]$ pwd
/u01/app/12.2.0.1/grid/racg/usrco
[oracle@node1 usrco]$ ls
relocate_srv.sh
[oracle@mtldemtsedb11 usrco]$
OS version is also different from our 11g server.
11g OS version is 6.9
12c OS version is 7.4
There could be some difference in the functionality of dirname command in both?
Do you have the same OS in your lab to test?
Thank you!
Nadeem