{"id":548,"date":"2023-02-24T16:48:44","date_gmt":"2023-02-24T15:48:44","guid":{"rendered":"https:\/\/ilmarkerm.eu\/blog\/?p=548"},"modified":"2023-02-24T19:13:26","modified_gmt":"2023-02-24T18:13:26","slug":"testing-client-failover-with-data-guard-fast-start-failover-1-setup","status":"publish","type":"post","link":"https:\/\/ilmarkerm.eu\/blog\/2023\/02\/testing-client-failover-with-data-guard-fast-start-failover-1-setup\/","title":{"rendered":"Testing client failover with Data Guard Fast-Start Failover 1\/setup"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">All posts in this series<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/ilmarkerm.eu\/blog\/2023\/02\/testing-client-failover-with-data-guard-fast-start-failover-1-setup\/\" data-type=\"URL\" data-id=\"https:\/\/ilmarkerm.eu\/blog\/2023\/02\/testing-client-failover-with-data-guard-fast-start-failover-1-setup\/\">Part 1: Setup<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/ilmarkerm.eu\/blog\/2023\/02\/testing-client-failover-with-data-guard-2-switchover\/\" data-type=\"URL\" data-id=\"https:\/\/ilmarkerm.eu\/blog\/2023\/02\/testing-client-failover-with-data-guard-2-switchover\/\">Part 2: Switchover<\/a><\/li>\n\n\n\n<li>Part 3: Failover<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Data Guard setup<\/h2>\n\n\n\n<p>My end goal is to find out how fast database clients (applications) can resume work (reading and writing) after Data Guard role change operation &#8211; either it happens as a planned or unplanned operation. Testing different options to find the shortest downtime &#8211; results apply for my environment only, your results will vary.<\/p>\n\n\n\n<p>I&#8217;m doing it just to increase my knowledge about different Data Guard scenarios.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">My database setup<\/h2>\n\n\n\n<p>I&#8217;m ONLY interested in the role change operation between instances failtesti1 and failtesti2. Whitch are in the same DC, different availability zones (ping 0.165 ms avg). They equal in every way and are intended to service application read+write traffic as primary databases. These two instances are in SYNC mode.<\/p>\n\n\n\n<p>During the test I will also throw in a remote standby database failtesti3 that is located a few thousand km away (53ms ping). This remote standby is in ASYNC or ARCH mode. failtesti3 is only a BYSTANDER during all tests.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>DGMGRL&gt; show configuration verbose;\n\nConfiguration - failtest\n\n  Protection Mode: MaxAvailability\n  Members:\n  failtesti1 - Primary database\n    failtesti2 - (*) Physical standby database \n    failtesti3 - Physical standby database \n\n  (*) Fast-Start Failover target\n\n  Properties:\n    FastStartFailoverThreshold      = '25'\n    OperationTimeout                = '30'\n    TraceLevel                      = 'USER'\n    FastStartFailoverLagLimit       = '30'\n    CommunicationTimeout            = '180'\n    ObserverReconnect               = '1800'\n    FastStartFailoverAutoReinstate  = 'TRUE'\n    FastStartFailoverPmyShutdown    = 'TRUE'\n    BystandersFollowRoleChange      = 'ALL'\n    ObserverOverride                = 'FALSE'\n    ExternalDestination1            = ''\n    ExternalDestination2            = ''\n    PrimaryLostWriteAction          = 'FAILOVER'\n    ConfigurationWideServiceName    = 'QKNPUXMZ_CFG'\n\nFast-Start Failover: Enabled in Zero Data Loss Mode\n  Lag Limit:          30 seconds (not in use)\n  Threshold:          25 seconds\n  Active Target:      failtesti2\n  Potential Targets:  \"failtesti2\"\n    failtesti2 valid\n  Observer:           failtest-observer\n  Shutdown Primary:   TRUE\n  Auto-reinstate:     TRUE\n  Observer Reconnect: 1800 seconds\n  Observer Override:  FALSE\n\nConfiguration Status:\nSUCCESS<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Services<\/h2>\n\n\n\n<p>Both CDB and PDB have a simple service and on database open trigger created, to start the service when database opens.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>exec dbms_service.create_service('dbauser_failtest','dbauser_failtest');\nexec dbms_service.start_service('dbauser_failtest');\n\nCREATE OR REPLACE TRIGGER sys.cdb_start_services AFTER STARTUP OR DB_ROLE_CHANGE ON DATABASE\nDECLARE\n    v_srv_name varchar2(30):= 'dbauser_failtest';\n    cnt NUMBER;\nBEGIN\n    SELECT count(*) INTO cnt FROM v$active_services WHERE name = v_srv_name;\n    IF cnt = 0 THEN\n        dbms_service.start_service(v_srv_name);\n    END IF;\nEND;\n\/<\/code><\/pre>\n\n\n\n<p>For PDB application use<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>alter session set container=soefail;\n\n-- Lets create a simple service\nexec dbms_service.create_service('soe','soe');\n\n-- Now lets create one service with Application Continuity turned on\nDECLARE\n    new_service_params dbms_service.svc_parameter_array;\n    p_srv_name varchar2(20):= 'soe_ac';\nBEGIN\n    new_service_params('FAILOVER_TYPE'):= 'AUTO';\n    new_service_params('REPLAY_INITIATION_TIMEOUT'):= 1800;\n    new_service_params('RETENTION_TIMEOUT'):= 86400;\n    new_service_params('FAILOVER_DELAY'):= 5;\n    new_service_params('FAILOVER_RETRIES'):= 10;\n    new_service_params('FAILOVER_RESTORE'):= 'AUTO';\n    new_service_params('commit_outcome'):= 'true';\n    new_service_params('aq_ha_notifications'):= 'true';\n    dbms_service.create_service(p_srv_name, p_srv_name);\n    dbms_service.modify_service(p_srv_name, new_service_params);\n    dbms_service.start_service(p_srv_name);\nEND;\n\/\n\nCREATE OR REPLACE TRIGGER sys.pdb_start_services AFTER STARTUP ON DATABASE \nBEGIN\n    dbms_service.start_service('soe');\n    dbms_service.start_service('soe_ac');\n    dbms_service.start_service('dbauser_soefail');\nEND;\n\/<\/code><\/pre>\n\n\n\n<p>Load swingbench SOE schema<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>swingbench\/bin\/oewizard -cl -cs tcp:\/\/failtest-1\/dbauser_soefail.prod1.dbs -dba dbauser -dbap dbauser -u soe -p soe -scale 1 -create -tc 2 -ts soe<\/code><\/pre>\n\n\n\n<p>Command to run swingbench test<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>bin\/charbench -cs \"(description=(failover=on)(connect_timeout=2)(transport_connect_timeout=1 sec)(address_list=(address=(protocol=tcp)(host=failtest-1)(port=1521))(address=(protocol=tcp)(host=failtest-2)(port=1521)))(connect_data=(service_name=soe.prod1.dbs)))\" \\\n-u soe -p soe -uc 4 -c SOE_Server_Side_V2.xml<\/code><\/pre>\n\n\n\n<p>During all test I have swingbench running at around 1600 TPS &#8211; which is enough for my small setup to keep the CPUs 40% idle. I want databases under load, but still enough free CPU time for it not to affect the test.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">avg-cpu:  %user   %nice %system %iowait  %steal   %idle\n          52.11    0.05    5.96    7.39    0.00   34.49\n\nDevice            r\/s     w\/s     rMB\/s     wMB\/s   rrqm\/s   wrqm\/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util\nvdc             33.00 1506.40      1.68      9.85    66.00     0.00  66.67   0.00    0.95    1.48   2.49    52.07     6.70   0.40  61.38\nvdd             38.80 1487.80      1.84      8.83    72.00     0.60  64.98   0.04    0.84    1.42   2.37    48.66     6.08   0.40  61.00\n\navg-cpu:  %user   %nice %system %iowait  %steal   %idle\n          50.33    0.05    4.84    9.33    0.00   35.44\n\nDevice            r\/s     w\/s     rMB\/s     wMB\/s   rrqm\/s   wrqm\/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util\nvdc             29.80 1388.80      1.41      8.91    23.60     1.20  44.19   0.09    0.84    2.05   3.11    48.32     6.57   0.42  59.42\nvdd             34.20 1417.40      1.37      8.55    21.00     0.20  38.04   0.01    0.72    1.82   2.86    41.12     6.18   0.42  60.88<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">The client<\/h2>\n\n\n\n<p>I wrote a simple client in Python that will connect two threads to the database &#8211; one that only reads and one that only writes, in separate threads. And the program will record all errors the client experiences during the test and in case connection becomes unusable &#8211; reconnect to the database. Results are recorded in CSV file to be analysed later.<\/p>\n\n\n\n<p>Side note: The client is indeed using the new python-oracledb module, but since it is currently not complete, specially in the HA area &#8211; I&#8217;m using it in thick mode with Instantclient 21.9.<\/p>\n\n\n\n<p><a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/ilmarkerm\/oracle_client_failover_test\" data-type=\"URL\" data-id=\"https:\/\/github.com\/ilmarkerm\/oracle_client_failover_test\" target=\"_blank\">Access the test client code here (github.com)<\/a><\/p>\n\n\n\n<p>Examples queries for data analytics are also part of the github repository.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The plan<\/h2>\n\n\n\n<p>The next post will be about switchover&#8230; then about failover. Maybe more, who knows.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>All posts in this series Data Guard setup My end goal is to find out how fast database clients (applications) can resume work (reading and writing) after Data Guard role change operation &#8211; either it happens as a planned or unplanned operation. Testing different options to find the shortest downtime &#8211; results apply for my [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[49,56,4],"class_list":["post-548","post","type-post","status-publish","format-standard","hentry","category-blog-entry","tag-dataguard","tag-high-availability","tag-oracle"],"_links":{"self":[{"href":"https:\/\/ilmarkerm.eu\/blog\/wp-json\/wp\/v2\/posts\/548","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ilmarkerm.eu\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ilmarkerm.eu\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ilmarkerm.eu\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ilmarkerm.eu\/blog\/wp-json\/wp\/v2\/comments?post=548"}],"version-history":[{"count":17,"href":"https:\/\/ilmarkerm.eu\/blog\/wp-json\/wp\/v2\/posts\/548\/revisions"}],"predecessor-version":[{"id":594,"href":"https:\/\/ilmarkerm.eu\/blog\/wp-json\/wp\/v2\/posts\/548\/revisions\/594"}],"wp:attachment":[{"href":"https:\/\/ilmarkerm.eu\/blog\/wp-json\/wp\/v2\/media?parent=548"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ilmarkerm.eu\/blog\/wp-json\/wp\/v2\/categories?post=548"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ilmarkerm.eu\/blog\/wp-json\/wp\/v2\/tags?post=548"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}