Ilmar Kerm

Oracle, databases, Linux and maybe more

I previously wrote about how after successfully downgrading APEX users still get the error “Application Express is currently unavailable”. I now ran to the same issue again, with newer versions and the procedure how ORDS detects is APEX is currently being patched has changed.

This post is about downgrading to APEX 23.1 and ORDS 24.1.

After completing APEX downgrade to 23.1, ORDS 24.1 still reported that “Application Express is currently unavailable”. I followed my own previous blog and flipped the APEX internal patching status to APPLIED, but no luck this time. PANIC! I turned on tracking for ORDS sessions and this PL/SQL block seems to be the culprit.

DECLARE
    NM         OWA.VC_ARR := :1 ;
    VL         OWA.VC_ARR := :2 ;
    L_PATCH    VARCHAR2(10) := null;
    L_VERSION  VARCHAR2(100) := null;
    SCHEMA     VARCHAR2(30);
    PART1      VARCHAR2(30);
    PART2      VARCHAR2(30);
    DBLINK     VARCHAR2(30);
    PART1_TYPE NUMBER(10);
    OBJECT_NUM NUMBER(10);
BEGIN
    OWA.INIT_CGI_ENV(:3 , NM, VL);
    HTP.INIT;
    HTP.HTBUF_LEN := 63;
 
  --CHECK THE SCHEMA OVERRIDE FOR PL/SQL GATEWAY APPLICATION DETAILS
  DECLARE
    GATEWAY_DETAILS VARCHAR2(2000) := q'[ SELECT    
                                    (SELECT VALUE FROM ORDS_METADATA.USER_ORDS_PROPERTIES WHERE KEY = 'plsql.gateway.patching') as patchingValue, 
                                    (SELECT VALUE FROM ORDS_METADATA.USER_ORDS_PROPERTIES WHERE KEY = 'plsql.gateway.version') as versionValue 
                                    FROM dual]';
  BEGIN
    BEGIN
       EXECUTE IMMEDIATE GATEWAY_DETAILS INTO L_PATCH, L_VERSION;
    EXCEPTION
       WHEN OTHERS THEN
           -- ignore exception and leave l_patch as null so fallback is executed
           NULL;
    END;
 
    IF L_PATCH IS NULL THEN
    -- L_PATCH  IS NULL, EITHER AN OLDER VERSION OF APEX IS PRESENT OR APEX IS NOT INSTALLED
    -- CHECK IF APEX PRESENT
      DECLARE
        L_SQL VARCHAR2(200) := 'select patch_applied from apex_release';
      BEGIN
        DBMS_UTILITY.NAME_RESOLVE('APEX_RELEASE', 0, SCHEMA, PART1, PART2, DBLINK, PART1_TYPE, OBJECT_NUM);
        EXECUTE IMMEDIATE L_SQL INTO L_PATCH;
      EXCEPTION
        WHEN OTHERS THEN
        -- Could not access apex_release. Default l_patch to 'N' so it's still served 
          L_PATCH :='N';
      END;
    END IF;
  END;
    :4  := L_PATCH;
    :5  :=L_VERSION; 
END;

The flag that tells ORDS that APEX is currently being patched has been moved to ORDS_METADATA schema. And in my case the query indeed returned TRUE.

SELECT VALUE FROM ORDS_METADATA.USER_ORDS_PROPERTIES WHERE KEY = 'plsql.gateway.patching'

After flipping it to FALSE, downgraded APEX started working again.

UPDATE ORDS_METADATA.USER_ORDS_PROPERTIES set value='FALSE' WHERE KEY = 'plsql.gateway.patching';

We are refreshing development databases from production using storage thin cloning. Everything works automatically and part of refresh procedure is also running some SQLPlus scripts, using OS authentication, as SYS.

One database also has APEX, that uses APEX Social Login feature to give users single sign-on for both APEX Builder and the end user application. You can read here how to set it up using Azure SSO. But since APEX is fully inside the database, this means that the production SSO credentials get copied over during database refresh.

I would like to have a SQL script that would run as part of the database refresh procedure, that would replace the APEX SSO credentials. And I’d like that script to run as SYS.

We have been using Single-Sign On for APEX applications for a long time, way before there was social login feature available in APEX itself. We implemented it on web server level in front of APEX. But in order to simplify the webserver setup (mainly to disable sticky load balancer setup) and since APEX for quite some time can do some SSO protocols now internally – we finally moved to APEX social sign-in – using OpenID connector. I blogged about it here earlier.

Enabled it last evening, everything worked fine over night (low use), and suddenly, this morning, when the real use begins, application users started getting the following login failures:

The HTTP request to “https://login.microsoftonline.com/<tenancy_id>/v2.0/.well-known/openid-configuration” failed.

Quite un-informative… why did it fail? This URL is the OpenID discovery URL and the message comes from APEX, not Azure SSO. A few quick checks:

  • The URL works, and it returns proper 200 OK status code, even from the database server
  • If there were any TLS errors, then the error message would be different
  • No new messages in APEX_DEBUG_MESSAGES since the issue started
  • Nothing in APEX_WEBSERVICE_LOG
  • Nothing in alert.log 🤔
  • Changing APEX instance proxy did not help
  • TLS wallet was accessible and good
  • The error is returned ONLY to application users, even though builder uses the same OpenID Discovery URL – and APEX builder login works without any issues
  • And every developer swore, that noone has changed anything today

As a side-note – the end-user applications are quite heavily used and there are hundreds of internal users.

Finally I started creating new Authentications scheme using Generic OAuth2 instead and they seemed to connect to Azure services without issues. But then messages started appearing in APEX_DEBUG_MESSAGES (what triggered that DEBUG logging started writing? accessing applications via Builder?) and hidden deep in messages I found the gem:

Exception in "GET https://login.microsoftonline.com/azure_tenancy_id/v2.0/.well-known/openid-configuration":
Error Stack: ORA-20001: You have exceeded the maximum number of web service requests per workspace. Please contact your administrator.

Looks like, at least in APEX 23.1, calls to OpenID discovery URL count towards the APEX workspace web service request limiter, after increasing the limit the authentication started working again.

exec APEX_INSTANCE_ADMIN.SET_PARAMETER('MAX_WEBSERVICE_REQUESTS', 100000);

According to the documentation MAX_WEBSERVICE_REQUESTS limits the number of outgoing web service requests from each workspace within 24 hours, default 1000.

Problem solved, but it was quite confusing to troubleshoot due to lack of more detailed error messages. I really do not expect authentication hitting a rate limiter and then being completely silent about the underlying cause.

Here I’m exploring how to control the basic network level resource security accesses. In AWS there is a concept called Security Groups. In OCI Oracle Cloud the similar concept is called Network Security Groups, also there is a little bit less powerful concept called Security Lists. A good imprevement with Network Security Groups over Security Lists is that in rules you can refer to other NSGs, not only CIDR.

Below I create two NSG – one for databases and one for application servers, and allow unrestricted outgoing traffc from them both.

# security.tf

# Rules for appservers

resource "oci_core_network_security_group" "appserver" {
    compartment_id = oci_identity_compartment.compartment.id
    vcn_id = oci_core_vcn.main.id
    display_name = "Application servers"
}

resource "oci_core_network_security_group_security_rule" "appserver_egress" {
    network_security_group_id = oci_core_network_security_group.appserver.id
    direction = "EGRESS"
    protocol = "all"
    description = "Allow all Egress traffic"
    destination = "0.0.0.0/0"
    destination_type = "CIDR_BLOCK"
}

# Rules for databases

resource "oci_core_network_security_group" "db" {
    compartment_id = oci_identity_compartment.compartment.id
    vcn_id = oci_core_vcn.main.id
    display_name = "Databases"
}

resource "oci_core_network_security_group_security_rule" "db_egress" {
    network_security_group_id = oci_core_network_security_group.db.id
    direction = "EGRESS"
    protocol = "all"
    description = "Allow all Egress traffic"
    destination = "0.0.0.0/0"
    destination_type = "CIDR_BLOCK"
}

Some rule examples to allow traffic from appservers towards databases. Here referring to the appserver NSG as source – not a CIDR.

# This rule allows port 1521/tcp to be accessed from NSG "appserver" created earlier
resource "oci_core_network_security_group_security_rule" "db_appserver_oracle" {
    network_security_group_id = oci_core_network_security_group.db.id
    direction = "INGRESS"
    protocol = "6" # TCP
    description = "Allow ingress from application servers to 1521/tcp"
    source_type = "NETWORK_SECURITY_GROUP"
    source = oci_core_network_security_group.appserver.id
    tcp_options {
        destination_port_range {
            min = 1521
            max = 1521
        }
    }
}

# This rule allows port 5432/tcp to be accessed from NSG "appserver" created earlier
resource "oci_core_network_security_group_security_rule" "db_appserver_postgres" {
    network_security_group_id = oci_core_network_security_group.db.id
    direction = "INGRESS"
    protocol = "6" # TCP
    description = "Allow ingress from application servers to 5432/tcp"
    source_type = "NETWORK_SECURITY_GROUP"
    source = oci_core_network_security_group.appserver.id
    tcp_options {
        destination_port_range {
            min = 5432
            max = 5432
        }
    }
}

And one example rule for appserver group, here I just want to show that the source NSG can refer to itself – so the port is open only to resources placed in the same NSG.

# This rule allows port 80/tcp to be accessed from the NSG itself
# Example use - the application is running unencrypted HTTP and is expected to have a loadbalancer in front, that does the encryption. In this case loadbalancer could be put to the same NSG.
# Or if the different application servers need to have a backbone communication port between each other - like cluster interconnect
resource "oci_core_network_security_group_security_rule" "appserver_http" {
    network_security_group_id = oci_core_network_security_group.appserver.id
    direction = "INGRESS"
    protocol = "6" # TCP
    description = "Allow access port port 80/tcp only from current NSG (self)"
    source_type = "NETWORK_SECURITY_GROUP"
    source = oci_core_network_security_group.appserver.id
    tcp_options {
        destination_port_range {
            min = 80
            max = 80
        }
    }
}

Now, network security groups need to be attached to the resources they are intended to protect. NSG-s are attached to the virtual network adapers VNICs.

To attach NSG to my previously created compute instance, I have to go back and edit the compute instance declaration to attach a NSG to the primary VNIC of that instance.

# compute.tf

resource "oci_core_instance" "arm_instance" {
    compartment_id = oci_identity_compartment.compartment.id
    # oci iam availability-domain list
    availability_domain = "MpAX:EU-STOCKHOLM-1-AD-1"
    # oci compute shape list --compartment-id 
    shape = "VM.Standard.A1.Flex" # ARM based shape
    shape_config {
        # How many CPUs and memory
        ocpus = 2
        memory_in_gbs = 4
    }
    display_name = "test-arm-1"
    source_details {
        # The source operating system image
        # oci compute image list --all --output table --compartment-id 
        source_id = data.oci_core_images.oel.images[0].id
        source_type = "image"
    }
    create_vnic_details {
        # Network details
        subnet_id = oci_core_subnet.subnet.id
        assign_public_ip = true
        # attaching Network Security Groups - NSGs
        nsg_ids = [oci_core_network_security_group.appserver.id]
    }
    # CloudInit metadata - including my public SSH key
    metadata = {
        ssh_authorized_keys = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCZ4bqPK+Mwiy+HLabqJxCMcQ/hY7IPx/oEQZWZq7krJxkLLUI6lkw44XRCutgww1q91yTdsSUNDZ9jFz9LihGTEIu7CGKkzmoGtAWHwq2W38GuA5Fqr0r2vPH1qwkTiuN+VmeKJ+qzOfm9Lh1zjD5e4XndjxiaOrw0wI19zpWlUnEqTTjgs7jz9X7JrHRaimzS3PEF5GGrT6oy6gWoKiWSjrQA2VGWI0yNQpUBFTYWsKSHtR+oJHf2rM3LLyzKcEXnlUUJrjDqNsbbcCN26vIdCGIQTvSjyLj6SY+wYWJEHCgPSbBRUcCEcwp+bATDQNm9L4tI7ZON5ZiJstL/sqIBBXmqruh7nSkWAYQK/H6PUTMQrUU5iK8fSWgS+CB8CiaA8zos9mdMfs1+9UKz0vMDV7PFsb7euunS+DiS5iyz6dAz/uFexDbQXPCbx9Vs7TbBW2iPtYc6SNMqFJD3E7sb1SIHhcpUvdLdctLKfnl6cvTz2o2VfHQLod+mtOq845s= ilmars_public_key"
    }
}

Continusing to build Oracle Cloud Infrastructure with Terraform. Today moving on to compute instances.

But first some networking, the VCN I created earlier did not have access to the internet. Lets fix it now. The code below will add an Internet Gateway and modify the default route table to send out the network traffic via the Internet Gateway.

# network.tf

resource "oci_core_internet_gateway" "internet_gateway" {
    compartment_id = oci_identity_compartment.compartment.id
    vcn_id = oci_core_vcn.main.id
    # Internet Gateway cannot be associated with Route Table here, otherwise adding a route table rule will error with - Rules in the route table must use private IP as a target.
    #route_table_id = oci_core_vcn.main.default_route_table_id
}

resource "oci_core_default_route_table" "default_route_table" {
    manage_default_resource_id = oci_core_vcn.main.default_route_table_id
    compartment_id = oci_identity_compartment.compartment.id
    display_name = "Default Route Table for VCN"
    route_rules {
        network_entity_id = oci_core_internet_gateway.internet_gateway.id
        destination = "0.0.0.0/0"
        destination_type = "CIDR_BLOCK"
    }
}

Moving on to the compute instance itself. First question is – what operating system should it run – what is the source image. There is a data source for this. Here I select the latest Oracle Linux 9 image for ARM.

data "oci_core_images" "oel" {
    compartment_id = oci_identity_compartment.compartment.id
    operating_system = "Oracle Linux"
    operating_system_version = "9"
    shape = "VM.Standard.A1.Flex"
    state = "AVAILABLE"
    sort_by = "TIMECREATED"
    sort_order = "DESC"
}

# Output the list for debugging
output "images" {
    value = data.oci_core_images.oel
}

We are now ready to create the compute instance itself. In the metadata I provide my SSH public key, so I could SSH into the server.

resource "oci_core_instance" "arm_instance" {
    compartment_id = oci_identity_compartment.compartment.id
    # oci iam availability-domain list
    availability_domain = "MpAX:EU-STOCKHOLM-1-AD-1"
    # oci compute shape list --compartment-id 
    shape = "VM.Standard.A1.Flex" # ARM based shape
    shape_config {
        # How many CPUs and memory
        ocpus = 2
        memory_in_gbs = 4
    }
    display_name = "test-arm-1"
    source_details {
        # The source operating system image
        # oci compute image list --all --output table --compartment-id 
        source_id = data.oci_core_images.oel.images[0].id
        source_type = "image"
    }
    create_vnic_details {
        # Network details
        subnet_id = oci_core_subnet.subnet.id
        assign_public_ip = true
    }
    # CloudInit metadata - including my public SSH key
    metadata = {
        ssh_authorized_keys = "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCZ4bqPK+Mwiy+HLabqJxCMcQ/hY7IPx/oEQZWZq7krJxkLLUI6lkw44XRCutgww1q91yTdsSUNDZ9jFz9LihGTEIu7CGKkzmoGtAWHwq2W38GuA5Fqr0r2vPH1qwkTiuN+VmeKJ+qzOfm9Lh1zjD5e4XndjxiaOrw0wI19zpWlUnEqTTjgs7jz9X7JrHRaimzS3PEF5GGrT6oy6gWoKiWSjrQA2VGWI0yNQpUBFTYWsKSHtR+oJHf2rM3LLyzKcEXnlUUJrjDqNsbbcCN26vIdCGIQTvSjyLj6SY+wYWJEHCgPSbBRUcCEcwp+bATDQNm9L4tI7ZON5ZiJstL/sqIBBXmqruh7nSkWAYQK/H6PUTMQrUU5iK8fSWgS+CB8CiaA8zos9mdMfs1+9UKz0vMDV7PFsb7euunS+DiS5iyz6dAz/uFexDbQXPCbx9Vs7TbBW2iPtYc6SNMqFJD3E7sb1SIHhcpUvdLdctLKfnl6cvTz2o2VfHQLod+mtOq845s= ilmars_public_key"
    }
}

And attach the block storage volumes I created in the previous post. Here I create attachments as paravirtualised, meaning the volumes appear on server as sd* devices, but also iSCSI is possible.

resource "oci_core_volume_attachment" "test_volume_attachment" {
    attachment_type = "paravirtualized"
    instance_id = oci_core_instance.arm_instance.id
    volume_id = oci_core_volume.test_volume.id

    # Interesting options, could be useful in some cases
    is_pv_encryption_in_transit_enabled = false
    is_read_only = false
    is_shareable = false
}

resource "oci_core_volume_attachment" "silver_test_volume_attachment" {
    # This is to enforce device attachment ordering
    depends_on = [oci_core_volume_attachment.test_volume_attachment]

    attachment_type = "paravirtualized"
    instance_id = oci_core_instance.arm_instance.id
    volume_id = oci_core_volume.silver_test_volume.id

    # Interesting options, could be useful in some cases
    is_pv_encryption_in_transit_enabled = false
    is_read_only = true
    is_shareable = false
}

Looks like OCI support some interesting options for attaching volumes, like encryption, read only and shareable. I can see them being useful in the future. If I log into the created server, the attached devices are created as sdb and sdc – where sdc was instructed to be read only. And indeed it is.

[root@test-arm-1 ~]# lsblk
NAME               MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda                  8:0    0 46.6G  0 disk
├─sda1               8:1    0  100M  0 part /boot/efi
├─sda2               8:2    0    2G  0 part /boot
└─sda3               8:3    0 44.5G  0 part
  ├─ocivolume-root 252:0    0 29.5G  0 lvm  /
  └─ocivolume-oled 252:1    0   15G  0 lvm  /var/oled
sdb                  8:16   0   50G  0 disk
sdc                  8:32   0   50G  1 disk

[root@test-arm-1 ~]# dd if=/dev/zero of=/dev/sdb bs=1M count=10
10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.0453839 s, 231 MB/s

[root@test-arm-1 ~]# dd if=/dev/zero of=/dev/sdc bs=1M count=10
dd: failed to open '/dev/sdc': Read-only file system