Ilmar Kerm

Oracle, databases, Linux and maybe more

In Oracle Autonomous database one major application feature is missing – Flashback Data Archive. Purpose of that feature was archiving all data changes seamlessly to the application, no old-school trigger-based data archival/auditing needed. One major disadvantage of triggers is that end user session needs to wait for the trigger to execute, they become part of the end user transaction. FDA lets the users just declare for how long all changes need to be retained (even years) and it collected the data to be archived in the background, from UNDO data. It also had the possibility to add context from the user session that made the data change. Although FDA has been quite troublesome/buggy component, I’ve always liked its functionality.

FDA is part of all Oracle Database editions – except Oracle Autonomous Database. Reference

I currently do not see any other possibility than to go back to old-trigger based solutions, but how to make it as easy to maintain as possible? And keep the modification context.

Traditional way using triggers is to store :OLD (and :NEW ?) values in a separate table, while having a separate audit table for each source table. But these kinds of triggers need to be recreated (and history tables maintained also) whenever the source table structure changes. Sadly there is no way to serialize :OLD/:NEW or access them in any dynamic way.

There are ready made packages for implementing this style of auditing, for example this one by Connor McDonald.

I decided to create my own to make the maintenance much simpler. Store everything as JSON, and store only the new row version.

  • It is very easy to generate JSON object from a table row
  • I would argue, that there is no need to record :OLD row in the history, if :NEW is always recorded. And since every history record has a context attached, it makes more logical sense to me that you store the session context together with the :NEW row version – that this person changed the row to this new version. Not that this person removed that :OLD row version.
  • Changes to source table do not require any changes to triggers nor the history table.
  • Yes, JSON wastes more diskspace.
  • It is very easy to query JSON as relational data in Oracle.
  • Currently the code expects each table has a primary key named ID. I’m not going to change it to make it more generic, you are welcome to change the code for yourself 🙂 As always, don’t take the code blindly from the internet, understand it before you use it.

I’ve been using InfluxDB for storing metrics, but it has hit its limitations – pretty huge memory consumption, so can’t really store some metrics for longer time and InfluxQL language itself is very limiting for analytics. So looking now into replacing it with Oracle 21c with JSON in-memory store and other goodies.

But InfluxDB has a nice GROUP BY time(<time interval>) clause that lets you group timestamps into arbitary length time groups – 1 minute, 2 minute, 5 minute, 15 minute, 3 hours…

In Oracle you can use built in TRUNC() or ROUND() to round timestamp into 1 MINUTE, 1 HOUR, 1 DAY, but not to 5 minute and so on.

Here are my functions to fix this issue.

Yes you are seeing correct, all TIMESTAMP data types. No DATE. Please stop using this old expired DATE data type. TIMESTAMPS are better as ISO joins are better 🙂

I wanted to share some of the Grafana dashboards I’ve built recently to monitor Oracle database environments. Images for now, for your inspiration, so you could build your own.

First up is Oracle database listener. Listener log is a very valuable source of information, that I think is often overlooked. For example just from monitoring listener logs you can get real time information if clients can connect to the database successfully or they get some error. Or so some specific client get an error only (misconfigured client), or some forgotten app is still running is hammering the listener with service name that does not exist anymore. Maybe some client is misbehaving and is trying to connect hundreds of times per second to the database, you will see it from the listener log, with the client information. Who are the clients still connecting with TCP and have not yet migrated over to TCPS connections like DBA-s are demanding? And so on.

And from grafana you can also generate alerts.

Here is the current state of my listener dashboard (this one is for human operators to drill down into issues, I have another dashboard for automatic alerts).

Data collector was published earlier:

Some InfluxQL queries behind these panels:

I think “Failed connections by service” is a very good metric to create an alert on.

SystemD is standard today in the Linux world, and it is very powerful and good, I really like it.

Wouldn’t it be nice to manage Oracle database components also with SystemD? Not just simple starting and stopping, but also making sure it keeps running and restarts it automatically if needed.

I also want to manage listener separately from the database instances, since one listener can service multiple database instances and also keeping automation and automated out-of-place patching in mind – listener does not need to run from the same home as database instance.

Realistic example: you need to wait for months until Oracle support creates you an essential merge patch for the database, but before that you can already run listener with the latest Release Update Revision, that fixes the latest security bugs.

Here is the code:

oracle-listener.service is the SystemD unit file. I have intended it to be pushed out centrally from Ansible, so the content there is generated automatically, for example the Oracle Home path, the the PID file location. the purpose of this script is to return the listener process PID (tnslsnr) to SystemD, so it could keep an eye on it, and restart if it happens to die. Rather long script for such a simple task, but we are dealing with Oracle here, and instead of just returning a simple process PID, “lsnrctl show pid” has a lot on unnecessary clutter in the output.

For a while now Oracle is using standardised log structure under Automatic Diagnostic Repository (ADR) – this includes database alert.log, listener log, CRS log, ASM log and much more. All this is very valuable information for monitoring.

I’m a fan of using Grafana (using InfluxDB as a data source) for monitoring, since in Grafana it is very easy to create your own customised dashboards and alerts – so lately I’ve ben creating my own monitoring agents to gather send Oracle monitoring information to InfluxDB.

InfluxDB is fast time series database, perfect for storing monitoring data – each record must have timestamp associated and data retention is also natively built in.

Since our Oracle environment is exploding in size, my goal is also to have monitoring tools/agents that figure out what to do themselves, deployed using Ansible. All without any human involvment.

Here is monitoring agent with following high level features:

  • Searches for all recently modified alert/log.xml files under ADR base
  • Parses newly added log messages for any tags that Oracle itself has added (like component and message level)
  • Parses log messages against custom regular expressions to parse out custom information

Code is here:

It is expected to run as SystemD service. I currently left in some Ansible tags, since this is how I distribute the agent to all servers (and to make sure people do no just take the code and run it without reading and understanding it).