July 2009 – Ilmar Kerm

Here I’m following up my previous post Using linguistic indexes for sorting in Oracle. I don’t much like the Oracle solution, that requires creating a special index to speed up sorting, but… at the same time its very powerful, allows to index in many languages and no database changes are needed.

In this post I’ll take a look at the two popular open source databases MySQL and PostgreSQL. I’ll take a look only at features, that the database has included and that can be used without any special application changes.

PostgreSQL 8.4

Starting from 8.4, collation (sorting) rules can be defined per database and there is no possibility to set it in session level. All sorting and all indexes are ordered according to the database collation locale. In previous versions there was only one collation locale allowed for the entire database cluster.

For my example I create two databases, one with Estonian locale and one with German.

$ createdb -E utf8 -l 'et_EE.utf8' -O ilmar ilmar_ee
$ createdb -E utf8 -l 'de_DE.utf8' -O ilmar -T template0 ilmar_de

Currently PostgreSQL relies on the underlying OS to do the collations, so the OS must also support the specified locale. Check it with:

$ locale -a | grep et_EE
et_EE
et_EE.iso88591
et_EE.iso885915
et_EE.utf8

To change the collation you need to dump the entire database to a text file (pg_dump), create a new database and reload the data. So, a pretty painful procedure for large databases.

A small test if it really works. In ilmar_ee and ilmar_de I create table test_coll and load it with 4 rows:

CREATE TABLE test_coll (
  t character varying(100) NOT NULL
);

begin;
insert into test_coll values ('a');
insert into test_coll values ('o');
insert into test_coll values ('õ');
insert into test_coll values ('ä');
commit;


ilmar_ee=> select t from test_coll order by t;
a
o
õ
ä

ilmar_de=> select t from test_coll order by t;
a
ä
o
õ

Now can index be used for sorting?

CREATE TABLE t (
  t character varying(100) NOT NULL
);

CREATE INDEX idxt
  ON t
  USING btree
  (t);

CREATE OR REPLACE FUNCTION fill_t(p_num_rows bigint) RETURNS bigint AS
$BODY$
declare
  s t.t%type;
  i integer;
begin
  for i in 1..p_num_rows loop
    s:= case mod(i, 4) when 0 then 'a' when 1 then 'ä' when 2 then 'o' when 3 then 'õ' end;
    if mod(i,2) = 0 then
      s:= upper(s);
    end if;
    s:= s ||' wqe wqe wqe wqeqwdsa asd asdasd sasss we qwewq dssas';
    insert into t (t) values (s);
  end loop;
  return 0;
end;
$BODY$
  LANGUAGE 'plpgsql' VOLATILE;

select fill_t(10000);
vacuum analyze t;

After the test data is created, some tests. Oracle users will find the following very strange:

ilmar=> explain select t from t order by t;
                          QUERY PLAN
--------------------------------------------------------------
 Sort  (cost=868.39..893.39 rows=10000 width=55)
   Sort Key: t
   ->  Seq Scan on t  (cost=0.00..204.00 rows=10000 width=55)

ilmar=> explain select t from t where t between 'a' and 'b' order by t;
                               QUERY PLAN
-------------------------------------------------------------------------
 Sort  (cost=395.10..401.35 rows=2500 width=55)
   Sort Key: t
   ->  Seq Scan on t  (cost=0.00..254.00 rows=2500 width=55)
         Filter: (((t)::text >= 'a'::text) AND ((t)::text <= 'b'::text))

ilmar=> explain select t from t where t between 'a' and 'ak' order by t;
                               QUERY PLAN
------------------------------------------------------------------------
 Index Scan using idxt on t  (cost=0.00..8.27 rows=1 width=55)
   Index Cond: (((t)::text >= 'a'::text) AND ((t)::text <= 'ak'::text))

It seems that Postgres optimizer only consideres using index for sorting, when there is only a small fraction of the table filtered. A reason for this in the documentation is:

The planner will consider satisfying an ORDER BY specification either by scanning an available index that matches the specification, or by scanning the table in physical order and doing an explicit sort. For a query that requires scanning a large fraction of the table, an explicit sort is likely to be faster than using an index because it requires less disk I/O due to following a sequential access pattern. Indexes are more useful when only a few rows need be fetched.

So it seems that Postgres cannot fast full scan an index. If I fill the table up even more, then finally optimizer is costing the sort operation higher than index scan. But the query is slow, 4125ms on my system.

ilmar=> select fill_t(90000);
ilmar=> vacuum analyze t;

ilmar=> explain select t from t order by t;
                              QUERY PLAN
-----------------------------------------------------------------------
 Index Scan using idxt on t  (cost=0.00..9771.11 rows=100000 width=55)

A note from documentation:

The drawback of using locales other than C or POSIX in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by LIKE. For this reason use locales only if you actually need them.

Better collation supports seems to be a work in progress:
Todo:Collate
Todo:IDU

Its also possible to use third party Orafce package, that enables the use on NLSSORT function, that is similar to Oracle.

MySQL 5.1

MySQL supports collations at the column level.
All available collations for the given charset can be queried like this:

mysql> show collation where charset = 'utf8';
+--------------------+---------+-----+---------+----------+---------+
| Collation          | Charset | Id  | Default | Compiled | Sortlen |
+--------------------+---------+-----+---------+----------+---------+
| utf8_general_ci    | utf8    |  33 | Yes     | Yes      |       1 |
| utf8_bin           | utf8    |  83 |         | Yes      |       1 |
| utf8_unicode_ci    | utf8    | 192 |         | Yes      |       8 |
| utf8_icelandic_ci  | utf8    | 193 |         | Yes      |       8 |
| utf8_latvian_ci    | utf8    | 194 |         | Yes      |       8 |
| utf8_romanian_ci   | utf8    | 195 |         | Yes      |       8 |
| utf8_slovenian_ci  | utf8    | 196 |         | Yes      |       8 |
| utf8_polish_ci     | utf8    | 197 |         | Yes      |       8 |
| utf8_estonian_ci   | utf8    | 198 |         | Yes      |       8 |
| utf8_spanish_ci    | utf8    | 199 |         | Yes      |       8 |
| utf8_swedish_ci    | utf8    | 200 |         | Yes      |       8 |
| utf8_turkish_ci    | utf8    | 201 |         | Yes      |       8 |
| utf8_czech_ci      | utf8    | 202 |         | Yes      |       8 |
| utf8_danish_ci     | utf8    | 203 |         | Yes      |       8 |
| utf8_lithuanian_ci | utf8    | 204 |         | Yes      |       8 |
| utf8_slovak_ci     | utf8    | 205 |         | Yes      |       8 |
| utf8_spanish2_ci   | utf8    | 206 |         | Yes      |       8 |
| utf8_roman_ci      | utf8    | 207 |         | Yes      |       8 |
| utf8_persian_ci    | utf8    | 208 |         | Yes      |       8 |
| utf8_esperanto_ci  | utf8    | 209 |         | Yes      |       8 |
| utf8_hungarian_ci  | utf8    | 210 |         | Yes      |       8 |
+--------------------+---------+-----+---------+----------+---------+
21 rows in set (0.04 sec)

I'll create a table test, in where column e uses Estonian sorting and column h uses hungarian sorting.

mysql> create table test (
  e varchar(100) not null collate 'utf8_estonian_ci',
  h varchar(100) not null collate 'utf8_hungarian_ci'
) charset=utf8;

mysql> insert into test values ('a','a');
mysql> insert into test values ('ä','ä');
mysql> insert into test values ('o','o');
mysql> insert into test values ('õ','õ');
mysql> select * from test;
+---+---+
| e | h |
+---+---+
| a | a |
| ä | ä |
| o | o |
| õ | õ |
+---+---+
4 rows in set (0.02 sec)

mysql> select * from test order by e;
+---+---+
| e | h |
+---+---+
| a | a |
| o | o |
| õ | õ |
| ä | ä |
+---+---+
4 rows in set (0.04 sec)

mysql> select * from test order by h;
+---+---+
| e | h |
+---+---+
| a | a |
| ä | ä |
| o | o |
| õ | õ |
+---+---+
4 rows in set (0.01 sec)

Perfect, now what about indexes?

mysql> create index idxe on test (e);
Query OK, 4 rows affected (0.71 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> create index idxh on test (h);
Query OK, 4 rows affected (0.18 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> explain select e from test order by e;
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | test  | index | NULL          | idxe | 302     | NULL |    4 | Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> explain select h from test order by h;
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | test  | index | NULL          | idxh | 302     | NULL |    4 | Using index |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

If I force a different collation on a column, then values are read from index, but extra filesort step is needed:

mysql> explain select h from test order by h collate utf8_estonian_ci;
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                       |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | test  | index | NULL          | idxh | 302     | NULL |    4 | Using index; Using filesort |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------+
1 row in set (0.00 sec)

I'll add more rows to see how much difference there is when reading the order from index and when doing extra sorting.

mysql> delimiter //
mysql> create procedure load_data(p_num_rows INT)
    -> BEGIN
    ->   SET @i = 0;
    ->   REPEAT
    ->     SET @i = @i + 1;
    ->     INSERT INTO test (e, h) VALUES (CONCAT('eeeaad sadsa dasd asd', @i),
    ->       CONCAT('213aad sadsa dasd asd', @i));
    ->   UNTIL @i > p_num_rows END REPEAT;
    -> END
    -> //
Query OK, 0 rows affected (0.44 sec)

mysql> delimiter ;
mysql> call load_data(10000);
mysql> call load_data(20000);
mysql> ANALYZE TABLE test;

mysql> select sql_no_cache count(*) from (select h from test order by h collate utf8_estonian_ci) a;
+----------+
| count(*) |
+----------+
|    30002 |
+----------+
1 row in set (1.22 sec)

mysql> select sql_no_cache count(*) from (select h from test order by h) a;
+----------+
| count(*) |
+----------+
|    30002 |
+----------+
1 row in set (0.09 sec)

It is also possible to set collation at connection level, but this does not change the row sorting order like in Oracle.

mysql> truncate table test;
Query OK, 0 rows affected (0.07 sec)

mysql> insert into test values ('a','a');
Query OK, 1 row affected (0.03 sec)

mysql> insert into test values ('ä','ä');
Query OK, 1 row affected (0.08 sec)

mysql> insert into test values ('o','o');
Query OK, 1 row affected (0.03 sec)

mysql> insert into test values ('õ','õ');
Query OK, 1 row affected (0.03 sec)

mysql> set collation_connection=utf8_estonian_ci;
Query OK, 0 rows affected (0.01 sec)

mysql> select h from test order by h;
+---+
| h |
+---+
| a |
| ä |
| o |
| õ |
+---+
4 rows in set (0.00 sec)

mysql> set collation_connection=utf8_hungarian_ci;
Query OK, 0 rows affected (0.00 sec)

mysql> select h from test order by h;
+---+
| h |
+---+
| a |
| ä |
| o |
| õ |
+---+
4 rows in set (0.00 sec)

It’s common to use a production database copy for testing. But how to keep it fresh? If production database is large and located on a remote site (even on another country/continent), then copying the full database backup over network may be too unreasonable and if the test database needs to be open for a longer time, then transporting and storing production archived logs to test site may be too unreasonable also.
In this situation Oracle has three great technologies that can help: physical standby database, flashback database and RMAN.

Here I’m using Oracle Database 10.2.0.4 EE.

Initial setup

This is just to create a test database as physical standby database. Do it in any way you like. Just make sure you create a guaranteed restore point before activating the standby.

Create backup and standby controlfile. Give backupsets some tag, that can later be used for cleaning up the backups from primary database.

RMAN> backup device type disk format '/home/oracle/backup/%U' as compressed backupset
  tag 'testdb_init' database INCLUDE CURRENT CONTROLFILE FOR STANDBY
  plus archivelog;

Starting backup at 24-JUL-09
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=159 devtype=DISK
channel ORA_DISK_1: starting compressed full datafile backupset
... and so on
Finished backup at 24-JUL-09

I also need to take spfile and password file.

$ cp $ORACLE_HOME/dbs/spfiletest1.ora /home/oracle/backup/
$ cp $ORACLE_HOME/dbs/orapwtest1 /home/oracle/backup/

Now, copy everything over to the test database. I’m leaving all file locations exactly the same as in primary database.

$ scp /home/oracle/backup/* oracle@testdb:/home/oracle/backup/

On test server:

$ cd backup/
$ cp spfiletest1.ora $ORACLE_HOME/dbs/
$ cp orapwtest1 $ORACLE_HOME/dbs/

Start up the test database instance and change service name to be different from the one used in production.

SQL> startup nomount
ORACLE instance started.

Total System Global Area  608174080 bytes
Fixed Size                  1268920 bytes
Variable Size             167773000 bytes
Database Buffers          436207616 bytes
Redo Buffers                2924544 bytes
SQL> show parameter service

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
service_names                        string      test1

SQL> alter system set service_names='test_test1' scope=both;

System altered.

Restore controlfile and restore the datafiles.

RMAN> restore controlfile from '/home/oracle/backup/0tkkuvob_1_1';

Starting restore at 24-JUL-09
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=156 devtype=DISK

channel ORA_DISK_1: restoring control file
channel ORA_DISK_1: restore complete, elapsed time: 00:00:08
output filename=/u01/app/oracle/oradata/TEST1/controlfile/o1_mf_56lyof8g_.ctl
output filename=/u01/app/oracle/flash_recovery_area/TEST1/controlfile/o1_mf_56lyofm4_.ctl
Finished restore at 24-JUL-09

RMAN> alter database mount;
RMAN> sql "alter database flashback off";

RMAN> catalog start with '/home/oracle/backup';

searching for all files that match the pattern /home/oracle/backup

List of Files Unknown to the Database
=====================================
File Name: /home/oracle/backup/0ukkuvok_1_1
File Name: /home/oracle/backup/0tkkuvob_1_1

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /home/oracle/backup/0ukkuvok_1_1
File Name: /home/oracle/backup/0tkkuvob_1_1

RMAN> restore database;

Starting restore at 24-JUL-09
...
Finished restore at 24-JUL-09

RMAN> recover database;

Starting recover at 24-JUL-09
using channel ORA_DISK_1

starting media recovery
unable to find archive log
archive log thread=1 sequence=51
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 07/24/2009 17:09:50
RMAN-06054: media recovery requesting unknown log: thread 1 seq 51 lowscn 360057

Recover database will fail at the end, but this is only because it didn’t find the current log. This doesn’t matter here.

Test that media recovery process will start.

SQL> alter database recover managed standby database disconnect from session;

Database altered.

SQL> select process, status, sequence# from v$managed_standby where process like 'MRP%';

PROCESS   STATUS        SEQUENCE#
--------- ------------ ----------
MRP0      WAIT_FOR_LOG         51

SQL> alter database recover managed standby database cancel;

Database altered.

If MRP* is started and waiting for log (and no error in alert.log), then everything is good.

Set up flashback. Make sure that you have enough diskspace in FRA (flashback recovery area) for all flashback logs generated while the test database is open.
After flashback has been enabled, create a guaranteed restore point.

SQL> alter database flashback on;

Database altered.

SQL> create restore point prod_point guarantee flashback database;

Restore point created.

Now, just activate standby database and open it.

SQL> alter database activate standby database;
SQL> startup mount force
SQL> alter database set standby database to maximize performance;
SQL> alter database open;

Delete the backups from both databases. Just for cleaning up.

RMAN> delete backup tag 'testdb_init';

Syncing with production

To make it more interesting, I’ll add some new datafiles to the primary database.

SQL> alter tablespace sysaux add datafile size 10m autoextend on next 10m maxsize 1g;

Tablespace altered.

SQL> create tablespace test5891 datafile size 10m autoextend on next 10m maxsize 1g;

Tablespace created.

First, in test database find out the restore point SCN. This is the point from where to create the incremental backup.

SQL> select to_char(scn) from v$restore_point where name = 'PROD_POINT';

TO_CHAR(SCN)
---------------
390703

Find out the current production database log sequence.

SQL> select sequence# from v$log where status = 'CURRENT';

 SEQUENCE#
----------
        57

SQL> alter system checkpoint;

Create incremental backup from production and transport it to the test site. Also include all archived logs that were generated during backup.

RMAN> BACKUP INCREMENTAL FROM SCN=390703 device type disk format '/home/oracle/backup/%U'
  as compressed backupset database INCLUDE CURRENT CONTROLFILE FOR STANDBY;

RMAN> sql "alter system archive log current";

RMAN> backup device type disk format '/home/oracle/backup/%U' as compressed backupset archivelog from sequence 57;

Close test database and flash it back to the restore point.

SQL> shutdown immediate
SQL> startup mount
SQL> flashback database to restore point PROD_POINT;
SQL> drop restore point PROD_POINT;
SQL> alter database flashback off;
SQL> ALTER DATABASE CONVERT TO PHYSICAL STANDBY;
SQL> startup mount force

Perform any cleanups, like added datafiles while the test database was open.

RMAN> catalog start with '/home/oracle/backup';

searching for all files that match the pattern /home/oracle/backup

List of Files Unknown to the Database
=====================================
File Name: /home/oracle/backup/13kl0v38_1_1
File Name: /home/oracle/backup/12kl0uuv_1_1
File Name: /home/oracle/backup/14kl0v87_1_1

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /home/oracle/backup/13kl0v38_1_1
File Name: /home/oracle/backup/12kl0uuv_1_1
File Name: /home/oracle/backup/14kl0v87_1_1

RMAN> recover database;

Starting recover at 25-JUL-09
using channel ORA_DISK_1
channel ORA_DISK_1: starting incremental datafile backupset restore
...

starting media recovery

channel ORA_DISK_1: starting archive log restore to default destination
...
unable to find archive log
archive log thread=1 sequence=59
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 07/25/2009 12:10:43
RMAN-06054: media recovery requesting unknown log: thread 1 seq 59 lowscn 391208

At the end, recover database will fail, because it cant find the current log, but it doesn’t matter here. Looking at the datafiles, RMAN has automatically created all newly added datafiles. That’s nice and easy.

Now, I’ll check that standby database is working, create a new restore point and open the database.

SQL> alter database recover managed standby database disconnect from session;

Database altered.

SQL> select process, status, sequence# from v$managed_standby where process like 'MRP%';

PROCESS   STATUS        SEQUENCE#
--------- ------------ ----------
MRP0      WAIT_FOR_LOG         59

SQL> alter database recover managed standby database cancel;

Database altered.

SQL> alter database flashback on;

Database altered.

SQL> create restore point prod_point guarantee flashback database;

Restore point created.

SQL> alter database activate standby database;

Database altered.

SQL> startup mount force
ORACLE instance started.
...
Database mounted.
SQL> alter database set standby database to maximize performance;

Database altered.

SQL> alter database open;

Database altered.

That’s it. Pretty easy actually 🙂

So if you think this method can be useful to you, then try it out. I needed to work this out, because test database is on another continent from primary and physical standbys and the network link is pretty slow. Test database is open usually about a month so using incremental backups for refreshing is pretty much the only thing reasonable.

Actually I use this method with backups taken from physical standby database, so in that case it’s not necessary to take the archivelogs also, just shut down media recovery process before starting the backup.

In Estonia we have our own language and Estonian language has its own alphabet that is a little bit different from English 🙂 And it would be good if a database application sorts text fields according to the correct language rules (I have seen many bug reports from clients, that “the sort order is wrong”).

In Oracle database its quite easy to sort according to the correct language rules, you just need to set the correct database session language from the client. This can be done using ALTER SESSION SET NLS_LANGUAGE or quite often the database client picks up the correct language from OS locale.

Small example to demonstrate this (I use Oracle Database 11.1.0.7 for these examples):

create table t (
  t varchar2(100) not null
);

insert into t values ('a');
insert into t values ('ä');
insert into t values ('o');
insert into t values ('õ');
commit;

SQL> alter session set nls_language='estonian';

Session altered.

SQL> select t from t order by t;

T
---------
a
o
õ
ä

SQL> alter session set nls_language='german';

Session altered.

SQL> select t from t order by t;

T
---------
a
ä
o
õ

But what happens if I create an index to speed up the sorting?

SQL> truncate table t;

SQL> create index idxt on t(t);

SQL> 
declare
  s t.t%type;
begin
  for i in 1..100000 loop
    s:= case mod(i,4) 
          when 0 then 'a'
          when 1 then 'ä'
          when 2 then 'o'
          when 3 then 'õ'
        end;
    if mod(i,3) = 0 then
      s:= upper(s);
    end if;
    s:= s||dbms_random.string('x',50);
    insert into t values (s);
  end loop;
  commit;
end;
/
SQL> exec dbms_stats.gather_table_stats(user,'t',cascade=>true);

PL/SQL procedure successfully completed.

SQL> alter session set nls_language='estonian';

Session altered.

SQL> select t from t order by t;

100000 rows selected.

Elapsed: 00:00:02.89

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|  5175K|       |  1482   (2)| 00:00:18 |
|   1 |  SORT ORDER BY     |      |   100K|  5175K|  5896K|  1482   (2)| 00:00:18 |
|   2 |   TABLE ACCESS FULL| T    |   100K|  5175K|       |   162   (2)| 00:00:02 |
-----------------------------------------------------------------------------------

SQL> alter session set nls_language='american';

Session altered.

SQL> select t from t order by t;

100000 rows selected.

Elapsed: 00:00:01.56

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |   100K|  5175K|   895   (1)| 00:00:11 |
|   1 |  INDEX FULL SCAN | IDXT |   100K|  5175K|   895   (1)| 00:00:11 |
-------------------------------------------------------------------------

First I did a query with Estonian language and then for comparison with American language. If you compare these two, the query in American language works as expected, it reads the correct sort order directly from index. But query in Estonian language requires extra sorting. This is also clearly demonstrated in the query elapsed times. Usually this is not a problem at all, but if there is a need for more performance, what if the data sets are really large and could benefit a sorting help from an index?

This is because normal indexes in oracle are always built with BINARY sort mode. This is the fastest sorting mode that orders the characters by their numerical value in the character set. But this is only good for the English language.

Here come the linguistic indexes.

SQL> create index idx_ling on t (nlssort(t, 'nls_sort=estonian'));

Index created.

SQL> alter session set nls_language='estonian';

Session altered.

SQL> select /*+index(t idx_ling)*/ t from t order by t;

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |   100K|    14M|   101K  (1)| 00:20:20 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T        |   100K|    14M|   101K  (1)| 00:20:20 |
|   2 |   INDEX FULL SCAN           | IDX_LING |   100K|       |  1646   (1)| 00:00:20 |
----------------------------------------------------------------------------------------

So it works, there is no extra sorting step anymore. But it’s useless in this case, because the index contains only the sorting value, not the actual column value and the slow TABLE ACCESS BY INDEX ROWID needs to be performed.

This isn’t a problem when I do not need to return the actual value of column t and only need to do the sorting:

SQL> select 1 from t order by t;

-----------------------------------------------------------------------------
| Id  | Operation        | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT |          |   100K|    14M|  1646   (1)| 00:00:20 |
|   1 |  INDEX FULL SCAN | IDX_LING |   100K|    14M|  1646   (1)| 00:00:20 |
-----------------------------------------------------------------------------

Or I could include the actual column value in the index itself:

SQL> create index idx_ling2 on t (nlssort(t, 'nls_sort=estonian'), t);

SQL> select t from t order by t;

------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |   100K|    14M|  2388   (1)| 00:00:29 |
|   1 |  INDEX FULL SCAN | IDX_LING2 |   100K|    14M|  2388   (1)| 00:00:29 |
------------------------------------------------------------------------------

If I change the language, the index is not used anymore.

SQL> drop index idxt;

SQL> alter session set nls_language='american';

Session altered.

SQL> select 1 from t order by t;

-----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100K|  5175K|       |  1482   (2)| 00:00:18 |
|   1 |  SORT ORDER BY     |      |   100K|  5175K|  5896K|  1482   (2)| 00:00:18 |
|   2 |   TABLE ACCESS FULL| T    |   100K|  5175K|       |   162   (2)| 00:00:02 |
-----------------------------------------------------------------------------------

The same linguistic index can also be used for searching, but for that NLS_COMP needs to be set to LINGUISTIC:

SQL> alter session set nls_language='estonian';

Session altered.

SQL> select t from t where t = 'x';

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    53 |   163   (3)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    53 |   163   (3)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("T"='x')

SQL> alter session set nls_comp='linguistic';

Session altered.

SQL> select t from t where t = 'x';

------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |     1 |   157 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| IDX_LING2 |     1 |   157 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access(NLSSORT("T",'nls_sort=''ESTONIAN''')=HEXTORAW('80000100')
              )

It is also possible to combine search and sorting:

SQL> select t from t where t between 'a' and 'b' order by t;

------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |   250 | 39250 |    13   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| IDX_LING2 |   250 | 39250 |    13   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access(NLSSORT("T",'nls_sort=''ESTONIAN''')>=HEXTORAW('14000100')
                AND NLSSORT("T",'nls_sort=''ESTONIAN''')<=HEXTORAW('19000100') )

In conclusion, its possible to use index for sorting in non-English languages, but the solution is not very straight-forward and simple. But at the same time it is powerful, you can index the same column in different languages at the same time! But I'm not yet sure, that I like this FBI kind of solution by Oracle, because when doing SQL tuning I really like the possibility of reading the actual column value from index and not doing the always painfully slow TABLE ACCESS BY INDEX ROWID, if at all possible.
In one of the next posts I will take a look at other databases, how have they solved sorting according to national language rules.

More examples about Linguistic Indexes can be found from Richard Foote blog, he focuses on the case-insensitive search feature:
Introduction To Linguistic Indexes – Part I

Also take a look at Metalink note 227335.1:
Linguistic Sorting - Frequently Asked Questions

And of course... documentation:
Oracle® Database Globalization Support Guide - Linguistic Sorting and String Searching

I have attended some APEX presentations/seminars and one question seems to be repeating and got my attention. Can APEX run on another database?

Well, APEX itself or APEX application itself cannot run in any way on another database besides Oracle. APEX is PL/SQL application and no other database supports Oracle PL/SQL language.

But it’s possible to select and modify data from another database from your APEX application.

First of all you need to set up a database link to the other database using Oracle Heterogeneous Services/Transparent Gateways. I’m not going to get into that right now, please use documentation/google for that.
I’m going to name my database link HSDB:

create database link hsdb 
  connect to "lowercaseusername" 
  identified by "randomCasepaSSword"
  using 'MYSQL_DB_SERVICE';

I’m going to use MySQL database and I’ll create table employees on MySQL database:

CREATE TABLE `employees` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `first_name` varchar(100) NOT NULL,
  `last_name` varchar(100) NOT NULL,
  `personal_code` varchar(20) NOT NULL,
  `birthday` date DEFAULT NULL,
  `salary` decimal(10,2) NOT NULL,
  `is_active` tinyint(4) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Test that table employees is accessible from Oracle:

select * from "employees"@hsdb;

NB! I created the table in MySQL with a lower case table name, but Oracle uses upper case identifiers by default, unless you force the correct case with double-quotes.

One other thing to look out for: check that ODBC/Oracle do not translate any of the columns to the deprecated LONG datatype. LONG columns seems to be used for long/unlimited text fields (text in MySQL, text in PostgreSQL). You can check it by running DESC command from SQLPLUS.

desc "employees"@hsdb
 Name                       Null?    Type
 -------------------------- -------- -----------------
 id                                  NUMBER(10)
 first_name                 NOT NULL VARCHAR2(100)
 last_name                  NOT NULL VARCHAR2(100)
 personal_code              NOT NULL VARCHAR2(20)
 birthday                   DATE
 salary                     NOT NULL NUMBER(10,2)
 is_active                  NOT NULL NUMBER(3)

Make sure, that none of the data types is LONG. If you have LONG column, then you need to modify the data type from the source database or create a “Oracle friendly” view on top of it.
In PostgreSQL ODBC driver its also possible to modify the maximum VARCHAR length before its mapped to LONG.
Anyway, this is the most critical part in my opinion. Make sure that you are happy with the data type mappings before continuing!

Now create an Oracle view on top that MySQL table. This view will translate all column names to upper case and will simplify writing the SQL code.

  CREATE VIEW EMPLOYEES_MYSQL AS 
  SELECT   "id" id,
            "first_name" first_name,
            "last_name" last_name,
            "personal_code" personal_code,
            "birthday" birthday,
            "salary" salary,
            "is_active" is_active
     FROM   "employees"@hsdb;

desc employees_mysql
 Name                    Null?    Type
 ----------------------- -------- ----------------
 ID                               NUMBER(10)
 FIRST_NAME              NOT NULL VARCHAR2(300)
 LAST_NAME               NOT NULL VARCHAR2(300)
 PERSONAL_CODE           NOT NULL VARCHAR2(60)
 BIRTHDAY                DATE
 SALARY                  NOT NULL NUMBER(10,2)
 IS_ACTIVE               NOT NULL NUMBER(3)

Simple report

Creating a simple report is very easy, just use the created EMPLOYEES_MYSQL view in APEX like any other normal Oracle table/view.

Complex report

Writing vanilla-SQL against non-Oracle table over database link usually works, when the SQL is really simple. But there are pretty serious limitations and in some cases the performance “may not be very good” (indexes not used for some data type conversions).
To solve that problem, Oracle has a package DBMS_HS_PASSTHROUGH, that sends unmodified native SQL query to the linked database.

For this example I’ll use the following native MySQL SQL statement for a report:

select sql_cache id, first_name, last_name from employees limit 10

To be able to use this result in a SQL statement, I’m going to use PIPELINED function and for that I first need to create TYPE, that will define the structure of the query output. And after that I’ll create the actual function that will query the remote database.

create or replace type mysql_hstest_type as object (
  id number,
  first_name varchar2(250),
  last_name varchar2(250)
);
/

create or replace type tab_mysql_hstest_type is 
  table of mysql_hstest_type;
/

create or replace FUNCTION mysql_hstest_query 
  RETURN tab_mysql_hstest_type PIPELINED
IS
  p_row mysql_hstest_type:= mysql_hstest_type(null, null, null);
  p_c binary_integer;
BEGIN
  p_c:= DBMS_HS_PASSTHROUGH.OPEN_CURSOR@hsdb;
  DBMS_HS_PASSTHROUGH.PARSE@hsdb(p_c, 
    'select sql_cache id, first_name, last_name from employees limit 10');
  WHILE DBMS_HS_PASSTHROUGH.FETCH_ROW@hsdb(p_c) > 0 LOOP
    DBMS_HS_PASSTHROUGH.GET_VALUE@hsdb(p_c, 1, p_row.id);
    DBMS_HS_PASSTHROUGH.GET_VALUE@hsdb(p_c, 2, p_row.first_name);
    DBMS_HS_PASSTHROUGH.GET_VALUE@hsdb(p_c, 3, p_row.last_name);
    PIPE ROW(p_row);
  END LOOP;
  DBMS_HS_PASSTHROUGH.CLOSE_CURSOR@hsdb(p_c);
  RETURN;
END;
/

Now, to use it in an APEX form just use the following SQL in report:

select * from table(mysql_hstest_query)

Form

First create a form on a view EMPLOYEES_MYSQL with the wizard like for any normal table. This will just create all the necessary page elements quickly.

Its necessary to write optimistic locking feature, because “Automatic Row Processing (DML)” process cannot be used and optimistic locking is a “silent” built in feature of that process. If for some weird reason you do not want the optimistic locking feature, then you can skip the checksum and validation steps.
For checksumming I’ll create hashing function (this one needs execute privileges on DBMS_CRYPTO):

CREATE OR REPLACE FUNCTION form_md5_checksum(
  p1 varchar2 default null, p2 varchar2 default null,
  p3 varchar2 default null, p4 varchar2 default null,
  p5 varchar2 default null, p6 varchar2 default null,
  p7 varchar2 default null, p8 varchar2 default null,
  p9 varchar2 default null, p10 varchar2 default null)
RETURN varchar2 DETERMINISTIC IS
BEGIN
  RETURN rawtohex(DBMS_crypto.hash(UTL_RAW.CAST_TO_RAW(
    p1||'|'||p2||'|'||p3||'|'||p4||'|'||
    p5||'|'||p6||'|'||p7||'|'||p8||'|'||
    p9||'|'||p10), DBMS_CRYPTO.HASH_MD5));
END;
/

Now create a new hidden and protected item in the APEX page, I’ll call it P3_CHECKSUM.
Then create a PL/SQL anonymous block process:

  Name: Calculate checksum
  Sequence: 20 (just after Automated Row Fetch)
  Process Point: On Load - After Header
  Process:

BEGIN
  :p3_checksum:= form_md5_checksum(
    :p3_ID, :p3_FIRST_NAME, :p3_LAST_NAME, :p3_PERSONAL_CODE, 
    :p3_BIRTHDAY, :p3_SALARY, :p3_IS_ACTIVE);
END;

Then I removed “Database Action” from the form buttons and changed the Button Name (the page submit REQUEST value):

Delete - DELETE
Apply Changes - UPDATE
Create - INSERT

The default “Automatic Row Processing (DML)” process cannot be used for saving the chages back to the database, because the ODBC database/driver lacks the support for SELECT FOR UPDATE. Because of it, delete the existing “Automatic Row Processing (DML)” process.

To save the changes, a new procedure is needed:

CREATE OR REPLACE PROCEDURE modify_employees_mysql(
  p_action IN VARCHAR2, p_row IN employees_mysql%rowtype,
  p_md5 varchar2)
IS
  PRAGMA AUTONOMOUS_TRANSACTION;
  p_new_md5 varchar2(50);
  p_new_row employees_mysql%rowtype;
BEGIN
  -- Calculate checksum
  IF p_action IN ('UPDATE','DELETE') AND p_row.id IS NOT NULL THEN
    -- Lock the row
    UPDATE employees_mysql SET first_name = first_name
      WHERE id = p_row.id;
    -- Calculate new checksum
    SELECT * INTO p_new_row FROM employees_mysql WHERE id = p_row.id;
    p_new_md5:= form_md5_checksum(p_new_row.ID, p_new_row.FIRST_NAME,
      p_new_row.LAST_NAME, p_new_row.PERSONAL_CODE, 
      p_new_row.BIRTHDAY, p_new_row.SALARY, p_new_row.IS_ACTIVE);
    -- Check if the checksum has changed
    IF NVL(p_new_md5, '-') <> NVL(p_md5, '-') THEN
      ROLLBACK;
      raise_application_error(-20000, 'Data has changed');
    END IF;
    --
  END IF;
  -- Do the data modifications
  IF p_action = 'INSERT' THEN
    INSERT INTO employees_mysql VALUES p_row;
  ELSIF p_action = 'UPDATE' AND p_row.id IS NOT NULL THEN
    UPDATE employees_mysql SET ROW = p_row WHERE id = p_row.id;
  ELSIF p_action = 'DELETE' AND p_row.id IS NOT NULL THEN
    DELETE FROM employees_mysql WHERE id = p_row.id;
  ELSE
    raise_application_error(-20099, 'Invalid action.');
  END IF;
  commit;
END;
/

Note the “PRAGMA AUTONOMOUS_TRANSACTION” in the above code. I used the default open source MySQL ODBC driver that lacks the support for 2PC (Two Phase Commit). The symptom for this “ORA-02047: cannot join the distributed transaction in progress” when running the procedure inside APEX transaction.
If you are using some commercial ODBC driver with 2PC support or drivers supplied by Oracle HS or Oracle Transparent Gateways, then you don’t need autonomous transaction for this procedure and you also need to remove commit/rollback statements from the procedure.

And finally put this procedure to the APEX page flow.
Create a new PL/SQL anonymous block process:

  Name: Save changes
  Sequence: 30
  Process Point: On Submit - After Computations and Validations
  Process:

DECLARE
  p_row employees_mysql%rowtype;
BEGIN
  p_row.id:= :P3_ID;
  p_row.first_name:= :P3_FIRST_NAME;
  p_row.last_name:= :P3_LAST_NAME;
  p_row.personal_code:= :P3_PERSONAL_CODE;
  p_row.birthday:= :P3_BIRTHDAY;
  p_row.salary:= :P3_SALARY;
  p_row.is_active:= :P3_IS_ACTIVE;
  modify_employees_mysql(:REQUEST, p_row, :p3_checksum);
END;

And now you have it – APEX running on another database 🙂

For every Oracle database session it is always good to set MODULE, ACTION and CLIENT_ID values for instrumentation. This allows DBA to see and debug in database level what the session is/was doing in detail. Oracle diagnostic tools are all powered up for using these values and a lot of diagnostic power is just lost, when the application is not instrumented.

For JDBC one can use end-to-end metrics support in JDBC driver
For PL/SQL one can use DBMS_APPLICATION_INFO subprograms

All this is just great, but what about APEX? I think it’s the best database application development tool in the market today, but has it got this instrumentation already built in?
Yes it has!

SELECT   module,
         action,
         client_id
  FROM   v$active_session_history
 WHERE   module LIKE 'APEX%';

MODULE                  ACTION          CLIENT_ID
----------------------- --------------- ------------------------
APEX:APPLICATION 109    PAGE 7          ILMAR:2697049844839191
APEX:APPLICATION 109    PAGE 12         ILMAR:2697049844839191
APEX:APPLICATION 109    PAGE 6          ILMAR:2697049844839191

MODULE is set to the application number
ACTION contains the page number
CLIENT_ID constains username and session id

This example is from APEX 3.2 and Oracle 11.1.0.7 database.

You can read more about using Oracle diagnostic tools and session tracing from Doug Burns blog:
Session Level ASH Reports

Ilmar Kerm

Archives: Using linguistic indexes for sorting in open source databases

PostgreSQL 8.4

MySQL 5.1

Archives: Refreshing test database from production using incremental backups

Initial setup

Syncing with production

Archives: Using linguistic indexes for sorting in Oracle

Archives: APEX application running on “another” database

Simple report

Complex report

Form

Archives: APEX instrumentation

Categories

About me

Pages

Recent Posts

Tags

Archive