Regular database restore tests are important
I came to work today morning and there was an alert in my inbox saying that one of the large databases failed the nightly restore test.
Looking into RMAN logs I saw that recovery failed when applying archivelogs and error was something I have never seen before:
ORA-00756: recovery detected a lost write of a data block ORA-10567: Redo is inconsistent with data block (file# 3, block# 37166793, file offset is 3350347776 bytes) ORA-10564: tablespace SYSAUX ORA-01110: data file 3: '/nfs/...' ORA-10560: block type 'FIRST LEVEL BITMAP BLOCK'
Version: 18.104.22.168 with 2017-08 bundle patch
Looking at MOS I see two bugs that could match:
- Bug 22302666 ORA-753 ORA-756 or ORA-600  with KCOX_FUTURE after RMAN Restore / PITR with BCT after Open Resetlogs in 12c
but this bugfix is already included in 2017-08 bundle patch, and:
- Bug 23589471 ORA-600  KCOX_FUTURE or ORA-756 Lost Write detected during Recovery of a RMAN backup that used BCT
Looks like this matches quite well with my situation and the note has a really scary sentence in it: Incremental RMAN backups using BCT may miss to backup some blocks.
Bugs exists in modern software, bugs exist even in rock solid Oracle database backup and recovery procedures. It doesn’t matter if the backup was completed successfully, the state of the backup can only be determened when a restore is attempted. So please start testing your backups. Regularly. Daily.