Last night I received a filesystem free space warning from one of our clients servers.
The offending filesystem was the filesystem used to store the rman backup pieces before they are copied to tape via a filesystem backup, and a quick ls showed that there where more rman backups still on disk then expected.
Normally the old backups are removed from disk through the rman “delete obsolete recovery window of 2 days;” command at the end of the backup script, but there was still a backup of 3 days back on disk.
In the logfile the output of the delete obsolete command showed that there where no obsolete backups found and the list backup proved that the backup on disk was still known to rman.
After a close inspection of the logfiles, I think I have found the reason.
The table below shows the completion time of the last backupset containing datafiles and the time on which the “delete obsolete” command ran:
Date
Completion time
Obsolete
10/07
04:20
04:21
09/07
04:20
04:22
08/07
04:27
04:28
07/07
04:09
04:11
As you can see the script checked today at 04:21 for obsolete backups (with the recovery window of 2 days), but the backup of the datafiles at 08/07 was finished at 04:28, meaning rman could not use this backup to satisfy the earliest necessary restore point of 08/07 04:21 and therefor the backup of 07/07 (and all archivelogs between) where still needed for the recovery window of 2 days.
Normally this should not be a real problem, but you must keep this in mind when calculating how much disk space you need to store your rman backups to disk.
A possible solution is to not use the “recovery window” clause but rather the “redundancy” or the “completed before” clause (and the use a fix starting point in the date calculation). Another option would be to remove the deletion from the backup script and put it in a seperate maintenance script that is run on a fixed time (well after the backup time), so that the deletion is not affected by the backup duration.
For those interested, I have posted below an sql script that can be used to follow up on your backup sizes and durations:
set linesize 200 column bjob format a15 column recid format 99999 column start_time format a20 column handle format a55 column mb format 99G999D99 break on bjob skip 1 on recid on backup_type on start_time compute sum of mb on bjob with backup_set_details as ( select set_count, set_stamp, session_recid, session_stamp from v$backup_set_details group by set_count, set_stamp, set_count, session_recid, session_stamp ) select rbjd.session_key ||',' || rbjd.session_stamp bjob, bs.recid recid, bs.backup_type, to_char(bs.start_time, 'DD/MM/YYYY HH24:MI:SS') start_time, bp.piece#, bp.bytes/1024/1024 mb, bp.compressed, bp.handle from v$backup_set bs, v$backup_piece bp, backup_set_details bsd, v$rman_backup_job_details rbjd where bs.set_stamp = bp.set_stamp and bs.set_count = bp.set_count and bs.set_stamp = bsd.set_stamp and bs.set_count = bsd.set_count and bsd.session_recid = rbjd.session_recid and bsd.session_stamp = rbjd.session_stamp and bp.status != 'D' order by bs.start_time, bs.recid, bp.piece#;