Irrelevant thoughts of an oracle DBA

21 September 2008

ORA-08103 revisisted

Filed under: Uncategorized — dhoogfr @ 0:39

In an earlier post I had already written about sessions receiving an ORA-08103 error after selecting from a table that had been truncated. At that time, I wrote the following conclusion:

“At this point I have no real explanation as for why the error occurs at this number of records fetched.
I suspect oracle of storing the locations of the blocks it has to scan during a full table in a kind of array with a limited length. After this array has been processed, Oracle would use the dataobj# value (and other keys) to get the next batch of block locations. At that moment the “object no longer exists” error would be thrown as the referenced dataobj# value no longer exists (it has been increased by 1).
But at this moment this is just pure speculation.”

Some weeks ago, I read an answer on the Oracle-L list from Tanel Poder to an ORA-08103 question, in which he stated that oracle would read 10 pointers to extents from the segment header at a time.

A quick test does seem to confirm this:

The TEST tablespace is a LMT with an uniform extent size of 16K

drop table test;

Table dropped.

fdh@GUNNAR> create table test (veld1 char(2000)) tablespace test pctfree 0;

Table created.

fdh@GUNNAR> insert into test select 'x' from all_objects where rownum <= 200;

200 rows created.

fdh@GUNNAR> commit;

Commit complete.

fdh@GUNNAR> select bytes, blocks, extents from user_segments where segment_name = 'TEST';

BYTES     BLOCKS    EXTENTS
---------- ---------- ----------
425984         52         26

fdh@GUNNAR> set serveroutput on
fdh@GUNNAR> -- set the arraysize to 1 to prevent oracle from "prefetching" records
fdh@GUNNAR> set arraysize 1
fdh@GUNNAR> var x refcursor
fdh@GUNNAR>
fdh@GUNNAR> BEGIN
2
3      open :x for select veld1 from test;
4
5  END;
6  /

PL/SQL procedure successfully completed.

fdh@GUNNAR>
fdh@GUNNAR> DECLARE
2
3      l_veld1   char(2000);
4
5  BEGIN
6
7      fetch :x into l_veld1;
8      dbms_output.put_line('veld1: ' || trim(l_veld1));
9
10  END;
11  /
veld1: x

PL/SQL procedure successfully completed.

-- AT THIS MOMENT ANOTHER SESSION TRUNCATES THE TABLE

fdh@GUNNAR>
fdh@GUNNAR> DECLARE
2
3      l_veld1   char(2000);
4      l_cnt     number(3,0)   := 0;
5
6  BEGIN
7
8      BEGIN
9
10          loop
11
12              fetch :x into l_veld1;
13              exit when :x%NOTFOUND;
14              l_cnt := l_cnt + 1;
15
16          end loop;
17
18      EXCEPTION
19          when others then
20              dbms_output.put_line(SQLERRM);
21
22      END;
23
24      dbms_output.put_line('fetched rows: ' || l_cnt);
25
26  END;
27  /
ORA-08103: object no longer exists
fetched rows: 75

PL/SQL procedure successfully completed.

fdh@GUNNAR>

My block size is 8K, so I could store 4 records per db block. Before the truncate I read 1 record and after the truncate I could read 75 records before getting the error. 76 records / 4 records per block / 2 blocks per extent = 9.5 extents. Remember that the first block is used for the segment header, so pointer 1 would only point to an extent with 1 data block ( meaning that my 10 pointers point in total to 19 blocks * 4 = 76 records).

This answers the question on when you would receive the ORA-08103 error after a truncate, but also means that your dbfmrc is limited by your extent size.
In my example the 10 pointers would reference 20 blocks (not taking in account the segment header), so a dbfmrc of more then 20 would not be possible for this table.

New question: Why is a multiblock read not crossing extent bounderies?

20 August 2008

Just because its printed, doesn’t mean its true

Filed under: infrastructure, rant — dhoogfr @ 0:55

That statement is often written by Jonathan Lewis and today I was reminded on how true it is.
I had a discussion today with two of my colleagues who wanted to increase the number of arch processes on a dataguard system. As reason they pointed to metalink note 468817.1 – “RFS: possible network disconnect while taking rman backup on primary site”, which makes the following statement:

“In a Data Guard Configuration, during Scheduled RMAN Backup no Redo is transported to the Standby Server as the ARCH Process is blocked (as expected ie. RMAN would utilize 1 ARCn Process and the other ARCn for local Archiving ) which means the Standby stays out of sync (assuming max_arch_processes=2) until the ArchiveLog is manually copied across and registered to the Standby Database after which the Standby Database resumes applying the ArchiveLogs.”

The first thing that drew my attention was the part about rman utilizing an ARCn process.
While the ARCn processes are indeed responsible for archiving the online redo log files, it is the sessions own server process that does this when issuing an “alter system archive log current” command. When rman forces a log file to be archived, the same thing happens. This can easily be verified by looking to the “creator” column in the v$archived_log view. A strace of the rman server process would also prove that it is this process which reads the archived redo logs and streams them to the rman client to be written in a backup piece.

1 – 0 for me

The second thing I noticed was that, according to the note, the standby would remain out of sync until the archivelog was manually copied across and registered to the standby db.
Even if the rman process was using an ARCn process, leaving no processes to copy the archivelog over (with max_arch_processes=2), the standby db would normally be able to pickup the synchronization again after rman would have released the ARCn process again.

2 – 0 for me
and end of discussion.

So, “just because its printed, doesn’t mean its true”, would also apply on metalink notes.
Luckily there is a feedback link at the end of the note, so I hope it will soon be removed or modified.

And yes, I can get a little bit competitive in discussions.
How did you guess?

31 July 2008

Wordle

Filed under: personal — dhoogfr @ 0:39

Karen Morton has written a post about a site which takes for example an rss feed and turns it into a word-cloud.
I just had to try it out, so here is the word cloud generated from the rss feed from my blog:

I like the fact that “backup” has such a prominent place…

21 July 2008

AnalyzeThis

Filed under: Uncategorized — dhoogfr @ 12:22

A while ago I had written a post with the question if I would release a package as open source or just provide a wrapped form of it. Since then, it became clear to me that I simply don’t have the time to provide proper support for the package.
Meaning I have two choises, either I never release the package to the public and only use it for my own purposes, or I publish the unwrapped code.

For me, option 1 is not realy an option, so I’m proud to present to you: AnalyzeThis !

uhm, ok, fantastic, euhm… so what does it do?

Well, I have created a page with a little faq (which will have to act as documentation), but in a nuttshell: It helps you to gather and manage the cbo statistics for your database by making it possible to store the dbms_stats option values you want to use into a table and to specify default options on several levels. I also automates the backup of statistics before overwriting them and logs information about the generation of new statistics (including timing information).

Currently you have two options with dbms_stats: either you let oracle determine which stats options to use for which object or you specify seperate options for each object yourself. There is nothing in between.
With AnalyzeThis you now have a third option: specify the objects (for which you can also use wildcards) for which you want to use a certain parameter set and let the other objects use the default parameters.

I don’t know if other people think there is a need for such a package, but I did. So here it is.
Don’t expect to much support for it though. Remember the main reason why I’m publising the code is because I don’t have the time to provide proper supprt.
I will still continue to further development this package, so irregular updates and bugfixes will be made available for download in the future.

10 July 2008

Why is rman not removing my obsolete backups?

Filed under: backup — dhoogfr @ 13:06

Last night I received a filesystem free space warning from one of our clients servers.
The offending filesystem was the filesystem used to store the rman backup pieces before they are copied to tape via a filesystem backup, and a quick ls showed that there where more rman backups still on disk then expected.

Normally the old backups are removed from disk through the rman “delete obsolete recovery window of 2 days;” command at the end of the backup script, but there was still a backup of 3 days back on disk.
In the logfile the output of the delete obsolete command showed that there where no obsolete backups found and the list backup proved that the backup on disk was still known to rman.

After a close inspection of the logfiles, I think I have found the reason.
The table below shows the completion time of the last backupset containing datafiles and the time on which the “delete obsolete” command ran:

Date

Completion time

Obsolete

10/07

04:20

04:21

09/07

04:20

04:22

08/07

04:27

04:28

07/07

04:09

04:11

As you can see the script checked today at 04:21 for obsolete backups (with the recovery window of 2 days), but the backup of the datafiles at 08/07 was finished at 04:28, meaning rman could not use this backup to satisfy the earliest necessary restore point of 08/07 04:21 and therefor the backup of 07/07 (and all archivelogs between) where still needed for the recovery window of 2 days.

Normally this should not be a real problem, but you must keep this in mind when calculating how much disk space you need to store your rman backups to disk.
A possible solution is to not use the “recovery window” clause but rather the “redundancy” or the “completed before” clause (and the use a fix starting point in the date calculation). Another option would be to remove the deletion from the backup script and put it in a seperate maintenance script that is run on a fixed time (well after the backup time), so that the deletion is not affected by the backup duration.

For those interested, I have posted below an sql script that can be used to follow up on your backup sizes and durations:

set linesize 200

column bjob format a15
column recid format 99999
column start_time format a20
column handle format a55
column mb format 99G999D99

break on bjob skip 1 on recid on backup_type on start_time

compute sum of mb on bjob

with backup_set_details
as
( select set_count, set_stamp, session_recid, session_stamp
from v$backup_set_details
group by set_count, set_stamp, set_count, session_recid, session_stamp
)
select rbjd.session_key ||',' || rbjd.session_stamp bjob, bs.recid recid, bs.backup_type,
to_char(bs.start_time, 'DD/MM/YYYY HH24:MI:SS') start_time,
bp.piece#, bp.bytes/1024/1024 mb, bp.compressed, bp.handle
from v$backup_set bs, v$backup_piece bp, backup_set_details bsd, v$rman_backup_job_details rbjd
where bs.set_stamp = bp.set_stamp
and bs.set_count = bp.set_count
and bs.set_stamp = bsd.set_stamp
and bs.set_count = bsd.set_count
and bsd.session_recid = rbjd.session_recid
and bsd.session_stamp = rbjd.session_stamp
and bp.status != 'D'
order by bs.start_time, bs.recid, bp.piece#;

7 July 2008

Knowing the history of an application

Filed under: rant — dhoogfr @ 1:30

The company I work for has many hospitals as client, and most of them use the same set of applications. One of those applications (which will be kept unnamed) is always giving us performance problems. Through time we discovered many design problems in its database structure, such as not using primary keys or foreign keys, storing dates in separate year, month, day fields,….
Also the developers seemed to be very into using cryptic names for the table and field names (all table names are like on150, on010, pt010, on070, zv054, … with field names as tgcde, innr1, crcmg, …), which is really handy when you have to tune a query. Another nice design decision they made was to include a column in each table storing the table name, and then include this column as leading column in almost every single indexes (including in the unique indexes).

A couple of days ago we were asked again to tune 3 custom queries, which started to take hours (days) to run after the application had been upgraded. Unlike other occasions, this time we did manage to receive a part of the db layout from the application vendor, including short descriptions for the tables and columns.

From the moment I looked at the documenation, things fell into place.
You see, I’m old enough to still had a cobol course at college, and the documentation layout remembered me very much to cobol key-sequenced files.
Apparently, when they switched to using oracle they just took the files and recreated them as tables. Without making any further change in the db design. Neither did their developers made the mentally switch towards “set oriented” programming (read: sql) to retreive data, but instead continued to “read” from the tables as they were still using files (In the email conversation I had with their customer support, they where actually still speaking of files instead of tables). Even the mystery of the column storing the table name was explained, as each “file” layout in the documentation had the sentence: “number of record formats in this file” next to it… .

Of course, knowing why the application has such a dreadful design does not help me much to rewrite the sql statements. But at least it stopped me from willing to kill the developers, as I now know that they just don’t realize that they are no longer working with key-sequenced files, but with a relational database instead.

Ok, I admit… I still want to torture them. But just a little bit

10 June 2008

bottle water

Filed under: personal — dhoogfr @ 1:06

This weekend I saw a documentary on television about the sense or nonsense of bottle water.
In this documentary is was pointed out that in mosts part of the world, tap water is perfectly save to drink (in Belgium it is actually more strictlty regulated then bottle water), that tap water is cheaper and that bottle water has a high impact on the environment (mainly because of transportation and the plastic bottles, even when they are recycled).

They also zoomed in to some of the more extreme forms of bottle water: Appereantly there exists something that is called “cloud juice” which can be ordered in the more fancier restaurants and is nothing more then rain water from King Island, Australia. Another example was water from Fiji, an area where lots of people don’t have access to clean water!

I must admit that I had not given it much thought until today, and in spite of the fact that I always have drunk tap water at my parents, I’m drinking bottled water today.
But after having seen this documentary my wife and I decided to switch to tap water. No more heavy lifting of bottles of water (except for little Gunnar who still requires water with a low level of minerals).

Ok, to be honest I’m drinking bottle water as I’m writing this. A special kind even: fire water aka a 14 year old anConoc single malt :)

24 May 2008

So, you think to know who is blocking you, do you?

Filed under: Uncategorized — dhoogfr @ 23:09

Recently I needed to verify why an update kept hanging on one of our client’s new rac system.
I suspected this to be a locking issue, so I queried v$session with a self join on the blocking_session column to check if the session was indeed blocked and if so who was blocking it.

The result showed that the session was indeed blocked and that the blocking session was issued by the sysman user.
Which was a little bit odd.

After running some queries to find out what the hell sysman was doing (which resulted in zero answers), I ran an older script I have that joins v$lock with v$session (next to v$locked_object and dba_objects) which showed a different session as blocking session.
Which was a little bit odd.

Additional queries confirmed that it was indeed this other session which was holding the lock (comming of course from the same user on a different machine, on which he forgot to issue a commit).

Currious about this, I checked the output from the queries again and I noticed that the two reported sessions where off by 1. In v$session.blocking_session the value was 1 higher then reported from v$lock.
So I did some tests if this result was consistent.

C:\>sqlplus fdh/fdh

SQL*Plus: Release 10.2.0.1.0 - Production on Wed May 14 17:40:25 2008

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters and Data Mining options

fdh@TRIDION> create table test
  2  ( veld1  number)
  3  tablespace sysaux;

Table created.

fdh@TRIDION> insert into test values (1);

1 row created.

fdh@TRIDION> commit;

Commit complete.

fdh@TRIDION> select sid from v$mystat where rownum = 1;

       SID
----------
       502

fdh@TRIDION> update test set veld1 = 2;

1 row updated.

fdh@TRIDION>

I now have an unfinished transaction in session with sid 502.
I will now open a new session (on the same instance) and try to update the same statement.

C:\>sqlplus fdh/fdh

SQL*Plus: Release 10.2.0.1.0 - Production on Wed May 14 17:40:40 2008

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters and Data Mining options

fdh@TRIDION> select sid from v$mystat where rownum = 1;

       SID
----------
       322

fdh@TRIDION> update test set veld1 = 3;

I now have a blocking lock, so lets check v$session to see the value of the blocking_session column:

sys@TRIDION> select sid, serial#, username, blocking_session from gv$session where blocking_session is not null;

SID    SERIAL# USERNAME                       BLOCKING_SESSION
---------- ---------- ------------------------------ ----------------
322      17689 FDH                                         503

I did the test several times, each time the value in blocking_session was 1 higher then the actual blocking session.
However, If the 2 sessions where on different nodes, the reported sid was correct. Same for single instances.
So I rule seems to be that if both the blocking and the blocked session are on the same rac instance, the value in the blocking_session column in v$session is 1 higher then the actual blocking sid.

A search on metalink showed that this is in fact a known bug: Bug 5481650, and that it is fixed in 10.2.0.4.
An interim patch is availabe for 10.2.0.2 and 10.2.0.3 for both solaris sparc as linux 64bit (don’t know if the problem is restricted to these two platforms).

So be warned the next time you need to verify a locking issue on your rac database, oracle might lie to you about who is guilty

5 May 2008

10.2.0.4 available on solaris sparc 64bit

Filed under: Uncategorized — dhoogfr @ 11:27

Just noticed on www.dba-village.com that the 10.2.0.4 patch is finally available for sparc 64bit.

Just to let you know…

1 April 2008

open source or freeware

Filed under: opinion — dhoogfr @ 9:39

Say you have put in a lot of effort in developing a pl/sql package that helps you in your job.
You think it does a reasonable decent job and you are thinking about releasing it to the rest of the oracle community.
However you are affraid to lose control over your application. I mean, you have put so much work in it, can you then just give it out hand or would you still want to control it?

Of course you could wrap it and distribute it as freeware, but who would install a wrapped package from the internet into their production databases?
You could try to work arround that by splitting of part of the package needing a higher level of privileges into separate packages which are left unwrapped and can be installed into a different schema. But would this be sufficient?

You would also be the sole responsible for support and bug fixing as you are the only one with access to the code. That is the only one, excluding those who have access to an unwrapper, with which they can revert your code back into its original state (including variable names).

Another option would be to share the source code, but what would happen then? Probably people would start modifying your code.
Is this a bad thing? Maybe that depends if they afterwards would start bothering you with questions as “why it doesn’t work” or not.
Would you allow people to send in their modifications or would you only maintain your own version? Could these modified “versions” then still use the same name as your original package or not?

Then there is the question of licensing. Would you just publish the code, or would you put a licensing on it? If so, which one?

Many questions and few answers.
So tell me, what would you do and how would you do it?

« Previous PageNext Page »

Blog at WordPress.com.