Irrelevant thoughts of an oracle DBA

8 March 2008

bugs introduced via patches

Filed under: infrastructure,rac — dhoogfr @ 13:58

“Sometimes” you have to apply patches to your oracle system to fix bugs. This is however not without risk, as patches are just code and thus can contain bugs themself. The same is true for the scripts used to apply the patches, as I found out first hand.
Recently I needed to perform a drop node / add node procedure on an oracle 10gR2 rac (on solaris 10).
During this procedure I ran into the following problems:

When using ocrconfig to check the backups of the cluster registry, I got the following error:

ld.so.1: ocrconfig: fatal: libocr10.so: open failed: No such file or directory
Killed

After some searching on metalink I found the following bug: 6342492 – ld.so.1: ocrcheck: fatal: libocr10.so: open failed after patch 6000740.
Patch 6000740 is bundle patch MLR7, and the problem with this patch is that the path to make has been hardcoded to a value which is not correct on solaris 10. The solution is to create a symbolic link from /usr/ccs/bin/make to /usr/bin/make, and perform a relink all.

Yep, a hardcoded path. The same problem which requires you to create symbolic links for scp and ssh into the /usr/local/bin directory on solaris 10 when installing a rac system. You would think they learn from their mistakes… .

The second problem I ran into, appeared during the addnode procedure. During the copy of the cluster home to the new node, oracle complained that not all files could be copied.
When I checked the logfile I found the following messages:

WARNING: Error while copying directory /opt/oracle/crs with exclude file list '/tmp/OraInstall2008-02-09_04-20-28PM/installExcludeFile.lst' to no
des 'eocpc-rc01'. [PRKC-1073 : Failed to transfer directory "/opt/oracle/crs" to any of the given nodes "eocpc-rc01 ".
Error on node eocpc-rc01:tar: ./lib/prod/lib: Permission denied
tar: ./lib/prod/lib: Permission denied
tar: cannot open ./lib/prod/lib No such file or directory
tar: ./lib/prod/lib: Permission denied
tar: cannot open ./lib/prod/lib/v9 No such file or directory
tar: ./lib/prod/lib: Permission denied
tar: cannot open ./lib/prod/lib/v9 No such file or directory
tar: ./lib/prod/lib: Permission denied
tar: cannot open ./lib/prod/lib/v9 No such file or directory
tar: ./lib/prod/lib: Permission denied
tar: cannot open ./lib/prod/lib/v9 No such file or directory
tar: ./lib/prod/lib: Permission denied
tar: cannot open ./lib/prod/lib/v9 No such file or directory
tar: ./lib/prod/lib: Permission denied
tar: cannot open ./lib/prod/lib/v9 No such file or directory
tar: ./lib/prod/lib: Permission denied
tar: cannot open ./lib/prod/lib/v9 No such file or directory]

All problem files where located in the same directory, and this directory was lacking write privileges for the oracle user.

[oracle@eocpc-rc02:~]$ find /opt/oracle/crs/lib/prod -exec ls -ald {} \;
dr-xrwx---   3 oracle   oinstall     512 May  5  2007 /opt/oracle/crs/lib/prod
drwxrwx---   3 oracle   oinstall     512 May  5  2007 /opt/oracle/crs/lib/prod/lib
drwxrwx---   2 oracle   oinstall     512 May  5  2007 /opt/oracle/crs/lib/prod/lib/v9
-rw-rw----   1 oracle   oinstall    2824 Mar 13  2003 /opt/oracle/crs/lib/prod/lib/v9/CCrti.o
-rw-rw----   1 oracle   oinstall    1232 Mar 13  2003 /opt/oracle/crs/lib/prod/lib/v9/CCrtn.o
-rw-rw----   1 oracle   oinstall    3064 Mar 13  2003 /opt/oracle/crs/lib/prod/lib/v9/crt1.o
-rw-rw----   1 oracle   oinstall     776 Mar 13  2003 /opt/oracle/crs/lib/prod/lib/v9/crti.o
-rw-rw----   1 oracle   oinstall     712 Mar 13  2003 /opt/oracle/crs/lib/prod/lib/v9/crtn.o
-rw-rw----   1 oracle   oinstall   14736 Mar 13  2003 /opt/oracle/crs/lib/prod/lib/v9/libCCexcept.so.1
-rw-rw----   1 oracle   oinstall   19056 Mar 13  2003 /opt/oracle/crs/lib/prod/lib/v9/libldstab_ws.so

A quick check on a new, net yet patched installation confirmed that this was not a normal situation:

-[oracle@eocsp-rc51:~]$ find /opt/oracle/crs/lib/prod -exec ls -ald {} \;
drwxrwx---   3 oracle   oinstall     512 Feb  7 23:47 /opt/oracle/crs/lib/prod
drwxrwx---   3 oracle   oinstall     512 Feb  7 23:47 /opt/oracle/crs/lib/prod/lib
drwxrwx---   2 oracle   oinstall     512 Feb  7 23:48 /opt/oracle/crs/lib/prod/lib/v9
-rw-rw----   1 oracle   oinstall    2824 Mar 13  2003 /opt/oracle/crs/lib/prod/lib/v9/CCrti.o
-rw-rw----   1 oracle   oinstall    1232 Mar 13  2003 /opt/oracle/crs/lib/prod/lib/v9/CCrtn.o
-rw-rw----   1 oracle   oinstall    3064 Mar 13  2003 /opt/oracle/crs/lib/prod/lib/v9/crt1.o
-rw-rw----   1 oracle   oinstall     776 Mar 13  2003 /opt/oracle/crs/lib/prod/lib/v9/crti.o
-rw-rw----   1 oracle   oinstall     712 Mar 13  2003 /opt/oracle/crs/lib/prod/lib/v9/crtn.o
-rw-rw----   1 oracle   oinstall   14736 Mar 13  2003 /opt/oracle/crs/lib/prod/lib/v9/libCCexcept.so.1
-rw-rw----   1 oracle   oinstall   19056 Mar 13  2003 /opt/oracle/crs/lib/prod/lib/v9/libldstab_ws.so

Making the lib/prod directory in the crs home writable again for the oinstall group did indeed solve the problem.
Sofar I have found 2 different patches which introduces this problem:

  • 6000740 MLR7
  • 5749953 ons sigbus error after install patchset 10.2.0.3 for crs

A quick search on metalink did not found this problem listed as a known bug. I will file one next week. (If I find the time)

Interesting to note is that these 2 problems are also not likely to be discovered by applying the patch on a test system.
I don’t think there are many dba’s who are performing an addnode procedure when testing a new patchset…

About these ads

3 Comments »

  1. From my experience, addnode and deletenode procedures on 10.2.0.x systems, regardless of which patches have been applied, are a 50-50 bet. Each patch that fixes an issue creates several new ones which you may or may not encounter.

    The RAC-packers from Oracle insist that 11g is much improved in this aspect. We’ll see.

    Comment by Chen Shapira — 8 March 2008 @ 20:56 | Reply

  2. Chen,

    I don’t have to perform an addnode or deletenode often, so I can’t really say if it gives many problems.
    The second problem is easy enough to solve, but the first one requires downtime (and try to explain you need downtime to fix a bug in a patch :/ )

    On the bright site, it also means that dba’s are not likely to become obsolete in the near future :)

    Comment by dhoogfr — 9 March 2008 @ 10:12 | Reply

  3. good docs. thanks

    Comment by Sefa Şahin — 15 June 2008 @ 11:00 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: