bugs introduced via patches
“Sometimes” you have to apply patches to your oracle system to fix bugs. This is however not without risk, as patches are just code and thus can contain bugs themself. The same is true for the scripts used to apply the patches, as I found out first hand.
Recently I needed to perform a drop node / add node procedure on an oracle 10gR2 rac (on solaris 10).
During this procedure I ran into the following problems:
When using ocrconfig to check the backups of the cluster registry, I got the following error:
ld.so.1: ocrconfig: fatal: libocr10.so: open failed: No such file or directory Killed
After some searching on metalink I found the following bug: 6342492 - ld.so.1: ocrcheck: fatal: libocr10.so: open failed after patch 6000740.
Patch 6000740 is bundle patch MLR7, and the problem with this patch is that the path to make has been hardcoded to a value which is not correct on solaris 10. The solution is to create a symbolic link from /usr/ccs/bin/make to /usr/bin/make, and perform a relink all.
Yep, a hardcoded path. The same problem which requires you to create symbolic links for scp and ssh into the /usr/local/bin directory on solaris 10 when installing a rac system. You would think they learn from their mistakes… .
The second problem I ran into, appeared during the addnode procedure. During the copy of the cluster home to the new node, oracle complained that not all files could be copied.
When I checked the logfile I found the following messages:
WARNING: Error while copying directory /opt/oracle/crs with exclude file list '/tmp/OraInstall2008-02-09_04-20-28PM/installExcludeFile.lst' to no des 'eocpc-rc01'. [PRKC-1073 : Failed to transfer directory "/opt/oracle/crs" to any of the given nodes "eocpc-rc01 ". Error on node eocpc-rc01:tar: ./lib/prod/lib: Permission denied tar: ./lib/prod/lib: Permission denied tar: cannot open ./lib/prod/lib No such file or directory tar: ./lib/prod/lib: Permission denied tar: cannot open ./lib/prod/lib/v9 No such file or directory tar: ./lib/prod/lib: Permission denied tar: cannot open ./lib/prod/lib/v9 No such file or directory tar: ./lib/prod/lib: Permission denied tar: cannot open ./lib/prod/lib/v9 No such file or directory tar: ./lib/prod/lib: Permission denied tar: cannot open ./lib/prod/lib/v9 No such file or directory tar: ./lib/prod/lib: Permission denied tar: cannot open ./lib/prod/lib/v9 No such file or directory tar: ./lib/prod/lib: Permission denied tar: cannot open ./lib/prod/lib/v9 No such file or directory tar: ./lib/prod/lib: Permission denied tar: cannot open ./lib/prod/lib/v9 No such file or directory]
All problem files where located in the same directory, and this directory was lacking write privileges for the oracle user.
[oracle@eocpc-rc02:~]$ find /opt/oracle/crs/lib/prod -exec ls -ald {} \;
dr-xrwx— 3 oracle oinstall 512 May 5 2007 /opt/oracle/crs/lib/prod
drwxrwx— 3 oracle oinstall 512 May 5 2007 /opt/oracle/crs/lib/prod/lib
drwxrwx— 2 oracle oinstall 512 May 5 2007 /opt/oracle/crs/lib/prod/lib/v9
-rw-rw—- 1 oracle oinstall 2824 Mar 13 2003 /opt/oracle/crs/lib/prod/lib/v9/CCrti.o
-rw-rw—- 1 oracle oinstall 1232 Mar 13 2003 /opt/oracle/crs/lib/prod/lib/v9/CCrtn.o
-rw-rw—- 1 oracle oinstall 3064 Mar 13 2003 /opt/oracle/crs/lib/prod/lib/v9/crt1.o
-rw-rw—- 1 oracle oinstall 776 Mar 13 2003 /opt/oracle/crs/lib/prod/lib/v9/crti.o
-rw-rw—- 1 oracle oinstall 712 Mar 13 2003 /opt/oracle/crs/lib/prod/lib/v9/crtn.o
-rw-rw—- 1 oracle oinstall 14736 Mar 13 2003 /opt/oracle/crs/lib/prod/lib/v9/libCCexcept.so.1
-rw-rw—- 1 oracle oinstall 19056 Mar 13 2003 /opt/oracle/crs/lib/prod/lib/v9/libldstab_ws.so
A quick check on a new, net yet patched installation confirmed that this was not a normal situation:
-[oracle@eocsp-rc51:~]$ find /opt/oracle/crs/lib/prod -exec ls -ald {} \;
drwxrwx— 3 oracle oinstall 512 Feb 7 23:47 /opt/oracle/crs/lib/prod
drwxrwx— 3 oracle oinstall 512 Feb 7 23:47 /opt/oracle/crs/lib/prod/lib
drwxrwx— 2 oracle oinstall 512 Feb 7 23:48 /opt/oracle/crs/lib/prod/lib/v9
-rw-rw—- 1 oracle oinstall 2824 Mar 13 2003 /opt/oracle/crs/lib/prod/lib/v9/CCrti.o
-rw-rw—- 1 oracle oinstall 1232 Mar 13 2003 /opt/oracle/crs/lib/prod/lib/v9/CCrtn.o
-rw-rw—- 1 oracle oinstall 3064 Mar 13 2003 /opt/oracle/crs/lib/prod/lib/v9/crt1.o
-rw-rw—- 1 oracle oinstall 776 Mar 13 2003 /opt/oracle/crs/lib/prod/lib/v9/crti.o
-rw-rw—- 1 oracle oinstall 712 Mar 13 2003 /opt/oracle/crs/lib/prod/lib/v9/crtn.o
-rw-rw—- 1 oracle oinstall 14736 Mar 13 2003 /opt/oracle/crs/lib/prod/lib/v9/libCCexcept.so.1
-rw-rw—- 1 oracle oinstall 19056 Mar 13 2003 /opt/oracle/crs/lib/prod/lib/v9/libldstab_ws.so
Making the lib/prod directory in the crs home writable again for the oinstall group did indeed solve the problem.
Sofar I have found 2 different patches which introduces this problem:
- 6000740 MLR7
- 5749953 ons sigbus error after install patchset 10.2.0.3 for crs
A quick search on metalink did not found this problem listed as a known bug. I will file one next week. (If I find the time)
Interesting to note is that these 2 problems are also not likely to be discovered by applying the patch on a test system.
I don’t think there are many dba’s who are performing an addnode procedure when testing a new patchset…
From my experience, addnode and deletenode procedures on 10.2.0.x systems, regardless of which patches have been applied, are a 50-50 bet. Each patch that fixes an issue creates several new ones which you may or may not encounter.
The RAC-packers from Oracle insist that 11g is much improved in this aspect. We’ll see.
Comment by Chen Shapira — 8 March 2008 @ 20:56
Chen,
I don’t have to perform an addnode or deletenode often, so I can’t really say if it gives many problems.
The second problem is easy enough to solve, but the first one requires downtime (and try to explain you need downtime to fix a bug in a patch :/ )
On the bright site, it also means that dba’s are not likely to become obsolete in the near future :)
Comment by dhoogfr — 9 March 2008 @ 10:12
good docs. thanks
Comment by Sefa Şahin — 15 June 2008 @ 11:00