Irrelevant thoughts of an oracle DBA

3 November 2009

Two Oracle RAC bugs on the wall, two Oracle bugs. Take one down …

Filed under: infrastructure,linux,rac — Freek D'Hooge @ 2:50

Ok, not as good as beer and they can give you a nasty headache, so you have been warned   ;)

Reason for this post are 2 bugs I discovered with Oracle RAC, both resulting in a single point of failure.
The platform on which I’m working is Oracle 10gR2 (10.2.0.4) on OEL 4.7.

The first one is when you are using NFS volumes to host the ocr and ocrmirror volumes.
Normally, when the ocr volume gets corrupted or unavailable , oracle should failover to the ocrmirror volume. The exact response is documented in the RAC FAQ on metalink (note 220970.1) and is currently discussed in a series of blog posts by Geert de Paep.
With NFS, however, you must use both the nointr and hard mount options (OS is OEL 4.7) and as a result the process that is trying to read or write an unavailable ocr volume will wait undefinitly on a response. This is not only happening when using commands such as crs_stat or srvctl, but also when an instance or service failover is initiated.
Oracle support however, does not exactly see it this way and has first blamed the os, then the storage and finally stated that there is no failover foreseen between the ocr and ocrmirror volumes…
It took some escalating and a change in support engineer to get some progress in that SR (mind you that after more then 4 months, they still have not acknowledged it as a bug).

The second problem is that, when you made the public interface redundant with os bonding, the racgvip script does not detect when all interfaces in the bond are disconnected.
This is caused because the script, unlike older version, is using mii-tool to check the availability of the public interface. Only when mii-tool states that the link is down, a ping test is done to the public gateway. If that test fails as well, then the vip fails over and the rac instances on that node are placed in a blocked state.
The problem however with mii-tool is that it plays not very well with bonds, and always reports the bond status as being up (in fact, regardless of the link state, mii-tool is always reporting a network bond as “bond0: 10 Mbit, half duplex, link ok”). So, the racgvip script always thinks that the public interface is up.
As mii-tool is an os utility, I first opened a case on the Oracle Enterprise Linux support, to check with them if its behavior was normal (I already confirmed that by googeling, but Oracle support does not seem to accept results from google :)   ). And after running multiple tests with different bond options, they finally stated that mii-tool was indeed obsolete and should not be used to verify a bond status (yes, I know. Its own man page already states that mii-tool is obsolete).
So next, I opened a SR on part of the clusterware and oracle development promptly stated that it was not a clusterware bug but an os issue, pointing the finger to mii-tool and asking where it was written that mii-tool is obsolete… . After making them aware of the statement made by their OEL colleagues and the mii-tool man page, they have seemed to have accepted it as a bug.
I have checked the 11gR2 version of the racgvip script, and it seems to suffer the same problem.

ps) Note 365605.1 – “Oracle Bug Status Codes, Descriptions and Usage” is, although it seems incomplete, very usefull to understand the different status codes

Advertisements

25 January 2008

oracle unbreakable linux with Wim Coekaerts

Filed under: linux,Oracle VM — Freek D'Hooge @ 0:26

Today we had an Oracle partner meeting with Wim Coekaerts.
For those who don’t know who Wim Coekaerts is, he is vice president of Linux Engineering for Oracle (and originating from Belgium, but living in the US for more then 10 years now) and is also known as Oracle’s Mr. Linux.
Needless to say that when I received an inventation to attend a partner meeting with him, I was quickly to confirm my presence.

The presentation that Wim gave was divided into 2 parts: Oracle unbreakable linux and Oracle VM. Both parts where very interesting and at the end we had more then enough time to ask questions.

Some key points that I have written down:

Enterprise linux:
  • Oracle did not launched oracle enterprise linux to bully Redhat or to push Redhat out of the market.
    They came with their own linux support because they felt that oracle customers where not helped sufficiently by Redhat support. As oracle software can be freely downloaded for testing and this is not possible with Redhat linux, oracle came with an own rebuild.
  • Oracle Enterprise linux is not a separate fork and never will be. It is, and stays completely compatible with Redhat linux.
    In fact, when oracle tests its software, they don’t differentiate between Oracle Enterprise linux and Redhat linux as os platform.
  • Oracle did not include an option to the installer to provide it with “preset” options suited to host an oracle database (needed rpm’s, kernel parameters, …), because they did not want to create the appearance that they are creating a fork.
    Instead oracle created the “oracle validated configuration” rpm. When installing this rpm, it will also install all rpm’s necessary, creates the oracle user and sets the kernel parameters and user os limites.
  • Linux (32bit) is the reference platform for all development.
    It is also the platform for all internal servers.
Oracle VM:
  • Wim claimed (and a paper about this would be appearing soon) that an oracle database running in oracle VM will perform at about 90% compared to running it on a real server. With VMware this would only be 70%.
  • You no longer have to license your database for all physical processors on the oracle VM server, but only the number of cpu’s defined in the guest.
    According to Wim there should be a document about this on the oracle site, but the document I found stated that this was only true when using hard partitioning with Oracle VM (http://www.oracle.com/corporate/pricing/partitioning.pdf).
    I will check this further.
  • The license policy for oracle on vmware was not going to change.
  • Grid control 11g will have buildin functionality to manage Oracle VM servers (deploying guests, performing life migrations, …), but the VM-Manager will not disappear.
  • The oracle supplied guest images will be certified for production use somewhere during the second half of this year.
    This would mean that you could download a database image from the oracle site and use it as a production database.

That was about it.
After the presentation I felt more assured that OEL would be here to stay and that the compatibility between Redhat linux and OEL would not disappear in the future.
Not to sure about Oracle VM though, I’m still a little bit anxious of running production databases in a virtualized environment (according to my shrink I have a problem with losing control).

Blog at WordPress.com.