Saturday 26 March 2016

IMR (Instance Membership Recovery)


Communication b/w the 2 or more instance through LMON Process interconnection.
When a communication failure occurs between the instances, or when an instance is not able to issue the heartbeat information to the controlfile, the cluster group may be in danger of possible data corruption. 
In addition, when no mechanism is present to detect the failures, the entire cluster will hang. 
To address the issue, IMR was introduced in Oracle 9i and improved in Oracle 10g. 
IMR removes the failed instance from the cluster group. When a subset of a cluster group survives during failures, IMR ensures that the larger partition group survives and kills all other smaller groups.

IMR is a part of the service offered by Cluster Group Services (CGS). LMON is the key process that handles many of the CGS functionalities. As you know, cluster software (known as Cluster Manager, or CM) can be a vendor-provided or Oracle-provided infrastructure tool. CM facilitates communication between all nodes of the cluster and provides information on the health of each node—the node state. It detects failures and manages the basic membership of nodes in the cluster. CM works at the cluster level and not at the database or instance level.

Inside Oracle RAC, the Node Monitor (NM) provides information about nodes and their health by registering and communicating with the CM. NM services are provided by LMON. 
Node membership is represented as a bitmap in the GRD.
A value of
0 denotes that a node is down, and a value of
1 denotes that the node is up.
There is no value to indicate a “transition” period such as during bootup or shutdown.
LMON uses the global notification mechanism to let others know of a change in the node membership. Every time a node joins or leaves a cluster, this bitmap in the GRD has to be rebuilt and communicated to all registered members in the cluster.

Node membership registration and deregistration is done in a series of synchronized steps—a topic beyond the scope of this chapter. Basically, cluster members register and deregister from a group.
The important thing to remember is that NM always communicates with the other instances in the cluster about their health and status using the CM.
In contrast, if LMON needs to send a message to LMON on another instance, it can do so directly without the help or involvement of CM. It is important to differentiate between cluster communication and Oracle RAC communication.


No comments: