Insights and Strategies for DBA: March 2016

Wednesday, 30 March 2016

gc cr block congested

gc cr block congested

- This event is the same as a "gc current block congested".

gc current grant congested

Definition

- Whenever any instance (HOST-03) request for any data block in any mode, this request will be served by MASTER NODE of that data block.
- Meanwhile requesting instance (HOST-03) is waiting for approval from master instance to perform physical IO to read data block from DISK.
- This wait event is "gc current grant congested"

gc current block congested

gc current block congested

Definition

- Whenever any instance (HOST-03) request for any data block in any mode, this request will be served by MASTER NODE (HOST-01) of that data block.
- LMS process running on master node will provide data block to the requesting instance LMS process.
- Now LMS process running on both the nodes are highly loaded so there would be wait event ""gc current block congested

Reason

- Highly loaded LMS process or CPU.

Troubleshooting

- Increase number of LMS Processes by setting "GCS_SERVER_PROCESS" in 9i RAC onward.
- Optionally you can also set "_LM_LSM" till OPS.
- Tune OS
- Add CPUs

gc current buffer busy

gc current buffer busy

This wait event appears mostly in SINGLE INSTANCE when more than one sessions are trying to access same data block.

Troubleshooting

- Application tuning

gc current block busy

gc current block busy

Definition

- An instance request for any data block in current mode, it send a request to the master.
- If master is the holder of that data block and also has already modified that block.
- Then master will retain PI block for itself.
- Master will also flush respective redo log to log file before sending CURRENT block to requesting instance.
- Now the block transfer delayed on requsting instance.
- Meanwhile requesting instance will wait in "gc current block busy"

Reason

- The block was being used by a session on another instance.
- The block transfer was delayed because the holding instance could not write the corresponding redo record to the online redo log file.

Troubleshooting

- Tune LGWR
- Appropriate Application Partitioning
- Tune N/w

gc cr grant 2-way

gc cr grant 2-way

Definition

- An instance request any data block in CR MODE for select from Master.
- If the data block is not cached on any instance even on master too, then master instance will send a message to the requesting instance granting the SHARED lock.
- Meanwhile requesting instance will wait in "GC CR BLOCK 2-WAY"
- Requesting instace will read data block from disk and do the physical I/O.
- This wait event does not indicate any contention.

gc current grant 3-way

gc current grant 3-way

Definition

- An instacne request for any data block in current mode.
- Mater is not the holder of that data block, then master forward a message to provide the data block to the requesting instance.
- Current holding instance don't have the block because of aging out mechanism.
- Current holding instance grant exclusive lock to the requesting instance for physical IO.
- Meanwhile requesting instance will wait in "gc current grant 3-way".
- there is a hidden parameter to control this "_cr_grant_local_role"

gc current grant 2-way

gc current grant 2-way

Definition

- An instance request any data block in CURRENT MODE for dml(current) from Master.
- If the data block is not cached on any instance even on master too, then master instance will send a message to the requesting instance granting the EXCLUSIVE lock.
- Meanwhile requesting instance will wait in "GC CURRENT BLOCK 2-WAY"
- Requesting instace will read data block from disk and do the physical I/O.
- This wait event does not indicate any contention.

Reason

- This wait event appears in "TOP-5" timed events section of AWR Report.
- This wait event represent that requesting instance is spending a significant amount of time in obtaining the locks.
- Interconnect is having N/W latency (rare).

Troubleshooting

- Best Solution is to tune your SQL Application so that it request less amout of data blocks.
- Because if requesting instance ask for more data blocks then master has to locking and holding instance related information in GRD before granting EXCLUSIVE LOCK, which will be high.
- Tune Interconnect (very rare).

gc cr block 3-way

gc cr block 3-way (read/read or write/read with 3 nodes)

This wait event is exactly same as "gc cr block 2-way", only the difference is that here 3 or more than 3 instances are involved.

gc cr block 2-way

gc cr block 2-way (read/read or write/read with 2 nodes)

Definition

Case-1: WRITE/READ

- Requesting instance request any CR data block for select from Master.
- If master is the holder of that data block and also has already modified that block.
- Then master will prepare CR copy of that data block (using undo).
- Finally Master instance serve CR block to the requesting instance.
- Meanwhile requesting instace will wait in "gc cr block 2-way"

Case-2: READ/READ

- Requesting instance request any CR data block for select from Master.
- If master is the holder of that data block and has not already modified that block.
- Master instance serve CR block to the requesting instance immediatly.
- Meanwhile requesting instace will wait in "gc cr block 2-way"

Reason

- In both the cases you will encounter this wait event in "TOP-5" section of AWR Report.
- Plan of action would be similer like "gc current block 2-way"

gc current block 3-way

gc current block 3-way (write/write with 3 nodes)

Definition

- Requesting instance request any data block in CURRENT MODE for dml(current) from Master.
- If master is not holder of that data block and that data block is globally available on another instance.
- Master will send a message to the current holding instance to relinquish ownership (Downgrade lock).
- The holding instance retain the PI of that data block and then serve to the requesting instance.
- Holding instance will also flush respective redo log to log file before sending CURRENT block to requesting instance.
- Meanwhile requesting instance will wait in "GC CURRENT BLOCK 3-WAY"

Reason

As you know before sending the current data block to the requesting instance master instance first flush respective redo log of the log file then it will prepare PI block and then CR block to send to the requesting instance over the interconnect.

- This wait event appears in "TOP-5" timed events section of AWR Report.
- Analyze the contention using AWR REPORT.
- In AWR REPORT, analyze "Current Block Recieved" and identify the top contentious objects.
- You are getting contention at segment level because of bad database design, DB object layout and Space Management.
- LGWR is not efficient to write in redo log file and thats why requesting instance is waiting in "gc current block 3-way".
- Interconnect is having N/W latency.
- Application Segregation is also a reason.

Troubleshooting

- Best Solution is to apply application Segregation means try to locate all select query on one node and all DML on another node.
- Tune LGWR
- Tune Interconnect

gc current block 2-way

gc current block 2-way (write/write with 2 nodes)

Definition

- requesting instance request any data block for dml(current) from Master.
- If master is the holder of that data block and also has already modified that block.
- Then master will retain PI block for itself.
- Master will also flush respective redo log to log file before sending CURRENT block to requesting instance.
- Meanwhile requesting instance will wait in "GC CURRENT BLOCK 2-WAY"

Reason

As you know before sending the current data block to the requesting instance master instance first flush respective redo log of the log file then it will prepare PI block and then send CURRENT block to the requesting instance over the interconnect.

- This wait event appears in "TOP-5" timed events section of AWR Report.
- Analyze the contention using AWR REPORT.
- In AWR REPORT, analyze "Current Block Recieved" and identify the top contentious objects.
- You are getting contention at segment level because of bad database design, DB object layout and Space Management.
- LGWR is not efficient to write in redo log file and thats why requesting instance is waiting in "gc current block 2-way".
- Interconnect is having N/W latency.
- Application Segregation is also a reason.

Troubleshooting

- Best Solution is to apply application Segregation means try to locate all select query on one node and all DML on another node.
- Tune LGWR
- Tune Interconnect
- Make Sure the system has enough CPU power.

Saturday, 26 March 2016

IMR (Instance Membership Recovery)

Communication b/w the 2 or more instance through LMON Process interconnection.
When a communication failure occurs between the instances, or when an instance is not able to issue the heartbeat information to the controlfile, the cluster group may be in danger of possible data corruption.
In addition, when no mechanism is present to detect the failures, the entire cluster will hang.
To address the issue, IMR was introduced in Oracle 9i and improved in Oracle 10g.
IMR removes the failed instance from the cluster group. When a subset of a cluster group survives during failures, IMR ensures that the larger partition group survives and kills all other smaller groups.

IMR is a part of the service offered by Cluster Group Services (CGS). LMON is the key process that handles many of the CGS functionalities. As you know, cluster software (known as Cluster Manager, or CM) can be a vendor-provided or Oracle-provided infrastructure tool. CM facilitates communication between all nodes of the cluster and provides information on the health of each node—the node state. It detects failures and manages the basic membership of nodes in the cluster. CM works at the cluster level and not at the database or instance level.

Inside Oracle RAC, the Node Monitor (NM) provides information about nodes and their health by registering and communicating with the CM. NM services are provided by LMON.
Node membership is represented as a bitmap in the GRD.
A value of
0 denotes that a node is down, and a value of
1 denotes that the node is up.
There is no value to indicate a “transition” period such as during bootup or shutdown.
LMON uses the global notification mechanism to let others know of a change in the node membership. Every time a node joins or leaves a cluster, this bitmap in the GRD has to be rebuilt and communicated to all registered members in the cluster.

Node membership registration and deregistration is done in a series of synchronized steps—a topic beyond the scope of this chapter. Basically, cluster members register and deregister from a group.
The important thing to remember is that NM always communicates with the other instances in the cluster about their health and status using the CM.
In contrast, if LMON needs to send a message to LMON on another instance, it can do so directly without the help or involvement of CM. It is important to differentiate between cluster communication and Oracle RAC communication.

SPFILE (RAC/ASM)

1.Create pfile from spfile
2.Edit pfile and adjust value causing this issue
3.Start the database up with nomount option
4.Create spfile from pfile
5.Adjust ASM alias
6.Bounce the database
7.Remove old spfile from ASM

Steps should be on a single node only.

1) Create pfile from spfile

create pfile=’/u02/initracdb_singleNODE.ora’ from spfile=’+OCR/rac-cluster/asmparameterfile/registry.253.874451309';
or
create pfile=’/u02/initracdb_singleNODE.ora’ from spfile;

For pfile, you need to specify a location, otherwise it will overwrite your pfile in $ORACLE_HOME/dbs, which will further complicate your recovery.

For spfile, you need to find where your spfile file is on ASM, remember your database is not even mounted, so Oracle is totally lost as where the location of database’s spfile is.

2) Edit pfile

Now change value causing violation.

3) Start the database up with nomount option

Startup nomount pfile=’/u02/backups/dbatst/initdbatst.ora.new’

This should bring up the database in nomount stage.

4) Create spfile from pfile

create spfile=’+DATA01' from pfile=’/u02/backups/dbatst/initdbatst.ora';

Notice now that in nomount stage, Oracle recognizes where to put the spfile and you don’t need to specify the full ASM path to spfile.

spfile Location: <+dg>/database name/parameterfile

In my situation it will be ‘+DATA01/dbatst/parameterfile’

Also note that Oracle will create a new spfile without overwriting the old one. You should delete the old spfile as indicated later.

5) Adjust ASM alias

“Alias” in ASM is like link “ln” operating system command.

The location of spfile alias on ASM is in the pfile under $ORACLE_HOME/dbs.

The spfile alias on ASM is pointing to the old spfile file, not the new one.

To fix this issue, you need to delete old alias and recreate a new alias

Here are the commands:

asmcmd

cd to alias location, location should be ‘+dg/dbname/spfiledbname.ora’

ls -l , just to confirm it is an alias

delete alias:

rmalias spfiledbatst.ora

recreate alias

mkalias +DATA01/DBATST/PARAMETERFILE/spfile.535.831221333 spfiledbatst.ora

6) Bounce the database

Use srvctl utility to bounce the database.

After database comes up successfully, make sure it is using the spfile.

From a sqlplus session:

show parameter spfile

7) Delete old spfile

Just to keep things tidy, remove the old spfile from <+dg>/database name/parameterfile

---: SPFILE Identify
sql> show parameter spfile
asmcmd spget ---- ASM SPFILE
gpnptool
kfed read /dev/sdb3 | grep -E ‘spf|ausize’

RAC ON and OFF

In some cases, you may want to disable the Oracle RAC options for testing purposes-perhaps to run a benchmark or to convert the Oracle RAC binaries to single-instance binaries.
Convert RAC - to NON-RAC .Not compatible for windows system only for UNIX system.

Some times usage when CRS (HA) not comming up and getting error " cannot communicate with Cluster ready services " immediately convert RAC TO NON-RAC
Fix the node issue then enable again.

-----------------
RAC OFF
-----------------

Use the following steps to disable Oracle RAC (known as RAC OFF):

1. Log in as the Oracle software owner (which is typically the UNIX account oracle) in all nodes.

2. Shut down all the instances from all the nodes using a NORMAL or IMMEDIATE option.

3. Change the working directory to $ORACLE_HOME/lib:

cd $ORACLE_HOME/lib

4. Run the following “make” command to relink the Oracle binaries without the Oracle RAC option:

make -f ins_rdbms.mk rac_off

This normally runs for a few minutes and should not pose any errors.

5. Now relink the Oracle binaries:

make -f ins_rdbms.mk ioracle

NOTE:- Now the Oracle binaries are relinked with the RAC OFF option. You may have to edit the init.ora or SPFILE parameters accordingly.
If errors occur in step 4, you may need to contact Oracle Support and log a service request with the trace and log files.

------------------
RAC ON
------------------

Use the following steps to enable Oracle RAC (known as RAC ON):

1. Log in as the Oracle software owner (typically the UNIX account oracle) in all nodes.

2. Shut down all the instances from all the nodes using a NORMAL or IMMEDIATE option.

3. Change the working directory to $ORACLE_HOME/lib:

cd $ORACLE_HOME/lib

4. Run the following “make” command to relink the Oracle binaries without the RAC option:

make -f ins_rdbms.mk rac_on

This normally runs for a few minutes and should not pose any errors.

5. Now relink the Oracle binaries:

make -f ins_rdbms.mk ioracle

NOTE:- Now the Oracle binaries are relinked with the RAC ON option. You may need to edit the init.ora or SPFILE parameters accordingly.
If any errors occur in step 4, you may need to contact Oracle Support and log a service request with the trace and log files.

Labels

Wednesday, 30 March 2016

Saturday, 26 March 2016