Possible
Causes:
All commit requests are waiting for
"logfile switch (archiving needed)" or "logfile switch
(Checkpoint. Incomplete)." Ensure that the archive disk is not full or
slow. DBWR may be too slow because of I/O. You may need to add more or larger
redo logs, and you may potentially need to add database writers if the DBWR is
the problem.
Actions:
1.
Optimize the number of Redo log Switches - As a general rule of thumb, it
should not be more than 3-4 per hour during peak load.
2. Optimize the
Redo Log Switch Time.
Addressing point number 1 was very straight forward. Our Redo logs were 256MB in size. We increased them to 512MB and Redo log switches per hour became 3-4. This certainly reduced number of failures per hour by half but further tuning was still required.
For Point number 2, after some research, we learnt that during Redo Log Switch, Oracle Writes the required information in all the control files on the system SEQUENTIALY and not in Parallel, this happens on both Primary and Standby. We reviewed the setup and found that we have 3 Control files on mount points with internal Hardware Level RAID (1+0).
Since we already had enough Fault Tolerance with respect to our setup and our control files are backed up on a frequent basis, we decided to keep just one control file instead of 3 on both Primary and Standby.
This reduced our redo log switch time by about 60%.
One thing to note is that our System did not have a superfast IO. For systems with High end SAN storage and enough Write Cache, point number 2 might not be required.
Addressing point number 1 was very straight forward. Our Redo logs were 256MB in size. We increased them to 512MB and Redo log switches per hour became 3-4. This certainly reduced number of failures per hour by half but further tuning was still required.
For Point number 2, after some research, we learnt that during Redo Log Switch, Oracle Writes the required information in all the control files on the system SEQUENTIALY and not in Parallel, this happens on both Primary and Standby. We reviewed the setup and found that we have 3 Control files on mount points with internal Hardware Level RAID (1+0).
Since we already had enough Fault Tolerance with respect to our setup and our control files are backed up on a frequent basis, we decided to keep just one control file instead of 3 on both Primary and Standby.
This reduced our redo log switch time by about 60%.
One thing to note is that our System did not have a superfast IO. For systems with High end SAN storage and enough Write Cache, point number 2 might not be required.
No comments:
Post a Comment