Tuesday 26 December 2023

Role of Standby Redo Logs Why and How?

I came across an article about Standby Redo Logs, and I wanted to share it with you.

Purpose of Data Guard:

High Availability
Disaster Recovery
Data Protection
Workload Offloading & Testing


How Data Guard Works:  Dataguard Track the Changes on Primary Database and Replicate the same to Standby Databases through Redo Logs/Archive Logs.


In 19c, Oracle Introduced 2 exciting features.

*.Active DG DML Redirect, through which even the DML Statements can be executed directly on Standby Database which will be applied to Primary Database as well which reduces the workload of Primary Database. 

*.We have automatic Standby recovery through Flashback feature by which we don't need to flashback/rebuild the standby database after Primary Flashback..


Modes of Standby databases:


Maximum Protection Mode which ensure transaction on primary commits only after it is applied on Standby. There are no Dataloss in Max Protection mode.
Here the redo transport type is SYNC & AFFIRM.

Maximum availability Mode. In This also, the transaction on Primary gets commited only after it is applied on standby database. But If Standby not available due to outage/network issue, the transaction on primary will continue without any impact.. Usually there will not be any data loss. Here also the redo transport is sync & Affirm.

Maximum performance Mode in which the primary database transaction will not wait for Standby replication which improve the performance of the primary database. But there are significant chances of Data Loss. 


Why use SRLs?

If you configure your standby for Maximum Protection, then Standby Redo Logs are required. Most implementations are configured for Maximum Performance because they do not want the performance hit Max Protect may impart on their application. 
Even if you are using Max Performance, you still want to implement SRLs. 

To understand why, we first need to start by examining how redo transport works when SRLs do not exist. 



1. A transaction writes redo records into the Log Buffer in the System Global Area (SGA).

2. The Log Writer process (LGWR) writes redo records from the Log Buffer to the Online Redo Logs (ORLs).

3. When the ORL switches to the next log sequence (normally when the ORL fills up), the Archiver process (ARC0) will copy the ORL to the Archived Redo Log.

4. Because a standby database exists, a second Archiver process (ARC1) will read from a completed Archived Redo Log and transmit the redo over the network to the Remote File Server (RFS) process running for the standby instance.

5. RFS sends the redo stream to the local Archiver process (ARCn).

6. ARCn then writes the redo to the archived redo log location on the standby server.

7. Once the archived redo log is completed, the Managed Recovery Process (MRP0) sends the redo to the standby instance for applying the transaction.



With SRLs, not only do we have more resources, we also have different choices, i.e. different paths to get from the primary to the standby. The first choice is to decide if we are configured for Max Protect or Max Performance as I will discuss its impact below.



1. Just like without SRLs, a transaction generates redo in the Log Buffer in the SGA.

2. The LGWR process writes the redo to the ORL.

3. Are we in Max Protect or Max Performance mode?

4. If Max Protect, then we are performing SYNC redo transport. The Network Server SYNC process (NSSn) is a slave process to LGWR. It ships redo to the RFS process on the standby server.

5. If Max Performance mode, then we are performing ASYNC redo transport. The Network Server ASYNC process (NSAn) reads from the ORL and transports the redo to the RFS process on the standby server.

6. RFS on the standby server simply writes the redo stream directly to the SRLs.

7. How the redo gets applied depends if we are using Real Time Apply or not.

8. If we are using Real Time Apply, MRP0 will read directly from the SRLs and apply the redo to the standby database.

9. If we are not using Real Time Apply, MRP0 will wait for the SRL’s contents to be archived and then once archived and once the defined delay has elapsed, MRP0 will apply the redo to the standby database.


 Step 3 above is the entire reason we want to use Standby Redo Logs. If we are in Max Protect (SYNC) mode, then SRLs are required otherwise this process will not work. If we are in Max Performance mode, will still want SRLs. Why? We want SRLs to be configured, even in Max Performance mode because they reduce data loss to seconds, rather than minutes or hours. Max Performance mode with SRLs often achieves a near-zero data loss solution. The last sentence above is why you want to configure SRLs if you are in Max Performance mode. The other big benefit to SRLs is when Real Time Apply is being performed. As soon as the redo is in the SRL, it can be replayed on the standby database. We do not have to wait for a log switch to occur. Real Time Apply, only possible with SRLs, means the recovery time to open the standby database in a failover operation is as low as it can be.

 I often find that people are operating under the misconception that if you configure for ASYNC, configure for Max Performance, then only ARCn can transport redo from the primary to the standby. This used to be true in much older versions, but in 10g (maybe 9i), ARCn is only used to transport redo only if SRLs are not configured. If SRLs are configured, then for ASYNC, NSAn is used to transport redo. Furthermore, NSAn does this in near real time. I only ever configure Max Performance mode in my standby configurations and I often have 1 second or 2 second data loss.

Here comes the 2 new Processes. that is Network Server Async Process (NSA) and Network Server Sync Process (NSS). Prior to 12c, the Log-Write Network Server Process (LNS) Process which is was used instead of NSA and NSS.

 Without SRLs, then I must wait for a log switch to occur on the primary before the redo can be transported. If it takes one hour for the log switch to occur, then I can have one hour’s worth of data loss. If it takes six hours for that log switch to occur, then I can have six hour’s worth of data loss. This behavior was ameliorated by the DBA implementing the ARCHIVE_LAG_TARGET initialization parameter in their primary configuration. If the DBA set this parameter to 3600 seconds, then a log switch would occur at most once per hour. Even with this parameter, one hour of data loss may seem like a lot to most companies, especially when you do better. 

 All you have to do to enjoy data loss measured in a few seconds is to create Standby Redo Logs in your standby database. That’s it. It couldn’t be more simple.
 
 
Best Practice:

*.Make sure your ORL groups all have the same exact size. You want every byte in the ORL to have a place in its corresponding SRL.

*.Create the SRLs with the same exact byte size as the ORL groups. If they can’t be the same exact size, make sure they are bigger than the ORLs.

*.Do not assign the SRLs to any specific thread. That way, the SRLs can be used by any thread, even with Oracle RAC primary databases.

*.When you create SRLs in the standby, create SRLs in the primary. They will normally never be used. But one day you may perform a switchover operation. When you do switchover, you want the old primary, now a standby database, to have SRLs. Create them at the same time.

*.For an Oracle RAC primary database, create the number of SRLs equal to the number of ORLs in all primary instances. For example, if you have a 3-node RAC database with 4 ORLs in each thread, create 12 SRLs (3x4) in your standby. No matter how many instances are in your standby, the standby needs enough SRLs to support all ORLs in the primary, for all instances.



No comments: