For those who seek help in different areas of software and hardware platform.

Data Guard “CORRUPTION DETECTED: in redo blocks starting at block” issues [Resolved]

One of my customers Cloud hosted environments (IaaS) has an Oracle Data Guard (physical standby) setup on Windows.  Recently, the standby database started logging the following errors in it’s alert log: This article will show you the procedure to fix Data Guard “CORRUPTION DETECTED: in redo blocks starting at block” issues.

Fri June 06 08:51:16 2016
RFS[1085]: Assigned to RFS process 8996
RFS[1085]: Opened log for thread 1 sequence 72899 dbid -2002036753 branch 876434118
CORRUPTION DETECTED: In redo blocks starting at block 135169count 2048 for thread 1 sequence 72899
Deleted Oracle managed file H:\FAST_RECOVERY_AREA\SNAPF\ARCHIVELOG\2016_06_03\O1_MF_1_72899_CMC1VNVP_.ARC
RFS[1085]: Possible network disconnect with primary database

The logs were being transported across from the primary site, but the media recovery process was reporting corrupt blocks when trying to apply the archive redo log files, and so recovery stalled.

Validating the archive logs at the primary site showed us that the files were indeed valid at the source (primary):

rman target /
validate archivelog sequence 72899;
List of Archived Logs
Thrd Seq     Status Blocks Failing Blocks Examined Name
---- ------- ------ -------------- --------------- ---------------
1    72899   OK     0              350165          H:\FAST_RECOVERY_AREA\SNAPF\ARCHIVELOG\2016_06_03\O1_MF_1_72899_CM3533SG_.ARC
Finished validate at 03-JUN-16

Attempting a dump of the log file contents would also demonstrate whether or not the log file was valid:


So we know the logs are clean and intact at the primary site, which would suggest that something in the log transport process was corrupting the logs.  Further, manually copying the files across, and re-registering would resolve the problem, until the next error occurred (not a sustainable work around):


Oracle were quite helpful in suggesting we check the firewall(s) to ensure the follow features were disabled:

    SQLNet fixup protocol
    Deep Packet Inspection (DPI)
    SQLNet packet inspection
    SQL Fixup
    SQL ALG (Juniper firewall)
    Oracle DB-control component DOS

After further investigation, it would seem that the Cisco switches being used between our primary and standby sites had “SQL*Net inspection enabled” by default (deep packet inspection).  As a result, because we were using the default 1521 listener port, packets were being scanned and reaching the standby site in a malformed/corrupted state.

Disable this feature wasn’t so straight forward unfortunately, so as a work around (and to avoid other 1521 port scanning protocols interfering), I opted to change the Data Guard listener port instead from 1521 to 1528 by adding another listener service:

 (SID_NAME = CLRExtProc)
 (ORACLE_HOME = E:\app\oracle\product\
 (PROGRAM = extproc)
 (ENVS = "EXTPROC_DLLS=ONLY:E:\app\oracle\product\\bin\oraclr11.dll")

 (ADDRESS = (PROTOCOL = TCP)(HOST = win02-stby.vbox)(PORT = 1521))

ADR_BASE_LISTENER = E:\app\oracle

# DG listener created to use port 1528, following SQL*Net packet inspection issues
 (ORACLE_HOME = E:\app\oracle\product\
 (GLOBAL_DBNAME = SNAPF) # Data Guard Broker Process
 (ORACLE_HOME = E:\app\oracle\product\

 (ADDRESS = (PROTOCOL = TCP)(HOST = win02-stby.vbox)(PORT = 1528))

ADR_BASE_LISTENER_DG = E:\app\oracle

After starting up the new LISTENER_DG service, the corruption issues disappeared.

NOTE: Don’t forget to change the port number at your primary site for your Data Guard TNS entries.


Post a Comment

Comments with links will not be published.

Video Tutorials