Bug 17824201 – PHYSICAL STANDBY DATABASE CRASHES WITH ORA-4021

接到客户告警，数据库挂了，查看alert日志，如下：

*** 2017-08-28 10:31:18.159
Media Recovery of Online Log [Thread=1, Seq=75574]

*** 2017-08-28 10:31:18.160
Recovery of Online Redo Log: Thread 1 Group 12 Seq 75574 Reading mem 0
Log read is SYNCHRONOUS though disk_asynch_io is enabled!

*** 2017-08-28 10:34:46.694
*** 2017-08-28 10:34:46.694 4338 krsh.c
MRP0: Background Media Recovery terminated with error 449
ORA-00449: background process 'LGWR' unexpectedly terminated with error 4021
*** 2017-08-28 10:34:46.698 4338 krsh.c
Managed Standby Recovery not using Real Time Apply
--------------------------------------------------------------------------
DDE Action 'DB_STRUCTURE_INTEGRITY_CHECK' was flood controlled
----- END DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (FLOOD CONTROLLED, -457834491 csec) -----
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/xxxxx/oracle/xxxxx/redo02.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/xxxxx/oracle/xxxxxx/redo02.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
error 4021 detected in background process

*** 2017-08-28 10:34:46.689
ORA-04021: timeout occurred while waiting to lock object 
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+465<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+63<-ksuitm()+5594<-ksbrdp()+3507<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253 
----- End of Abridged Call Stack Trace -----

*** 2017-08-28 10:34:46.713
LGWR (ospid: 120716): terminating the instance due to error 4021
ksuitm: waiting up to [5] seconds before killing DIAG(120677)

这个问题是11.2 ADG上一个比较知名的bug：
Bug 17824201 - PHYSICAL STANDBY DATABASE CRASHES WITH ORA-4021
是由LGWR跟其他进程之间在instance load lock上发生死锁引起的。虽然11.2.0.4 已经fix了这个BUG，但需要设置以下隐含参数启用：

SQL> alter system set "_adg_parselock_timeout"=500; 

System altered.

这个参数为LGWR设置了锁的超时时间，当尝试获得排它锁超时后，LGWR会sleep一阵，然后重试申请，这样就不会产生死锁导致实例挂掉了。
修改后数据库正常运行。

2025 年 4 月
一	二	三	四	五	六	日
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Bug 17824201 – PHYSICAL STANDBY DATABASE CRASHES WITH ORA-4021

Related Posts

append与nologging对的redo日志大小的影响

oracle rac 网卡多播检查