接到客户告警,数据库挂了,查看alert日志,如下:
*** 2017-08-28 10:31:18.159
Media Recovery of Online Log [Thread=1, Seq=75574]
*** 2017-08-28 10:31:18.160
Recovery of Online Redo Log: Thread 1 Group 12 Seq 75574 Reading mem 0
Log read is SYNCHRONOUS though disk_asynch_io is enabled!
*** 2017-08-28 10:34:46.694
*** 2017-08-28 10:34:46.694 4338 krsh.c
MRP0: Background Media Recovery terminated with error 449
ORA-00449: background process 'LGWR' unexpectedly terminated with error 4021
*** 2017-08-28 10:34:46.698 4338 krsh.c
Managed Standby Recovery not using Real Time Apply
--------------------------------------------------------------------------
DDE Action 'DB_STRUCTURE_INTEGRITY_CHECK' was flood controlled
----- END DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (FLOOD CONTROLLED, -457834491 csec) -----
Executing ASYNC actions
----- END DDE Actions Dump (total 0 csec) -----
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/xxxxx/oracle/xxxxx/redo02.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/xxxxx/oracle/xxxxxx/redo02.log'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
error 4021 detected in background process
*** 2017-08-28 10:34:46.689
ORA-04021: timeout occurred while waiting to lock object
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+465<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+63<-ksuitm()+5594<-ksbrdp()+3507<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
----- End of Abridged Call Stack Trace -----
*** 2017-08-28 10:34:46.713
LGWR (ospid: 120716): terminating the instance due to error 4021
ksuitm: waiting up to [5] seconds before killing DIAG(120677)
这个问题是11.2 ADG上一个比较知名的bug:
Bug 17824201 - PHYSICAL STANDBY DATABASE CRASHES WITH ORA-4021
是由LGWR跟其他进程之间在instance load lock上发生死锁引起的。虽然11.2.0.4 已经fix了这个BUG,但需要设置以下隐含参数启用:
SQL> alter system set "_adg_parselock_timeout"=500;
System altered.
这个参数为LGWR设置了锁的超时时间,当尝试获得排它锁超时后,LGWR会sleep一阵,然后重试申请,这样就不会产生死锁导致实例挂掉了。
修改后数据库正常运行。