咨询微信: dbservice1234 7 x 24 在线支持!

ORA-600 [3020] "Stuck Recovery"

ORA-600 [3020] "Stuck Recovery"

 

If you cannot recover the data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

Format: ORA-600 [3020] [a] [b] [c] [d] [e]

 

 

VERSIONS:

version 6.0 and above DESCRIPTION:

This is called a 'STUCK RECOVERY'.

 

There is an inconsistency between the information stored in the redo and the information stored in a database block being  recovered.

 

ARGUMENTS:

 

For Oracle 9.2 and earlier: Arg [a] Block DBA

Arg [b] Redo Thread Arg [c] Redo RBA Seq

Arg [d] Redo RBA Block No Arg [e] Redo RBA Offset.

 

For Oracle 10.1

Arg [a] Absolute file number of the datafile. Arg [b] Block number

Arg [c] Block DBA

 

FUNCTIONALITY:

kernel cache recovery parallel

 

IMPACT:

INSTANCE FAILURE during recovery.

 

SUGGESTIONS:

 

There have been cases of receiving this error when RECOVER has been issued, but either some datafiles were not restored to disk, or the restoration has not finished.

 

Therefore, ensure that the entire backup has been restored and that the restore has finished PRIOR to issuing a RECOVER database  command.

If problems continue, consider restoring from a backup and doing a point-in-time recovery to a time PRIOR to the one implied  by

the ORA-600[3020] error.

 

Example:

 

SQL> recover database until time 'YYYY-MON-DD:HH:MI:SS'; This error can also be caused by a lost  update.

During normal operations, block updates/writes are being performed  to a number of files including database datafiles, redo log files, archived redo log files etc.

 

This error can be reported if any of these updates are lost for some reason.

 

Therefore, thoroughly check your operating system and disk  hardware.

 

In the case of a lost update, restore an old copy of the datafile and attempt to recover and roll forward  again.

 

If the Known Issues section below does not help in terms of  identifying a solution, please submit the trace files and alert.log to Oracle Support Services for further  analysis.

 

Known Issues:

 

Related Articles

 

Note:1265884.1      Resolving ORA-752 or ORA-600 [3020] During Standby  Recovery

 

Known Bugs

 

You can restrict the list below to issues likely to affect one of the following versions by clicking the relevant button:

 

NB

Bug

Fixed

Description

 

9847338

 

Session hang after applying the patch for Bug 9587912 which causes ORA-600 [30

+

13467683

11.2.0.2.BP15, 11.2.0.3.3, 11.2.0.3.BP04, 12.1.0.0

Join of temp and permanent tables in RAC might cause corruption of permanent ta Regression by bug 10352368

 

12831782

11.2.0.2.BP11, 11.2.0.3.BP01, 12.1.0.0

ORA-600 [3020] / ORA-333 Recovery of datafile or async transport do not read mi there is a stale block

 

12582839

11.2.0.3, 12.1.0.0

ORA-8103/ORA-600 [3020] on RMAN recovered locally managed tablespace

 

11689702

11.2.0.2.5, 11.2.0.2.BP13, 11.2.0.2.GIPSU05, 11.2.0.3, 12.1.0.0

ORA-600 [3020] during recovery after datafile RESIZE (to smaller size)

 

10329146

11.2.0.1.BP10, 11.2.0.2.2, 11.2.0.2.BP03, 11.2.0.2.GIBUNDLE02,

11.2.0.2.GIPSU02, 11.2.0.3, 12.1.0.0

Lost write in ASM with multiple DBWs and a disk is offlined and then onlined

 

10218814

11.2.0.2.2, 11.2.0.2.BP02, 11.2.0.3, 12.1.0.0

ORA-600 [3020] during recovery / on standby

+

10209232

11.1.0.7.7, 11.2.0.1.BP08, 11.2.0.2.1, 11.2.0.2.BP02, 11.2.0.2.GIBUNDLE01,

11.2.0.3, 12.1.0.0

ORA-1578 / ORA-600 [3020] Corruption. Misplaced Blocks and Lost Write in ASM

*

10205230

11.2.0.1.6, 11.2.0.1.BP09, 11.2.0.2.2, 11.2.0.2.BP04, 11.2.0.3, 12.1.0.0

ORA-600 / corruption possible during shutdown in RAC

 

10094823

11.2.0.2.4, 11.2.0.2.BP09, 11.2.0.3, 12.1.0.0

Block change tracking on physical standby can cause data loss

 

10071193

11.2.0.2.BP02, 11.2.0.3, 12.1.0.0

Lost write / ORA-600 [kclchkblk_3] / ORA-600 [3020] in RAC - superceded

 

9587912

11.2.0.2, 12.1.0.0

ORA-600 [3020] in datafile that went offline/online in a RAC instance

 

8774868

11.2.0.1.2, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.0

OERI[3020] reinstating primary

+

8769473

11.2.0.2, 12.1.0.0

ORA-600 [kcbzib_5] on multi block read in RAC. Invalid lock in RAC. ORA-600 [302 Recovery

P

8635179

10.2.0.5, 11.2.0.2, 12.1.0.0

Solaris: directio may be disabled for RAC file access. Corruption / Lost Write

+

8597106

11.2.0.1.BP06, 11.2.0.2, 12.1.0.0

Lost Write in ASM when normal redundancy is used

P

12330911

12.1

EXADATA LSI firmware for lost writes

+

10425010

11.2.0.3, 12.1

Stale data blocks may be returned by Exadata FlashCache

 

8826708

10.2.0.5, 11.2.0.2

ORA-600 [3020] for block type 0x3a (58) during recovery for block restored by RM backup

 

11684626

11.2.0.1

ORA-600 [3020] on standby involving "BRR" redo when db_lost_write_protect is e

 

8230457

10.2.0.4.1, 10.2.0.5, 11.1.0.7.1, 11.2.0.1

Physical standby media recovery gets OERI[krr_media_12]

+

7680907

10.2.0.5, 11.1.0.7.1, 11.2.0.1

ORA-600 [kclexpandlock_2] in LMS / instance crash. Incorrect locks in RAC. ORA-6 [3020] in recovery

 

4637668

10.2.0.3, 11.1.0.6

IMU transactions can produce out-of-order redo (OERI [3020] on recovery)

 

4594917

9.2.0.8, 10.2.0.2, 11.1.0.6

Write IO error can cause incorrect file header checkpoint information

 

4453449

10.2.0.2, 11.1.0.6

OERI:3020 / corruption errors from multiple FLASHBACK DATABASE

 

7197445

10.2.0.4.1, 10.2.0.5

Standby Recovery session cancelled due to ORA-600 [3020] "CHANGE IN FUTURE BLOCK"

 

5610267

10.2.0.5

MRP terminated by ORA-600[krr_media_12] / OERI:3020 after flashback

 

3762714

9.2.0.7, 10.1.0.4, 10.2.0.1

ALTER DATABASE RECOVER MANAGED STANDBY fails with OERI[3020]

 

3560209

10.2.0.1

OERI[3020] stuck recovery under RAC

 

3397181

9.2.0.5, 10.1.0.3, 10.2.0.1

ALTER SYSTEM KILL SESSION of recovery slave causes stuck recovery

*

3381950

10.2.0.1

Backups from RAC DB before Data Guard Failover cannot be used

 

3535712

9.2.0.6, 10.1.0.4

OERI[3020] / ORA-10567 from RAC with standby in max performance mode

 

4594912

9.2.0.8, 10.1.0.2

Incorrect checkpoint possible in datafile headers

 

3635331

9.2.0.6, 10.1.0.4

Stuck recovery (OERI:3020) / ORA-1172 on startup after a crash

 

2322620

9.2.0.1

OERI:3020 possible on recovery of LOB DATA

P+

656370

7.3.3.4, 7.3.4.0, 8.0.3.0

AlphaNT only: Corrupt Redo (zeroed byte) OERI:3020

 

 

Note:190263.1

ORA-1172 OR ORA-600[3020] Quick Support Debugging  Guide

 

Given that this error could be due to a lost update to either the datafile and/or the redo files, one thing to do would be to get dumps of both.

 

Refer to the following notes for information on how to do this  :

 

Note:1031381.6  How to Dump Redo Log File Information

Note:45852.1    Taking BLOCKDUMPS on Oracle8 - The ALTER SYSTEM  DUMP command **INTERNAL ONLY**

 

It is especially useful to focus on the particular datafile block implied by the ORA-600 [3020]. Dump all redo for that block, starting with the log sequence before the restored  datafile,

up to the point of failure.

 

Blockdumps of the datafile should be taken at various stages of the recovery process - for example right after doing the restore; and then again after each redo log file has been applied; and just before the SCN (or point in time) that the ORA-600 was reported; and just after  redo

for the given SCN has been applied; and so  on.

 

The idea being that you may narrow down the point at which something went wrong.

 

ORA-600 [3020] [a] [b] [c] [d] [e]

Versions: 7.0.X  - 8.0.5                                  Source: knl/kcrp.c

===========================================================================

Meaning:

Recovering database and REDO entry has an INC/SEQUENCE number greater than that on the database  block.

In Oracle8 where the block structure is different it still means the same basic thing - the redo record we have has an SCN / SEQ which does not match the database block we are wanting to apply it to.

This is called 'STUCK RECOVERY'.

 

---------------------------------------------------------------------------

Argument Description:

 

a.   Block DBA

b.   Redo Thread

c.   Redo RBA Seq

d.   Redo RBA Block No

e.   Redo RBA Offset.

---------------------------------------------------------------------------

Diagnosis:

There are many possible causes for this most resulting from either invalid sets of commands or media  corruption.

 

-  Has customer restored a backup, open the DB, closed the DB and then tried to recover without re-loading the backup  ??

** If they say no GET THE ALERT LOG and prove it - it's easy to waste a lot of time when this was the real  cause.

 

-  If the problem was a lost update, restore of an OLDER copy of the datafile and a recovery may work.

 

-  The quick option here is to restore and recover UP TO an  SCN

 

just before the problem. Customer will lose some data as this is an incomplete recovery so you need to know the  priority:

a)  TIME   or  b) Minimal Data Loss.

 

-  Check the tracefile for the 3020 report. It is possible to signal OERI(3020) if the datafile block is  corrupt.

Eg: OERI(3020) with Inc=0 Seq=1 reported for the disk block is possibly a zeroed out data-block on the datafile and NOT a redo issue.

 

-  Is parallel server being used ?

If so another thread may have the required changes and they haven't been read for some reason. Check for OS and DLM errors. Try to make sure only ONE instance attempts any recovery by shutting down other instances.

 

-  Are hot backups being used ??

Check that the backups are occuring correctly between BEGIN and END backup commands.

 

-  Up to Oracle 8i you can try to skip the error using the  hidden

parameter:CORRUPT_BLOCKS_ON_STUCK_RECOVERY

Be aware that blocks will be marked corrupt if this is used so make sure the error is not on a dictionary object  !!

 

-  From 9i you can try to skip the error using the 'ALLOW .. CORRUPTION' clause of the RECOVER DATABASE  command.

(Note that in 11g onwards you may need to set DB_LOST_WRITE_PROTECT=NONE for the "ALLOW 1 CORRUPTION" clause to  work)

 

-  For logging a bug you need:

(a)  Where an error is reported, get any trace files produced and relevant redo log dumps if  necessary.

Document completely the circumstances  leading

up to the error including configuration; type of backup (manual, RMAN, incremental, etc.);

the exact commands used to create the backups  and

the exact commands used to do the restore and  recovery.

(b)  Provide a reproducible test case or dial-in information to development.

(c)  Where relevant, determine if generic or port-specific  issue.

 

 

 

Articles:

Parameter:CORRUPT_BLOCKS_ON_STUCK_RECOVERY

 

---------------------------------------------------------------------------

 

Example OERI:3020 dump in Oracle8

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 

*** 1999.07.02.01.02.58.000

RECOVERY OF THREAD 1 STUCK AT BLOCK 14099 OF FILE  6

 

REDO RECORD - Thread:1 RBA: 0x0045ee.00009c8b.0010 LEN: 0x00e8 VLD: 0x01 SCN scn: 0x0000.0951d868 07/02/99  00:57:19

CHANGE #2 TYP:2 CLS: 1 AFN:6 DBA:0x01803713 SCN:0x0000.09519e84 SEQ:  1   OP:10.4

buffer tsn: 5 rdba: 0x01803713 (6/14099) scn:0x0000.0951b4d4 seq:0x01 flg:0x00  tail:0xb4d40601

frmt:0x02 chkval:0x0000 type:0x06=trans  data

 

*** 1999.07.02.01.02.58.000

ksedmp: internal or fatal error

ORA-00600: internal error code,  arguments:

[3020], [25179923], [1], [17902], [40075], [16], [],  []

 

 

Breaking this up shows the following SCN information: Redo SCN:                              0x0000.0951d868

SCN expected on block:                       0x0000.09519e84 SCN on Buffer:                       0x0000.0951b4d4

 

In this case ithe actual SCN marked in the  block

in the buffer cache is _later_ than the expected SCN, but  _before_

the SCN level for the redo change vector. Normally, the SCN in the CHANGE line must match exactly the one on the block (in the buffer  cache);

and redo application brings that block to the (later)  SCN/SEQ

on the redo record.  One possible explanation is that the  system

saw a stale copy of the datafile block when the redo was generated, so that the SCN in the CHANGE line is the wrong one. That would indicate a possible lost update to the  datafile.

 

More commonly, the ORA-600 [3020] error indicates that the SCN on the block is BEHIND the SCN on the redo we want to  apply,

so there is a GAP.  I.e., the REDO is ahead of the  block.

 

However, in this example there is still a problem even though the block initially appears to be AHEAD of the REDO (normally  OK).

Why?  The SCN on the block is BELOW the most recent commit  SCN.

If we applied the current redo record then the SCN on the block would advance to the more recent commit SCN so if this block is  truely

ahead of this redo record it must have an SCN >= the most recent commit SCN. It hasn't, so something is wrong - most likely a lost datafile write which occurred between two items of redo  causing

two redo records using the same block SCN to base their change  on.

 

 

  Known issues caused by 3rd party  provider

 

1.  Lost IO / Corruption caused by EMC. From JET SR: 3-1260172021.  EMC bug ID:   emc230687

 

 

ID: emc230687

Domain: EMC1 HP-UX 11v1

Solution Class: 3.X Compatibility

 

ORA-600 [3020] during recovery caused bu LOST IO due to EMC bug ID:   emc230687.

 

 

No errors raised within the I/O stack at the host level nor from a Timefinder perspective, API ECA debug data void of any anomalies Timefinder w/Oracle best practices process is being adhered to ( recoverable business solution process   )

This also caused some corruption errors  like:

 

ORA-00600: internal error code, arguments: [kddummy_blkchk], [29], [2121334], [6108] kdbchk: xaction header lock count  mismatch

 

No errors raised with the Symm as well. Corruption issue resolved by applying fix 44177, see the following Primus article for more i ETA emc204393

2.  Lost IO by EMC.  EMC solution # is  emc251398

 

Fixed by the latest microcode version 5773.163.113 applied on the Symmetrix DMX (no changes on V-MAX cabins). EMC solution # is emc2

 

 

 

==

 

Ensure that this note comes out on top in Metalink when searched ora-600 ora-600 ora-600 ora-600 ora-600 ora-600  ora-600

ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 3020 3020 3020 3020 3020 3020 3020 3020 3020  3020

3020 3020 3020 3020 3020 3020 3020 3020 3020  3020