Email: service@parnassusdata.com 7 x 24 online support!

Oracle - REQUEST TO FIX ASM METADATA CORRUPT BLOCK

Oracle - REQUEST TO FIX ASM METADATA CORRUPT BLOCK

 

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

 

DETAILED PROBLEM DESCRIPTION
============================
ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [43],
[8486], [44], [8486] when trying to mount the diskgroup.
@ EXPLAIN WHY YOU BELIEVE THIS TO BE A PRODUCT DEFECT
@ ===================================================
@ AMDU shows one corrupt block. Requesting to fix asm metadata corrupt block.
@ ---------------------------- SCANNING DISK N0001
@ -----------------------------
@ Disk N0001: '/dev/oracleasm/disks/DISK1'
@ AMDU-00209: Corrupt block found: Disk N0001 AU [33] block [91] type [0]
@ AMDU-00201: Disk N0001: '/dev/oracleasm/disks/DISK1'
@ Allocated AU's: 93584
@ Free AU's: 954991
@ AU's read for dump: 165
@ Block images saved: 30383
@ Map lines written: 165
@ Heartbeats seen: 0
@ Corrupt metadata blocks: 1
@ Corrupt AT blocks: 0
 
 
DIAGNOSTIC ANALYSIS
===================
AMDU, dd , alert.logs, trace files are uploaded.
 
@ Due to flood all system was shutdown from main switch and storage cache was
@ dirty. To online the disks dirty cache on the storage was cleared. Customer
@ does not have os backups so can not restore. The only option is to mount the
@ diskgroup. ASM instance crashes with the error ORA-00600: internal error
@ code, arguments: [kfrValAcd30], [DATA], [1], [43], [8486], [44], [8486].
@ Customer cannot run anything on it now. AMDU shows only one corrupt block.
@ kfed output shows valid asm header.
 
 
 Fri Dec 28 07:19:06 2012
 NOTE: attached to recovery domain 1
 NOTE: starting recovery of thread=1 ckpt=43.8486 group=1 (DATA)
 NOTE: starting recovery of thread=2 ckpt=47.24 group=1 (DATA)
 Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11865.trc
 
 (incident=32228):
 ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [43],
 [8486], [44], [8486], [], [], [], [], []
 Incident details in:
 /u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_32228/+ASM1_ora_11865_i322
 28.trc
 Abort recovery for domain 1
 NOTE: crash recovery signalled OER-600
 ERROR: ORA-600 signalled during mount of diskgroup DATA
 ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [43],
 [8486], [44], [8486], [], [], [], [], []
 ERROR: ALTER DISKGROUP ALL MOUNT /* asm agent */
 .
 .
 .
 // From the OS log, there are some multipathd errors before ASM start.
 .
 .
 Dec 28 06:32:34 srv-ural multipathd: asm!.asm_ctl_vbg0: add path (uevent)
 Dec 28 06:32:34 srv-ural multipathd: asm!.asm_ctl_vbg0: failed to store path
 info
 Dec 28 06:32:34 srv-ural multipathd: uevent trigger error
 Dec 28 06:32:34 srv-ural multipathd: asm!.asm_ctl_vbg1: add path (uevent)
 Dec 28 06:32:34 srv-ural multipathd: asm!.asm_ctl_vbg1: failed to store path
 info
 Dec 28 06:32:35 srv-ural multipathd: uevent trigger error
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg2: add path (uevent)
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg2: failed to store path
 info
 Dec 28 06:32:35 srv-ural multipathd: uevent trigger error
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg3: add path (uevent)
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg3: failed to store path
 info
 Dec 28 06:32:35 srv-ural multipathd: uevent trigger error
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg4: add path (uevent)
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg4: failed to store path
 info
 Dec 28 06:32:35 srv-ural multipathd: uevent trigger error
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg5: add path (uevent)
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg5: failed to store path
 info
 Dec 28 06:32:35 srv-ural multipathd: uevent trigger error
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg6: add path (uevent)
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg6: failed to store path
 info
 Dec 28 06:32:35 srv-ural multipathd: uevent trigger error
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg7: add path (uevent)
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg7: failed to store path
 info
 Dec 28 06:32:35 srv-ural multipathd: uevent trigger error
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg8: add path (uevent)
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg8: failed to store path
 info
 Dec 28 06:32:35 srv-ural multipathd: uevent trigger error
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg9: add path (uevent)
 Dec 28 06:32:35 srv-ural multipathd: asm!.asm_ctl_vbg9: failed to store path
 info
 
 
 node 2 - message
 .
 Dec 24 21:02:23 srv-aral CLSD: The clock on host srv-aral has been updated by
 the Cluster Time Synchronization Service to be synchronous with the mean
 cluster time.
 Dec 24 21:51:35 srv-aral CLSD: The clock on host srv-aral has been updated by
 the Cluster Time Synchronization Service to be synchronous with the mean
 cluster time.
 Dec 24 22:21:41 srv-aral kernel: qla2xxx 0000:06:00.0: LOOP DOWN detected (2
 6 0).
 Dec 24 22:21:42 srv-aral kernel: qla2xxx 0000:06:00.1: LOOP DOWN detected (2
 6 0).
 Dec 27 18:56:08 srv-aral syslogd 1.4.1: restart.
 Dec 27 18:56:08 srv-aral kernel: klogd 1.4.1, log source = /proc/kmsg
 started.
 
 
 
*** RVARDHIN 12/31/12 01:57 am ***
 requested ct to set the below values
 .
 From
 .
 blacklist {
 # wwid 26353900f02796769
 devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
 devnode "^hd[a-z][[0-9]*]"
 # devnode "^cciss!c[0-9]d[0-9]*"
 # device {
 # vendor "HP"
 # product "OPEN-*"
 # }
 }
 .
 
 .
 To,
 .
 blacklist {
 # wwid 26353900f02796769
 devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
 devnode "^hd[a-z][[0-9]*]"
 # devnode "^cciss!c[0-9]d[0-9]*"
 # device {
 # vendor "HP"
 # product "OPEN-*"
 # wwid "*"
 devnode "^asm/*" # Here <<<.
 devnode "ofsctl" # Here <<<
 .
 # }
 }
 
 ct changed the multipatch.conf file. issue still exist
 .
 /u01/app/11.2.0/grid/bin/crsctl start cluster -all
 .
 CRS-2672: Attempting to start 'ora.asm' on 'srv-aral'
 CRS-2672: Attempting to start 'ora.cssd' on 'srv-ural'
 CRS-2672: Attempting to start 'ora.diskmon' on 'srv-ural'
 CRS-2676: Start of 'ora.diskmon' on 'srv-ural' succeeded
 ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [43],
 [8486], [44], [8486], [], [], [], [], []
 CRS-2674: Start of 'ora.asm' on 'srv-aral' failed
 CRS-2679: Attempting to clean 'ora.asm' on 'srv-aral'
 CRS-2681: Clean of 'ora.asm' on 'srv-aral' succeeded
 CRS-2676: Start of 'ora.cssd' on 'srv-ural' succeeded
 CRS-2672: Attempting to start 'ora.ctssd' on 'srv-ural'
 CRS-2676: Start of 'ora.ctssd' on 'srv-ural' succeeded
 CRS-2672: Attempting to start 'ora.asm' on 'srv-ural'
 ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [43],
 [8486], [44], [8486], [], [], [], [], []
 CRS-2674: Start of 'ora.asm' on 'srv-ural' failed
 CRS-2679: Attempting to clean 'ora.asm' on 'srv-ural'
 CRS-2681: Clean of 'ora.asm' on 'srv-ural' succeeded
 CRS-2673: Attempting to stop 'ora.ctssd' on 'srv-ural'
 CRS-2677: Stop of 'ora.ctssd' on 'srv-ural' succeeded
 CRS-2673: Attempting to stop 'ora.diskmon' on 'srv-ural'
 CRS-2677: Stop of 'ora.diskmon' on 'srv-ural' succeeded
 CRS-2673: Attempting to stop 'ora.cssd' on 'srv-ural'
 CRS-2677: Stop of 'ora.cssd' on 'srv-ural' succeeded
 
 
 - AMDU report shows only one Block as corrupted
 158 ---------------------------- SCANNING DISK N0001
 -----------------------------
 159 Disk N0001: '/dev/oracleasm/disks/DISK1'
 160 AMDU-00209: Corrupt block found: Disk N0001 AU [33] block [91] type [0]
 161 AMDU-00201: Disk N0001: '/dev/oracleasm/disks/DISK1'
 162 Allocated AU's: 93584
 163 Free AU's: 954991
 164 AU's read for dump: 165
 165 Block images saved: 30383
 166 Map lines written: 165
 167 Heartbeats seen: 0
 168 Corrupt metadata blocks: 1
 169 Corrupt AT blocks: 0
 - ORA 600 kfrValAcd30 is hit since ckpt ABA and current block ABA is not
 
 matching as expected.
 ASM1 alert log shows:
 16747 ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1],
 [43], [8486], [44], [8486], [], [], [], [], []
 - It doesn't look like the ABA is wrong here. It is having expected value as
 per ASM alert logs. (Still have to see why the assert is triggered)
 ASM1 alert log shows:
 16546 Thu Dec 20 18:14:17 2012
 16547 NOTE: attached to recovery domain 1
 16548 NOTE: starting recovery of thread=1 ckpt=43.8486 group=1 (DATA) <===
 ABA value reported in ORA 600
 16549 NOTE: advancing ckpt for thread=1 ckpt=43.8486
 16550 NOTE: cache recovered group 1 to fcn 0.814009
 16551 Thu Dec 20 18:14:17 2012
 16552 NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATA)
 16553 NOTE: LGWR found thread 1 closed at ABA 43.8485
 16554 NOTE: LGWR mounted thread 1 for diskgroup 1 (DATA)
 16555 NOTE: LGWR opening thread 1 at fcn 0.814009 ABA 44.8486 <=== ABA value
 reported in ORA 600
 16556 NOTE: cache mounting group 1/0x0C886DF4 (DATA) succeeded
 16557 NOTE: cache ending mount (success) of group DATA number=1
 incarn=0x0c886df4
 - Though ABA is right as per alert logs, ASM throws assert. We need to check
 why so.
 
 .
 SQL> alter diskgroup all mount;
 alter diskgroup all mount
 *
 ERROR at line 1:
 ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA], [1], [43],
 [8486], [44], [8486], [], [], [], [], []
 
 performed below activity by customer, getting below errors
 .
 -bash-3.2$ sqlplus / as sysasm
 .
 SQL*Plus: Release 11.2.0.1.0 Production on Wed Jan 2 13:38:10 2013
 .
 Copyright (c) 1982, 2009, Oracle. All rights reserved.
 .
 .
 Connected to:
 Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
 
 SQL> alter diskgroup all mount;
 alter diskgroup all mount
 *
 ERROR at line 1:
 ORA-15032: not all alterations performed
 ORA-15130: diskgroup "DATA" is being dismounted
 ORA-15066: offlining disk "DATA_0000" may result in a data loss
 .
 .
 SQL>
 
 1: Please explain what do you mean by remap here.
 2: Lost write means the data is valid but old. ASM uses set of methods (like
 magic, checksum, expected values of fields, etc) to validate whether it is
 .
 corrupted or not. With lost write, it is valid data but stale data. ASM
 need not identify it as corrupted.
 
 regarding remap please see:
 11gR1 ASM New Feature (Doc ID 429098.1)
 remap - You can remap and recover bad blocks on an ASM disk in normal or high
 redundancy that have been reported by storage management tools such as disk
 scrubbers. ASM reads from the good copy of an ASM mirror and rewrites these
 blocks to an alternate location on disk.
 .
 What is the redundancy on this disk group?
 .
 Customer requirement outlined here is to mount the disk group and then
 recover what data can be recovered at this time. Prabakaran is investigating
 if a diagnostic patch can be made to mount the disk group without hitting the
 assert.
 .
 points to note here:
 The customer needs to review their backup policy
 
 Looks like ct had external,I have conveyed the same to ct
 .
 [gridstgrac1 ~]$ kfed read disk1_dd.out | more
 kfbh.endian: 1 ; 0x000: 0x01
 kfbh.hard: 130 ; 0x001: 0x82
 kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
 kfbh.datfmt: 1 ; 0x003: 0x01
 |
 |
 kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL>>>>
 kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
 kfdhdb.dskname: DATA_0000 ; 0x028: length=9
 kfdhdb.grpname: DATA ; 0x048: length=4
 kfdhdb.fgname: DATA_0000 ; 0x068: length=9
 .
 
 
 - As per ckpt record, ABA is 43.8486
 16 kfbh.fcn.base: 0 ; 0x010: 0x00000000
 17 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
 29 kfracdc.ckpt.seq: 43 ; 0x018: 0x0000002b
 30 kfracdc.ckpt.blk: 8486 ; 0x01c: 0x00002126
 - While mounting a DG, ASM hit below assert:
 805 Errors in file
 /ade/b/1029623278/oracle/log/diag/asm/+asm/two0one1asm/trace/two0one1asm_ora_1
 1194.trc (incident=4939):
 806 ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA],
 [1], [43], [8486], [44], [8486], [], [], [], [], []
 7771 KSEORAASSERTNM6(FALSE,
 7772 OERINM("kfrValAcd30"),
 7773 kseidn(*thdCtx->grpName_kfrthdCtx),
 7774 ksenrg(thdCtx->thread_kfrthdCtx),
 7775
 ksenrg(thdCtx->currAba_kfrthdCtx.seq_kfraba),
 7776
 ksenrg(thdCtx->currAba_kfrthdCtx.blk_kfraba),
 7777 ksenrg(acdb->aba_kfracdbamd_kfraba),
 7778 ksenrg(acdb->aba_kfracdb.blk_kfraba));
 
 - In the previous mount, DG ckpt was 43.8486 which matches with thread
 Context
 16546 Thu Dec 20 18:14:17 2012
 16547 NOTE: attached to recovery domain 1
 16548 NOTE: starting recovery of thread=1 ckpt=43.8486 group=1 (DATA)
 16549 NOTE: advancing ckpt for thread=1 ckpt=43.8486
 16550 NOTE: cache recovered group 1 to fcn 0.814009
 .
 - Following the recovery, LGWR opened the thread with ABA 44.8486. Followin
 it, ASM2 was crashed. Thread 2 recovery for DATA is performed.
 
 
 16551 Thu Dec 20 18:14:17 2012
 16552 NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATA)
 16553 NOTE: LGWR found thread 1 closed at ABA 43.8485
 16554 NOTE: LGWR mounted thread 1 for diskgroup 1 (DATA)
 16555 NOTE: LGWR opening thread 1 at fcn 0.814009 ABA 44.8486
 16556 NOTE: cache mounting group 1/0x0C886DF4 (DATA) succeeded
 16557 NOTE: cache ending mount (success) of group DATA number=1
 incarn=0x0c886df4
 16574 Thu Dec 20 18:17:24 2012
 16575 Reconfiguration started (old inc 4, new inc 6)
 16576 List of instances:
 16577 1 (myinst: 1)
 16574 Thu Dec 20 18:17:24 2012
 16575 Reconfiguration started (old inc 4, new inc 6)
 16598 NOTE: SMON starting instance recovery for group DATA domain 1
 (mounted)
 
 16599 NOTE: F1X0 found on disk 0 au 2 fcn 0.0
 16600 NOTE: starting recovery of thread=2 ckpt=45.10601 group=1 (DATA)
 16601 NOTE: SMON waiting for thread 2 recovery enqueue
 16602 NOTE: SMON about to begin recovery lock claims for diskgroup 1 (DATA)
 16603 NOTE: advancing ckpt for thread=2 ckpt=45.10601 <=== recovery is
 done
 16604 NOTE: SMON did instance recovery for group DATA domain 1
 - ASM alert log does not show any activity for next 8 days. In the next mount
 (on Dec 28), kfrValAcd30 was triggered.
 16621 Reconfiguration complete
 16622 Fri Dec 28 07:18:51 2012
 16623 NOTE: No asm libraries found in the system
 16624 Starting ORACLE instance (normal)
 16716 Fri Dec 28 07:18:59 2012
 16717 SQL> ALTER DISKGROUP ALL MOUNT /* asm agent */
 16718 NOTE: Diskgroup used for Voting files is:
 16719 DATA
 16740 Fri Dec 28 07:19:06 2012
 16741 * allocate domain 1, invalid = TRUE
 16742 Fri Dec 28 07:19:06 2012
 16743 NOTE: attached to recovery domain 1
 16744 NOTE: starting recovery of thread=1 ckpt=43.8486 group=1 (DATA)
 16745 NOTE: starting recovery of thread=2 ckpt=47.24 group=1 (DATA)
 16746 Errors in file
 /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11865.trc
 (incident=32228):
 16747 ORA-00600: internal error code, arguments: [kfrValAcd30], [DATA],
 [1], [43], [8486], [44], [8486], [], [], [], [], []
 16748 Incident details in:
 /u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_32228/+ASM1_ora_11865_i322
 28.trc
 .
 - Excerpt from F3 dump is given below
There is no recovery with ABA of
 43.8486 where ckpt is made
 10 kfbh.hard: 130 ; 0x001: 0x82
 11 kfbh.type: 7 ; 0x002: KFBTYP_ACDC
 12 kfbh.datfmt: 1 ; 0x003: 0x01
 
 29 kfracdc.ckpt.seq: 43 ; 0x018: 0x0000002b
 30 kfracdc.ckpt.blk: 8486 ; 0x01c: 0x00002126
 .
 3542863 kfbh.block.blk: 8486 ; 0x004: T=0 NUMB=0x2126
 3542864 kfbh.block.obj: 3 ; 0x008: TYPE=0x0 NUMB=0x3
 3542870 kfracdb.aba.seq: 43 ; 0x000: 0x0000002b
 3542871 kfracdb.aba.blk: 8485 ; 0x004: 0x00002125
 .
 3542942 kfbh.block.blk: 8487 ; 0x004: T=0 NUMB=0x2127
 3542943 kfbh.block.obj: 3 ; 0x008: TYPE=0x0 NUMB=0x3
 3542949 kfracdb.aba.seq: 44 ; 0x000: 0x0000002c
 3542950 kfracdb.aba.blk: 8486 ; 0x004: 0x00002126
 .
 .