Email: service@parnassusdata.com 7 x 24 online support!

Oracle ASM DISK HEADER CORRUPTION

Oracle ASM DISK HEADER CORRUPTION

 

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

 
 
Oracle ASM DISK HEADER CORRUPTION
10.2.0.4 ASM on 2 Node RAC. In 2011 August some of the disks were dropped
 
from ASM diskgroup using sqlplus alter diskgroup drop disk command. Since the file descriptors were held, customer got downtime on 21st Jan 2012 to restart ASM instance to release the locks. ASM instance was restarted on Node-1 alone. After restart of the ASM instance, two diskgroups ( orcl1_DATADG01 and orcl1_ARCHDG01 ) did not mount. One disk from each diskgroup were complained as missing. In v$asm_disk these disks were seen as CANDIDATE. kfed read on it shows there is no ASM Metadata header. Only aunum=0 blknum=0 had issues and rest of the blocks were fine. Purpose of this bug is to figure out
 
what could cause the header block data getting corrupted.
 
DIAGNOSTIC ANALYSIS:
 
--------------------
 
Both the diskgroups were mounted on Node-2. And the issue fixed by the
 
following steps:
 
1] Drop the affected disk from diskgroup on Node-2 where it is mounted.
2] Rebalance started and completed. After rebalance completed, diskgroup got
 
dismounted failed with
 
ERROR: empty ASM disk check aborted, diskgroup (orcl1_DATADG01)
 
ERROR: ORA-15066 thrown in RBAL for group number 1
 
ORA-15066: offlining disk "Porcl1_DATADG01_0023" may result in a data
loss
 
3] But diskgroup orcl1_DATADG01 can be mounted on both the nodes after
 
this.
 
4] Restarted both the ASM instances restarted on both nodes. All diskgroups are mounted on both the nodes. It did not looked for disk 23 any more.
 
5] On Node-1 created a dummy disk group using
 
'/dev/oracle/orcl1/orcl1_datadg01_62'. Mounted on both the nodes.
 
Dropped dummy diskgroup.
 
6] added the disk to orcl1_DATADG01 with power 11. Rebalance completed
 
without errors.
 
 
The affected disks
 
orcl1_DATADG01 : orcl1_DATADG01_0023 :
 
/dev/oracle/orcl1/orcl1_datadg01_62
 
orcl1_ARCHDG01 : orcl1_ARCHDG01_0032 :
 
/dev/oracle/orcl1/orcl1_archdg01_25
 
We do not have the disk dump of orcl1_archdg01_25 before the fix. But we have the dd dump of orcl1_datadg01_62 when the issue was seen and header was lost.
 
two affected dg:
 orcl1_DATADG01
 orcl1_ARCHDG01
 affected disks:
 
 orcl1_DATADG01 : orcl1_DATADG01_0023 :
 
  /dev/oracle/orcl1/orcl1_datadg01_62
 
 orcl1_ARCHDG01 : orcl1_ARCHDG01_0032 :
 
 /dev/oracle/orcl1/orcl1_archdg01_25
 - last time dg mounted successfully at:
 
 asm1:
 
 Tue Aug 30 20:46:42 2011
 
 NOTE: cache mounting group 2/0x729B169F (orcl1_DATADG01) succeeded
 
 SUCCESS: diskgroup orcl1_DATADG01 was mounted
 Tue Aug 30 20:46:42 2011
 
 NOTE: cache mounting group 1/0x728B169E (orcl1_ARCHDG01) succeeded
 
 SUCCESS: diskgroup orcl1_ARCHDG01 was mounted
 
 asm2:
 
 Tue Aug 30 20:59:40 2011
 
 NOTE: cache mounting group 1/0x7288AD5B (orcl1_ARCHDG01) succeeded
 
 SUCCESS: diskgroup orcl1_ARCHDG01 was mounted
 
 
 
 Tue Aug 30 20:59:40 2011
 
 NOTE: cache mounting group 2/0x7298AD5C (orcl1_DATADG01) succeeded
 
 SUCCESS: diskgroup orcl1_DATADG01 was mounted
 
 - then ASM1 restarted and diskgroup orcl1_ARCHDG01 and orcl1_DATADG01
 fail to mount.
 Sat Jan 21 09:15:18 2012
 NOTE: PST enabling heartbeating (grp 1)
 Sat Jan 21 09:15:18 2012
 ERROR: diskgroup orcl1_ARCHDG01 was not mounted
 
 NOTE: cache dismounting group 2/0x72978BD8 (orcl1_DATADG01)
 
 NOTE: dbwr not being msg'd to dismount
 
 Sat Jan 21 09:15:18 2012
 
 NOTE: PST enabling heartbeating (grp 2)
 
 Sat Jan 21 09:15:18 2012
 ERROR: diskgroup orcl1_DATADG01 was not mounted
 NOTE: cache opening disk 1 of grp 3: orcl1_REDO1DG01_0001
 path:/dev/oracle/orcl1/orcl1_redo1dg01_01
 NOTE: F1X0 found on disk 1 fcn 0.172655
 NOTE: cache mounting (not first) group 3/0x72978BD9 (orcl1_REDO1DG01)
 
 - missing disks are:
 orcl1_DATADG01 : orcl1_DATADG01_0023 :
 /dev/oracle/orcl1/orcl1_datadg01_62
 orcl1_ARCHDG01 : orcl1_ARCHDG01_0032 :
 /dev/oracle/orcl1/orcl1_archdg01_25
 As per bug update, the dd output of missing diskorcl1_DATADG01_0023 shows
 the at least the first 0x30 bytes are corrupted:
 dd if=./orcl1_datadg01_62.dd bs=4k count=1 | hexdump -C
 1+0 records in
 1+0 records out
 4096 bytes (4.1 kB) copied, 2.1049e-05 seconds, 195 MB/s
 00000000 53 00 00 00 fd ff ff ff 06 ff 00 00 d8 00 00 00
 |S...............|
 00000010 00 4a 2a 08 b0 cf ff ff ad 4d 2a 08 30 75 00 00
 |.J*......M*.0u..|
 00000020 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 |................|
 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 |................|
 00000040 00 00 10 0a 17 00 01 03 50 4f 4c 4e 46 58 4d 31
 |........orcl1|
 00000050 5f 44 41 54 41 44 47 30 31 5f 30 30 32 33 00 00
 |_DATADG01_0023..|
 00000060 00 00 00 00 00 00 00 00 50 4f 4c 4e 46 58 4d 31
 |........orcl1|
 00000070 5f 44 41 54 41 44 47 30 31 00 00 00 00 00 00 00
 |_DATADG01.......|
  00000080 53 00 00 00 fd ff ff ff 0a ff 00 00 08 00 00 00
  |S...............|
 
 
 
there is no error/activities in asm alert.log that could suggest anything suspicious. Those S J* M* characters are not written by ASM. The The first byte of ASM header should be "kfbh.endian", for Linux, it should be 0x01, but here is 53 "S". It appears that something in the operating system or HBA is overwriting the first 64 bytes of block 0 on some ASM disks. Later version of ASM could provide diskheader backup and restore. This is caused by something  else outside of Oracle.