Email: service@parnassusdata.com 7 x 24 online support!

Oracle Database Block Corruption in ASM

Oracle Database Block Corruption in ASM

we got Block corruption in the database and found the following alert log entry during that time.
 
we need to find why block corruption occur. what is the culprit OS/Storage/DB ?
 
 
 
Mon Aug 25 19:48:37 2014
 
WARNING: cache read  a corrupt block: group=1(DATA) fn=281 indblk=16 disk=8 (ASM_DATA12) incarn=3491799612 au=28481 blk=16 count=6
 
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11370.trc:
 
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]
 
NOTE: a corrupted block from group DATA was dumped to /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11370.trc
 
WARNING: cache read (retry) a corrupt block: group=1(DATA) fn=281 indblk=16 disk=8 (ASM_DATA12) incarn=3491799612 au=28481 blk=16 count=1
 
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11370.trc:
 
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]
 
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]
 
ERROR: cache failed to read group=1(DATA) fn=281 indblk=16 from disk(s): 8(ASM_DATA12)
 
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]
 
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]
 
NOTE: cache initiating offline of disk 8 group DATA
 
NOTE: process _user11370_+asm1 (11370) initiating offline of disk 8.3491799612 (ASM_DATA12) with mask 0x7e in group 1
 
NOTE: initiating PST update: grp = 1, dsk = 8/0xd020a23c, mask = 0x6a, op = clear
 
Mon Aug 25 19:48:41 2014
 
GMON updating disk modes for group 1 at 52 for pid 41, osid 11370
 
ERROR: Disk 8 cannot be offlined, since diskgroup has external redundancy.
 
ERROR: too many offline disks in PST (grp 1)
 
Mon Aug 25 19:48:42 2014
 
NOTE: cache dismounting (not clean) group 1/0x4AA052EA (DATA)
 
NOTE: messaging CKPT to quiesce pins Unix process pid: 11956, image: oracle@DB01 (B000)
 
WARNING: Offline for disk ASM_DATA12 in mode 0x7f failed.
 
Mon Aug 25 19:48:42 2014
 
NOTE: halting all I/Os to diskgroup 1 (DATA)
 
Mon Aug 25 19:48:43 2014
 
NOTE: LGWR doing non-clean dismount of group 1 (DATA)
 
NOTE: LGWR sync ABA=182.3456 last written ABA 182.3456
 
Mon Aug 25 19:48:44 2014
 
kjbdomdet send to inst 2
 
detach from dom 1, sending detach message to inst 2
 
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_11370.trc  (incident=144329):
 
ORA-15335: ASM metadata corruption detected in disk group 'DATA'
 
ORA-15130: diskgroup "DATA" is being dismounted
 
ORA-15066: offlining disk "ASM_DATA12" in group "DATA" may result in a data loss
 
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]
 
ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [281] [2147483664] [0 != 1]
 
Incident details in: /u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_144329/+ASM1_ora_11370_i144329.trc
 
Mon Aug 25 19:48:45 2014
 
List of instances:
 
1 2
 
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 4)
 
Global Resource Directory partially frozen for dirty detach
 
* dirty detach - domain 1 invalid = TRUE
 
Mon Aug 25 19:48:45 2014
 
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
 
1416 GCS resources traversed, 0 cancelled
 
ERROR: ORA-15130 thrown in RBAL for group number 1
 
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
 
ORA-15130: diskgroup "DATA" is being dismounted
 
Dirty Detach Reconfiguration complete
 
Mon Aug 25 19:48:46 2014
 
WARNING: dirty detached from domain 1
 
NOTE: cache dismounted group 1/0x4AA052EA (DATA)
 
SQL> alter diskgroup DATA dismount force /* ASM SERVER:1252020970 */
 
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
 
ERROR: ORA-15130 thrown in RBAL for group number 1
 
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
 
ORA-15130: diskgroup "DATA" is being dismounted
 
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
 
ERROR: ORA-15130 thrown in RBAL for group number 1
 
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
 
ORA-15130: diskgroup "DATA" is being dismounted
 
Mon Aug 25 19:48:52 2014
 
Dumping diagnostic data in directory=[cdmp_20140825194852], requested by (instance=1, osid=11370), summary=[incident=144329].
 
Mon Aug 25 19:48:53 2014
 
System State dumped to trace file /u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_144329/+ASM1_ora_11370_i144329.trc
 
Mon Aug 25 19:48:53 2014
 
Sweep [inc][144329]: completed
 
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
 
ERROR: ORA-15130 thrown in RBAL for group number 1
 
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
 
ORA-15130: diskgroup "DATA" is being dismounted
 
Mon Aug 25 19:48:58 2014
 
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
 
ERROR: ORA-15130 thrown in RBAL for group number 1
 
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
 
ORA-15130: diskgroup "DATA" is being dismounted
 
Mon Aug 25 19:48:58 2014
 
Sweep [inc2][144329]: completed
 
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
 
ERROR: ORA-15130 thrown in RBAL for group number 1
 
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
 
ORA-15130: diskgroup "DATA" is being dismounted
 
Mon Aug 25 19:49:01 2014
 
NOTE: ASM client PROD_1:PROD disconnected unexpectedly.
 
NOTE: check client alert log.
 
NOTE: Trace records dumped in trace file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_17812.trc
 
NOTE: cache deleting context for group DATA 1/0x4aa052ea
 
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
 
ERROR: ORA-15130 thrown in RBAL for group number 1
 
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
 
ORA-15130: diskgroup "" is being dismounted
 
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
 
ERROR: ORA-15130 thrown in RBAL for group number 1
 
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
 
ORA-15130: diskgroup "" is being dismounted
 
Mon Aug 25 19:49:07 2014
 
NOTE: AMDU dump of disk group DATA created at /u01/app/oracle/diag/asm/+asm/+ASM1/incident/incdir_144329
 
Mon Aug 25 19:49:10 2014
 
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
 
ERROR: ORA-15130 thrown in RBAL for group number 1
 
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
 
ORA-15130: diskgroup "" is being dismounted
 
ERROR: ORA-15130 in COD recovery for diskgroup 1/0x4aa052ea (DATA)
 
ERROR: ORA-15130 thrown in RBAL for group number 1
 
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_15839.trc:
 
ORA-15130: diskgroup "" is being dismounted
 
Mon Aug 25 19:49:15 2014
 
GMON dismounting group 1 at 53 for pid 43, osid 11956
 
Mon Aug 25 19:49:15 2014
 
NOTE: Disk ASM_DATA01 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA02 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA03 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA07 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA08 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA09 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA19 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA11 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA12 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA13 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA14 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA15 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA16 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA17 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA18 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA21 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA22 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA23 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA24 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA10 in mode 0x7f marked for de-assignment
 
NOTE: Disk ASM_DATA20 in mode 0x7f marked for de-assignment
 
SUCCESS: diskgroup DATA was dismounted
 
SUCCESS: alter diskgroup DATA dismount force /* ASM SERVER:1252020970 */
 
SUCCESS: ASM-initiated MANDATORY DISMOUNT of group DATA
 
Mon Aug 25 19:49:16 2014
 
NOTE: diskgroup resource ora.DATA.dg is offline
 
Mon Aug 25 19:51:34 2014
 
SQL> ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:51111:41484} */
 
NOTE: cache registered group DATA number=1 incarn=0x4aa09b74
 
NOTE: cache began mount (not first) of group DATA number=1 incarn=0x4aa09b74
 
NOTE: Assigning number (1,0) to disk (ORCL:ASM_DATA01)
 
NOTE: Assigning number (1,1) to disk (ORCL:ASM_DATA02)
 
NOTE: Assigning number (1,2) to disk (ORCL:ASM_DATA03)
 
NOTE: Assigning number (1,3) to disk (ORCL:ASM_DATA07)
 
NOTE: Assigning number (1,4) to disk (ORCL:ASM_DATA08)
 
NOTE: Assigning number (1,5) to disk (ORCL:ASM_DATA09)
 
NOTE: Assigning number (1,19) to disk (ORCL:ASM_DATA10)
 
NOTE: Assigning number (1,7) to disk (ORCL:ASM_DATA11)
 
NOTE: Assigning number (1,8) to disk (ORCL:ASM_DATA12)
 
NOTE: Assigning number (1,9) to disk (ORCL:ASM_DATA13)
 
NOTE: Assigning number (1,10) to disk (ORCL:ASM_DATA14)
 
NOTE: Assigning number (1,11) to disk (ORCL:ASM_DATA15)
 
NOTE: Assigning number (1,12) to disk (ORCL:ASM_DATA16)
 
NOTE: Assigning number (1,13) to disk (ORCL:ASM_DATA17)
 
NOTE: Assigning number (1,14) to disk (ORCL:ASM_DATA18)
 
NOTE: Assigning number (1,6) to disk (ORCL:ASM_DATA19)
 
NOTE: Assigning number (1,20) to disk (ORCL:ASM_DATA20)
 
NOTE: Assigning number (1,15) to disk (ORCL:ASM_DATA21)
 
NOTE: Assigning number (1,16) to disk (ORCL:ASM_DATA22)
 
NOTE: Assigning number (1,17) to disk (ORCL:ASM_DATA23)
 
NOTE: Assigning number (1,18) to disk (ORCL:ASM_DATA24)
 
Mon Aug 25 19:51:34 2014
 
GMON querying group 1 at 55 for pid 27, osid 8831
 
NOTE: cache opening disk 0 of grp 1: ASM_DATA01 label:ASM_DATA01
 
NOTE: F1X0 found on disk 0 au 2 fcn 0.7050972
 
NOTE: cache opening disk 1 of grp 1: ASM_DATA02 label:ASM_DATA02
 
NOTE: cache opening disk 2 of grp 1: ASM_DATA03 label:ASM_DATA03
 
NOTE: cache opening disk 3 of grp 1: ASM_DATA07 label:ASM_DATA07
 
NOTE: cache opening disk 4 of grp 1: ASM_DATA08 label:ASM_DATA08
 
NOTE: cache opening disk 5 of grp 1: ASM_DATA09 label:ASM_DATA09
 
NOTE: cache opening disk 6 of grp 1: ASM_DATA19 label:ASM_DATA19
 
NOTE: cache opening disk 7 of grp 1: ASM_DATA11 label:ASM_DATA11
 
NOTE: cache opening disk 8 of grp 1: ASM_DATA12 label:ASM_DATA12
 
NOTE: cache opening disk 9 of grp 1: ASM_DATA13 label:ASM_DATA13
 
NOTE: cache opening disk 10 of grp 1: ASM_DATA14 label:ASM_DATA14
 
NOTE: cache opening disk 11 of grp 1: ASM_DATA15 label:ASM_DATA15
 
NOTE: cache opening disk 12 of grp 1: ASM_DATA16 label:ASM_DATA16
 
NOTE: cache opening disk 13 of grp 1: ASM_DATA17 label:ASM_DATA17
 
NOTE: cache opening disk 14 of grp 1: ASM_DATA18 label:ASM_DATA18
 
NOTE: cache opening disk 15 of grp 1: ASM_DATA21 label:ASM_DATA21
 
NOTE: cache opening disk 16 of grp 1: ASM_DATA22 label:ASM_DATA22
 
NOTE: cache opening disk 17 of grp 1: ASM_DATA23 label:ASM_DATA23
 
NOTE: cache opening disk 18 of grp 1: ASM_DATA24 label:ASM_DATA24
 
NOTE: cache opening disk 19 of grp 1: ASM_DATA10 label:ASM_DATA10
 
NOTE: cache opening disk 20 of grp 1: ASM_DATA20 label:ASM_DATA20
 
NOTE: cache mounting (not first) external redundancy group 1/0x4AA09B74 (DATA)
 
Mon Aug 25 19:51:35 2014
 
kjbdomatt send to inst 2
 
Mon Aug 25 19:51:35 2014
 
NOTE: attached to recovery domain 1
 
NOTE: redo buffer size is 256 blocks (1053184 bytes)
 
Mon Aug 25 19:51:35 2014
 
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATA)
 
NOTE: LGWR found thread 1 closed at ABA 182.3456
 
NOTE: LGWR mounted thread 1 for diskgroup 1 (DATA)
 
NOTE: LGWR opening thread 1 at fcn 0.11665696 ABA 183.3457
 
NOTE: cache mounting group 1/0x4AA09B74 (DATA) succeeded
 
NOTE: cache ending mount (success) of group DATA number=1 incarn=0x4aa09b74
 
Mon Aug 25 19:51:35 2014
 
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
 
SUCCESS: diskgroup DATA was mounted
 
SUCCESS: ALTER DISKGROUP DATA MOUNT  /* asm agent *//* {1:51111:41484} */
 
Mon Aug 25 19:51:48 2014
 
NOTE: client PROD_1:PROD registered, osid 21529, mbr 0x1
 
Mon Aug 25 19:55:53 2014
 
NOTE: ASM client PROD_1:PROD disconnected unexpectedly.
 
NOTE: check client alert log.
 
NOTE: Trace records dumped in trace file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_21529.trc
 
Mon Aug 25 19:57:42 2014
 
NOTE: client PROD_1:PROD registered, osid 2546, mbr 0x1
 
 
 
 
 
 
 
 
 
 
Unfortunately, such analysis and recommendations cannot be accomplished in the space of communities forum.  To try to determine the root cause for a corruption of ASM metadata requires the collection of a large amount of data for analysis.  This will need to be addressed in a formal service request to Oracle support to see if we are able to determine the root cause for the corruption of your metadata.
 
 
 
Please open a service request to Oracle support to conduct root cause analysis of an ASM disk header metadata corruption.  When you open the service request you will want to have already collected all of the following to be uploaded to the service request for analysis.
 
 
 
1. 
 
Please upload the text version of the ASM alert.log  If RAC from all ASM instances in the
 
cluster.  This file should be in the diagnostic destination trace directory for your Grid
 
Infrastructure installation and should be named alert_+ASM.log or if RAC the +ASM will be
 
appended with instance number.
 
 
 
2.  Assuming that you are on 11.2 or higher version of Oracle software you will want to also collect the following as the root user even if this is a single instance configuration
 
 
 
Please review CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide (Doc ID 330358.1)  From
 
this document collect the diagcollection script output as the root user.  This script will
 
produce a minimum of 3 zip files per Grid Infrastructure home named crsDATA.zip, ocrDATA.zip
 
and osDATA.zip.  There may be other files as well such as coreDATA.zip.
 
Please compress all of the zip files produced into a single zip file per Grid Infrastructure home.
 
 
 
3.  Assuming that your ASM instance is presently up and running collect the following
 
 
 
Please collect and upload the output of scripts 1, 2 and 3 from
 
How To Gather/Backup ASM Metadata In A Formatted Manner version 10.1, 10.2, 11.1, 11.2 and 12.1? (Doc ID 470211.1)
 
 
 
4.  Please use syntax similar to the following to get an AMDU dump of the impacted diskgroup so that we can determine the extent of the metadata corruption.
 
 
 
If you are not on 11.1 or higher refer to the next document to download the needed executable to run the command: 
 
Placeholder for AMDU binaries and using with ASM 10g (Doc ID 553639.1)
 
 
Once you have the executable run the next command  Note the directory created by the command and zip up the contents of the directory for upload to Oracle via the service request.
 
 
 
NOTE:  This command is specific to your environment and affected diskgroup so will not work on other systems but can be modified to work by changing diskstring entry to the diskstring being used and changing the diskgroup name at the end of the line to your diskgroup name
 
 
 
$ amdu -diskstring '/dev/oracleasm/disks/*' -dump 'DATA'
 
 
 
We will also need the output of the next command.  Note that this too is specific to your particular situation.  Do not be alarmed, the command provided only takes a full binary copy of the first 50 MB of the disk and creates a file to be uploaded for our analysis.
 
 
 
dd if=/dev/oracleasm/disks/ASM_DATA12 of=/tmp/DATA12.dd bs=1048576 count=50
 
 
 
Please collect the information requested in the next document (I created it) and provide it:
 
 
 
Collecting The Required Information For Support To Validate & Troubleshooting ASM Diskgroup Corruptions. (Doc ID 1675152.1)
 
 
 
 
When the ASM corruption occurs, you need to collection AMDU and disk's first 50mb backup to find the root cause of the ASM disk corruption.
 
Below are the possible issues of ASM corruption
 
 
 
• Disks formatted at the OS level while it was used by ASM
 
• Disks assigned to a file system while used by ASM
 
• IO errors (stale writes)
 
• Usage of 3rd party software
 
 
 
But to check what has happened in the disk, we need AMDU and disk backup. without this information it is not possible to find the cause of the issue.
 
Once you collect these information, you can recreate the diskgroup and restore the database from backup.
 
In your current scenario, You didn't recreate the diskgroup, You restore and recovers the database from backup.
 
 
 
Now you need to run "alter diskgroup <dg_name> check all norepair to check the disk for corruption. Unless the ASM is not touching that corrupted block, it will not throw ORA_15196 error again. but if disk corruption exist, when you run check all norepair, ASM will crash when it touches that corrupted block.