Email: service@parnassusdata.com 7 x 24 online support!

ORA-15196 Oracle ASM CASE STUDY: UNDERSTANDING ERROR ORA-15196

ORA-15196 Oracle ASM CASE STUDY: UNDERSTANDING ERROR ORA-15196

 

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

This document provides an explanation of error ORA-15196, including the details of each argument, suggestions for the diagnostic of the error and finally includes a case study using a real problem reported by a customer.

 

Error Description

 

ORA-15196 is reported after a validation of an ASM metadata block has failed. The error will be reported in the following format:

ORA-15196: invalid ASM block header [1st] [2nd] [3rd] [4th] [5th != 6th]

 

Where the arguments indicate:

Argument                   Meaning

  • 1st                           Function and line number in the code, where the exception is raised 2nd        Field failing the validation
  • 3rd                         ASM object number stored in the block
  • 4th                         ASM block number stored in the block
  • 5th                         Value associated with  field referenced by argument 2 6th    Expected value for field referenced by argument 2

 

 

Example:

 

ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]

 

Function and line number in the code, where the exception is raised = kfc.c:7997

Field failing the validation = endian_kfbh ASM object number stored in the block = ASM block number stored in the block = 93

Value associated  with field referenced by argument #2 = 211

Expected value for field referenced by argument #2 = 0

 

Arguments description

 

  • Function and line number in the code, where the exception is raised

 

In general terms it is valid to say this argument will be the same in most of the possible cases, because is always the same routine where this exception is raised.

 

#define kfbValid(data, len, type, bl) \

kfbValidPriv(data, len, type, bl,     FILE    ,     LINE   ).

 

  • Field failing the validation

 

The ASM metadata is composed by many different structures like file directory, disk directory, active change directory (ACDC), etc, which are organized by files (asm file# between 1 and 255). Each file will be made of extents, which will be made of ASM block (4096 bytes). Each block has a generic block header (kfbh), and any of those fields can be validated.

 

kfbh.endian:                           0 ; 0x000: 0x00
kfbh.hard:                           130 ; 0x001: 0x82
kfbh.type:                             4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt:                           1 ; 0x003: 0x01
kfbh.block.blk:                       80 ; 0x004: T=0 NUMB=0x50
kfbh.block.obj:                        1 ; 0x008: TYPE=0x0 NUMB=0x1
kfbh.check:                   4268948098 ; 0x00c: 0xfe72fa82
kfbh.fcn.base:                         0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                         0 ; 0x014: 0x00000000
kfbh.spare1:                           0 ; 0x018: 0x00000000
kfbh.spare2:                           0 ; 0x01c: 0x00000000

 

 

A short description of each of the fields referenced above (file kf3.h):

 

kfbh.endian                  endianness  of  writer          big or little endian

kfbh.hard                      H.A.R.D. magic # and block size

kfbh.type                       metadata block type   (type of ASM metadata)

kfbh.datfmt                  metadata block data format

kfbh.block.blk              block location of this block

kfbh.block.obj              check value to verify consistency

kfbh.check                     change number of last change

kfbh.spare1                    zero pad out to 32bytes

kfbh.spare2                    zero pad out to 32 bytes

A list of the fields reported by this error through different SR is:

 

 

endian_kfbh
obj_kfbl hard_kfbh
type_kfbh
datfmt_kfbh
check_kfbh

 

  • ASM object number stored in the block

 

Every ASM metadata block belongs to a specific file associated with a specific ASM structure. That’s why ASM File numbers between 1 and 255 are used to identify the files storing those structures. The value on this field, references the ASM file number.

ASM File Number     ASM Metadata

1                                                 File Directory

2                                                Disk Directory

3                                                Active Change Directory (ACD)

4                                                Continous Operations Directory (COD)

5                                                Template Directory

6                                                Alias Directory

9                                                Attributes Directory

12                                              Staleness Directory

 

For other ASM metadata structures like PST, ATB, DISK HEADER, this field will have a static value 2147483648 (0x80000000)

 

  • ASM block number stored in the block

 

An ASM file will allocate extents, which are associated with Allocation Units. Multiple ASM metadata blocks of 4096 bytes make the extent, considering the default Allocation Unit size of 1MB; there are 256 blocks on each extent/AU.

 

The value stored on this field indicates the block number relative to a particular file. In this example, (93) is the block number, which will be stored in the first extent of the file. That extent will be allocated on a specific Allocation Unit of any of the disks in the diskgroup.

 

  • Value associated with field referenced by argument #2

 

This is the value found in the block for the field referenced in argument #2.

 

  • Expected value for field referenced by argument 2

 

This is the expected value for the block referenced by argument # 2.

 

 

Having the description of all the arguments for error ORA-15196, It should be possible to have a better understanding of the message:

 

ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]

 

In the previous example, the field failing the validations is endian_kfbh, belong to file 1 (FILE DIRECTORY); it was also relative block 93, and the value for endian_kfbh was  211 while the correct value should have been 0.

 

Diagnostics

 

Up to 10gR2, there are some bugs (patch included) related to this error.

 

5554692 Related  to  indirect  extent  allocation.      Please  read  the  bug descriptionin webiv, because not all cases of ORA-15196 are this particular bug.
6027802 This was closed as not a bug, but was related to some IO issues caused by EMC Powerpath. Same type of data mismatch has been observed on other PP installations
6453944 ORA-15196 with ASM disks larger than 2TB using ASMLIB
   

 

 

The major number of issues of this error is associated with data changed outside of ASM.  This include:

 

  • Disks formatted at the OS level while it was used by ASM
  • Disks assigned to a file system while used by ASM
  • IO errors (stale writes)
  • Usage of 3rdparty software

 

Once this error is reported, the diskgroup needs to be recreated. There are situations where diskgroup cannot be mounted, or others where any reference to the metadata (recursive or non recursive), will signal the error and dismount the diskgroup.

 

Data Collection

 

In order to understand the extension of the problem and produce a correct diagnostic, it is  essential to obtain the following data:

 

  1. Alert.log and trace file associated to the error
  2. First 300MB of the disk affected with the error

 

In the alert.log, review the line before the report of error ORA-15196:

 

WARNING: cache failed to read fn=1 blk=80 from disk(s): 0

ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]

 

In the line prior the report of error ORA-15196, it indicates the disk storing the block:  from disk(s): 0.

 

To get the first 300MB:

 

$dd if=<device path> of=/tmp/disk.dd bs= 1048576 count=300

 

It may be necessary to provide partial copy of other disks in the diskgroup.

 

  1. Output from AMDU if available

 

AMDU will be explained with more detail in a different note (TBD).

 

This tool is part of the New Features introduced with 11g. It reads the ASM disks and extract information into different files. Those files have a mapping of the  ASM metadata, an image with the content of the disks or it is possible to extract files from the diskgroup.

 

AMDU can extract the information even if the diskgroup is dismounted.

 

The mapping file is very important for the diagnostic of error ORA-15196. It has the specific location for each of the extents of each ASM metadata file.

 

Note 553639.1 is the placeholder for the AMDU binaries for some of the platforms.

 

 

Data Review

 

  1. Always review other blocks in the boundaries of the affected block. If more than one block has incorrect data (zeros), and they belong to different ASM structures (file directory, disk directory, etc), it is most likely was caused outside of ASM: disk reformatted, assigned to another volume manager, etc.

 

Use kfed to extract the content of the blocks.

 

  1. Reviewing the trace file generated by the error.

 

The trace file always will print a dump of the ASM metadata block in memory,  and also a short call stack. The output of the block is the same generated by kfed, which is a readable by the user.

 

*** SERVICE NAME:() 2008-01-23 11:57:23.892

*** SESSION ID:(39.74) 2008-01-23 11:57:23.892

OSM metadata block dump:

kfbh.endian:                           0 ; 0x000: 0x00 kfbh.hard:  130 ; 0x001: 0x82

kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                      80 ; 0x004: T=0 NUMB=0x50

kfbh.block.obj:                       1 ; 0x008: TYPE=0x0 NUMB=0x1 kfbh.check:                                     4268948098 ; 0x00c: 0xfe72fa82 kfbh.fcn.base:           0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

 

/*  data remove on purpose */

 

After the OSM metadata block dump, the short call stack is printed:

 

—– Abridged Call Stack Trace —–

 

kfcReadBlk()+1276 kfcLoad()+2148 kffbScanNext()+252 kffbTableCb()+700 kfgTableCb()+1252 kffilTableCb()+240 qerfxFetch()+896 qersoFetch()+720 qerjotFetch()+184 opifch2()+8092 kpoal8()+4196 opiodr()+1548 ttcpip()+1284 opitsk()+1432 opiino()+1128 opiodr()+1548 opidrv()+896 sou2o()+80 opimai_real()+124 main()+152

 

  1. Compare the data in the trace file with the data extracted from disk using kfed.

 

Comparing the block dumped in the trace file and the block in disk, it is possible to identify the exact cause of the check validation failure. Every case will be different, but if the data stored in disk is zeros, always remember to validate other blocks (adjacent). If more blocks are reporting invalid data (zeros), this is an indication the disk has been formatted outside ASM.

 

Example 1:

 

This  is  an  example  of  a  block  with   invalid  data.      The  type  of  the  block is KFBTYP_INVALID, generated when a incorrect type is stored.

 

kfbh.endian:                           0 ; 0x000: 0x00

kfbh.hard:                            34 ; 0x001: 0x22

kfbh.type:                            0 ; 0x002: KFBTYP_INVALID

kfbh.datfmt:                          0 ; 0x003: 0x00

kfbh.block.blk:              4290772992 ; 0x004: T=1 NUMB=0x7fc00000

kfbh.block.obj:                     0 ; 0x008: TYPE=0x0 NUMB=0x0

kfbh.check:                           0 ; 0x00c: 0x00000000

kfbh.fcn.base:                    13879 ; 0x010: 0x00003637

kfbh.fcn.wrap:                      512 ; 0x014: 0x00000200

kfbh.spare1:                      978943 ; 0x018: 0x000eefff

kfbh.spare2:                      2054913149 ; 0x01c: 0x7a7b7c7d

 

 

 

 

 

Example 2:

 

The full content of the block has 0xd4.

 

disk:0 au:2 block:253 file:1 physical extent:0 block:253
kfed	read	ausz=1048576	blksz=4096	aunum=2	blknum=253 dev=/dev/rdsk/c2t50060E8000C41384d2s6

kfbh.endian:	212 ; 0x000: 0xd4
kfbh.hard:	212 ; 0x001: 0xd4
kfbh.type:	212 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:	212 ; 0x003: 0xd4
kfbh.block.blk:	3570717908 ; 0x004: T=1 NUMB=0x54d4d4d4 
kfbh.block.obj:	3570717908 ; 0x008: TYPE=0xd NUMB=0x4d4d4 
kfbh.check:	3570717908 ; 0x00c: 0xd4d4d4d4
kfbh.fcn.base:	3570717908 ; 0x010: 0xd4d4d4d4 
kfbh.fcn.wrap:	3570717908 ; 0x014: 0xd4d4d4d4 
kfbh.spare1:	3570717908 ; 0x018: 0xd4d4d4d4 
kfbh.spare2:	3570717908 ; 0x01c: 0xd4d4d4d4 
kfbtTraverseBlock: Invalid OSM block type 212
0000: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0020: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0040: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0060: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0080: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 00a0: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4
 

 

 

CASE STUDY

 

 

The diskgroup was not used for some months, used by a copy of a database. Due to business reasons, that database required to be used. Mounting the diskgroup was possible, but when the database was mounted, and reading the ASM metadata was required, error ORA-15196 was signaled and diskgroup dismounted.

 

The diskgroup was configured using external redundancy with a single disk and using the default Allocation Unit size of 1MB.

 

Data Collected

 

  1. The messages in the alert.log:

 

WARNING: cache failed to read fn=1 blk=256 from disk(s): 0

ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]

 

 

  1. The ASM block dumped in the trace file.

 

 

*** SESSION ID:(108.5) 2008-02-06 10:05:31.054

OSM metadata block dump:

kfbh.endian:                           0 ; 0x000: 0x00 kfbh.hard:  130 ; 0x001: 0x82

kfbh.type:                    7 ; 0x002: KFBTYP_ACDC

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                   10752 ; 0x004: T=0 NUMB=0x2a00

kfbh.block.obj:               3 ; 0x008: TYPE=0x0 NUMB=0x3

kfbh.check:                                     1103194877 ; 0x00c: 0x41c16afd

kfbh.fcn.base:                                               0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

 

  1. AMDU together with 300MB for the disk were collected.

 

Data Review

 

  1. The error:

 

WARNING: cache failed to read fn=1 blk=256 from disk(s): 0

ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]

 

The error provides the following information:

 

o    The field failing the validation is obj_kfbl

o    The block belongs to file 1 (fn=1).  File 1 is the File Directory.

o    The block is block 256 (blk=256)

o    The value for obj_kfbl found was 3 but the expected value should be 1.

 

 

File extents, allocation units, blocks in ASM start at 0. Also, block size is 4096. Using the default AU size (1MB), there are 256 blocks. Block 256 is stored in the second extent.

 

 

Although the diskgroup was mounted, any query referencing x$kffxp trying to get the extent mapping for file 1 failed. As a result, it was not possible to identify the AU used by block 256 from file 1 (the affected block).

 

  1. Using AMDU

 

One of the files generated by AMDU is the mapping file (*.map) . That file contains the location on disk for every extent of the files stored in the diskgroup. The only record for file 1 was:

 

N0001 D0000 R00 A00000002 F00000001 I0 E00000000 U00 C00256 S0001 B0002097152

 

This line indicates that for File 1 (F00000001)), the first extent is stored  in  Allocation Unit 2 ( A00000002  ) from disk 0 ( D0000    )   .

 

t was not  another entry for file 1 in the mapping file, but AMDU was generating  a core dump.  It was discovered AMDU was trying to read Allocation Unit 50.

 

One of the cool things of AMDU, is the possibility of dumping the content of a complete extent for a particular file, redirecting the output into a text file.

 

$amdu –diskstring ‘<path of device>’ –dump ‘<diskgroup name> -print ‘DG.F1.X1.B0.C256’

 

The previous command will dump 256 blocks of File 1 Extent 1 starting at block  0.

 

The results of the last command were:

 

************************** PRINTING XYZ.F1.X1.B0.C2 **************************

 

——————————– BLOCK 1 OF 2 ——————————–

…………………………………………………………………

disk:0 au:50 block:0 file:1 physical extent:1 block:0

kfed          read          ausz=1048576          blksz=4096          aunum=50          blknum=0 dev=/emea/bde/home/users/jfiguer2/disk.dd

 

At this point the conclusions were:

 

  • The ASM metadata shows that Allocation Unit 50 from disk 0 belongs to File 1.

 

——————————– BLOCK 1 OF 2 ——————————–

…………………………………………………………………

disk:0 au:50 block:0 file:1 physical extent:1 block:0

kfed       read       ausz=1048576        blksz=4096       aunum=50       blknum=0 dev=/emea/bde/home/users/jfiguer2/disk.dd

 

  • If the block belongs to file 1, the value for kfbh.block.obj field should have been 1 together with the value for kfbh.type, which should have been KFBTYP_FILEDIR. But that was not the case:

 

The error ORA-15196:

 

WARNING: cache failed to read fn=1 blk=256 from disk(s): 0

ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]

 

  • The content dumped into the trace file was the same found on disk. The check validation failed because the data stored in the block was not part of the correct ASM metadata, in this case file directory.

 

The next step was to validate all the blocks in the same Allocation Unit. Those blocks belong to the same ASM metadata (KFBTYP_FILEDIR). One Allocation Unit is used exclusively by one unique file.

 

Example for block 1 from AU 50:

 

disk:0 au:50 block:1 file:1 physical extent:1 block:1

kfed        read        ausz=1048576         blksz=4096         aunum=50        blknum=1 dev=/emea/bde/home/users/jfiguer2/disk.dd

 

 

 

 

The solution

 

There was not an available backup for the database stored on the diskgroup, so it was required to keep the diskgroup mounted. Patching the ASM metadata, replacing the content of the first block from Allocation Unit 50, with a valid data.

 

It was not possible to rebuild the real data for the block 0, so it was replaced with block

  1. Additional patching was required, in order to adjust other fields in the block. Once the block was successfully patched, the diskgroup was mounted and queries on internal views did not dismount the diskgroup.

 

Opening the database report errors trying to identify one data file. The extent mapping  for this file was stored in the patched block. Luckily that file was not relevant for the database.  After setting the file offline, the database opened  without errors.

 

Because was not possible to guarantee the integrity of the diskgroup, it was recommended to take a backup of the database and rebuild the diskgroup