Email: service@parnassusdata.com 7 x 24 online support!

    You are here

Oracle ASM disk group is not mounted on second node|| showing corruption

Oracle ASM disk group is not mounted on second node|| showing corruption

Hello Experts,
 
 
Envirionment :
OS: RHEL 5.6
Oracle :11.2.0.3 + PSU 5
 
 
i had is issue with disk group. i have a disk group called DATA, and this disk group is mounted successfully on first node in a two node RAC. when i tried to mount the disk group on second node,
i got
 
ASMCMD> mount data
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA" (DBD ERROR: OCIStmtExecute)
ASMCMD>
 
i have verified the permissions and compared with the first node. every thing looks correct.
 
when i ran a kfed to read the disk from second node, i got following error and it is complaining the corruption . if i ran the same command on first node in a cluster. it got successful. i am not sure
 how that can happen if nodes are reading the same device
 
 
db3: /opt/app/oracle/diag/asm/+asm/+ASM2/trace/amdu_2013_05_31_18_33_53 # kfed read /dev/raw/raw4
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
2B48D7FCC400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
 
please advice, to find out root cause.
 
 
1) It seems that the original block device(s) names bound to the raw(s) devices in questions are not referencing the original raw devices, they were relocated/reassigned, this situation usually occurrs when new disks are added into the system.  
 
2) Please obtain the following output from the affected & healthy node and provide it us:
 
Healthy node:
======================================================
script /tmp/node_ok.txt
 
raw -qa
 
cat /proc/partitions
 
exit
======================================================
 
Affected node:
 
======================================================
script /tmp/node_bad.txt
 
raw -qa
 
cat /proc/partitions
 
exit
======================================================
 
 
 
 
 
1) The best approach to guarantee device persistence (as our colleague Natalka mentioned previously) is using ASMLIB as described below:
 
 
Device Persistence with Oracle Linux ASMLib  
 
ASMLib is a support library for the Automatic Storage Management feature of Oracle Database 10g. Oracle provides a Linux specific implementation of this library. This document describes some advantages this ASMLib brings to Linux system administration. 
 
Device Persistence with Oracle Linux ASMLib 
 
 
 
This document describes some advantages the Linux specific ASM library provided by Oracle (herein "ASMLib") brings to the administration of a Linux system running Oracle. Linux often presents the challenge of disk name persistence. Change the storage configuration and a disk that appeared as     /dev/sdg yesterday can appear as        /dev/sdh after a reboot today. How can these changes be isolated so that they do not affect ASM?                                           
 
Why Not Let ASM Scan All Disks?                                        
 
 
ASM scans all disks it is allowed to discover (via the   asm_diskstring). Why not scan all the disks and let ASM determine which it cares about, rather than even worrying about disk name persistence? 
 
 
The question is notionally correct. If you pass /dev/sd* to ASM, and ASM can read the devices, ASM can indeed pick out its disks regardless of whether  /dev/sdg has changed to   /dev/sdh on this particular boot.  
 
However, to read these devices, ASM has to have permission to read these devices. That means ASM has to have user or group ownership on all devices /dev/sd*, including any system disks. Most system administrators do not want to have the oracle user own system disks just so ASM can ignore them. The potential for mistakes (DBA writing over the /home volume, etc) is way too high.   
 
 
ASMLib vs UDev or DevLabel 
 
There are various methods to provide names that do not change, including  devlabel and udev. What does ASMLib provide that these solutions do not?                                                                                  
 
The bigger problem is not specifically a persistent name - it is matching that name to a set of permissions. It doesn't matter if  /dev/sdg is now  /dev/sdh, as long as the new /dev/sdh has  oracle:dba ownership and the new /dev/sdg - which used to be  /dev/sdf - has the ownership the old  /dev/sdf used to have. The easiest way to ensure that permissions are correct is persistent naming. If a disk always appears as the same name, you can always apply the same permissions to it without worrying. In addition, you can then exclude names that match system disks. Even if the permissions are right, a system administrator isn't going to want ASM scanning system disks every time.                                                                                  
 
Now, udev or devlabel can handle keeping sdg as  sdg (or  /dev/mydisk, whatever). What does ASMLib add? A few things, actually. With ASMLib, there is a simple command to label a disk for ASM. With udev, you'll have to modify the udev configuration file for each disk you add. You'll have to determine a unique id to match the disk and learn the udev configuration syntax.                                                                                  
 
The name is also human-readable. With an Apple XServe RAID, why have a disk named /dev/sdg when it can be DRAWER1DISK2? ASMLib can also list all disks, where with udev you have to either know in your head that   sdg, sdf, and sdj are for ASM, or you have to provide names. With ASMLib, there is no chance of ASM itself scanning system disks. In fact, ASMLib never modifies the system's names for disks. ASMLib never uses the name " /dev/sdg". After boot-time querying the disks, it provides its own access to the devices with permissions for Oracle.  /dev/sdg is still owned by root:root, and the oracle user still cannot access the device by that name.                                                                                  
 
The configuration is persistent. Reinstall a system and your udev configuration is gone. ASMLib's labels are not. With udev, you have to copy the configuration over to the other nodes in a RAC. If you have sixteen nodes, you have to copy each configuration change to all sixteen nodes. Whether you use udev or devlabel, you have to set the permissions properly on all sixteen nodes. ASMLib just requires one invocation of "  /etc/init.d/oracleasm scandisks" to pick up all changes made on the other node.                                                                                  
 
These are just a few of the benefits ASMLib brings to device persistence.                                        
 
                                    
2) An example about how to setup the ASMLIB is described in the following document (I wrote it):
 
 
Note: 580153.1 How To Setup ASM on Linux Using ASMLIB Disks, Raw Devices, Block Devices or UDEV Devices?