Oracle ASM ORA-15063 / ORA-15042 - TROUBLESHOOTING STEPS BEFORE OPENING a SR to Oracle Support

Posted by PDSERVICE on Apr 09, 2020 In

APPLIES TO:

Oracle Database - Enterprise Edition

Oracle Database Cloud Schema Service - Version N/A and later

Oracle Database Exadata Cloud Machine - Version N/A and later

Oracle Cloud Infrastructure - Database Service - Version N/A and later

Oracle Database Cloud Exadata Service - Version N/A and later

Information in this document applies to any platform.

PURPOSE

Self-debugging steps when a diskgroup cannot be mounted due to error ORA-15063:

ORA-15063: ASM discovered an insufficient number of disks for diskgroup s%

ORA-15040: diskgroup is incomplete

ORA-15042: ASM disk "%" is missing

TROUBLESHOOTING STEPS

SECTION A - Getting started

Start by refering NOTE 452770.1 "TROUBLESHOOTING - ASM disk not found/visible/discovered issues "

Firstly identify all disks being part of the affected diskgroup by looking at last successful mount in alert_+ASM*.log.

You should search for a section as below:

SQL> ALTER DISKGROUP <DGNAME1> MOUNT /* asm agent *//* {0:0:214} */

NOTE: cache registered group DATA number=1 incarn=0x44bef6bb

NOTE: cache began mount (not first) of group DATA number=1 incarn=0x44bef6bb

NOTE: Loaded library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so

NOTE: Assigning number (1,0) to disk (ORCL:DATA01P)

NOTE: Assigning number (1,1) to disk (ORCL:DATA02P)

NOTE: Assigning number (1,2) to disk (ORCL:DATA03P)

NOTE: Assigning number (1,3) to disk (ORCL:DATA04P)

NOTE: Assigning number (1,4) to disk (ORCL:DATA05P)

NOTE: cache opening disk 0 of grp 1: DATA01P label:DATA01P

NOTE: cache opening disk 1 of grp 1: DATA02P label:DATA02P

SUCCESS: DISKGROUP <DGNAME1> was mounted

NOTE: When ASMLIB is not used the path to ASM disk is specified within the mount section:

NOTE: cache opening disk 1 of grp 1: REDO3_0001 path:/dev/mpath/3600601600ba12c00d4b784363e69e211

NOTE: cache opening disk 2 of grp 1: REDO3_0002 path:/dev/mpath/3600601600ba12c00d4b784363e69e212

...

Isolate the device(s) reported as "missing" as note 452770.1 suggested.

Finally start your checks as follow:

A1) If there is any IO/storage/multipathing errors reported in OS logs - investigate and fix them.

This step is mandatory as usually ORA-15063/ORA-15042 are caused by underlying IO/storage errors .

A2) If devices used by ASM disks are properly presented and configured at OS level.

If additionally "ORA-15075: disk(s) are not visible cluster-wide" is reported, make sure that all devices are cluster-wide visible.

A3) If all ASM disks have appropriate permissions (eg: they should be owned by grid owner)

If ownership of ASM disk(s) has been changed for whatever reason, please correct that.

A4) If/how the "missing" device(s) is reported when querying v$asm_disks

-----------------------------------------------------------------------------------

If the device(s) is reported with status:

=> "PROVISIONED/CANDIDATE" - this means the header of ASM disk(s) is damaged.

-> investigate the IO problems behind the corruption - see step A1. Oracle never wipes out its metadata!! A checksum is made for every write before being accepted.

-> check the header status, in order to confirm the damage:

$> kfed read <path_to_your_missing_devices>

kfbh.endian: 0 ; 0x000: 0x00

kfbh.hard: 0 ; 0x001: 0x00

kfbh.type: 0 ; 0x002: KFBTYP_INVALID

kfbh.datfmt: 0 ; 0x003: 0x00

kfbh.block.blk: 0 ; 0x004: blk=0

kfbh.block.obj: 0 ; 0x008: file=0

....

-> try to repair the header and see if diskgroup can be mounted:

$> kfed repair <path_to_your_missing_devices>

-> check the if there is additional corruptions reported by ASM (eg ORA-15196) or by your database - as IO/storage problems could affect more than one block.

If any corruption is seen please open a SR to Oracle Support.

NOTE:

1) When non-default AU size is used AUSZ=<au_size> must be specified with each KFED command.

2) "kfed repair" works for 11g ONLY!

=> "UNKNOWN/IGNORED" - this means the ASM disk(s) is not seen at OS level.

-> review steps A1,A2 and A3:

-----------------------------------------------------------------------------------

A5) If asm_diskstring is still properly set.

On Windows configuration, you can also refer NOTE 880061.1 "ASM Is Unable To Detect SCSI Disks On Windows"

SECTION B - ASMLIB is used

When ASMLIB is used, follow the above steps (section A) and also check the errors associated with ORA-15063:

B1) ORA-15183 Unable to initialize the ASMLIB in oracle/ORA-15183: ASMLIB initialization error [driver/agent not installed]

Refer: NOTE 340519.1 Cannot Start ASM Ora-15063/ORA-15183

B2) ORA-15186: ASMLIB error function = [asm_open], error = [1], mesg = [Operation not permitted]

Check your ASMLIB health.

=> correctness of installed rpm's

=> correctness of symlinks - all nodes should show:

# ls -l /etc/sysconfig/oracleasm

lrwxrwxrwx 1 root root 24 Sep 18 22:10 /etc/sysconfig/oracleasm -> oracleasm-_dev_oracleas

=> correctness of ASMLIB configuration (/etc/sysconfig/oracleasm) - when multipathing is used:

# ORACLEASM_SCANORDER: Matching patterns to order disk scanning

ORACLEASM_SCANORDER="dm"

# ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan

ORACLEASM_SCANEXCLUDE="sd"

B3) Check if ASMLIB disks are listed under /dev/oracleasm/disks

=> devices under /dev/oracleasm/disks/* must be reported as dm devices on all nodes (not single path device -sd*-).If not, please correct that! (see step B2)

$> ls -al /dev/oracleasm/disks

brw-rw---- 1 grid dba 253, 29 Feb 12 11:44 /dev/oracleasm/disks/DATA01P

brw-rw---- 1 grid dba 253, 35 Feb 12 11:44 /dev/oracleasm/disks/DATA02P

brw-rw---- 1 grid dba 253, 27 Feb 15 16:04 /dev/oracleasm/disks/DATA03P

brw-rw---- 1 grid dba 253, 24 Feb 12 11:44 /dev/oracleasm/disks/DATA04P

brw-rw---- 1 grid dba 253, 25 Feb 12 11:44 /dev/oracleasm/disks/DATA05P

=> If one of your ASMLIB disk(s) is missing from the above output, first try to re-scan devices, as root:

# /etc/init.d/oracleasm scandisks

=> If ASMLIB disk(s) is still missing from /dev/oracleasm/disks, engage your sysadmin to investigate this (see steps A1, A2, A3).

B4) Check if ASMLIB disk(s) has the correct ASMLIB stamp and status:

$> kfed read <ASMLIB_device> |grep provstr

kfdhdb.driver.provstr: ORCLDISK<diskname> ; 0x000: length=20

$> kfed read <ASMLIB_device> | egrep 'kfbh.type|kfdhdb.dskname|kfdhdb.hdrsts'

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfdhdb.dskname: DATA01P ; 0x028: length=14

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

=> If the output is "kfdhdb.driver.provstr: ORCLCLRD" (but kfdhdb.hdrsts= MEMBER and kfbh.type=KFBTYP_DISKHEAD) then your disk was deleted using "oracleasm deletedisk".

=> If kfbh.type = KFBTYP_INVALID -> see step A4) and check if "kfed repair" could fix the problem.

B5)Refer also the below documents:

NOTE: 398622.1 ORA-15186: ASMLIB error function = [asm_open], error = [1], mesg = [Operation not permitted]

NOTE: 1384504.1 Mount ASM Disk Group Fails : ORA-15186, ORA-15025, ORA-15063

NOTE: 967461.1 "Multipath: error getting device" seen in OS log causes ASM/ASMlib to shutdown by itself

NOTE: 1526920.1 ORA-15186 ORA-15063 on node 2

SECTION C - Additional notes to review

If the above checks are done, but error still persists, please review also the below notes, depending on your configuration/situation:

NOTE: 577526.1 ORA-15063 ASM Discovered An Insufficient Number Of Disks For Diskgroup using NetApp Storage

NOTE: 784776.1 ORA-15063 When Mounting a Diskgroup After Storage Cloning ( BCV / Split Mirror / SRDF / HDS / Flash Copy )

NOTE: 555918.1 ORA-15038 On Diskgroup Mount After Node Eviction

NOTE: 1484723.1 ASM Candidate Raw Device Is Not Presented As A RAC Cluster Wide Shared character Devices On Unix.

NOTE: 1534211.1 ORA-15017 and ORA-15063 errors for unused diskgroups in 11.2

NOTE: 1487443.1 Mounting Diskgroup Fails With ORA-15063 and V$ASM_DISK Shows PROVISIONED

NOTE: 742832.1 AIX:After changing Multipathing drivers from RDAC to MPIO ASM discovered an insufficient number of disks

NOTE: 1276913.1 Unable to discover or use raw devices for ASM in HP-UX Itanium in 11.2.0.2 ( ORA-15063 )

SECTION D - Information to be collected when are you going to open a SR

If you are not able to fix the problem on your own, please collect the below information and raise a SR to Oracle Support

D1) alert_+ASM*.log (from all nodes if RAC)

D2) script#1 from NOTE 470211.1 How To Gather/Backup ASM Metadata In A Formatted Manner version 10.1, 10.2, 11.1 & 11.2?

D3) KFED reports

#! /bin/sh

rm /tmp/kfed_DH.out /tmp/kfed_BK.out

for i in `ls <your_path_to_asm_disks>`

echo $i >> /tmp/kfed_DH.out

kfed read $i >> /tmp/kfed_DH.out

echo $i >> /tmp/kfed_BK.out

kfed read $i aun=1 blkn=254 >> /tmp/kfed_BK.out

done

Run kfed.sh in as GRID/ASM owner. Upload /tmp/kfed_DH.out, /tmp/kfed_BK.out

! Pay attention to non-default AU size - if a non-default AU size is used the you must specify it. (see note 1485597.1 "ASM tools used by Support : KFOD, KFED, AMDU")

D4) ASMLIB information

NOTE : 869526.1 Collecting The Required Information For Support To Troubleshot ASM/ASMLIB Issues.

D5) List of your ASM devices

$> ls -al <path_to_ASM_devices>

D6) OS logs (from all nodes if this is RAC configuration)

SECTION E - Disk is reported as MISSING after a failed disk addition

If you are facing ORA-15063 after a failed disk addition, please collect the below information and raise a SR to Oracle Support

E1) alert_+ASM*.log (from all nodes if RAC)

E2) script#1 from NOTE 470211.1 How To Gather/Backup ASM Metadata In A Formatted Manner version 10.1, 10.2, 11.1 & 11.2?

E3) KFED reports

#! /bin/sh

rm /tmp/kfed_*.out

for i in `ls <your_path_to_asm_disks>`

echo $i >> /tmp/kfed_DH.out

kfed read $i >> /tmp/kfed_DH.out

echo $i >> /tmp/kfed_BK.out

kfed read $i aun=1 blkn=254 >> /tmp/kfed_BK.out

echo $i >> /tmp/kfed_PST.out

kfed read $i aun=1 blkn=2 >> /tmp/kfed_PST.out

echo $i >> /tmp/kfed_FS.out

kfed read $i blkn=1 >> /tmp/kfed_FS.out

echo $i >> /tmp/kfed_FD.out

kfed read $i aun=2 blkn=1 >> /tmp/kfed_FD.out

echo $i >> /tmp/kfed_DD.out

kfed read $i aun=2 blkn=0 >> /tmp/kfed_DD.out ##there might be more than one block needed if a large number of disks -> this might be asked later by Oracle Support

done

Run kfed.sh in as GRID/ASM owner. Upload /tmp/kfed_*.out

! Pay attention to non-default AU size - if a non-default AU size is used the you must specify it. (see note 1485597.1 "ASM tools used by Support : KFOD, KFED, AMDU")

E4) AMDU output

amdu -diskstring '<ASM_DISKSTRING>' -dump '<DISKGROUP_NAME>' -noimage

amdu -diskstring '<ASM_DISKSTRING>' -print <DISKGROUP_NAME>.F2.V0.C2 > DG.amdu

####F2.V0.C2 --> This will only extract up to 16 disks information. If there is a large number of disks, a larger output is needed

You are here

Oracle ASM ORA-15063 / ORA-15042 - TROUBLESHOOTING STEPS BEFORE OPENING a SR to Oracle Support

Oracle ASM ORA-15063 / ORA-15042 - TROUBLESHOOTING STEPS BEFORE OPENING a SR to Oracle Support