你在这里

You are here:

ora-00600 [kfcema02] cause the diskgroup can not bring up.

Posted by PDSERVICE on 3月 15, 2017 In

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: [email protected]

Encountered a disk storage destroyed yesterday.

After recovery get this disk error.

Found rac can not bring up due to the diskgroup can not mount.

SQL> ALTER DISKGROUP ALL MOUNT

...

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

alert__ASM3.log

===================

SQL> ALTER DISKGROUP ALL MOUNT

NOTE: cache registered group ASMREDO1 number=1 incarn=0x0a4827c2

NOTE: cache began mount (not first) of group ASMREDO1 number=1 incarn=0x0a4827c2

NOTE: cache registered group DG_ORA number=2 incarn=0x873827c3

NOTE: cache began mount (first) of group DG_ORA number=2 incarn=0x873827c3

WARNING::ASMLIB library not found. See trace file for details.

NOTE: Assigning number (1,0) to disk (/dev/raw/raw10)

NOTE: Assigning number (2,6) to disk (/dev/raw/raw9)

NOTE: Assigning number (2,5) to disk (/dev/raw/raw8)

NOTE: Assigning number (2,4) to disk (/dev/raw/raw7)

NOTE: Assigning number (2,3) to disk (/dev/raw/raw6)

NOTE: Assigning number (2,2) to disk (/dev/raw/raw5)

NOTE: Assigning number (2,1) to disk (/dev/raw/raw4)

NOTE: Assigning number (2,0) to disk (/dev/raw/raw3)

kfdp_query(ASMREDO1): 3

kfdp_queryBg(): 3

NOTE: cache opening disk 0 of grp 1: ASMREDO1_0000 path:/dev/raw/raw10

NOTE: F1X0 found on disk 0 fcn 0.0

NOTE: cache mounting (not first) group 1/0x0A4827C2 (ASMREDO1)

kjbdomatt send to node 0

NOTE: attached to recovery domain 1

NOTE: LGWR attempting to mount thread 2 for diskgroup 1

NOTE: LGWR mounted thread 2 for disk group 1

NOTE: opening chunk 2 at fcn 0.146305 ABA

NOTE: seq=6 blk=5782

NOTE: cache mounting group 1/0x0A4827C2 (ASMREDO1) succeeded

NOTE: cache ending mount (success) of group ASMREDO1 number=1 incarn=0x0a4827c2

NOTE: start heartbeating (grp 2)

kfdp_query(DG_ORA): 5

kfdp_queryBg(): 5

NOTE: cache opening disk 0 of grp 2: DG_ORA_0000 path:/dev/raw/raw3

NOTE: F1X0 found on disk 0 fcn 0.0

NOTE: cache opening disk 1 of grp 2: DG_ORA_0001 path:/dev/raw/raw4

NOTE: cache opening disk 2 of grp 2: DG_ORA_0002 path:/dev/raw/raw5

NOTE: cache opening disk 3 of grp 2: DG_ORA_0003 path:/dev/raw/raw6

NOTE: cache opening disk 4 of grp 2: DG_ORA_0004 path:/dev/raw/raw7

NOTE: cache opening disk 5 of grp 2: DG_ORA_0005 path:/dev/raw/raw8

NOTE: cache opening disk 6 of grp 2: DG_ORA_0006 path:/dev/raw/raw9

NOTE: cache mounting (first) group 2/0x873827C3 (DG_ORA)

* allocate domain 2, invalid = TRUE

kjbdomatt send to node 0

NOTE: attached to recovery domain 2

NOTE: starting recovery of thread=1 ckpt=348.1542 group=2

NOTE: starting recovery of thread=2 ckpt=189.5027 group=2

NOTE: starting recovery of thread=3 ckpt=182.5380 group=2

Errors in file /opt/oracle/db/diag/asm/+asm/+ASM3/trace/+ASM3_ora_13438.trc (incident=5754):

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Incident details in: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

Trace dumping is performing id=[cdmp_20120917220327]

Abort recovery for domain 2

NOTE: crash recovery signalled OER-600

ERROR: ORA-600 signalled during mount of diskgroup DG_ORA

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

ERROR: ALTER DISKGROUP ALL MOUNT

NOTE: cache dismounting group 2/0x873827C3 (DG_ORA)

NOTE: lgwr not being msg'd to dismount

kjbdomdet send to node 0

detach from dom 2, sending detach message to node 0

Please provide the following:

-- AMDU output

Placeholder for AMDU binaries and using with ASM 10g (Doc ID 553639.1)

-- Kfed read output of all the disks that are part of the diskgroup you are unable to mount.

-Let us use the kfed to read the device

Building and using the kfed utility

------------------------------------------------

* For releases 10.2.0.X and up execute:

1) Change to the rdbms/lib directory:

% cd $ORACLE_HOME/rdbms/lib

2) Generate the executable:

10.2.0.XX:

% make -f ins_rdbms.mk ikfed

Using kfed:

Reading a file:

kfed read

example:

% kfed read /dev/rdsk/emcpower10a

-Please run the kfed read on the disks and provide me with the output

/dev/raw/raw1: bound to major 8, minor 16

/dev/raw/raw2: bound to major 8, minor 32

/dev/raw/raw3: bound to major 8, minor 48

/dev/raw/raw4: bound to major 8, minor 64

/dev/raw/raw5: bound to major 8, minor 80

/dev/raw/raw6: bound to major 8, minor 96

/dev/raw/raw7: bound to major 8, minor 112

/dev/raw/raw8: bound to major 8, minor 128

/dev/raw/raw9: bound to major 8, minor 144

/dev/raw/raw10: bound to major 8, minor 160

<<< from the above disks -- do they all belong to the diskgroup?

kfed read /dev/raw/raw4

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483649 ; 0x008: TYPE=0x8 NUMB=0x1

kfbh.check: 2061250939 ; 0x00c: 0x7adc317b

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 1 ; 0x024: 0x0001

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: DG_ORA_0001 ; 0x028: length=11

kfdhdb.grpname: DG_ORA ; 0x048: length=6

kfdhdb.fgname: DG_ORA_0001 ; 0x068: length=11

kfdhdb.capname: ; 0x088: length=0

kfed read /dev/raw/raw5

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483650 ; 0x008: TYPE=0x8 NUMB=0x2

kfbh.check: 2061327740 ; 0x00c: 0x7add5d7c

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 2 ; 0x024: 0x0002

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: DG_ORA_0002 ; 0x028: length=11

kfdhdb.grpname: DG_ORA ; 0x048: length=6

kfdhdb.fgname: DG_ORA_0002 ; 0x068: length=11

kfdhdb.capname: ; 0x088: length=0

kfed read /dev/raw/raw6

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483651 ; 0x008: TYPE=0x8 NUMB=0x3

kfbh.check: 2061320572 ; 0x00c: 0x7add417c

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 3 ; 0x024: 0x0003

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: DG_ORA_0003 ; 0x028: length=11

kfdhdb.grpname: DG_ORA ; 0x048: length=6

kfdhdb.fgname: DG_ORA_0003 ; 0x068: length=11

kfdhdb.capname: ; 0x088: length=0

kfed read /dev/raw/raw7

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483652 ; 0x008: TYPE=0x8 NUMB=0x4

kfbh.check: 2061327740 ; 0x00c: 0x7add5d7c

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 4 ; 0x024: 0x0004

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: DG_ORA_0004 ; 0x028: length=11

kfdhdb.grpname: DG_ORA ; 0x048: length=6

kfdhdb.fgname: DG_ORA_0004 ; 0x068: length=11

kfdhdb.capname: ; 0x088: length=0

kfed read /dev/raw/raw8

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483653 ; 0x008: TYPE=0x8 NUMB=0x5

kfbh.check: 2061320572 ; 0x00c: 0x7add417c

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 5 ; 0x024: 0x0005

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: DG_ORA_0005 ; 0x028: length=11

kfdhdb.grpname: DG_ORA ; 0x048: length=6

kfdhdb.fgname: DG_ORA_0005 ; 0x068: length=11

kfdhdb.capname: ; 0x088: length=0

kfed read /dev/raw/raw9

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483654 ; 0x008: TYPE=0x8 NUMB=0x6

kfbh.check: 2059439481 ; 0x00c: 0x7ac08d79

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 6 ; 0x024: 0x0006

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: DG_ORA_0006 ; 0x028: length=11

kfdhdb.grpname: DG_ORA ; 0x048: length=6

kfdhdb.fgname: DG_ORA_0006 ; 0x068: length=11

kfdhdb.capname: ; 0x088: length=0

kfed read /dev/raw/raw10

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8 NUMB=0x0

kfbh.check: 4131885754 ; 0x00c: 0xf64792ba

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8

kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ; 0x020: 0x0a100000

kfdhdb.dsknum: 0 ; 0x024: 0x0000

kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

kfdhdb.dskname: ASMREDO1_0000 ; 0x028: length=13

kfdhdb.grpname: ASMREDO1 ; 0x048: length=8

kfdhdb.fgname: ASMREDO1_0000 ; 0x068: length=13

kfdhdb.capname: ; 0x088: length=0

I have the kfed read outputs only from

/dev/raw/raw4

/dev/raw/raw5

/dev/raw/raw6

/dev/raw/raw7

/dev/raw/raw8

/dev/raw/raw9

/dev/raw/raw10

From the AMDU output -- we see :

----------------------------- DISK REPORT N0009 ------------------------------

Disk Path: /dev/raw/raw2

Unique Disk ID:

Disk Label:

Physical Sector Size: 512 bytes

Disk Size: 2048 megabytes

** NOT A VALID ASM DISK HEADER. BAD VALUE IN FIELD blksize_kfdhdb **

----------------------------- DISK REPORT N0010 ------------------------------

Disk Path: /dev/raw/raw1

Unique Disk ID:

Disk Label:

Physical Sector Size: 512 bytes

Disk Size: 2048 megabytes

** NOT A VALID ASM DISK HEADER. BAD VALUE IN FIELD blksize_kfdhdb **

Do the above 2 disks belong to the diskgroup that you are trying to mount?

we encounter a disk storage destroyed yesterday.

after we recover this disk error. We found our rac can not bring up due to the diskgroup can not mount.

error ora-00600 shows in the alert log file.

kjbdomatt send to node 0

NOTE: attached to recovery domain 2

NOTE: starting recovery of thread=1 ckpt=348.1542 group=2

NOTE: starting recovery of thread=2 ckpt=189.5027 group=2

NOTE: starting recovery of thread=3 ckpt=182.5380 group=2

Errors in file /opt/oracle/db/diag/asm/+asm/+ASM3/trace/+ASM3_ora_13438.trc (incident=5754):

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Incident details in: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

Trace dumping is performing id=[cdmp_20120917220327]

Abort recovery for domain 2

NOTE: crash recovery signalled OER-600

ERROR: ORA-600 signalled during mount of diskgroup DG_ORA

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Could you please share generated trace and incident files with us,

cd /opt/oracle/db/diag/asm/+asm/+ASM3/trace

grep '2012-09-22 22' *trc | awk -F: '{print $1}' | uniq

and incident file named as below ,

/opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

What the exact process you do to recover disk error,please let us know.

Research ::---------

==========

kfdp_query(ASMREDO1): 3

----- Abridged Call Stack Trace -----

<-ksedsts()+315<-kfdp_query()+337<-kfdPstSyncPriv()+589<-kfgFinalizeMount()+1629<-kfgscFinalize()+1051<-kfgForEachKfgsc()+194<-kfgsoFinalize()+135<-kfgFinalize()+388<-kfxdrvMount()+3712<-kfxdrvEntry()+1707<-opiexe()+21338<-opiosq0()+6520<-kpooprx()+353<-kpoal8()+922

*** 2012-09-17 22:03:22.816

<-opiodr()+2554<-ttcpip()+1058<-opitsk()+1449<-opiino()+1026<-opiodr()+2554<-opidrv()+580<-sou2o()+90<-opimai_real()+145<-ssthrdmain()+177<-main()+215<-__libc_start_main()+244<-_start()+41----- End of Abridged Call Stack Trace -----

*** 2012-09-17 22:03:26.954

kfdp_query(DG_ORA): 5

----- Abridged Call Stack Trace -----

2012-09-17 22:03:27.250989 : Start recovery for domain=2, valid=0, flags=0x4

NOTE: starting recovery of thread=1 ckpt=348.1542 group=2

NOTE: starting recovery of thread=2 ckpt=189.5027 group=2

NOTE: starting recovery of thread=3 ckpt=182.5380 group=2

WARNING:io_submit failed due to kernel limitations MAXAIO for process=128 pending aio=128

WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=6272

WARNING:Oracle process running out of OS kernel I/O resources

Incident 5754 created, dump file: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Abort recovery for domain 2, flags = 0x4

kjb_abort_recovery: abort recovery for domain 2 @ inc 4

kjb_abort_recovery: domain flags=0x0, valid=0

kfdp_dismount(): 6

----- Abridged Call Stack Trace -----

File_name :: +ASM3_ora_13438.trc

Could you please share already requested incident file with us ,

/opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

I am looking for below file ,please share .

/opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

Research ::---------

==========

NOTE: start heartbeating (grp 2)

kfdp_query(DG_ORA): 5

kfdp_queryBg(): 5

NOTE: cache opening disk 0 of grp 2: DG_ORA_0000 path:/dev/raw/raw3

NOTE: F1X0 found on disk 0 fcn 0.0

NOTE: cache opening disk 1 of grp 2: DG_ORA_0001 path:/dev/raw/raw4

NOTE: cache opening disk 2 of grp 2: DG_ORA_0002 path:/dev/raw/raw5

NOTE: cache opening disk 3 of grp 2: DG_ORA_0003 path:/dev/raw/raw6

NOTE: cache opening disk 4 of grp 2: DG_ORA_0004 path:/dev/raw/raw7

NOTE: cache opening disk 5 of grp 2: DG_ORA_0005 path:/dev/raw/raw8

NOTE: cache opening disk 6 of grp 2: DG_ORA_0006 path:/dev/raw/raw9

NOTE: cache mounting (first) group 2/0x873827C3 (DG_ORA)

* allocate domain 2, invalid = TRUE

kjbdomatt send to node 0

NOTE: attached to recovery domain 2

NOTE: starting recovery of thread=1 ckpt=348.1542 group=2

NOTE: starting recovery of thread=2 ckpt=189.5027 group=2

NOTE: starting recovery of thread=3 ckpt=182.5380 group=2

Errors in file /opt/oracle/db/diag/asm/+asm/+ASM3/trace/+ASM3_ora_13438.trc (incident=5754):

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Incident details in: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

Trace dumping is performing id=[cdmp_20120917220327]

Abort recovery for domain 2

NOTE: crash recovery signalled OER-600

ERROR: ORA-600 signalled during mount of diskgroup DG_ORA

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

ERROR: ALTER DISKGROUP ALL MOUNT

kfdp_query(ASMREDO1): 3

----- Abridged Call Stack Trace -----

*** 2012-09-17 22:03:22.816

*** 2012-09-17 22:03:26.954

kfdp_query(DG_ORA): 5

----- Abridged Call Stack Trace -----

2012-09-17 22:03:27.250989 : Start recovery for domain=2, valid=0, flags=0x4

NOTE: starting recovery of thread=1 ckpt=348.1542 group=2

NOTE: starting recovery of thread=2 ckpt=189.5027 group=2

NOTE: starting recovery of thread=3 ckpt=182.5380 group=2

WARNING:io_submit failed due to kernel limitations MAXAIO for process=128 pending aio=128

WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=6272

WARNING:Oracle process running out of OS kernel I/O resources

Incident 5754 created, dump file: /opt/oracle/db/diag/asm/+asm/+ASM3/incident/incdir_5754/+ASM3_ora_13438_i5754.trc

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Abort recovery for domain 2, flags = 0x4

kjb_abort_recovery: abort recovery for domain 2 inc 4

kjb_abort_recovery: domain flags=0x0, valid=0

kfdp_dismount(): 6

----- Abridged Call Stack Trace -----

File_name :: +ASM3_ora_13438.trc

========= Dump for incident 5754 (ORA 600 [kfcema02]) ========

----- Beginning of Customized Incident Dump(s) -----

CE: (0x0x617be648) group=2 (DG_ORA) obj=4 (disk) blk=2115

hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1

flags_kfcpba=0x49 copies=1 blockIndex=67 AUindex=0 AUcount=1

copy #0: disk=4 au=910336

BH: (0x0x6178e798) bnum=10 type=ALLOCTBL state=rcv chgSt=not modifying

flags=0x00000000 pinmode=excl lockmode=null bf=0x0x61409000

kfbh_kfcbh.fcn_kfbh = 0.165046340 lowAba=0.0 highAba=0.0

last kfcbInitSlot return code=null cpkt lnk is null ralFlags=0x00000000

-------------------------------------------------------------------------------

----- Invocation Context Dump -----

Address: 0x2b5faa6e8498

Phase: 3

flags: 0x18E0001

Incident ID: 5754

Error Descriptor: ORA-600 [kfcema02] [0] [165057275] [] [] [] [] []

Error class: 0

Problem Key # of args: 1

Number of actions: 8

----- Incident Context Dump -----

Address: 0x7fff6c8a42b8

Incident ID: 5754

Problem Key: ORA 600 [kfcema02]

Error: ORA-600 [kfcema02] [0] [165057275] [] [] [] [] []

[00]: dbgexExplicitEndInc [diag_dde]

[01]: dbgeEndDDEInvocationImpl [diag_dde]

[02]: dbgeEndDDEInvocation [diag_dde]

[03]: kfcema [ASM]<-- Signaling

[04]: kfrPass2 [ASM]

[05]: kfrcrv [ASM]

[06]: kfcMountPriv [ASM]

[07]: kfcMount [ASM]

[08]: kfgInitCache [ASM]

[09]: kfgFinalizeMount [ASM]

[10]: kfgscFinalize [ASM]

[11]: kfgForEachKfgsc [ASM]

[12]: kfgsoFinalize [ASM]

[13]: kfgFinalize [ASM]

[14]: kfxdrvMount [ASM]

[15]: kfxdrvEntry [ASM]

[16]: opiexe []

[17]: opiosq0 []

[18]: kpooprx []

[19]: kpoal8 []

[20]: opiodr []

[21]: ttcpip []

[22]: opitsk []

[23]: opiino []

[24]: opiodr []

[25]: opidrv []

[26]: sou2o []

[27]: opimai_real []

[28]: ssthrdmain []

[29]: main []

[30]: __libc_start_main []

[31]: _start []

MD [00]: 'SID'='115.3' (0x3)

MD [01]: 'ProcId'='19.1' (0x3)

MD [02]: 'PQ'='(50331648, 1347894201)' (0x7)

MD [03]: 'Client ProcId'='oraclemos5200db3 (TNS V1-V3).13438_47689880133216' (0x0)

Impact 0:

Impact 1:

Impact 2:

Impact 3:

Derived Impact:

File_name :: +ASM3_ora_13438_i5754.trc

1. Execute

kfed read /dev/raw/raw7 aunum=910336 blknum=2115 text=/tmp/kfed_raw7_910336_2115.txt

kfed read /dev/raw/raw7 text=/tmp/kfed_raw7.txt

2. get the 'File 1 Block 1' for the diskgroup following:

a. for each disk in the diskgroup execute:

kfed read <DSK> read | grep f1b1

3. you may get non-zero values for 'kfdhdb.f1b1locn' like:

kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002

4. for that disk execute (Replace <AUNUM> using the one from previous step):

kfed read <DSK> aunum=<AUNUM> text=kfed_<DSK>_<AUNUM>.txt

kfed read <DSK> text=kfed_<DSK>_w_f1b1.txt

5. Set the below event in asm pfile and try to mount the diskgroup DG_ORA manually at asm level,

Reproduce the problem setting on the instance:

event = "15199 trace name context forever, level 0x8007"

Then start the asm instnace using that pfile

startup nomount pfile=<pfile name>;

Then try to mount each diskgroup manually one-by-one including DG_ORA,

sql> . alter diskgroup <diskgroup_name> mount;

and collect the traces from bdump/udump. The event will dump the redo until

we get the error.

6. for each disk in the diskgroup get a backup for the first 50Mb (Replace the <disk_name>):

dd if=<disk_name> of=/tmp/<disk_name>.dd

later compress those files and upload them to the bug.

7. At the end please upload:

a. Complete alert for all the ASM instances

b. traces produced when event was set

c. metadata dumps (files /tmp/kfed*)

d. OS logs /var/adm/messages* for each node which contains latest timestamp of those mount.

e. dd dumps

----------------------------- DISK REPORT N0008 ------------------------------

Disk Path: /dev/raw/raw3

Unique Disk ID:

Disk Label:

Physical Sector Size: 512 bytes

Disk Size: 1047552 megabytes

Group Name: DG_ORA

Disk Name: DG_ORA_0000

Failure Group Name: DG_ORA_0000

Disk Number: 0

Header Status: 3

Disk Creation Time: 2012/03/01 15:31:59.955000

Last Mount Time: 2012/04/07 15:40:22.454000

Compatibility Version: 0x0a100000(10010000)

Disk Sector Size: 512 bytes

Disk size in AUs: 1047552 AUs

Group Redundancy: 1

Metadata Block Size: 4096 bytes

AU Size: 1048576 bytes

Stride: 113792 AUs

Group Creation Time: 2012/03/01 15:31:59.829000

File 1 Block 1 location: AU 2 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

File_name :: report.txt

It is available for diskgroup DG_ORA on disk /dev/raw/raw3.

Please execute that command on same device.

Yes,we need only for disk raw3 as this issue of mount is related to dg_ora diskgroup and there should be atleast one location in a diskgroup,so you can see 2 from 2 different diskgroup.

Hence ,for this step 2-4 do the mentioned action plan then rest of them.

2. get the 'File 1 Block 1' for the diskgroup following:

a. for each disk in the diskgroup execute:

kfed read /dev/raw/raw3 read | grep f1b1

3. you may get non-zero values for 'kfdhdb.f1b1locn' like:

kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002

4. for that disk execute (Replace <AUNUM> using the one from previous step):

kfed read /dev/raw/raw3 aunum=2 text=kfed_raw3_2.txt

kfed read /dev/raw/raw3 text=kfed_raw3_w_f1b1.txt

It seems file asmlog.part01.rar is broken ,could you please share the same again with us .

1. Clarification of current patch level status:

=> from uploaded OPatch lsinventory output:

Oracle Home : /opt/oracle/db/product/11g/db_1

Installed Top-level Products (2):

Oracle Database 11g 11.1.0.6.0

Oracle Database 11g Patch Set 1 11.1.0.7.0

Interim patches (3) :

Patch 9549042 : applied on Thu Mar 01 09:25:24 WIT 2012

Patch 7272646 : applied on Thu Mar 01 08:48:05 WIT 2012

Patch 12419384 : applied on Thu Mar 01 08:42:47 WIT 2012

=> DATABASE PSU 11.1.0.7.8 (INCLUDES CPUJUL2011)

=> Note:

1) The prior mentioned bug 6163771 is already fixed in patchset 11.1.0.7

@@ PATCHSET REQUEST #70719 CREATED IN BUG 6712856 FOR FIX IN 11.1.0.7.0

2) I cannot find any informations inthis SR if the ASM and DB are common. Need to clarify.

2. From +ASM3_ora_13438_i5754.trc:

*** ACTION NAME:() 2012-09-17 22:03:27.432

Dump continued from file: /opt/oracle/db/diag/asm/+asm/+ASM3/trace/+ASM3_ora_13438.trc

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

========= Dump for incident 5754 (ORA 600 [kfcema02]) ========

----- Beginning of Customized Incident Dump(s) -----

CE: (0x0x617be648) group=2 (DG_ORA) obj=4 (disk) blk=2115

hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1

flags_kfcpba=0x49 copies=1 blockIndex=67 AUindex=0 AUcount=1

copy #0: disk=4 au=910336

BH: (0x0x6178e798) bnum=10 type=ALLOCTBL state=rcv chgSt=not modifying

flags=0x00000000 pinmode=excl lockmode=null bf=0x0x61409000

kfbh_kfcbh.fcn_kfbh = 0.165046340 lowAba=0.0 highAba=0.0

last kfcbInitSlot return code=null cpkt lnk is null ralFlags=0x00000000

...

*** 2012-09-17 22:03:27.553

----- Current SQL Statement for this session (sql_id=2pa6sbf4762ga) -----

ALTER DISKGROUP ALL MOUNT

----- Call Stack Trace -----

Function List:

skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp

<- PGOSF52_ksfdmp <- dbgexPhaseII <- dbgexExplicitEndInc <- dbgeEndDDEInvocatio <- nImpl

<- dbgeEndDDEInvocatio <- kfcema <- kfrPass2 <- kfrcrv <- kfcMountPriv

<- kfcMount <- kfgInitCache <- kfgFinalizeMount <- 2241 <- kfgscFinalize

<- kfgForEachKfgsc <- kfgsoFinalize <- kfgFinalize <- kfxdrvMount <- kfxdrvEntry

<- opiexe <- opiosq0 <- kpooprx <- kpoal8 <- opiodr

<- ttcpip <- opitsk <- opiino <- opiodr <- opidrv

<- sou2o <- opimai_real <- ssthrdmain <- main <- libc_start_main

@@ Bug 13407102: ORA-600 [KFCEMA02] AND ORA-600 [KFCMOUNT15] HAPPENED ON ASM INSTANCE

1. The prior engineer sent out an action plan to you with the target to patch the diskgroup.

Could we ask you for the results from this action plan please ? Was you able to mount the diskgroup after that plan ?

+++ for Bug 13407102: ORA-600 [KFCEMA02] AND ORA-600 [KFCMOUNT15] HAPPENED ON ASM INSTANCE, there is no patch at all. also it happens on 11gR2

2. If the problem is still remaining verify that the affected diskgroup is dismounted on all nodes/asm instances.

After that please try to mount it on ASM instance 1 only (manually in SQLPLUS).

What is the result ? Do you still get the same ORA-600 error as before ?

Please re-upload the most current alertfile from ASM instance 1 together with the tracefiles which will be written.

++ no patch can be applied, do you still want us to do this?

3. If the internal error is still remaining and the patching of the diskgroup failed then we have to rebuild the diskgroup.

Do you have a full backup of the data within the affected diskgroup ? Please clarify.

+++ sorry, no backup

Unfortunately this is a misunderstanding.

The action plan which was provided to you by the prior engineer was to patch the bad blocks in the affected disks.

- not to apply any patch. Even if we could apply a patch it would be possible that this patch does only avoid new

occurances - but probably it will not repair the current situation (if the diskgroup is corrupted).

Please note that in case that we cannot repair the diskgroup you will have to rebuild the diskgroup and then

to restore and recover the lost data. Accordingly you should have at least some kind of worste case backup ?

Your issue was transferred to me. My name is Pallavi and I will be helping you with your issue. I am currently reviewing/researching the situation and will update the SR / call you as soon as I have additional information. Thank you for your patience.

We can try to patch the diskgroup. If this doesn't work, you will have to recreate the diskgroup and restore data from a valid backup.

!!!!!! VERY IMPORTANT: Be sure you have a valid backup of data pertaining to ora_data diskgroup. !!!!!!

-----------------------------------------------------------------

You need to create kfed and amdu for further use.

1) kfed is a tool that allows to read/write the ASM metadata. To create kfed, connect as the user owner of the oracle software and execute:

$cd $ORACLE_ASMHOME/rdbms/lib

$make -f ins_rdbms.mk ikfed

2)AMDU was released with 11g, and is a tool used to get the location of the ASM metadata across the disks.

As many other tools released with 11g, it can be used on 10g environments. Note 553639.1 is the placeholder for the different platforms. The note include also instructions for the configuration.

* Transfer amdu and facp to a working directory and include it on LD_LIBRARY_PATH, PATH and other relevant variables.

There is no guarantee that the patching would work. It all depends on the status of the disk that we are trying to patch. We will only know what the status is when we try.

As the ASM software owner, execute facp:

$ ./facp 'diskstring' 'DISKGROUP NAME' ALL

eg:

$./facp '/dev/vg00/rraw*' 'DATAHP' ALL

Run this only ONCE -- and then please update the sr with all the files it has generated.

Did you execute facp command as requested ,if not please do the same and share related generated files with us ,

As the ASM software owner, execute facp:

$ ./facp 'diskstring' '<DISKGROUP NAME>' ALL

$ ./facp '/dev/raw/raw*' 'DG_ORA' ALL

Then share related files named as below ,

facp_report

facp_dump_1

facp_dump_2

facp_dump_3

facp_restore

facp_patch_1 (one per node that uses the dg)

facp_adjust

facp_check

facp_patch

Note:: Run this only ONCE

We are waiting for the same.

Execute the below command and share generated logfile with us,

script /tmp/facp.log

# Run the following to lower all checkpoints by 10 blocks:

$ ./facp_adjust -10

# Then run facp_check.

$ ./facp_check

exit

Share the file named as /tmp/facp.log

Try to adjust to some lower value than 10 using below command,

./facp_adjust -<integer>

Then ,validate ,

$ ./facp_check

If facp_check reports "Valid Checkpoint" for all threads, it's the indication

to proceed with the real patching, which means, updating the ACD records

on the disks with the records from files fac_patch_*.

To continue with this step, facp_check should have returned "Valid Checkpoint" for all threads.

Then execute the below command to patch the ACDC ,

./facp_patch

Then try to mount this diskgroup manually ,

SQL> alter diskgroup dg_ora mount;

if again mount fail with same error then go back up to facp_adjust step, using a new argument for facp_adjust and continue until diskgroup is mounted.

Instruction: Following note : How to fix error ORA-600 [KFCEMA02] (Doc ID 728884.1) As per above note,Ct tried to patch acd ,but still not able to mount the diskgroup, oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_adjust -9 3 patch files written oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_check --- Executing amdu to validate checkpoint target blocks --- Thread 1 (348,1533): Valid Checkpoint Thread 2 (189,5018): Valid Checkpoint Thread 3 (182,5371): Valid Checkpoint oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_patch --- Executing amdu to check for heartbeat --- Patching Thread 1 kfracdc.ckpt.seq: 348 kfracdc.ckpt.blk: 1533 Patching Thread 2 kfracdc.ckpt.seq: 189 kfracdc.ckpt.blk: 5018 Patching Thread 3 kfracdc.ckpt.seq: 182 kfracdc.ckpt.blk: 5371 Save files ./facp_* to document what was patched oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> export ORACLE_SID= ASM1 Refer to the SQL*Plus User's Guide and Reference for more information. oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> sqlplus / as sysdba SQL*Plus: Release 11.1.0.7.0 - Production on Wed Sep 19 15:12:37 2012 Copyright (c) 1982, 2008, Oracle. All rights reserved. Connected to an idle instance. SQL> startup nomount; ASM instance started Total System Global Area 283930624 bytes Fixed Size 2158992 bytes Variable Size 256605808 bytes ASM Cache 25165824 bytes SQL> alter diskgroup DG_ORA mount; alter diskgroup DG_ORA mount * ERROR at line 1: ORA-00600: internal error code, arguments: [kfcema02], [0], [165054516], [], [], [], [], [], [], [], [], [] SQL> host oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> exit Seems Ct needs to recreate this diskgroup and restore data from backup. If they does not have backup. Then ,Need to log a bug to involve development team further.

Activity Instruction

Created: 18-Sep-2012 03:28:29 PM GMT+00:00 Instruction Type: Severity 1 : End of Shift Note

Instruction: Currently we try to find out if the diskgroup can be patched/repaired. Aritra Kundu has sent out an action plan therefor. We are still waiting for related customer feedback. If the diskgroup cannot be repaired we have to rebulid it.

Activity Instruction

Created: 18-Sep-2012 08:08:23 AM GMT+00:00 Instruction Type: Severity 1 : End of Shift Note

Instruction: Seems Ct is on PSU 8 and related known defect is already RFIed into 11.1.0.7 BUG 6712856 - RFI BACKPORT OF BUG 6163771 FOR INCLUSION IN 11.1.0.7.0 Waiting for Ct to share requested information,after that needs to raise a defect with development team and page BDE immediately to involve them.

2. TECHNICAL & BUSINESS IMPACT

Probably diskgroup corruption.

If we try to mount the affected diskgroup then we fail during the dg recovery with an internal error:

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Currently we try to find out if the diskgroup can be patched/repaired.

Aritra Kundu has sent out an action plan therefor. We are still waiting for related customer feedback.

ove note,Ct tried to patch acd ,but still not able to mount the diskgroup,

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_adjust -9

3 patch files written

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_check

--- Executing amdu to validate checkpoint target blocks ---

Thread 1 (348,1533): Valid Checkpoint

Thread 2 (189,5018): Valid Checkpoint

Thread 3 (182,5371): Valid Checkpoint

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_patch

--- Executing amdu to check for heartbeat ---

Patching Thread 1

kfracdc.ckpt.seq: 348

kfracdc.ckpt.blk: 1533

Patching Thread 2

kfracdc.ckpt.seq: 189

kfracdc.ckpt.blk: 5018

Patching Thread 3

kfracdc.ckpt.seq: 182

kfracdc.ckpt.blk: 5371

Save files ./facp_* to document what was patched

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> export ORACLE_SID=+ASM1

Refer to the SQL*Plus User's Guide and Reference for more information.

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> sqlplus / as sysdba

SQL*Plus: Release 11.1.0.7.0 - Production on Wed Sep 19 15:12:37 2012

Connected to an idle instance.

SQL> startup nomount;

ASM instance started

Total System Global Area 283930624 bytes

Fixed Size 2158992 bytes

Variable Size 256605808 bytes

ASM Cache 25165824 bytes

SQL> alter diskgroup DG_ORA mount;

alter diskgroup DG_ORA mount

ERROR at line 1:

ORA-00600: internal error code, arguments: [kfcema02], [0], [165054516], [],

[], [], [], [], [], [], [], []

SQL> host

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> exit

Seems Ct needs to recreate this diskgroup and restore data from backup.

If they does not have backup.

Then ,Need to log a bug to involve development team further.

again the latest uploaded file 'facplog' is not readable on our side - it cannot be de-compressed.

Please make sure that the uploaded compressed files are readable/can be de-compressed again -

still before uploading them. In that way we all can save time...

To get the current correct status of the patching action please upload the next informations:

1. Re-upload file ''facplog'.

2. Upload the most current asm alertfile from all instances.

3. Upload that tracefile which was written during the last happened ORA-600 [kfcema02]

1. from asm alertfile (inst.1):

=> latest Occurance:

Wed Sep 19 15:33:40 2012

SQL> alter diskgroup DG_ORA mount

...

NOTE: cache opening disk 0 of grp 2: DG_ORA_0000 path:/dev/raw/raw3

NOTE: F1X0 found on disk 0 fcn 0.0

NOTE: cache opening disk 1 of grp 2: DG_ORA_0001 path:/dev/raw/raw4

NOTE: cache opening disk 2 of grp 2: DG_ORA_0002 path:/dev/raw/raw5

NOTE: cache opening disk 3 of grp 2: DG_ORA_0003 path:/dev/raw/raw6

NOTE: cache opening disk 4 of grp 2: DG_ORA_0004 path:/dev/raw/raw7

NOTE: cache opening disk 5 of grp 2: DG_ORA_0005 path:/dev/raw/raw8

NOTE: cache opening disk 6 of grp 2: DG_ORA_0006 path:/dev/raw/raw9

NOTE: cache mounting (first) group 2/0x95BC2DFD (DG_ORA)

...

Wed Sep 19 15:33:45 2012

NOTE: attached to recovery domain 2

NOTE: starting recovery of thread=1 ckpt=348.1542 group=2

NOTE: starting recovery for thread 1 at

NOTE: seq=348 blk=1542

NOTE: starting recovery of thread=2 ckpt=189.5027 group=2

NOTE: starting recovery for thread 2 at

NOTE: seq=189 blk=5027

NOTE: starting recovery of thread=3 ckpt=182.5380 group=2

NOTE: starting recovery for thread 3 at

NOTE: seq=182 blk=5380

Errors in file /opt/oracle/db/diag/asm/+asm/+ASM1/trace/+ASM1_ora_2519.trc (incident=9775):

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

Abort recovery for domain 2

NOTE: crash recovery signalled OER-600

ERROR: ORA-600 signalled during mount of diskgroup DG_ORA

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [], [], [], [], [], [], [], [], []

ERROR: alter diskgroup DG_ORA mount

...

2. from +ASM1_ora_2519.trc:

2012-09-19 15:25:25.156129 : Start recovery for domain=2, valid=0, flags=0x4

NOTE: starting recovery of thread=1 ckpt=348.1537 group=2

NOTE: starting recovery of thread=2 ckpt=189.5022 group=2

NOTE: starting recovery of thread=3 ckpt=182.5375 group=2

...

*** 2012-09-19 15:25:25.172

kfrHtAdd: obj=0x1 blk=0x6e6 op=133 fcn:0.165051322 -> 0.165051323

kfrHtAdd: bcd: obj=1 blk=1766 from:0.165051322 to:0.165051323

...

=> revovery is running...

...

*** 2012-09-19 15:25:25.206

kfrHtAdd: obj=0x6e9 blk=0x80000000 op=161 fcn:0.165057973 -> 0.165057974

*** 2012-09-19 15:25:25.206

kfrHtAdd: obj=0x1 blk=0x6e9 op=133 fcn:0.165057974 -> 0.165057975

*** 2012-09-19 15:25:25.206

kfrHtAdd: obj=0x80000006 blk=0x60f op=65 fcn:0.165057967 -> 0.165057975

*** 2012-09-19 15:25:25.206

kfrHtAdd: obj=0x6e9 blk=0x80000000 op=161 fcn:0.165057974 -> 0.165057975

WARNING:io_submit failed due to kernel limitations MAXAIO for process=128 pending aio=128

WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=6400

WARNING:Oracle process running out of OS kernel I/O resources

*** 2012-09-19 15:25:25.212

kfrRcvSetRem: obj=0x1 blk=0x6e7 [set] = 284

block needed no recovery:

CE: (0x0x617be2b0) group=2 (DG_ORA) obj=1 blk=1767

hashFlags=0x0000 lid=0x0002 lruFlags=0x0000 bastCount=1

flags_kfcpba=0x18 copies=1 blockIndex=231 AUindex=0 AUcount=0

copy #0: disk=3 au=762686

BH: (0x0x6178e360) bnum=5 type=FILEDIR state=rcv chgSt=not modifying

flags=0x00000000 pinmode=excl lockmode=null bf=0x0x61404000

kfbh_kfcbh.fcn_kfbh = 0.165054713 lowAba=0.0 highAba=0.0

last kfcbInitSlot return code=null cpkt lnk is null ralFlags=0x00000000

...

=> from here is seems that the recovery was interrupted due to an I/O kernel limitation:

WARNING:io_submit failed due to kernel limitations MAXAIO for process=128 pending aio=128

WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=6400

WARNING:Oracle process running out of OS kernel I/O resources

=== Follow up ===

3. From the block patching actions:

=> regarding to the patched blocks we are here:

SQL> host

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_adjust -0

3 patch files written

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_check

--- Executing amdu to validate checkpoint target blocks ---

Thread 1 (348,1542): WRONG SEQ NUMBER

Thread 2 (189,5027): WRONG SEQ NUMBER

Thread 3 (182,5380): Valid Checkpoint

DO NOT PATCH WITH THE CURRENT PATCH FILES

oracle@mos5200db1:/opt/oracle/db/product/11g/db_1/bin> ./facp_patch

--- Executing amdu to check for heartbeat ---

Patching Thread 1

kfracdc.ckpt.seq: 348

kfracdc.ckpt.blk: 1542

Patching Thread 2

kfracdc.ckpt.seq: 189

kfracdc.ckpt.blk: 5027

Patching Thread 3

kfracdc.ckpt.seq: 182

kfracdc.ckpt.blk: 5380

Save files ./facp_* to document what was patched

=> not sure why '/facp_adjust' command was used with zero (-0) ?

SQL> alter diskgroup DG_ORA mount;

alter diskgroup DG_ORA mount

ERROR at line 1:

ORA-00600: internal error code, arguments: [kfcema02], [0], [165057275], [],[], [], [], [], [], [], [], []

4. Patch level status of the instance:

Oracle Database 11g Patch Set 1 11.1.0.7.0

There are 2 products installed in this Oracle Home.

Interim patches (3) :

Patch 9549042 : applied on Thu Mar 01 09:25:24 WIT 2012

Patch 7272646 : applied on Thu Mar 01 08:48:05 WIT 2012

Patch 12419384 : applied on Thu Mar 01 08:42:47 WIT 2012

=> PSU 11.1.0.7.8

=> so we are on 11.1.0.7.8 here

please see our latest analysis below.

Currently I can see two problems when we are trying to mount the corrupted diskgroup.

We get the known ORA-600 but also an error about an I/O kernel limitation during the block recovery.

I would like to avoid the I/O kernel limitation error during the recovery. Maybe after that the recovery can

complete and resolve the situation - instead to patch blocks manually.

We know about the next bugs in companion with I/O kernel limitation errors (from note 868590.1):

"...

For 11gR1

The fix for unpublished Bug 6687381 is included in patch set 11.1.0.7

The fix for Bug 7523755 is available as overlay patch on Patch Set Update 11.1.0.7.10 ,

... apply patch set 11.1.0.7 and Patch 13343461 on top of that., then Apply fix for Bug 7523755...

Accordingly I would suggest to follow the next actions now:

Since you are currently on 11.1.0.7.8 you would need to apply at first PSU 11.1.0.7.10. in all Oracle_Homes (ASM & DB).

Afterwards apply the fix for Bug 7523755.

Finally, after the patches are applied, restart the instance and try to mount the diskgroup again.

Verify if the block recovery can be completed now or if we are still failing with the same ORA-600.

At least the I/O errors should not be reported anymore now.

apply at first PSU 11.1.0.7.10. in all Oracle_Homes (ASM & DB).

Afterwards apply the fix for Bug 7523755

Finally, after the patches are applied, restart the instance and try to mount the diskgroup again.

Verify if the block recovery can be completed now or if we are still failing with the same ORA-600.

At least the I/O errors should not be reported anymore now.