Insights and Strategies for DBA: ORA-15077 Has a disk HB but no network HB

Few days ago i have face some issue in RAC below the full explanation the issue ,detection and resolution.

---------------
Problem :
---------------

At the time of starting Grid Infrastructure facing issue.
Vice-Versa Issue At a time only single node work and 2nd node all services started but ora.crsd and ora.asm not started.

---------------

NODE 1:

---------------

[root@rac1 ~]# crsctl stat res -init -t

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

1 ONLINE ONLINE rac1 Started

ora.crsd

1 ONLINE ONLINE rac1

ora.cssd

1 ONLINE ONLINE rac1

ora.cssdmonitor

1 ONLINE ONLINE rac1

ora.ctssd

1 ONLINE ONLINE rac1 ACTIVE:0

ora.diskmon

1 ONLINE ONLINE rac1

ora.evmd

1 ONLINE ONLINE rac1

ora.gipcd

1 ONLINE ONLINE rac1

ora.gpnpd

1 ONLINE ONLINE rac1

ora.mdnsd

1 ONLINE ONLINE rac1

--- /app/gridHome/log/rac1/alertrac1.log

2015-12-28 21:14:01.314

[ohasd(5082)]CRS-2112:The OLR service started on node rac1.

2015-12-28 21:14:02.084

[ohasd(5082)]CRS-8011:reboot advisory message from host: rac1, component: ag163608, with time stamp: L-2015-12-25-17:13:18.517

[ohasd(5082)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS

2015-12-28 21:14:02.368

[ohasd(5082)]CRS-8011:reboot advisory message from host: rac1, component: ag203044, with time stamp: L-2015-12-26-22:10:09.463

[ohasd(5082)]CRS-8013:reboot advisory message text: Rebooting after limit 760 exceeded; disk timeout 710, network timeout 760, last heartbeat from CSSD at epoch seconds 1451148008.203, 1251 milliseconds ago based on invariant clock value of 5249663

2015-12-28 21:14:02.785

[ohasd(5082)]CRS-8011:reboot advisory message from host: rac1, component: mo163608, with time stamp: L-2015-12-25-17:13:18.517

[ohasd(5082)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS

2015-12-28 21:14:02.936

[ohasd(5082)]CRS-8017:location: /etc/oracle/lastgasp has 34 reboot advisory log files, 3 were announced and 0 errors occurred

2015-12-28 21:14:33.180

[ohasd(5082)]CRS-2772:Server 'rac1' has been assigned to pool 'Free'.

2015-12-28 21:14:36.486

[ohasd(5082)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).

2015-12-28 21:14:44.368

[cssd(5347)]CRS-1713:CSSD daemon is started in clustered mode

2015-12-28 21:15:00.603

[cssd(5347)]CRS-1707:Lease acquisition for node rac1 number 1 completed

2015-12-28 21:15:00.707

[cssd(5347)]CRS-1605:CSSD voting file is online: ORCL:OCR; details in /app/gridHome/log/rac1/cssd/ocssd.log.

2015-12-28 21:15:18.881

[cssd(5347)]CRS-1625:Node rac2, number 2, was manually shut down

2015-12-28 21:15:28.300

[cssd(5347)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 .

2015-12-28 21:15:31.235

[ctssd(5410)]CRS-2407:The new Cluster Time Synchronization Service reference node is host rac1.

2015-12-28 21:15:31.337

[ctssd(5410)]CRS-2401:The Cluster Time Synchronization Service started on host rac1.

2015-12-28 21:16:40.946

[crsd(5644)]CRS-1012:The OCR service started on node rac1.

2015-12-28 21:16:46.469

[crsd(5644)]CRS-1201:CRSD started on node rac1.

2015-12-28 21:16:55.272

[crsd(5644)]CRS-2772:Server 'rac1' has been assigned to pool 'Generic'.

2015-12-28 21:16:55.273

[crsd(5644)]CRS-2772:Server 'rac1' has been assigned to pool 'ora.racdb'.

--- /app/gridHome/log/rac1/crsd/crsd.log

2015-12-28 21:17:01.730: [UiServer][2572688272] Sending message to PE. ctx= 0xb58edc0

2015-12-28 21:17:01.730: [UiServer][2572688272] processMessage called

2015-12-28 21:17:01.730: [UiServer][2572688272] Sending message to PE. ctx= 0xb82c1a8

2015-12-28 21:17:01.731: [ CRSPE][2576890768] Cmd : 0x9900cc70 : flags: EVENT_TAG | HOST_TAG

2015-12-28 21:17:01.731: [ CRSPE][2576890768] Processing PE command id=115. Description: [Start Resource : 0x9900cc70]

2015-12-28 21:17:01.731: [ CRSPE][2576890768] Attribute overrides for the command: USR_ORA_OPI = true;

2015-12-28 21:17:01.735: [ CRSPE][2576890768] PE Command [ Start Resource : 0x9900cc70 ] has completed

2015-12-28 21:17:01.735: [ CRSPE][2576890768] UI Command [Start Resource : 0x9900cc70] is replying to sender.

2015-12-28 21:17:01.735: [ CRSPE][2576890768] Processing PE command id=116. Description: [Stat Resource : 0x9900cc70]

2015-12-28 21:17:01.737: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed

2015-12-28 21:17:01.737: [ CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.

2015-12-28 21:17:01.738: [UiServer][2572688272] Container [ Name: ORDER

MESSAGE:

TextMessage[CRS-2545: Cannot operate on 'instance of ora.LISTENER.lsnr assigned to rac1'. It is locked by 'SYSTEM' for command 'Resource Autostart : rac1']

MSGTYPE:

TextMessage[1]

OBJID:

TextMessage[ora.LISTENER.lsnr]

WAIT:

TextMessage[0]

]

2015-12-28 21:17:01.738: [ COMMCRS][2572688272]clscsendx: (0xb7b25f8) Connection not active

2015-12-28 21:17:01.738: [UiServer][2572688272] S(0xb56cfb8): Error sending msg over socket.6

2015-12-28 21:17:01.738: [UiServer][2572688272] Fatal exception sending reply back to client.FatalCommsException : Failed to send response to client.

(File: clsMessageStream.cpp, line: 275

2015-12-28 21:17:01.739: [UiServer][2572688272] Container [ Name: UI_DATA

ora.LISTENER.lsnr:

TextMessage[233]

]

2015-12-28 21:17:01.739: [UiServer][2572688272] S(0xb56cfb8): No connection to client.6

2015-12-28 21:17:01.739: [UiServer][2572688272] Fatal exception sending reply back to client.FatalCommsException : Failed to send response to client.

(File: clsMessageStream.cpp, line: 275

2015-12-28 21:17:01.739: [UiServer][2572688272] Done for ctx=0xb58edc0

2015-12-28 21:17:01.743: [UiServer][2572688272] Done for ctx=0xb82c1a8

2015-12-28 21:17:01.858: [UiServer][2570587024] S(0xb7b2570): set Properties ( grid,0xb7a1f00)

2015-12-28 21:17:01.875: [UiServer][2572688272] processMessage called

2015-12-28 21:17:01.876: [UiServer][2572688272] Sending message to PE. ctx= 0xb7a2538

2015-12-28 21:17:01.877: [ CRSPE][2576890768] Processing PE command id=117. Description: [Stat Resource : 0x9900cc70]

2015-12-28 21:17:01.881: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed

2015-12-28 21:17:01.882: [ CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.

2015-12-28 21:17:01.886: [UiServer][2572688272] Done for ctx=0xb7a2538

2015-12-28 21:17:01.922: [UiServer][2570587024] S(0xb6e09b0): set Properties ( grid,0xb7b8fb0)

2015-12-28 21:17:01.934: [UiServer][2572688272] processMessage called

2015-12-28 21:17:01.934: [UiServer][2572688272] Sending message to PE. ctx= 0xb7b25f8

2015-12-28 21:17:01.934: [ CRSPE][2576890768] Processing PE command id=118. Description: [Stat Resource : 0x9900cc70]

2015-12-28 21:17:01.936: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed

2015-12-28 21:17:01.937: [ CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.

2015-12-28 21:17:01.938: [UiServer][2572688272] Done for ctx=0xb7b25f8

2015-12-28 21:17:01.947: [UiServer][2570587024] S(0xb79fe88): set Properties ( grid,0xb7a24e8)

2015-12-28 21:17:01.958: [UiServer][2572688272] processMessage called

2015-12-28 21:17:01.958: [UiServer][2572688272] Sending message to PE. ctx= 0xb70b938

2015-12-28 21:17:01.959: [ CRSPE][2576890768] Processing PE command id=119. Description: [Stat Resource : 0x9900cc70]

2015-12-28 21:17:01.960: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed

2015-12-28 21:17:01.960: [ CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.

2015-12-28 21:17:01.961: [UiServer][2572688272] Done for ctx=0xb70b938

2015-12-28 21:17:02.224: [ AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:555 from the agent /app/gridHome/bin/oraagent_grid

2015-12-28 21:17:02.224: [ AGFW][2587397008] Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:550

2015-12-28 21:17:02.225: [ CRSPE][2576890768] Received reply to action [Start] message ID: 550

2015-12-28 21:17:02.371: [ AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:564 from the agent /app/gridHome/bin/oraagent_grid

2015-12-28 21:17:02.371: [ AGFW][2587397008] Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:563

2015-12-28 21:17:02.371: [ CRSPE][2576890768] Received reply to action [Start] message ID: 563

2015-12-28 21:17:02.374: [ AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:555 from the agent /app/gridHome/bin/oraagent_grid

2015-12-28 21:17:02.374: [ AGFW][2587397008] Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:550

2015-12-28 21:17:02.375: [ CRSPE][2576890768] Received reply to action [Start] message ID: 550

2015-12-28 21:17:02.375: [ CRSPE][2576890768] RI [ora.LISTENER_SCAN1.lsnr 1 1] new external state [ONLINE] old value: [OFFLINE] on rac1 label = []

2015-12-28 21:17:02.376: [ CRSPE][2576890768] CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'rac1' succeeded

2015-12-28 21:17:02.377: [ CRSRPT][2574789520] Publishing event: Cluster Resource State Change Event for ora.LISTENER_SCAN1.lsnr:rac1 : 0xb7b93a8

2015-12-28 21:17:02.377: [ CRSRPT][2574789520] FPE: Publisher not ready.

2015-12-28 21:17:02.459: [ AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:564 from the agent /app/gridHome/bin/oraagent_grid

2015-12-28 21:17:02.459: [ AGFW][2587397008] Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:563

2015-12-28 21:17:02.459: [ CRSPE][2576890768] Received reply to action [Start] message ID: 563

2015-12-28 21:17:02.459: [ CRSPE][2576890768] RI [ora.LISTENER.lsnr rac1 1] new external state [ONLINE] old value: [OFFLINE] on rac1 label = []

2015-12-28 21:17:02.460: [ CRSPE][2576890768] CRS-2676: Start of 'ora.LISTENER.lsnr' on 'rac1' succeeded

2015-12-28 21:17:02.460: [ CRSPE][2576890768] PE Command [ Resource State Change (ora.net1.network rac1 1) : 0x99004ff8 ] has completed

2015-12-28 21:17:02.462: [ CRSRPT][2574789520] Publishing event: Cluster Resource State Change Event for ora.LISTENER.lsnr:rac1 : 0xb7a0208

2015-12-28 21:17:02.462: [ CRSRPT][2574789520] FPE: Publisher not ready.

2015-12-28 21:17:02.462: [ AGFW][2587397008] Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:620

2015-12-28 21:17:02.462: [ AGFW][2587397008] Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:620

2015-12-28 21:17:02.513: [ CRSPE][2576890768] Re-evaluation of queued op [START of [ora.LISTENER_SCAN1.lsnr 1 1] on [rac1] : 0x99005308]. found it no longer needed:CRS-0223: Resource 'ora.LISTENER_SCAN1.lsnr' has placement error.

. Finishing the op.

2015-12-28 21:17:02.514: [ CRSPE][2576890768] Re-evaluation of queued op [START of [ora.ons rac1 1] on [rac1] : 0x99005810]. found it no longer needed:CRS-0223: Resource 'ora.ons' has placement error.

. Finishing the op.

2015-12-28 21:17:02.516: [ CRSPE][2576890768] Re-evaluation of queued op [START of [ora.rac2.vip 1 1] on [rac1] : 0x99007e48]. found it no longer needed:CRS-0223: Resource 'ora.rac2.vip' has placement error.

. Finishing the op.

2015-12-28 21:17:02.517: [ CRSPE][2576890768] Re-evaluation of queued op [START of [ora.LISTENER.lsnr rac1 1] on [rac1] : 0x99008fd0]. found it no longer needed:CRS-0223: Resource 'ora.LISTENER.lsnr' has placement error.

. Finishing the op.

2015-12-28 21:17:08.729: [ AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.eons rac1 1] ID 4098:508 from the agent /app/gridHome/bin/oraagent_grid

2015-12-28 21:17:08.729: [ AGFW][2587397008] Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.eons rac1 1] ID 4098:507

2015-12-28 21:17:08.729: [ CRSPE][2576890768] Received reply to action [Start] message ID: 507

2015-12-28 21:17:08.730: [ AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.eons rac1 1] ID 4098:508 from the agent /app/gridHome/bin/oraagent_grid

2015-12-28 21:17:08.730: [ AGFW][2587397008] Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.eons rac1 1] ID 4098:507

2015-12-28 21:17:08.730: [ CRSPE][2576890768] Received reply to action [Start] message ID: 507

2015-12-28 21:17:08.731: [ CRSPE][2576890768] RI [ora.eons rac1 1] new external state [ONLINE] old value: [OFFLINE] on rac1 label = []

2015-12-28 21:17:08.731: [ CRSPE][2576890768] CRS-2676: Start of 'ora.eons' on 'rac1' succeeded

2015-12-28 21:17:08.731: [ CRSPE][2576890768] PE Command [ Server Join: Resource Startup {rac1 } : 0x99004188 ] has completed

2015-12-28 21:17:08.732: [ CRSPE][2576890768] Updated state details for server rac1from [AUTOSTARTING RESOURCES] to : []

2015-12-28 21:17:08.732: [ CRSRPT][2574789520] Publishing event: Cluster Resource State Change Event for ora.eons:rac1 : 0xb733660

2015-12-28 21:17:08.732: [ CRSRPT][2574789520] FPE: Publisher not ready.

2015-12-28 21:17:08.733: [ CRSOCR][2585295760] Multi Write Batch processing...

2015-12-28 21:17:08.733: [ AGFW][2587397008] Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:640

2015-12-28 21:17:08.733: [ AGFW][2587397008] Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:640

2015-12-28 21:17:08.733: [ CRSRPT][2574789520] Reporter skipping ServerStateChange event to EVM

2015-12-28 21:17:08.733: [ CRSRPT][2574789520] Publishing event: Cluster Server State Change Event for rac1becoming ONLINE : 0xb7b20e0

2015-12-28 21:17:08.733: [ CRSRPT][2574789520] FPE: Publisher not ready.

2015-12-28 21:17:08.775: [ CRSPE][2576890768] Resource Autostart completed for rac1

2015-12-28 21:17:08.775: [ CRSOCR][2585295760] Multi Write Batch done.

2015-12-28 21:17:09.774: [ CRSRPT][2574789520] Publisher initialized flushing buffered events.

2015-12-28 21:17:09.774: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7db288

2015-12-28 21:17:09.774: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7a7aa0

2015-12-28 21:17:09.774: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb644228

2015-12-28 21:17:09.774: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7b00d8

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7ab9e0

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb79d6d8

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7a9818

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7b0178

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb531e08

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7ab2a0

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7addb0

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb73a9f8

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb70a520

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7af480

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7336e8

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb6bd5d0

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7b93a8

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7a0208

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb733660

2015-12-28 21:17:09.776: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7b20e0

2015-12-28 21:17:52.973: [UiServer][2570587024] S(0xb7b80b8): set Properties ( grid,0xb6df8b8)

2015-12-28 21:17:52.985: [UiServer][2572688272] processMessage called

2015-12-28 21:17:52.985: [UiServer][2572688272] Sending message to PE. ctx= 0xb70a5b8

2015-12-28 21:17:52.985: [ CRSPE][2576890768] Processing PE command id=120. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:17:52.987: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:17:52.987: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:17:52.993: [UiServer][2572688272] Done for ctx=0xb70a5b8

2015-12-28 21:17:52.996: [UiServer][2570587024] S(0xb7bae90): set Properties ( grid,0xb70cdf0)

2015-12-28 21:17:53.008: [UiServer][2572688272] processMessage called

2015-12-28 21:17:53.008: [UiServer][2572688272] Sending message to PE. ctx= 0xb7b9ee0

2015-12-28 21:17:53.008: [ CRSPE][2576890768] Processing PE command id=121. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:17:53.010: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:17:53.010: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:17:53.014: [UiServer][2572688272] Done for ctx=0xb7b9ee0

2015-12-28 21:17:53.021: [UiServer][2570587024] S(0xb70d018): set Properties ( grid,0xb6df8b8)

2015-12-28 21:17:53.033: [UiServer][2572688272] processMessage called

2015-12-28 21:17:53.033: [UiServer][2572688272] Sending message to PE. ctx= 0xb70a5b8

2015-12-28 21:17:53.033: [ CRSPE][2576890768] Processing PE command id=122. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:17:53.035: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:17:53.035: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:17:53.064: [UiServer][2572688272] Done for ctx=0xb70a5b8

2015-12-28 21:17:53.065: [ AGFW][2587397008] Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:198

2015-12-28 21:17:53.065: [ AGFW][2587397008] Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:198

2015-12-28 21:20:34.067: [UiServer][2570587024] S(0xb7078c8): set Properties ( root,0xb6df8b8)

2015-12-28 21:20:34.078: [UiServer][2572688272] processMessage called

2015-12-28 21:20:34.079: [UiServer][2572688272] Sending message to PE. ctx= 0xb6e1bd0

2015-12-28 21:20:34.079: [ CRSPE][2576890768] Processing PE command id=123. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:20:34.080: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:20:34.080: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:20:34.083: [UiServer][2572688272] Done for ctx=0xb6e1bd0

2015-12-28 21:25:34.156: [UiServer][2570587024] S(0xb70e9c0): set Properties ( root,0xb70cdf0)

2015-12-28 21:25:34.168: [UiServer][2572688272] processMessage called

2015-12-28 21:25:34.168: [UiServer][2572688272] Sending message to PE. ctx= 0xb6df8c8

2015-12-28 21:25:34.170: [ CRSPE][2576890768] Processing PE command id=124. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:25:34.171: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:25:34.171: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:25:34.172: [UiServer][2572688272] Done for ctx=0xb6df8c8

2015-12-28 21:30:34.209: [UiServer][2570587024] S(0xb7a35f8): set Properties ( root,0xb70cdf0)

2015-12-28 21:30:34.220: [UiServer][2572688272] processMessage called

2015-12-28 21:30:34.221: [UiServer][2572688272] Sending message to PE. ctx= 0xb736340

2015-12-28 21:30:34.222: [ CRSPE][2576890768] Processing PE command id=125. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:30:34.222: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:30:34.223: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:30:34.226: [UiServer][2572688272] Done for ctx=0xb736340

2015-12-28 21:35:34.276: [UiServer][2570587024] S(0xb70d468): set Properties ( root,0xb73ca58)

2015-12-28 21:35:34.288: [UiServer][2572688272] processMessage called

2015-12-28 21:35:34.289: [UiServer][2572688272] Sending message to PE. ctx= 0xb6dfd38

2015-12-28 21:35:34.290: [ CRSPE][2576890768] Processing PE command id=126. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:35:34.291: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:35:34.291: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:35:34.292: [UiServer][2572688272] Done for ctx=0xb6dfd38

2015-12-28 21:40:34.542: [UiServer][2570587024] S(0xb73bc60): set Properties ( root,0xb70e238)

2015-12-28 21:40:34.554: [UiServer][2572688272] processMessage called

2015-12-28 21:40:34.554: [UiServer][2572688272] Sending message to PE. ctx= 0xb739c30

2015-12-28 21:40:34.555: [ CRSPE][2576890768] Processing PE command id=127. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:40:34.555: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:40:34.555: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:40:34.557: [UiServer][2572688272] Done for ctx=0xb739c30

2015-12-28 21:45:34.632: [UiServer][2570587024] S(0xb73cb78): set Properties ( root,0xb7b8000)

2015-12-28 21:45:34.644: [UiServer][2572688272] processMessage called

2015-12-28 21:45:34.644: [UiServer][2572688272] Sending message to PE. ctx= 0xb6b81a8

2015-12-28 21:45:34.645: [ CRSPE][2576890768] Processing PE command id=128. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:45:34.645: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:45:34.646: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:45:34.650: [UiServer][2572688272] Done for ctx=0xb6b81a8

2015-12-28 21:50:34.776: [UiServer][2570587024] S(0xb73bc60): set Properties ( root,0xb7b8000)

2015-12-28 21:50:34.793: [UiServer][2572688272] processMessage called

2015-12-28 21:50:34.794: [UiServer][2572688272] Sending message to PE. ctx= 0xb6e3ee8

2015-12-28 21:50:34.795: [ CRSPE][2576890768] Processing PE command id=129. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:50:34.795: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:50:34.796: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:50:34.807: [UiServer][2572688272] Done for ctx=0xb6e3ee8

2015-12-28 21:55:34.772: [UiServer][2570587024] S(0xb73bc60): set Properties ( root,0xb7b8000)

2015-12-28 21:55:34.785: [UiServer][2572688272] processMessage called

2015-12-28 21:55:34.785: [UiServer][2572688272] Sending message to PE. ctx= 0xb7af430

2015-12-28 21:55:34.786: [ CRSPE][2576890768] Processing PE command id=130. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:55:34.787: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:55:34.787: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:55:34.788: [UiServer][2572688272] Done for ctx=0xb7af430

--- /app/gridHome/log/rac1/cssd/ocssd.log

2015-12-28 21:58:52.166: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:58:55.061: [ CSSD][2998131600]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:OCR sched delay 1840 > margin 1500 cur_ms 1931954 lastalive 1930114

2015-12-28 21:58:55.061: [ CSSD][2998131600]clssnmvSchedDiskThreads: KillBlockThread for voting file ORCL:OCR sched delay 1830 > margin 1500 cur_ms 1931954 lastalive 1930124

2015-12-28 21:58:57.059: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:58:57.059: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:01.068: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:01.068: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:05.079: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:05.079: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:09.091: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:09.091: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:13.099: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:13.099: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:18.110: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:18.110: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

2015-12-28 21:59:23.120: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:23.120: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

2015-12-28 21:59:28.131: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:28.131: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

2015-12-28 21:59:33.140: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:33.140: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

2015-12-28 21:59:37.150: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:37.150: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:41.158: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:41.158: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:45.168: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:45.168: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:49.176: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:49.177: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:53.186: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:53.186: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:57.195: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:57.195: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 22:00:01.206: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 22:00:01.206: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 22:00:06.217: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 22:00:06.217: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

2015-12-28 22:00:11.228: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 22:00:11.228: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

2015-12-28 22:00:16.239: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 22:00:16.239: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

2015-12-28 22:00:21.249: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 22:00:21.249: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

--- /app/oracleBase/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log

Starting up:

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production

With the Real Application Clusters and Automatic Storage Management options.

Using parameter settings in server-side spfile +OCR/rac-cluster/asmparameterfile/registry.253.866216815

System parameters with non-default values:

large_pool_size = 12M

instance_type = "asm"

remote_login_passwordfile= "EXCLUSIVE"

asm_diskstring = "ORCL:*"

asm_diskgroups = "FRA"

asm_diskgroups = "DATA"

asm_power_limit = 1

diagnostic_dest = "/app/oracleBase"

Cluster communication is configured to use the following interface(s) for this instance

192.168.1.151

cluster interconnect IPC version:Oracle UDP/IP (generic)

IPC Vendor 1 proto 2

Mon Dec 28 21:16:07 2015

PMON started with pid=2, OS id=5521

Mon Dec 28 21:16:08 2015

VKTM started with pid=3, OS id=5525 at elevated priority

VKTM running at (10)millisec precision with DBRM quantum (100)ms

Mon Dec 28 21:16:08 2015

GEN0 started with pid=4, OS id=5531

Mon Dec 28 21:16:08 2015

DIAG started with pid=5, OS id=5535

Mon Dec 28 21:16:08 2015

PING started with pid=6, OS id=5539

Mon Dec 28 21:16:08 2015

PSP0 started with pid=7, OS id=5543

Mon Dec 28 21:16:10 2015

DIA0 started with pid=8, OS id=5547

Mon Dec 28 21:16:11 2015

LMON started with pid=9, OS id=5552

Mon Dec 28 21:16:11 2015

LMD0 started with pid=10, OS id=5556

Mon Dec 28 21:16:11 2015

LMS0 started with pid=11, OS id=5560 at elevated priority

Mon Dec 28 21:16:12 2015

LMHB started with pid=12, OS id=5566

Mon Dec 28 21:16:12 2015

MMAN started with pid=13, OS id=5571

Mon Dec 28 21:16:12 2015

LGWR started with pid=15, OS id=5579

Mon Dec 28 21:16:12 2015

DBW0 started with pid=14, OS id=5575

Mon Dec 28 21:16:12 2015

CKPT started with pid=16, OS id=5583

Mon Dec 28 21:16:13 2015

SMON started with pid=17, OS id=5587

Mon Dec 28 21:16:13 2015

GMON started with pid=19, OS id=5597

Mon Dec 28 21:16:13 2015

RBAL started with pid=18, OS id=5591

Mon Dec 28 21:16:13 2015

MMON started with pid=20, OS id=5601

Mon Dec 28 21:16:13 2015

MMNL started with pid=21, OS id=5605

lmon registered with NM - instance number 1 (internal mem no 0)

Reconfiguration started (old inc 0, new inc 2)

ASM instance

List of instances:

1 (myinst: 1)

Global Resource Directory frozen

* allocate domain 0, invalid = TRUE

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Post SMON to start 1st pass IR

Submitted all GCS remote-cache requests

Post SMON to start 1st pass IR

Fix write in gcs resources

Reconfiguration complete

Mon Dec 28 21:16:16 2015

LCK0 started with pid=22, OS id=5615

Mon Dec 28 21:16:18 2015

ORACLE_BASE not set in environment. It is recommended

that ORACLE_BASE be set in the environment

Reusing ORACLE_BASE from an earlier startup = /app/oracleBase

Mon Dec 28 21:16:19 2015

SQL> ALTER DISKGROUP ALL MOUNT /* asm agent */

NOTE: Diskgroups listed in ASM_DISKGROUPS are

FRA

DATA

NOTE: Diskgroup used for Voting files is:

OCR

Diskgroup with spfile:OCR

Diskgroup used for OCR is:OCR

NOTE: cache registered group DATA number=1 incarn=0x6188a14e

NOTE: cache began mount (first) of group DATA number=1 incarn=0x6188a14e

NOTE: cache registered group FRA number=2 incarn=0x6198a14f

NOTE: cache began mount (first) of group FRA number=2 incarn=0x6198a14f

NOTE: cache registered group OCR number=3 incarn=0x6198a150

NOTE: cache began mount (first) of group OCR number=3 incarn=0x6198a150

NOTE: Loaded library: /opt/oracle/extapi/32/asm/orcl/1/libasm.so

NOTE: Assigning number (1,0) to disk (ORCL:DATA)

NOTE: Assigning number (2,0) to disk (ORCL:FRA)

NOTE: Assigning number (3,0) to disk (ORCL:OCR)

Mon Dec 28 21:16:25 2015

NOTE: start heartbeating (grp 1)

kfdp_query(DATA): 5

kfdp_queryBg(): 5

NOTE: cache opening disk 0 of grp 1: DATA label:DATA

NOTE: F1X0 found on disk 0 au 2 fcn 0.0

NOTE: cache mounting (first) external redundancy group 1/0x6188A14E (DATA)

Mon Dec 28 21:16:27 2015

* allocate domain 1, invalid = TRUE

Mon Dec 28 21:16:28 2015

NOTE: attached to recovery domain 1

NOTE: starting recovery of thread=1 ckpt=6.266 group=1 (DATA)

NOTE: advancing ckpt for thread=1 ckpt=6.266

NOTE: cache recovered group 1 to fcn 0.3427

Mon Dec 28 21:16:28 2015

NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATA)

NOTE: LGWR found thread 1 closed at ABA 6.265

NOTE: LGWR mounted thread 1 for diskgroup 1 (DATA)

NOTE: LGWR opening thread 1 at fcn 0.3427 ABA 7.266

NOTE: cache mounting group 1/0x6188A14E (DATA) succeeded

NOTE: cache ending mount (success) of group DATA number=1 incarn=0x6188a14e

NOTE: start heartbeating (grp 2)

kfdp_query(FRA): 7

kfdp_queryBg(): 7

NOTE: cache opening disk 0 of grp 2: FRA label:FRA

NOTE: F1X0 found on disk 0 au 2 fcn 0.0

NOTE: cache mounting (first) external redundancy group 2/0x6198A14F (FRA)

* allocate domain 2, invalid = TRUE

NOTE: attached to recovery domain 2

NOTE: starting recovery of thread=1 ckpt=6.183 group=2 (FRA)

NOTE: advancing ckpt for thread=1 ckpt=6.183

NOTE: cache recovered group 2 to fcn 0.1827

NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (FRA)

NOTE: LGWR found thread 1 closed at ABA 6.182

NOTE: LGWR mounted thread 1 for diskgroup 2 (FRA)

NOTE: LGWR opening thread 1 at fcn 0.1827 ABA 7.183

NOTE: cache mounting group 2/0x6198A14F (FRA) succeeded

NOTE: cache ending mount (success) of group FRA number=2 incarn=0x6198a14f

NOTE: start heartbeating (grp 3)

kfdp_query(OCR): 9

kfdp_queryBg(): 9

NOTE: cache opening disk 0 of grp 3: OCR label:OCR

NOTE: F1X0 found on disk 0 au 2 fcn 0.0

NOTE: cache mounting (first) external redundancy group 3/0x6198A150 (OCR)

* allocate domain 3, invalid = TRUE

NOTE: attached to recovery domain 3

Mon Dec 28 21:16:29 2015

NOTE: starting recovery of thread=1 ckpt=16.68 group=3 (OCR)

NOTE: advancing ckpt for thread=1 ckpt=16.68

NOTE: cache recovered group 3 to fcn 0.790

NOTE: LGWR attempting to mount thread 1 for diskgroup 3 (OCR)

NOTE: LGWR found thread 1 closed at ABA 16.67

NOTE: LGWR mounted thread 1 for diskgroup 3 (OCR)

NOTE: LGWR opening thread 1 at fcn 0.790 ABA 17.68

NOTE: cache mounting group 3/0x6198A150 (OCR) succeeded

NOTE: cache ending mount (success) of group OCR number=3 incarn=0x6198a150

Mon Dec 28 21:16:29 2015

kfdp_query(DATA): 10

kfdp_queryBg(): 10

NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1

SUCCESS: diskgroup DATA was mounted

kfdp_query(FRA): 11

kfdp_queryBg(): 11

NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2

SUCCESS: diskgroup FRA was mounted

kfdp_query(OCR): 12

kfdp_queryBg(): 12

NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 3

SUCCESS: diskgroup OCR was mounted

SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent */

SQL> ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */

SUCCESS: ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */

Mon Dec 28 21:16:31 2015

WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)

WARNING: failed to online diskgroup resource ora.FRA.dg (unable to communicate with CRSD/OHASD)

WARNING: failed to online diskgroup resource ora.OCR.dg (unable to communicate with CRSD/OHASD)

Mon Dec 28 21:16:35 2015

Starting background process ASMB

Mon Dec 28 21:16:35 2015

ASMB started with pid=24, OS id=5664

Mon Dec 28 21:17:53 2015

ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))))' SCOPE=MEMORY SID='+ASM1';

-------------

NODE: 2

-------------

[root@rac2 ~]# crsctl stat res -init -t

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

1 ONLINE OFFLINE

ora.crsd

1 ONLINE OFFLINE

ora.cssd

1 ONLINE OFFLINE

ora.cssdmonitor

1 ONLINE ONLINE rac2

ora.ctssd

1 ONLINE OFFLINE

ora.diskmon

1 ONLINE ONLINE rac2

ora.evmd

1 ONLINE OFFLINE

ora.gipcd

1 ONLINE ONLINE rac2

ora.gpnpd

1 ONLINE ONLINE rac2

ora.mdnsd

1 ONLINE ONLINE rac2

--- OCSSD.LOG

2015-12-27 00:09:03.052: [ CSSD][2888805264]clssnmLocalJoinEvent: begin on node(2), waittime 193000

2015-12-27 00:09:03.052: [ CSSD][2888805264]clssnmLocalJoinEvent: set curtime (4294388080) for my node

2015-12-27 00:09:03.052: [ CSSD][2888805264]clssnmLocalJoinEvent: scanning 32 nodes

2015-12-27 00:09:03.052: [ CSSD][2888805264]clssnmLocalJoinEvent: Node rac1, number 1, is in an existing cluster with disk state 3

2015-12-27 00:09:03.053: [ CSSD][2888805264]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk

2015-12-27 00:09:03.572: [ CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 61997, LATS 4294388600, lastSeqNo 61997, uniqueness 1451148676, timestamp 1451155163/6166804

2015-12-27 00:09:04.052: [ CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2015-12-27 00:09:04.581: [ CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 61998, LATS 4294389610, lastSeqNo 61998, uniqueness 1451148676, timestamp 1451155164/6167814

2015-12-27 00:09:04.955: [ CSSD][2899295120]clssnmSendingThread: sending join msg to all nodes

2015-12-27 00:09:04.955: [ CSSD][2899295120]clssnmSendingThread: sent 5 join msgs to all nodes

2015-12-27 00:09:05.054: [ CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2015-12-27 00:09:05.514: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: [network] failed connect attempt endp 0x9f21488 [00000000000002ec] { gipcEndpoint : localAddr 'gipc://rac2:2b90-7ba3-ca8e-6111#192.168.1.152#50593', remoteAddr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', numPend 0, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x80612, usrFlags 0x0 }, req 0x9f233e8 [00000000000002f5] { gipcConnectRequest : addr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', parentEndp 0x9f21488, ret gipcretConnection

2015-12-27 00:09:05.514: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos op : sgipcnTcpConnect

2015-12-27 00:09:05.514: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos dep : Connection timed out (110)

2015-12-27 00:09:05.515: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos loc : connect

2015-12-27 00:09:05.515: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos info: addr '192.168.1.151:12874'

2015-12-27 00:09:05.515: [ CSSD][2878315408]clssscSelect: conn complete ctx 0x99a1c48 endp 0x2ec

2015-12-27 00:09:05.515: [ CSSD][2878315408]clssnmeventhndlr: node(1), endp(0x2ec) failed, probe((nil)) ninf->endp (0x2ec) CONNCOMPLETE

2015-12-27 00:09:05.515: [ CSSD][2878315408]clssnmDiscHelper: rac1, node(1) connection failed, endp (0x2ec), probe((nil)), ninf->endp 0x2ec

2015-12-27 00:09:05.515: [ CSSD][2878315408]clssnmDiscHelper: node 1 clean up, endp (0x2ec), init state 0, cur state 0

2015-12-27 00:09:05.515: [GIPCXCPT][2878315408]gipcInternalDissociate: obj 0x9f21488 [00000000000002ec] { gipcEndpoint : localAddr 'gipc://rac2:2b90-7ba3-ca8e-6111#192.168.1.152#50593', remoteAddr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)

2015-12-27 00:09:05.515: [GIPCXCPT][2878315408]gipcDissociateF [clssnmDiscHelper : clssnm.c : 3215]: EXCEPTION[ ret gipcretFail (1) ] failed to dissociate obj 0x9f21488 [00000000000002ec] { gipcEndpoint : localAddr 'gipc://rac2:2b90-7ba3-ca8e-6111#192.168.1.152#50593', remoteAddr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 }, flags 0x0

2015-12-27 00:09:05.517: [ CSSD][2878315408]clssnmDiscEndp: gipcDestroy 0x2ec

2015-12-27 00:09:05.588: [ CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 61999, LATS 4294390620, lastSeqNo 61999, uniqueness 1451148676, timestamp 1451155165/6168814

2015-12-27 00:09:05.589: [ CSSD][2878315408]clssnmconnect: connecting to addr gipc://rac1:nm_rac-cluster#192.168.1.151#12874

2015-12-27 00:09:05.590: [ CSSD][2878315408]clssscConnect: endp 0x2f8 - cookie 0x99a1c48 - addr gipc://rac1:nm_rac-cluster#192.168.1.151#12874

2015-12-27 00:09:05.590: [ CSSD][2878315408]clssnmconnect: connecting to node(1), endp(0x2f8), flags 0x10002

2015-12-27 00:09:06.056: [ CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2015-12-27 00:09:06.597: [ CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 62000, LATS 4294391630, lastSeqNo 62000, uniqueness 1451148676, timestamp 1451155166/6169824

2015-12-27 00:09:07.058: [ CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2015-12-27 00:09:07.603: [ CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 62001, LATS 4294392630, lastSeqNo 62001, uniqueness 1451148676, timestamp 1451155167/6170834

2015-12-27 00:09:08.060: [ CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2015-12-27 00:09:08.611: [ CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 62002, LATS 4294393640, lastSeqNo 62002, uniqueness 1451148676, timestamp 1451155168/6171844

2015-12-27 00:09:09.062: [ CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

--- CRSD.LOG

tail: cannot open `100' for reading: No such file or directory

==> crsd.log <==

2015-12-26 20:42:40.834: [ OCRASM][3046749904]proprasmcl: asmhandle is NULL

2015-12-26 20:42:40.834: [ OCRRAW][3046749904]proprinit: Could not open raw device

2015-12-26 20:42:40.834: [ OCRASM][3046749904]proprasmcl: asmhandle is NULL

2015-12-26 20:42:40.834: [ OCRAPI][3046749904]a_init:16!: Backend init unsuccessful : [26]

2015-12-26 20:42:40.835: [ CRSOCR][3046749904] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge

ORA-15077: could not locate ASM instance serving a required diskgroup

] [7]

2015-12-26 20:42:40.835: [ CRSD][3046749904][PANIC] CRSD exiting: Could not init OCR, code: 26

2015-12-26 20:42:40.835: [ CRSD][3046749904] Done.

--- GRID ALERT.LOG

(9009)]CRS-5822:Agent '/app/gridHome/bin/cssdagent_root' disconnected from server. Details at (:CRSAGF00117:) in /app/gridHome/log/rac2/agent/ohasd/oracssdagent_root/oracssdagent_root.log.

2015-12-27 00:06:15.655

[ohasd(4740)]CRS-2112:The OLR service started on node rac2.

2015-12-27 00:06:20.613

[ohasd(4740)]CRS-8011:reboot advisory message from host: rac2, component: mo162335, with time stamp: L-2015-12-25-17:27:14.683

[ohasd(4740)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS

2015-12-27 00:06:20.926

[ohasd(4740)]CRS-8017:location: /etc/oracle/lastgasp has 44 reboot advisory log files, 1 were announced and 0 errors occurred

2015-12-27 00:06:43.252

[ohasd(4740)]CRS-2772:Server 'rac2' has been assigned to pool 'Free'.

2015-12-27 00:06:46.574

[ohasd(4740)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).

2015-12-27 00:06:51.107

[cssd(5034)]CRS-1713:CSSD daemon is started in clustered mode

2015-12-27 00:07:01.189

[cssd(5034)]CRS-1707:Lease acquisition for node rac2 number 2 completed

2015-12-27 00:07:01.268

[cssd(5034)]CRS-1605:CSSD voting file is online: ORCL:OCR; details in /app/gridHome/log/rac2/cssd/ocssd.log.

--- +ASM2.log

Starting up:

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production

With the Real Application Clusters and Automatic Storage Management options.

Using parameter settings in server-side spfile +OCR/rac-cluster/asmparameterfile/registry.253.866216815

System parameters with non-default values:

large_pool_size = 12M

instance_type = "asm"

remote_login_passwordfile= "EXCLUSIVE"

asm_diskgroups = ""

asm_power_limit = 1

diagnostic_dest = "/app/oracleBase"

Cluster communication is configured to use the following interface(s) for this instance

192.168.1.152

cluster interconnect IPC version:Oracle UDP/IP (generic)

IPC Vendor 1 proto 2

Sat Dec 26 22:11:17 2015

PMON started with pid=2, OS id=6243

Sat Dec 26 22:11:17 2015

VKTM started with pid=3, OS id=6247 at elevated priority

VKTM running at (10)millisec precision with DBRM quantum (100)ms

Sat Dec 26 22:11:17 2015

GEN0 started with pid=4, OS id=6253

Sat Dec 26 22:11:17 2015

DIAG started with pid=5, OS id=6257

Sat Dec 26 22:11:18 2015

PING started with pid=6, OS id=6262

Sat Dec 26 22:11:18 2015

PSP0 started with pid=7, OS id=6267

Sat Dec 26 22:11:19 2015

DIA0 started with pid=8, OS id=6271

Sat Dec 26 22:11:19 2015

LMON started with pid=9, OS id=6276

Sat Dec 26 22:11:20 2015

LMD0 started with pid=10, OS id=6280

Sat Dec 26 22:11:20 2015

LMS0 started with pid=11, OS id=6284 at elevated priority

Sat Dec 26 22:11:21 2015

LMHB started with pid=12, OS id=6290

Sat Dec 26 22:11:21 2015

MMAN started with pid=13, OS id=6295

Sat Dec 26 22:11:22 2015

DBW0 started with pid=14, OS id=6299

Sat Dec 26 22:11:22 2015

LGWR started with pid=15, OS id=6303

Sat Dec 26 22:11:22 2015

SMON started with pid=17, OS id=6311

Sat Dec 26 22:11:23 2015

RBAL started with pid=18, OS id=6316

Sat Dec 26 22:11:23 2015

CKPT started with pid=16, OS id=6307

Sat Dec 26 22:11:23 2015

GMON started with pid=19, OS id=6320

Sat Dec 26 22:11:23 2015

MMON started with pid=20, OS id=6324

Sat Dec 26 22:11:23 2015

MMNL started with pid=21, OS id=6328

lmon registered with NM - instance number 2 (internal mem no 1)

Reconfiguration started (old inc 0, new inc 2)

ASM instance

List of instances:

2 (myinst: 2)

Global Resource Directory frozen

* allocate domain 0, invalid = TRUE

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Post SMON to start 1st pass IR

Submitted all GCS remote-cache requests

Post SMON to start 1st pass IR

Fix write in gcs resources

Reconfiguration complete

Sat Dec 26 22:11:25 2015

LCK0 started with pid=22, OS id=6337

Sat Dec 26 22:11:28 2015

ORACLE_BASE not set in environment. It is recommended

that ORACLE_BASE be set in the environment

Reusing ORACLE_BASE from an earlier startup = /app/oracleBase

Sat Dec 26 22:11:28 2015

SQL> ALTER DISKGROUP ALL MOUNT /* asm agent */

NOTE: Diskgroup used for Voting files is:

OCR

Diskgroup with spfile:OCR

Diskgroup used for OCR is:OCR

NOTE: cache registered group OCR number=1 incarn=0x81c19341

NOTE: cache began mount (first) of group OCR number=1 incarn=0x81c19341

NOTE: Loaded library: /opt/oracle/extapi/32/asm/orcl/1/libasm.so

NOTE: Assigning number (1,0) to disk (ORCL:OCR)

Sat Dec 26 22:11:35 2015

NOTE: start heartbeating (grp 1)

kfdp_query(OCR): 3

kfdp_queryBg(): 3

NOTE: cache opening disk 0 of grp 1: OCR label:OCR

NOTE: F1X0 found on disk 0 au 2 fcn 0.0

NOTE: cache mounting (first) external redundancy group 1/0x81C19341 (OCR)

Sat Dec 26 22:11:36 2015

* allocate domain 1, invalid = TRUE

Sat Dec 26 22:11:36 2015

NOTE: attached to recovery domain 1

NOTE: starting recovery of thread=1 ckpt=14.64 group=1 (OCR)

NOTE: advancing ckpt for thread=1 ckpt=14.64

NOTE: cache recovered group 1 to fcn 0.780

Sat Dec 26 22:11:36 2015

NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (OCR)

NOTE: LGWR found thread 1 closed at ABA 14.63

NOTE: LGWR mounted thread 1 for diskgroup 1 (OCR)

NOTE: LGWR opening thread 1 at fcn 0.780 ABA 15.64

NOTE: cache mounting group 1/0x81C19341 (OCR) succeeded

NOTE: cache ending mount (success) of group OCR number=1 incarn=0x81c19341

Sat Dec 26 22:11:36 2015

kfdp_query(OCR): 4

kfdp_queryBg(): 4

NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1

SUCCESS: diskgroup OCR was mounted

SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent */

SQL> ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */

SUCCESS: ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */

Sat Dec 26 22:11:38 2015

WARNING: failed to online diskgroup resource ora.OCR.dg (unable to communicate with CRSD/OHASD)

Sat Dec 26 22:18:24 2015

Detected partial connectivity in the cluster - IMR is being issued. This may cause some instances to be evicted

Evicting instance 1 from cluster

Reconfiguration started (old inc 2, new inc 4)

List of instances:

2 (myinst: 2)

Sat Dec 26 22:18:25 2015

Shutting down instance (immediate)

Shutting down instance: further logons disabled

Stopping background process MMNL

Global Resource Directory frozen

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

Sat Dec 26 22:18:25 2015

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Submitted all GCS remote-cache requests

Fix write in gcs resources

Reconfiguration complete

Stopping background process MMON

License high water mark = 1

SQL> ALTER DISKGROUP ALL DISMOUNT /* asm agent */

NOTE: cache dismounting (clean) group 1/0x81C19341 (OCR)

Sat Dec 26 22:18:26 2015

NOTE: LGWR doing clean dismount of group 1 (OCR)

NOTE: LGWR closing thread 1 of diskgroup 1 (OCR) at ABA 15.65

NOTE: LGWR released thread recovery enqueue

Sat Dec 26 22:18:26 2015

freeing rdom 1

Sat Dec 26 22:18:26 2015

NOTE: detached from domain 1

NOTE: cache dismounted group 1/0x81C19341 (OCR)

kfdp_dismount(): 5

Sat Dec 26 22:18:26 2015

kfdp_dismountBg(): 5

NOTE: De-assigning number (1,0) from disk (ORCL:OCR)

SUCCESS: diskgroup OCR was dismounted

NOTE: cache deleting context for group OCR 1/-2118020287

SUCCESS: ALTER DISKGROUP ALL DISMOUNT /* asm agent */

Reconfiguration started (old inc 4, new inc 6)

List of instances:

2 (myinst: 2)

Global Resource Directory frozen

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Submitted all GCS remote-cache requests

Fix write in gcs resources

Reconfiguration complete

Sat Dec 26 22:18:30 2015

WARNING: failed to offline diskgroup resource ora.OCR.dg (unable to communicate with CRSD/OHASD)

Sat Dec 26 22:18:31 2015

Stopping background process VKTM:

freeing rdom 0

Sat Dec 26 22:18:38 2015

Instance shutdown complete

--------------------

SOLUTION: -

--------------------

In my case the problem is ssh and the solution is co-ordinate with OS admin and resolved the issue of SSH make coonection grid and oracle user vice-versa NODES.

--------------

Solution 4:

--------------

First Check ssh connection work or not , In that case also above the errors.

sshd issue grid to grid

Problem:-- ssh: connect to host xxx.xxx.x.xx port 22: Connection timed out

--- Port listening Check ,DNS also Check.

# ps -A | grep sshd # results in

# lanscan

# netstat -in

# netstat -rn

# netstat -nat | grep 22

# netstat -ntl |grep :22

# sudo netstat -alnp|grep :22

# traceroute 10.230.xx.xx

# telnet 10.230.xx.xx 22

# reinstall ssh

Some relevant Solution available when facing the issue starting grid infrastructure.

--------------

Solution 1:

--------------

Oracle 11.2 with ASM, crsd fail to start on second node after reboot.

Description:

OS rhel 5.8 x64 bit. Oracle 11.2 CRS fail to start on second node after reboot.

Errors:

crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

crsd.log:

2013-11-05 15:58:19.095: [ OCRASM][1520055008]proprasmo: Error in open/create file in dg [DATACOST]

[ OCRASM][1520055008]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge

ORA-15077: could not locate ASM instance serving a required diskgroup

2013-11-05 15:58:19.097: [ OCRASM][1520055008]proprasmo: kgfoCheckMount returned [7]

2013-11-05 15:58:19.097: [ OCRASM][1520055008]proprasmo: The ASM instance is down

2013-11-05 15:58:19.097: [ OCRRAW][1520055008]proprioo: Failed to open [+DATACOST]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.

2013-11-05 15:58:19.097: [ OCRRAW][1520055008]proprioo: No OCR/OLR devices are usable

2013-11-05 15:58:19.097: [ OCRASM][1520055008]proprasmcl: asmhandle is NULL

2013-11-05 15:58:19.097: [ OCRRAW][1520055008]proprinit: Could not open raw device

2013-11-05 15:58:19.097: [ OCRASM][1520055008]proprasmcl: asmhandle is NULL

2013-11-05 15:58:19.097: [ OCRAPI][1520055008]a_init:16!: Backend init unsuccessful : [26]

2013-11-05 15:58:19.097: [ CRSOCR][1520055008] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge

ORA-15077: could not locate ASM instance serving a required diskgroup

] [7]

2013-11-05 15:58:19.097: [ CRSD][1520055008][PANIC] CRSD exiting: Could not init OCR, code: 26

2013-11-05 15:58:19.098: [ CRSD][1520055008] Done.

trying to start ASM manually (i have one user for grid and db – oracle, otherwise login to your grid owner).

su – oracle

export ORACLE_HOME=$GRID_HOME

export ORACLE_SID=+ASM2

asmcmd

amdcmd> startup

ORA-27154: post/wait create failed

ORA-27300: OS system dependent operation:semget failed with status: 28

ORA-27301: OS failure message: No space left on device

ORA-27302: failure occurred at: sskgpsemsper

This related to kernel parameter settings, if you have free space on your node. Check it with df -h.

-- Solution:

1. Edit /etc/sysctl.conf file and modify semmni value to:

kernel.sem = 250 32000 100 200

2. reboot the node.

All must work fine, but if you will not see crsd started again, login to grid owner user and try to start ASM instance manually.

You will see different error, not related to free space.

---------------

Solution 2:

---------------

Problem was in ASM parameter file:

01.+ASM1.asm_diskgroups='ARCH','REDO','OCR','VOTE'#Manual Mount

02.+ASM2.asm_diskgroups='REDO','VOTE','ARCH'#Manual Mount

+ASM1.asm_diskgroups='ARCH','REDO','OCR','VOTE'#Manual Mount

+ASM2.asm_diskgroups='REDO','VOTE','ARCH'#Manual Mount

I added missing 'OCR' to +ASM2.asm_diskgroups parameter. After physically rebooted both machines. And everything is OK.

01.[root@faa5 tmp]# crsctl check crs 02.CRS-4638: Oracle High Availability Services is online 03.CRS-4537: Cluster Ready Services is online 04.CRS-4529: Cluster Synchronization Services is online 05.CRS-4533: Event Manager is online 06.[root@faa5 tmp]# 07.[root@faa5 tmp]# 08.[root@faa5 tmp]# srvctl status database -d FAARACDB 09.Instance FAARACDB1 is running on node faa4 10.Instance FAARACDB2 is running on node faa5 11.[root@faa5 tmp]# [root@faa5 tmp]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

[root@faa5 tmp]#

[root@faa5 tmp]# srvctl status database -d FAARACDB

Instance FAARACDB1 is running on node faa4

Instance FAARACDB2 is running on node faa5

[root@faa5 tmp]#

---------------

Solution 3:

---------------

Execute script on those node which is not start.

root.sh script execute in RAC2 node.

Reference:

http://oraclegurukul.blogspot.in/2016/01/11g-r2-rac-startup-issues.html

Insights and Strategies for DBA

Labels

Wednesday, 6 January 2016

ORA-15077 Has a disk HB but no network HB

No comments: