Wednesday 6 January 2016

ORA-15077 Has a disk HB but no network HB

Few days ago i have face some issue in RAC below the full explanation the issue ,detection and resolution. 

---------------
Problem : 
---------------

At the time of starting Grid Infrastructure facing issue.
Vice-Versa Issue At a time only single node work and 2nd node all services started but ora.crsd and ora.asm not started.


---------------
 NODE 1:
---------------

[root@rac1 ~]# crsctl stat res -init -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rac1                     Started
ora.crsd
      1        ONLINE  ONLINE       rac1
ora.cssd
      1        ONLINE  ONLINE       rac1
ora.cssdmonitor
      1        ONLINE  ONLINE       rac1
ora.ctssd
      1        ONLINE  ONLINE       rac1                     ACTIVE:0
ora.diskmon
      1        ONLINE  ONLINE       rac1
ora.evmd
      1        ONLINE  ONLINE       rac1
ora.gipcd
      1        ONLINE  ONLINE       rac1
ora.gpnpd
      1        ONLINE  ONLINE       rac1
ora.mdnsd
      1        ONLINE  ONLINE       rac1



--- /app/gridHome/log/rac1/alertrac1.log

2015-12-28 21:14:01.314
[ohasd(5082)]CRS-2112:The OLR service started on node rac1.
2015-12-28 21:14:02.084
[ohasd(5082)]CRS-8011:reboot advisory message from host: rac1, component: ag163608, with time stamp: L-2015-12-25-17:13:18.517
[ohasd(5082)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS
2015-12-28 21:14:02.368
[ohasd(5082)]CRS-8011:reboot advisory message from host: rac1, component: ag203044, with time stamp: L-2015-12-26-22:10:09.463
[ohasd(5082)]CRS-8013:reboot advisory message text: Rebooting after limit 760 exceeded; disk timeout 710, network timeout 760, last heartbeat from CSSD at epoch seconds 1451148008.203, 1251 milliseconds ago based on invariant clock value of 5249663
2015-12-28 21:14:02.785
[ohasd(5082)]CRS-8011:reboot advisory message from host: rac1, component: mo163608, with time stamp: L-2015-12-25-17:13:18.517
[ohasd(5082)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS
2015-12-28 21:14:02.936
[ohasd(5082)]CRS-8017:location: /etc/oracle/lastgasp has 34 reboot advisory log files, 3 were announced and 0 errors occurred
2015-12-28 21:14:33.180
[ohasd(5082)]CRS-2772:Server 'rac1' has been assigned to pool 'Free'.
2015-12-28 21:14:36.486
[ohasd(5082)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2015-12-28 21:14:44.368
[cssd(5347)]CRS-1713:CSSD daemon is started in clustered mode
2015-12-28 21:15:00.603
[cssd(5347)]CRS-1707:Lease acquisition for node rac1 number 1 completed
2015-12-28 21:15:00.707
[cssd(5347)]CRS-1605:CSSD voting file is online: ORCL:OCR; details in /app/gridHome/log/rac1/cssd/ocssd.log.
2015-12-28 21:15:18.881
[cssd(5347)]CRS-1625:Node rac2, number 2, was manually shut down
2015-12-28 21:15:28.300
[cssd(5347)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 .
2015-12-28 21:15:31.235
[ctssd(5410)]CRS-2407:The new Cluster Time Synchronization Service reference node is host rac1.
2015-12-28 21:15:31.337
[ctssd(5410)]CRS-2401:The Cluster Time Synchronization Service started on host rac1.
2015-12-28 21:16:40.946
[crsd(5644)]CRS-1012:The OCR service started on node rac1.
2015-12-28 21:16:46.469
[crsd(5644)]CRS-1201:CRSD started on node rac1.
2015-12-28 21:16:55.272
[crsd(5644)]CRS-2772:Server 'rac1' has been assigned to pool 'Generic'.
2015-12-28 21:16:55.273
[crsd(5644)]CRS-2772:Server 'rac1' has been assigned to pool 'ora.racdb'.



--- /app/gridHome/log/rac1/crsd/crsd.log

2015-12-28 21:17:01.730: [UiServer][2572688272] Sending message to PE. ctx= 0xb58edc0
2015-12-28 21:17:01.730: [UiServer][2572688272] processMessage called
2015-12-28 21:17:01.730: [UiServer][2572688272] Sending message to PE. ctx= 0xb82c1a8
2015-12-28 21:17:01.731: [   CRSPE][2576890768] Cmd : 0x9900cc70 : flags: EVENT_TAG | HOST_TAG
2015-12-28 21:17:01.731: [   CRSPE][2576890768] Processing PE command id=115. Description: [Start Resource : 0x9900cc70]
2015-12-28 21:17:01.731: [   CRSPE][2576890768] Attribute overrides for the command: USR_ORA_OPI = true;
2015-12-28 21:17:01.735: [   CRSPE][2576890768] PE Command [ Start Resource : 0x9900cc70 ] has completed
2015-12-28 21:17:01.735: [   CRSPE][2576890768] UI Command [Start Resource : 0x9900cc70] is replying to sender.
2015-12-28 21:17:01.735: [   CRSPE][2576890768] Processing PE command id=116. Description: [Stat Resource : 0x9900cc70]
2015-12-28 21:17:01.737: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed
2015-12-28 21:17:01.737: [   CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.
2015-12-28 21:17:01.738: [UiServer][2572688272] Container [ Name: ORDER
        MESSAGE:
        TextMessage[CRS-2545: Cannot operate on 'instance of ora.LISTENER.lsnr assigned to rac1'. It is locked by 'SYSTEM' for command 'Resource Autostart : rac1']
        MSGTYPE:
        TextMessage[1]
        OBJID:
        TextMessage[ora.LISTENER.lsnr]
        WAIT:
        TextMessage[0]
]
2015-12-28 21:17:01.738: [ COMMCRS][2572688272]clscsendx: (0xb7b25f8) Connection not active

2015-12-28 21:17:01.738: [UiServer][2572688272] S(0xb56cfb8): Error sending msg over socket.6
2015-12-28 21:17:01.738: [UiServer][2572688272] Fatal exception sending reply back to client.FatalCommsException : Failed to send response to client.
(File: clsMessageStream.cpp, line: 275

2015-12-28 21:17:01.739: [UiServer][2572688272] Container [ Name: UI_DATA
        ora.LISTENER.lsnr:
        TextMessage[233]
]
2015-12-28 21:17:01.739: [UiServer][2572688272] S(0xb56cfb8): No connection to client.6
2015-12-28 21:17:01.739: [UiServer][2572688272] Fatal exception sending reply back to client.FatalCommsException : Failed to send response to client.
(File: clsMessageStream.cpp, line: 275

2015-12-28 21:17:01.739: [UiServer][2572688272] Done for ctx=0xb58edc0
2015-12-28 21:17:01.743: [UiServer][2572688272] Done for ctx=0xb82c1a8
2015-12-28 21:17:01.858: [UiServer][2570587024] S(0xb7b2570): set Properties ( grid,0xb7a1f00)
2015-12-28 21:17:01.875: [UiServer][2572688272] processMessage called
2015-12-28 21:17:01.876: [UiServer][2572688272] Sending message to PE. ctx= 0xb7a2538
2015-12-28 21:17:01.877: [   CRSPE][2576890768] Processing PE command id=117. Description: [Stat Resource : 0x9900cc70]
2015-12-28 21:17:01.881: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed
2015-12-28 21:17:01.882: [   CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.
2015-12-28 21:17:01.886: [UiServer][2572688272] Done for ctx=0xb7a2538
2015-12-28 21:17:01.922: [UiServer][2570587024] S(0xb6e09b0): set Properties ( grid,0xb7b8fb0)
2015-12-28 21:17:01.934: [UiServer][2572688272] processMessage called
2015-12-28 21:17:01.934: [UiServer][2572688272] Sending message to PE. ctx= 0xb7b25f8
2015-12-28 21:17:01.934: [   CRSPE][2576890768] Processing PE command id=118. Description: [Stat Resource : 0x9900cc70]
2015-12-28 21:17:01.936: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed
2015-12-28 21:17:01.937: [   CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.
2015-12-28 21:17:01.938: [UiServer][2572688272] Done for ctx=0xb7b25f8
2015-12-28 21:17:01.947: [UiServer][2570587024] S(0xb79fe88): set Properties ( grid,0xb7a24e8)
2015-12-28 21:17:01.958: [UiServer][2572688272] processMessage called
2015-12-28 21:17:01.958: [UiServer][2572688272] Sending message to PE. ctx= 0xb70b938
2015-12-28 21:17:01.959: [   CRSPE][2576890768] Processing PE command id=119. Description: [Stat Resource : 0x9900cc70]
2015-12-28 21:17:01.960: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed
2015-12-28 21:17:01.960: [   CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.
2015-12-28 21:17:01.961: [UiServer][2572688272] Done for ctx=0xb70b938
2015-12-28 21:17:02.224: [    AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:555 from the agent /app/gridHome/bin/oraagent_grid
2015-12-28 21:17:02.224: [    AGFW][2587397008] Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:550
2015-12-28 21:17:02.225: [   CRSPE][2576890768] Received reply to action [Start] message ID: 550
2015-12-28 21:17:02.371: [    AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:564 from the agent /app/gridHome/bin/oraagent_grid
2015-12-28 21:17:02.371: [    AGFW][2587397008] Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:563
2015-12-28 21:17:02.371: [   CRSPE][2576890768] Received reply to action [Start] message ID: 563
2015-12-28 21:17:02.374: [    AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:555 from the agent /app/gridHome/bin/oraagent_grid
2015-12-28 21:17:02.374: [    AGFW][2587397008] Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:550
2015-12-28 21:17:02.375: [   CRSPE][2576890768] Received reply to action [Start] message ID: 550
2015-12-28 21:17:02.375: [   CRSPE][2576890768] RI [ora.LISTENER_SCAN1.lsnr 1 1] new external state [ONLINE] old value: [OFFLINE] on rac1 label = []
2015-12-28 21:17:02.376: [   CRSPE][2576890768] CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'rac1' succeeded

2015-12-28 21:17:02.377: [  CRSRPT][2574789520] Publishing event: Cluster Resource State Change Event for ora.LISTENER_SCAN1.lsnr:rac1 : 0xb7b93a8
2015-12-28 21:17:02.377: [  CRSRPT][2574789520] FPE: Publisher not ready.
2015-12-28 21:17:02.459: [    AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:564 from the agent /app/gridHome/bin/oraagent_grid
2015-12-28 21:17:02.459: [    AGFW][2587397008] Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:563
2015-12-28 21:17:02.459: [   CRSPE][2576890768] Received reply to action [Start] message ID: 563
2015-12-28 21:17:02.459: [   CRSPE][2576890768] RI [ora.LISTENER.lsnr rac1 1] new external state [ONLINE] old value: [OFFLINE] on rac1 label = []
2015-12-28 21:17:02.460: [   CRSPE][2576890768] CRS-2676: Start of 'ora.LISTENER.lsnr' on 'rac1' succeeded

2015-12-28 21:17:02.460: [   CRSPE][2576890768] PE Command [ Resource State Change (ora.net1.network rac1 1) : 0x99004ff8 ] has completed
2015-12-28 21:17:02.462: [  CRSRPT][2574789520] Publishing event: Cluster Resource State Change Event for ora.LISTENER.lsnr:rac1 : 0xb7a0208
2015-12-28 21:17:02.462: [  CRSRPT][2574789520] FPE: Publisher not ready.
2015-12-28 21:17:02.462: [    AGFW][2587397008] Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:620
2015-12-28 21:17:02.462: [    AGFW][2587397008] Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:620
2015-12-28 21:17:02.513: [   CRSPE][2576890768] Re-evaluation of queued op [START of [ora.LISTENER_SCAN1.lsnr 1 1] on [rac1] : 0x99005308]. found it no longer needed:CRS-0223: Resource 'ora.LISTENER_SCAN1.lsnr' has placement error.
. Finishing the op.
2015-12-28 21:17:02.514: [   CRSPE][2576890768] Re-evaluation of queued op [START of [ora.ons rac1 1] on [rac1] : 0x99005810]. found it no longer needed:CRS-0223: Resource 'ora.ons' has placement error.
. Finishing the op.
2015-12-28 21:17:02.516: [   CRSPE][2576890768] Re-evaluation of queued op [START of [ora.rac2.vip 1 1] on [rac1] : 0x99007e48]. found it no longer needed:CRS-0223: Resource 'ora.rac2.vip' has placement error.
. Finishing the op.
2015-12-28 21:17:02.517: [   CRSPE][2576890768] Re-evaluation of queued op [START of [ora.LISTENER.lsnr rac1 1] on [rac1] : 0x99008fd0]. found it no longer needed:CRS-0223: Resource 'ora.LISTENER.lsnr' has placement error.
. Finishing the op.
2015-12-28 21:17:08.729: [    AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.eons rac1 1] ID 4098:508 from the agent /app/gridHome/bin/oraagent_grid
2015-12-28 21:17:08.729: [    AGFW][2587397008] Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.eons rac1 1] ID 4098:507
2015-12-28 21:17:08.729: [   CRSPE][2576890768] Received reply to action [Start] message ID: 507
2015-12-28 21:17:08.730: [    AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.eons rac1 1] ID 4098:508 from the agent /app/gridHome/bin/oraagent_grid
2015-12-28 21:17:08.730: [    AGFW][2587397008] Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.eons rac1 1] ID 4098:507
2015-12-28 21:17:08.730: [   CRSPE][2576890768] Received reply to action [Start] message ID: 507
2015-12-28 21:17:08.731: [   CRSPE][2576890768] RI [ora.eons rac1 1] new external state [ONLINE] old value: [OFFLINE] on rac1 label = []
2015-12-28 21:17:08.731: [   CRSPE][2576890768] CRS-2676: Start of 'ora.eons' on 'rac1' succeeded

2015-12-28 21:17:08.731: [   CRSPE][2576890768] PE Command [ Server Join: Resource Startup {rac1 } : 0x99004188 ] has completed
2015-12-28 21:17:08.732: [   CRSPE][2576890768] Updated state details for server rac1from [AUTOSTARTING RESOURCES] to : []
2015-12-28 21:17:08.732: [  CRSRPT][2574789520] Publishing event: Cluster Resource State Change Event for ora.eons:rac1 : 0xb733660
2015-12-28 21:17:08.732: [  CRSRPT][2574789520] FPE: Publisher not ready.
2015-12-28 21:17:08.733: [  CRSOCR][2585295760] Multi Write Batch processing...
2015-12-28 21:17:08.733: [    AGFW][2587397008] Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:640
2015-12-28 21:17:08.733: [    AGFW][2587397008] Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:640
2015-12-28 21:17:08.733: [  CRSRPT][2574789520] Reporter skipping ServerStateChange event to EVM
2015-12-28 21:17:08.733: [  CRSRPT][2574789520] Publishing event: Cluster Server State Change Event for rac1becoming ONLINE : 0xb7b20e0
2015-12-28 21:17:08.733: [  CRSRPT][2574789520] FPE: Publisher not ready.
2015-12-28 21:17:08.775: [   CRSPE][2576890768] Resource Autostart completed for rac1
2015-12-28 21:17:08.775: [  CRSOCR][2585295760] Multi Write Batch done.
2015-12-28 21:17:09.774: [  CRSRPT][2574789520] Publisher initialized flushing buffered events.
2015-12-28 21:17:09.774: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7db288
2015-12-28 21:17:09.774: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7a7aa0
2015-12-28 21:17:09.774: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb644228
2015-12-28 21:17:09.774: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7b00d8
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7ab9e0
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb79d6d8
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7a9818
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7b0178
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb531e08
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7ab2a0
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7addb0
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb73a9f8
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb70a520
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7af480
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7336e8
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb6bd5d0
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7b93a8
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7a0208
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb733660
2015-12-28 21:17:09.776: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7b20e0
2015-12-28 21:17:52.973: [UiServer][2570587024] S(0xb7b80b8): set Properties ( grid,0xb6df8b8)
2015-12-28 21:17:52.985: [UiServer][2572688272] processMessage called
2015-12-28 21:17:52.985: [UiServer][2572688272] Sending message to PE. ctx= 0xb70a5b8
2015-12-28 21:17:52.985: [   CRSPE][2576890768] Processing PE command id=120. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:17:52.987: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:17:52.987: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:17:52.993: [UiServer][2572688272] Done for ctx=0xb70a5b8
2015-12-28 21:17:52.996: [UiServer][2570587024] S(0xb7bae90): set Properties ( grid,0xb70cdf0)
2015-12-28 21:17:53.008: [UiServer][2572688272] processMessage called
2015-12-28 21:17:53.008: [UiServer][2572688272] Sending message to PE. ctx= 0xb7b9ee0
2015-12-28 21:17:53.008: [   CRSPE][2576890768] Processing PE command id=121. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:17:53.010: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:17:53.010: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:17:53.014: [UiServer][2572688272] Done for ctx=0xb7b9ee0
2015-12-28 21:17:53.021: [UiServer][2570587024] S(0xb70d018): set Properties ( grid,0xb6df8b8)
2015-12-28 21:17:53.033: [UiServer][2572688272] processMessage called
2015-12-28 21:17:53.033: [UiServer][2572688272] Sending message to PE. ctx= 0xb70a5b8
2015-12-28 21:17:53.033: [   CRSPE][2576890768] Processing PE command id=122. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:17:53.035: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:17:53.035: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:17:53.064: [UiServer][2572688272] Done for ctx=0xb70a5b8
2015-12-28 21:17:53.065: [    AGFW][2587397008] Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:198
2015-12-28 21:17:53.065: [    AGFW][2587397008] Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:198
2015-12-28 21:20:34.067: [UiServer][2570587024] S(0xb7078c8): set Properties ( root,0xb6df8b8)
2015-12-28 21:20:34.078: [UiServer][2572688272] processMessage called
2015-12-28 21:20:34.079: [UiServer][2572688272] Sending message to PE. ctx= 0xb6e1bd0
2015-12-28 21:20:34.079: [   CRSPE][2576890768] Processing PE command id=123. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:20:34.080: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:20:34.080: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:20:34.083: [UiServer][2572688272] Done for ctx=0xb6e1bd0
2015-12-28 21:25:34.156: [UiServer][2570587024] S(0xb70e9c0): set Properties ( root,0xb70cdf0)
2015-12-28 21:25:34.168: [UiServer][2572688272] processMessage called
2015-12-28 21:25:34.168: [UiServer][2572688272] Sending message to PE. ctx= 0xb6df8c8
2015-12-28 21:25:34.170: [   CRSPE][2576890768] Processing PE command id=124. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:25:34.171: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:25:34.171: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:25:34.172: [UiServer][2572688272] Done for ctx=0xb6df8c8
2015-12-28 21:30:34.209: [UiServer][2570587024] S(0xb7a35f8): set Properties ( root,0xb70cdf0)
2015-12-28 21:30:34.220: [UiServer][2572688272] processMessage called
2015-12-28 21:30:34.221: [UiServer][2572688272] Sending message to PE. ctx= 0xb736340
2015-12-28 21:30:34.222: [   CRSPE][2576890768] Processing PE command id=125. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:30:34.222: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:30:34.223: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:30:34.226: [UiServer][2572688272] Done for ctx=0xb736340
2015-12-28 21:35:34.276: [UiServer][2570587024] S(0xb70d468): set Properties ( root,0xb73ca58)
2015-12-28 21:35:34.288: [UiServer][2572688272] processMessage called
2015-12-28 21:35:34.289: [UiServer][2572688272] Sending message to PE. ctx= 0xb6dfd38
2015-12-28 21:35:34.290: [   CRSPE][2576890768] Processing PE command id=126. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:35:34.291: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:35:34.291: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:35:34.292: [UiServer][2572688272] Done for ctx=0xb6dfd38
2015-12-28 21:40:34.542: [UiServer][2570587024] S(0xb73bc60): set Properties ( root,0xb70e238)
2015-12-28 21:40:34.554: [UiServer][2572688272] processMessage called
2015-12-28 21:40:34.554: [UiServer][2572688272] Sending message to PE. ctx= 0xb739c30
2015-12-28 21:40:34.555: [   CRSPE][2576890768] Processing PE command id=127. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:40:34.555: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:40:34.555: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:40:34.557: [UiServer][2572688272] Done for ctx=0xb739c30
2015-12-28 21:45:34.632: [UiServer][2570587024] S(0xb73cb78): set Properties ( root,0xb7b8000)
2015-12-28 21:45:34.644: [UiServer][2572688272] processMessage called
2015-12-28 21:45:34.644: [UiServer][2572688272] Sending message to PE. ctx= 0xb6b81a8
2015-12-28 21:45:34.645: [   CRSPE][2576890768] Processing PE command id=128. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:45:34.645: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:45:34.646: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:45:34.650: [UiServer][2572688272] Done for ctx=0xb6b81a8
2015-12-28 21:50:34.776: [UiServer][2570587024] S(0xb73bc60): set Properties ( root,0xb7b8000)
2015-12-28 21:50:34.793: [UiServer][2572688272] processMessage called
2015-12-28 21:50:34.794: [UiServer][2572688272] Sending message to PE. ctx= 0xb6e3ee8
2015-12-28 21:50:34.795: [   CRSPE][2576890768] Processing PE command id=129. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:50:34.795: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:50:34.796: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:50:34.807: [UiServer][2572688272] Done for ctx=0xb6e3ee8
2015-12-28 21:55:34.772: [UiServer][2570587024] S(0xb73bc60): set Properties ( root,0xb7b8000)
2015-12-28 21:55:34.785: [UiServer][2572688272] processMessage called
2015-12-28 21:55:34.785: [UiServer][2572688272] Sending message to PE. ctx= 0xb7af430
2015-12-28 21:55:34.786: [   CRSPE][2576890768] Processing PE command id=130. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:55:34.787: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:55:34.787: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:55:34.788: [UiServer][2572688272] Done for ctx=0xb7af430



--- /app/gridHome/log/rac1/cssd/ocssd.log

2015-12-28 21:58:52.166: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:58:55.061: [    CSSD][2998131600]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:OCR sched delay 1840 > margin 1500 cur_ms 1931954 lastalive 1930114
2015-12-28 21:58:55.061: [    CSSD][2998131600]clssnmvSchedDiskThreads: KillBlockThread for voting file ORCL:OCR sched delay 1830 > margin 1500  cur_ms 1931954 lastalive 1930124
2015-12-28 21:58:57.059: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:58:57.059: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:01.068: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:01.068: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:05.079: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:05.079: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:09.091: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:09.091: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:13.099: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:13.099: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:18.110: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:18.110: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes
2015-12-28 21:59:23.120: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:23.120: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes
2015-12-28 21:59:28.131: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:28.131: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes
2015-12-28 21:59:33.140: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:33.140: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes
2015-12-28 21:59:37.150: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:37.150: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:41.158: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:41.158: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:45.168: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:45.168: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:49.176: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:49.177: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:53.186: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:53.186: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:57.195: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:57.195: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 22:00:01.206: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 22:00:01.206: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 22:00:06.217: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 22:00:06.217: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes
2015-12-28 22:00:11.228: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 22:00:11.228: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes
2015-12-28 22:00:16.239: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 22:00:16.239: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes
2015-12-28 22:00:21.249: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 22:00:21.249: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes


--- /app/oracleBase/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log

Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Real Application Clusters and Automatic Storage Management options.
Using parameter settings in server-side spfile +OCR/rac-cluster/asmparameterfile/registry.253.866216815
System parameters with non-default values:
  large_pool_size          = 12M
  instance_type            = "asm"
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "ORCL:*"
  asm_diskgroups           = "FRA"
  asm_diskgroups           = "DATA"
  asm_power_limit          = 1
  diagnostic_dest          = "/app/oracleBase"
Cluster communication is configured to use the following interface(s) for this instance
  192.168.1.151
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Mon Dec 28 21:16:07 2015
PMON started with pid=2, OS id=5521
Mon Dec 28 21:16:08 2015
VKTM started with pid=3, OS id=5525 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
Mon Dec 28 21:16:08 2015
GEN0 started with pid=4, OS id=5531
Mon Dec 28 21:16:08 2015
DIAG started with pid=5, OS id=5535
Mon Dec 28 21:16:08 2015
PING started with pid=6, OS id=5539
Mon Dec 28 21:16:08 2015
PSP0 started with pid=7, OS id=5543
Mon Dec 28 21:16:10 2015
DIA0 started with pid=8, OS id=5547
Mon Dec 28 21:16:11 2015
LMON started with pid=9, OS id=5552
Mon Dec 28 21:16:11 2015
LMD0 started with pid=10, OS id=5556
Mon Dec 28 21:16:11 2015
LMS0 started with pid=11, OS id=5560 at elevated priority
Mon Dec 28 21:16:12 2015
LMHB started with pid=12, OS id=5566
Mon Dec 28 21:16:12 2015
MMAN started with pid=13, OS id=5571
Mon Dec 28 21:16:12 2015
LGWR started with pid=15, OS id=5579
Mon Dec 28 21:16:12 2015
DBW0 started with pid=14, OS id=5575
Mon Dec 28 21:16:12 2015
CKPT started with pid=16, OS id=5583
Mon Dec 28 21:16:13 2015
SMON started with pid=17, OS id=5587
Mon Dec 28 21:16:13 2015
GMON started with pid=19, OS id=5597
Mon Dec 28 21:16:13 2015
RBAL started with pid=18, OS id=5591
Mon Dec 28 21:16:13 2015
MMON started with pid=20, OS id=5601
Mon Dec 28 21:16:13 2015
MMNL started with pid=21, OS id=5605
lmon registered with NM - instance number 1 (internal mem no 0)
Reconfiguration started (old inc 0, new inc 2)
ASM instance
List of instances:
 1 (myinst: 1)
 Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Mon Dec 28 21:16:16 2015
LCK0 started with pid=22, OS id=5615
Mon Dec 28 21:16:18 2015
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /app/oracleBase
Mon Dec 28 21:16:19 2015
SQL> ALTER DISKGROUP ALL MOUNT /* asm agent */
NOTE: Diskgroups listed in ASM_DISKGROUPS are
         FRA
         DATA
NOTE: Diskgroup used for Voting files is:
         OCR
Diskgroup with spfile:OCR
Diskgroup used for OCR is:OCR
NOTE: cache registered group DATA number=1 incarn=0x6188a14e
NOTE: cache began mount (first) of group DATA number=1 incarn=0x6188a14e
NOTE: cache registered group FRA number=2 incarn=0x6198a14f
NOTE: cache began mount (first) of group FRA number=2 incarn=0x6198a14f
NOTE: cache registered group OCR number=3 incarn=0x6198a150
NOTE: cache began mount (first) of group OCR number=3 incarn=0x6198a150
NOTE: Loaded library: /opt/oracle/extapi/32/asm/orcl/1/libasm.so
NOTE: Assigning number (1,0) to disk (ORCL:DATA)
NOTE: Assigning number (2,0) to disk (ORCL:FRA)
NOTE: Assigning number (3,0) to disk (ORCL:OCR)
Mon Dec 28 21:16:25 2015
NOTE: start heartbeating (grp 1)
kfdp_query(DATA): 5
kfdp_queryBg(): 5
NOTE: cache opening disk 0 of grp 1: DATA label:DATA
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 1/0x6188A14E (DATA)
Mon Dec 28 21:16:27 2015
* allocate domain 1, invalid = TRUE
Mon Dec 28 21:16:28 2015
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=6.266 group=1 (DATA)
NOTE: advancing ckpt for thread=1 ckpt=6.266
NOTE: cache recovered group 1 to fcn 0.3427
Mon Dec 28 21:16:28 2015
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATA)
NOTE: LGWR found thread 1 closed at ABA 6.265
NOTE: LGWR mounted thread 1 for diskgroup 1 (DATA)
NOTE: LGWR opening thread 1 at fcn 0.3427 ABA 7.266
NOTE: cache mounting group 1/0x6188A14E (DATA) succeeded
NOTE: cache ending mount (success) of group DATA number=1 incarn=0x6188a14e
NOTE: start heartbeating (grp 2)
kfdp_query(FRA): 7
kfdp_queryBg(): 7
NOTE: cache opening disk 0 of grp 2: FRA label:FRA
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 2/0x6198A14F (FRA)
* allocate domain 2, invalid = TRUE
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=6.183 group=2 (FRA)
NOTE: advancing ckpt for thread=1 ckpt=6.183
NOTE: cache recovered group 2 to fcn 0.1827
NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (FRA)
NOTE: LGWR found thread 1 closed at ABA 6.182
NOTE: LGWR mounted thread 1 for diskgroup 2 (FRA)
NOTE: LGWR opening thread 1 at fcn 0.1827 ABA 7.183
NOTE: cache mounting group 2/0x6198A14F (FRA) succeeded
NOTE: cache ending mount (success) of group FRA number=2 incarn=0x6198a14f
NOTE: start heartbeating (grp 3)
kfdp_query(OCR): 9
kfdp_queryBg(): 9
NOTE: cache opening disk 0 of grp 3: OCR label:OCR
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 3/0x6198A150 (OCR)
* allocate domain 3, invalid = TRUE
NOTE: attached to recovery domain 3
Mon Dec 28 21:16:29 2015
NOTE: starting recovery of thread=1 ckpt=16.68 group=3 (OCR)
NOTE: advancing ckpt for thread=1 ckpt=16.68
NOTE: cache recovered group 3 to fcn 0.790
NOTE: LGWR attempting to mount thread 1 for diskgroup 3 (OCR)
NOTE: LGWR found thread 1 closed at ABA 16.67
NOTE: LGWR mounted thread 1 for diskgroup 3 (OCR)
NOTE: LGWR opening thread 1 at fcn 0.790 ABA 17.68
NOTE: cache mounting group 3/0x6198A150 (OCR) succeeded
NOTE: cache ending mount (success) of group OCR number=3 incarn=0x6198a150
Mon Dec 28 21:16:29 2015
kfdp_query(DATA): 10
kfdp_queryBg(): 10
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup DATA was mounted
kfdp_query(FRA): 11
kfdp_queryBg(): 11
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup FRA was mounted
kfdp_query(OCR): 12
kfdp_queryBg(): 12
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 3
SUCCESS: diskgroup OCR was mounted
SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent */
SQL> ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */
SUCCESS: ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */
Mon Dec 28 21:16:31 2015
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
WARNING: failed to online diskgroup resource ora.FRA.dg (unable to communicate with CRSD/OHASD)
WARNING: failed to online diskgroup resource ora.OCR.dg (unable to communicate with CRSD/OHASD)
Mon Dec 28 21:16:35 2015
Starting background process ASMB
Mon Dec 28 21:16:35 2015
ASMB started with pid=24, OS id=5664
Mon Dec 28 21:17:53 2015
ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))))' SCOPE=MEMORY SID='+ASM1';



-------------
NODE: 2
-------------

[root@rac2 ~]# crsctl stat res -init -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE
ora.crsd
      1        ONLINE  OFFLINE
ora.cssd
      1        ONLINE  OFFLINE
ora.cssdmonitor
      1        ONLINE  ONLINE       rac2
ora.ctssd
      1        ONLINE  OFFLINE
ora.diskmon
      1        ONLINE  ONLINE       rac2
ora.evmd
      1        ONLINE  OFFLINE
ora.gipcd
      1        ONLINE  ONLINE       rac2
ora.gpnpd
      1        ONLINE  ONLINE       rac2
ora.mdnsd
      1        ONLINE  ONLINE       rac2



---  OCSSD.LOG

2015-12-27 00:09:03.052: [    CSSD][2888805264]clssnmLocalJoinEvent: begin on node(2), waittime 193000
2015-12-27 00:09:03.052: [    CSSD][2888805264]clssnmLocalJoinEvent: set curtime (4294388080) for my node
2015-12-27 00:09:03.052: [    CSSD][2888805264]clssnmLocalJoinEvent: scanning 32 nodes
2015-12-27 00:09:03.052: [    CSSD][2888805264]clssnmLocalJoinEvent: Node rac1, number 1, is in an existing cluster with disk state 3
2015-12-27 00:09:03.053: [    CSSD][2888805264]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
2015-12-27 00:09:03.572: [    CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 61997, LATS 4294388600, lastSeqNo 61997, uniqueness 1451148676, timestamp 1451155163/6166804
2015-12-27 00:09:04.052: [    CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2015-12-27 00:09:04.581: [    CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 61998, LATS 4294389610, lastSeqNo 61998, uniqueness 1451148676, timestamp 1451155164/6167814
2015-12-27 00:09:04.955: [    CSSD][2899295120]clssnmSendingThread: sending join msg to all nodes
2015-12-27 00:09:04.955: [    CSSD][2899295120]clssnmSendingThread: sent 5 join msgs to all nodes
2015-12-27 00:09:05.054: [    CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2015-12-27 00:09:05.514: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: [network]  failed connect attempt endp 0x9f21488 [00000000000002ec] { gipcEndpoint : localAddr 'gipc://rac2:2b90-7ba3-ca8e-6111#192.168.1.152#50593', remoteAddr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', numPend 0, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x80612, usrFlags 0x0 }, req 0x9f233e8 [00000000000002f5] { gipcConnectRequest : addr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', parentEndp 0x9f21488, ret gipcretConnection
2015-12-27 00:09:05.514: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos op  :  sgipcnTcpConnect
2015-12-27 00:09:05.514: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos dep :  Connection timed out (110)
2015-12-27 00:09:05.515: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos loc :  connect
2015-12-27 00:09:05.515: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos info:  addr '192.168.1.151:12874'
2015-12-27 00:09:05.515: [    CSSD][2878315408]clssscSelect: conn complete ctx 0x99a1c48 endp 0x2ec
2015-12-27 00:09:05.515: [    CSSD][2878315408]clssnmeventhndlr: node(1), endp(0x2ec) failed, probe((nil)) ninf->endp (0x2ec) CONNCOMPLETE
2015-12-27 00:09:05.515: [    CSSD][2878315408]clssnmDiscHelper: rac1, node(1) connection failed, endp (0x2ec), probe((nil)), ninf->endp 0x2ec
2015-12-27 00:09:05.515: [    CSSD][2878315408]clssnmDiscHelper: node 1 clean up, endp (0x2ec), init state 0, cur state 0
2015-12-27 00:09:05.515: [GIPCXCPT][2878315408]gipcInternalDissociate: obj 0x9f21488 [00000000000002ec] { gipcEndpoint : localAddr 'gipc://rac2:2b90-7ba3-ca8e-6111#192.168.1.152#50593', remoteAddr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)
2015-12-27 00:09:05.515: [GIPCXCPT][2878315408]gipcDissociateF [clssnmDiscHelper : clssnm.c : 3215]: EXCEPTION[ ret gipcretFail (1) ]  failed to dissociate obj 0x9f21488 [00000000000002ec] { gipcEndpoint : localAddr 'gipc://rac2:2b90-7ba3-ca8e-6111#192.168.1.152#50593', remoteAddr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 }, flags 0x0
2015-12-27 00:09:05.517: [    CSSD][2878315408]clssnmDiscEndp: gipcDestroy 0x2ec
2015-12-27 00:09:05.588: [    CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 61999, LATS 4294390620, lastSeqNo 61999, uniqueness 1451148676, timestamp 1451155165/6168814
2015-12-27 00:09:05.589: [    CSSD][2878315408]clssnmconnect: connecting to addr gipc://rac1:nm_rac-cluster#192.168.1.151#12874
2015-12-27 00:09:05.590: [    CSSD][2878315408]clssscConnect: endp 0x2f8 - cookie 0x99a1c48 - addr gipc://rac1:nm_rac-cluster#192.168.1.151#12874
2015-12-27 00:09:05.590: [    CSSD][2878315408]clssnmconnect: connecting to node(1), endp(0x2f8), flags 0x10002
2015-12-27 00:09:06.056: [    CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2015-12-27 00:09:06.597: [    CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 62000, LATS 4294391630, lastSeqNo 62000, uniqueness 1451148676, timestamp 1451155166/6169824
2015-12-27 00:09:07.058: [    CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2015-12-27 00:09:07.603: [    CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 62001, LATS 4294392630, lastSeqNo 62001, uniqueness 1451148676, timestamp 1451155167/6170834
2015-12-27 00:09:08.060: [    CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2015-12-27 00:09:08.611: [    CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 62002, LATS 4294393640, lastSeqNo 62002, uniqueness 1451148676, timestamp 1451155168/6171844
2015-12-27 00:09:09.062: [    CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0


--- CRSD.LOG

tail: cannot open `100' for reading: No such file or directory
==> crsd.log <==
2015-12-26 20:42:40.834: [  OCRASM][3046749904]proprasmcl: asmhandle is NULL
2015-12-26 20:42:40.834: [  OCRRAW][3046749904]proprinit: Could not open raw device
2015-12-26 20:42:40.834: [  OCRASM][3046749904]proprasmcl: asmhandle is NULL
2015-12-26 20:42:40.834: [  OCRAPI][3046749904]a_init:16!: Backend init unsuccessful : [26]
2015-12-26 20:42:40.835: [  CRSOCR][3046749904] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
ORA-15077: could not locate ASM instance serving a required diskgroup
] [7]
2015-12-26 20:42:40.835: [    CRSD][3046749904][PANIC] CRSD exiting: Could not init OCR, code: 26
2015-12-26 20:42:40.835: [    CRSD][3046749904] Done.



--- GRID ALERT.LOG

(9009)]CRS-5822:Agent '/app/gridHome/bin/cssdagent_root' disconnected from server. Details at (:CRSAGF00117:) in /app/gridHome/log/rac2/agent/ohasd/oracssdagent_root/oracssdagent_root.log.
2015-12-27 00:06:15.655
[ohasd(4740)]CRS-2112:The OLR service started on node rac2.
2015-12-27 00:06:20.613
[ohasd(4740)]CRS-8011:reboot advisory message from host: rac2, component: mo162335, with time stamp: L-2015-12-25-17:27:14.683
[ohasd(4740)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS
2015-12-27 00:06:20.926
[ohasd(4740)]CRS-8017:location: /etc/oracle/lastgasp has 44 reboot advisory log files, 1 were announced and 0 errors occurred
2015-12-27 00:06:43.252
[ohasd(4740)]CRS-2772:Server 'rac2' has been assigned to pool 'Free'.
2015-12-27 00:06:46.574
[ohasd(4740)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2015-12-27 00:06:51.107
[cssd(5034)]CRS-1713:CSSD daemon is started in clustered mode
2015-12-27 00:07:01.189
[cssd(5034)]CRS-1707:Lease acquisition for node rac2 number 2 completed
2015-12-27 00:07:01.268
[cssd(5034)]CRS-1605:CSSD voting file is online: ORCL:OCR; details in /app/gridHome/log/rac2/cssd/ocssd.log.


--- +ASM2.log

Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Real Application Clusters and Automatic Storage Management options.
Using parameter settings in server-side spfile +OCR/rac-cluster/asmparameterfile/registry.253.866216815
System parameters with non-default values:
  large_pool_size          = 12M
  instance_type            = "asm"
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskgroups           = ""
  asm_power_limit          = 1
  diagnostic_dest          = "/app/oracleBase"
Cluster communication is configured to use the following interface(s) for this instance
  192.168.1.152
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Sat Dec 26 22:11:17 2015
PMON started with pid=2, OS id=6243
Sat Dec 26 22:11:17 2015
VKTM started with pid=3, OS id=6247 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
Sat Dec 26 22:11:17 2015
GEN0 started with pid=4, OS id=6253
Sat Dec 26 22:11:17 2015
DIAG started with pid=5, OS id=6257
Sat Dec 26 22:11:18 2015
PING started with pid=6, OS id=6262
Sat Dec 26 22:11:18 2015
PSP0 started with pid=7, OS id=6267
Sat Dec 26 22:11:19 2015
DIA0 started with pid=8, OS id=6271
Sat Dec 26 22:11:19 2015
LMON started with pid=9, OS id=6276
Sat Dec 26 22:11:20 2015
LMD0 started with pid=10, OS id=6280
Sat Dec 26 22:11:20 2015
LMS0 started with pid=11, OS id=6284 at elevated priority
Sat Dec 26 22:11:21 2015
LMHB started with pid=12, OS id=6290
Sat Dec 26 22:11:21 2015
MMAN started with pid=13, OS id=6295
Sat Dec 26 22:11:22 2015
DBW0 started with pid=14, OS id=6299
Sat Dec 26 22:11:22 2015
LGWR started with pid=15, OS id=6303
Sat Dec 26 22:11:22 2015
SMON started with pid=17, OS id=6311
Sat Dec 26 22:11:23 2015
RBAL started with pid=18, OS id=6316
Sat Dec 26 22:11:23 2015
CKPT started with pid=16, OS id=6307
Sat Dec 26 22:11:23 2015
GMON started with pid=19, OS id=6320
Sat Dec 26 22:11:23 2015
MMON started with pid=20, OS id=6324
Sat Dec 26 22:11:23 2015
MMNL started with pid=21, OS id=6328
lmon registered with NM - instance number 2 (internal mem no 1)
Reconfiguration started (old inc 0, new inc 2)
ASM instance
List of instances:
 2 (myinst: 2)
 Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Sat Dec 26 22:11:25 2015
LCK0 started with pid=22, OS id=6337
Sat Dec 26 22:11:28 2015
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /app/oracleBase
Sat Dec 26 22:11:28 2015
SQL> ALTER DISKGROUP ALL MOUNT /* asm agent */
NOTE: Diskgroup used for Voting files is:
         OCR
Diskgroup with spfile:OCR
Diskgroup used for OCR is:OCR
NOTE: cache registered group OCR number=1 incarn=0x81c19341
NOTE: cache began mount (first) of group OCR number=1 incarn=0x81c19341
NOTE: Loaded library: /opt/oracle/extapi/32/asm/orcl/1/libasm.so
NOTE: Assigning number (1,0) to disk (ORCL:OCR)
Sat Dec 26 22:11:35 2015
NOTE: start heartbeating (grp 1)
kfdp_query(OCR): 3
kfdp_queryBg(): 3
NOTE: cache opening disk 0 of grp 1: OCR label:OCR
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 1/0x81C19341 (OCR)
Sat Dec 26 22:11:36 2015
* allocate domain 1, invalid = TRUE
Sat Dec 26 22:11:36 2015
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=14.64 group=1 (OCR)
NOTE: advancing ckpt for thread=1 ckpt=14.64
NOTE: cache recovered group 1 to fcn 0.780
Sat Dec 26 22:11:36 2015
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (OCR)
NOTE: LGWR found thread 1 closed at ABA 14.63
NOTE: LGWR mounted thread 1 for diskgroup 1 (OCR)
NOTE: LGWR opening thread 1 at fcn 0.780 ABA 15.64
NOTE: cache mounting group 1/0x81C19341 (OCR) succeeded
NOTE: cache ending mount (success) of group OCR number=1 incarn=0x81c19341
Sat Dec 26 22:11:36 2015
kfdp_query(OCR): 4
kfdp_queryBg(): 4
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup OCR was mounted
SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent */
SQL> ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */
SUCCESS: ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */
Sat Dec 26 22:11:38 2015
WARNING: failed to online diskgroup resource ora.OCR.dg (unable to communicate with CRSD/OHASD)
Sat Dec 26 22:18:24 2015
Detected partial connectivity in the cluster - IMR is being issued. This may cause some instances to be evicted
Evicting instance 1 from cluster
Reconfiguration started (old inc 2, new inc 4)
List of instances:
 2 (myinst: 2)
Sat Dec 26 22:18:25 2015
Shutting down instance (immediate)
Shutting down instance: further logons disabled
Stopping background process MMNL
 Global Resource Directory frozen
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Sat Dec 26 22:18:25 2015
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Submitted all GCS remote-cache requests
 Fix write in gcs resources
Reconfiguration complete
Stopping background process MMON
License high water mark = 1
SQL> ALTER DISKGROUP ALL DISMOUNT /* asm agent */
NOTE: cache dismounting (clean) group 1/0x81C19341 (OCR)
Sat Dec 26 22:18:26 2015
NOTE: LGWR doing clean dismount of group 1 (OCR)
NOTE: LGWR closing thread 1 of diskgroup 1 (OCR) at ABA 15.65
NOTE: LGWR released thread recovery enqueue
Sat Dec 26 22:18:26 2015
freeing rdom 1
Sat Dec 26 22:18:26 2015
NOTE: detached from domain 1
NOTE: cache dismounted group 1/0x81C19341 (OCR)
kfdp_dismount(): 5
Sat Dec 26 22:18:26 2015
kfdp_dismountBg(): 5
NOTE: De-assigning number (1,0) from disk (ORCL:OCR)
SUCCESS: diskgroup OCR was dismounted
NOTE: cache deleting context for group OCR 1/-2118020287
SUCCESS: ALTER DISKGROUP ALL DISMOUNT /* asm agent */
Reconfiguration started (old inc 4, new inc 6)
List of instances:
 2 (myinst: 2)
 Global Resource Directory frozen
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Submitted all GCS remote-cache requests
 Fix write in gcs resources
Reconfiguration complete
Sat Dec 26 22:18:30 2015
WARNING: failed to offline diskgroup resource ora.OCR.dg (unable to communicate with CRSD/OHASD)
Sat Dec 26 22:18:31 2015
Stopping background process VKTM:
freeing rdom 0
Sat Dec 26 22:18:38 2015
Instance shutdown complete

--------------------
SOLUTION: -
--------------------

In my case the problem is ssh and the solution is co-ordinate with OS admin and resolved the issue of SSH make coonection grid and oracle user vice-versa NODES.

--------------
Solution 4:
--------------
First Check ssh connection work or not , In that case also above the errors.

sshd issue grid to grid


Problem:-- ssh: connect to host xxx.xxx.x.xx port 22: Connection timed out


--- Port listening Check ,DNS also Check.
# ps -A | grep sshd # results in
# lanscan 
# netstat -in
# netstat -rn
# netstat -nat | grep 22
# netstat -ntl |grep :22
# sudo netstat -alnp|grep :22
# traceroute 10.230.xx.xx
# telnet 10.230.xx.xx 22
# reinstall ssh 


Some relevant Solution available when facing the issue starting grid infrastructure.

--------------
Solution 1:
--------------

Oracle 11.2 with ASM, crsd fail to start on second node after reboot.

Description:
OS rhel 5.8 x64 bit. Oracle 11.2 CRS fail to start on second node after reboot.
Errors:
crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
crsd.log:
2013-11-05 15:58:19.095: [  OCRASM][1520055008]proprasmo: Error in open/create file in dg [DATACOST]
[  OCRASM][1520055008]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
ORA-15077: could not locate ASM instance serving a required diskgroup
2013-11-05 15:58:19.097: [  OCRASM][1520055008]proprasmo: kgfoCheckMount returned [7]
2013-11-05 15:58:19.097: [  OCRASM][1520055008]proprasmo: The ASM instance is down
2013-11-05 15:58:19.097: [  OCRRAW][1520055008]proprioo: Failed to open [+DATACOST]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2013-11-05 15:58:19.097: [  OCRRAW][1520055008]proprioo: No OCR/OLR devices are usable
2013-11-05 15:58:19.097: [  OCRASM][1520055008]proprasmcl: asmhandle is NULL
2013-11-05 15:58:19.097: [  OCRRAW][1520055008]proprinit: Could not open raw device
2013-11-05 15:58:19.097: [  OCRASM][1520055008]proprasmcl: asmhandle is NULL
2013-11-05 15:58:19.097: [  OCRAPI][1520055008]a_init:16!: Backend init unsuccessful : [26]
2013-11-05 15:58:19.097: [  CRSOCR][1520055008] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
ORA-15077: could not locate ASM instance serving a required diskgroup
] [7]
2013-11-05 15:58:19.097: [    CRSD][1520055008][PANIC] CRSD exiting: Could not init OCR, code: 26
2013-11-05 15:58:19.098: [    CRSD][1520055008] Done.

trying to start ASM manually (i have one user for grid and db – oracle, otherwise login to your grid owner).

su – oracle

export ORACLE_HOME=$GRID_HOME

export ORACLE_SID=+ASM2

asmcmd

amdcmd> startup
ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpsemsper
This related to kernel parameter settings, if you have free space on your node. Check it with df -h.
-- Solution:

1. Edit /etc/sysctl.conf file and modify semmni value to:
kernel.sem = 250 32000 100 200

2. reboot the node.
All must work fine, but if you will not see crsd started again, login to grid owner user and try to start ASM instance manually.
You will see different error, not related to free space.


---------------
Solution 2:
---------------


Problem was in ASM parameter file:

01.+ASM1.asm_diskgroups='ARCH','REDO','OCR','VOTE'#Manual Mount   
02.+ASM2.asm_diskgroups='REDO','VOTE','ARCH'#Manual Mount  
+ASM1.asm_diskgroups='ARCH','REDO','OCR','VOTE'#Manual Mount
+ASM2.asm_diskgroups='REDO','VOTE','ARCH'#Manual Mount
I added missing 'OCR' to +ASM2.asm_diskgroups parameter. After physically rebooted both machines. And everything is OK.

01.[root@faa5 tmp]# crsctl check crs   02.CRS-4638: Oracle High Availability Services is online   03.CRS-4537: Cluster Ready Services is online   04.CRS-4529: Cluster Synchronization Services is online   05.CRS-4533: Event Manager is online   06.[root@faa5 tmp]#    07.[root@faa5 tmp]#    08.[root@faa5 tmp]# srvctl status database -d FAARACDB   09.Instance FAARACDB1 is running on node faa4   10.Instance FAARACDB2 is running on node faa5   11.[root@faa5 tmp]#   [root@faa5 tmp]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

[root@faa5 tmp]# 
[root@faa5 tmp]# 
[root@faa5 tmp]# srvctl status database -d FAARACDB
Instance FAARACDB1 is running on node faa4
Instance FAARACDB2 is running on node faa5
[root@faa5 tmp]# 

---------------
Solution 3:
---------------

Execute script on those node which is not start.
 root.sh script execute in RAC2 node.



No comments: