Wednesday, 6 January 2016

ORA-15077 Has a disk HB but no network HB

Few days ago i have face some issue in RAC below the full explanation the issue ,detection and resolution. 

---------------
Problem : 
---------------

At the time of starting Grid Infrastructure facing issue.
Vice-Versa Issue At a time only single node work and 2nd node all services started but ora.crsd and ora.asm not started.


---------------
 NODE 1:
---------------

[root@rac1 ~]# crsctl stat res -init -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rac1                     Started
ora.crsd
      1        ONLINE  ONLINE       rac1
ora.cssd
      1        ONLINE  ONLINE       rac1
ora.cssdmonitor
      1        ONLINE  ONLINE       rac1
ora.ctssd
      1        ONLINE  ONLINE       rac1                     ACTIVE:0
ora.diskmon
      1        ONLINE  ONLINE       rac1
ora.evmd
      1        ONLINE  ONLINE       rac1
ora.gipcd
      1        ONLINE  ONLINE       rac1
ora.gpnpd
      1        ONLINE  ONLINE       rac1
ora.mdnsd
      1        ONLINE  ONLINE       rac1



--- /app/gridHome/log/rac1/alertrac1.log

2015-12-28 21:14:01.314
[ohasd(5082)]CRS-2112:The OLR service started on node rac1.
2015-12-28 21:14:02.084
[ohasd(5082)]CRS-8011:reboot advisory message from host: rac1, component: ag163608, with time stamp: L-2015-12-25-17:13:18.517
[ohasd(5082)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS
2015-12-28 21:14:02.368
[ohasd(5082)]CRS-8011:reboot advisory message from host: rac1, component: ag203044, with time stamp: L-2015-12-26-22:10:09.463
[ohasd(5082)]CRS-8013:reboot advisory message text: Rebooting after limit 760 exceeded; disk timeout 710, network timeout 760, last heartbeat from CSSD at epoch seconds 1451148008.203, 1251 milliseconds ago based on invariant clock value of 5249663
2015-12-28 21:14:02.785
[ohasd(5082)]CRS-8011:reboot advisory message from host: rac1, component: mo163608, with time stamp: L-2015-12-25-17:13:18.517
[ohasd(5082)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS
2015-12-28 21:14:02.936
[ohasd(5082)]CRS-8017:location: /etc/oracle/lastgasp has 34 reboot advisory log files, 3 were announced and 0 errors occurred
2015-12-28 21:14:33.180
[ohasd(5082)]CRS-2772:Server 'rac1' has been assigned to pool 'Free'.
2015-12-28 21:14:36.486
[ohasd(5082)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2015-12-28 21:14:44.368
[cssd(5347)]CRS-1713:CSSD daemon is started in clustered mode
2015-12-28 21:15:00.603
[cssd(5347)]CRS-1707:Lease acquisition for node rac1 number 1 completed
2015-12-28 21:15:00.707
[cssd(5347)]CRS-1605:CSSD voting file is online: ORCL:OCR; details in /app/gridHome/log/rac1/cssd/ocssd.log.
2015-12-28 21:15:18.881
[cssd(5347)]CRS-1625:Node rac2, number 2, was manually shut down
2015-12-28 21:15:28.300
[cssd(5347)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 .
2015-12-28 21:15:31.235
[ctssd(5410)]CRS-2407:The new Cluster Time Synchronization Service reference node is host rac1.
2015-12-28 21:15:31.337
[ctssd(5410)]CRS-2401:The Cluster Time Synchronization Service started on host rac1.
2015-12-28 21:16:40.946
[crsd(5644)]CRS-1012:The OCR service started on node rac1.
2015-12-28 21:16:46.469
[crsd(5644)]CRS-1201:CRSD started on node rac1.
2015-12-28 21:16:55.272
[crsd(5644)]CRS-2772:Server 'rac1' has been assigned to pool 'Generic'.
2015-12-28 21:16:55.273
[crsd(5644)]CRS-2772:Server 'rac1' has been assigned to pool 'ora.racdb'.



--- /app/gridHome/log/rac1/crsd/crsd.log

2015-12-28 21:17:01.730: [UiServer][2572688272] Sending message to PE. ctx= 0xb58edc0
2015-12-28 21:17:01.730: [UiServer][2572688272] processMessage called
2015-12-28 21:17:01.730: [UiServer][2572688272] Sending message to PE. ctx= 0xb82c1a8
2015-12-28 21:17:01.731: [   CRSPE][2576890768] Cmd : 0x9900cc70 : flags: EVENT_TAG | HOST_TAG
2015-12-28 21:17:01.731: [   CRSPE][2576890768] Processing PE command id=115. Description: [Start Resource : 0x9900cc70]
2015-12-28 21:17:01.731: [   CRSPE][2576890768] Attribute overrides for the command: USR_ORA_OPI = true;
2015-12-28 21:17:01.735: [   CRSPE][2576890768] PE Command [ Start Resource : 0x9900cc70 ] has completed
2015-12-28 21:17:01.735: [   CRSPE][2576890768] UI Command [Start Resource : 0x9900cc70] is replying to sender.
2015-12-28 21:17:01.735: [   CRSPE][2576890768] Processing PE command id=116. Description: [Stat Resource : 0x9900cc70]
2015-12-28 21:17:01.737: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed
2015-12-28 21:17:01.737: [   CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.
2015-12-28 21:17:01.738: [UiServer][2572688272] Container [ Name: ORDER
        MESSAGE:
        TextMessage[CRS-2545: Cannot operate on 'instance of ora.LISTENER.lsnr assigned to rac1'. It is locked by 'SYSTEM' for command 'Resource Autostart : rac1']
        MSGTYPE:
        TextMessage[1]
        OBJID:
        TextMessage[ora.LISTENER.lsnr]
        WAIT:
        TextMessage[0]
]
2015-12-28 21:17:01.738: [ COMMCRS][2572688272]clscsendx: (0xb7b25f8) Connection not active

2015-12-28 21:17:01.738: [UiServer][2572688272] S(0xb56cfb8): Error sending msg over socket.6
2015-12-28 21:17:01.738: [UiServer][2572688272] Fatal exception sending reply back to client.FatalCommsException : Failed to send response to client.
(File: clsMessageStream.cpp, line: 275

2015-12-28 21:17:01.739: [UiServer][2572688272] Container [ Name: UI_DATA
        ora.LISTENER.lsnr:
        TextMessage[233]
]
2015-12-28 21:17:01.739: [UiServer][2572688272] S(0xb56cfb8): No connection to client.6
2015-12-28 21:17:01.739: [UiServer][2572688272] Fatal exception sending reply back to client.FatalCommsException : Failed to send response to client.
(File: clsMessageStream.cpp, line: 275

2015-12-28 21:17:01.739: [UiServer][2572688272] Done for ctx=0xb58edc0
2015-12-28 21:17:01.743: [UiServer][2572688272] Done for ctx=0xb82c1a8
2015-12-28 21:17:01.858: [UiServer][2570587024] S(0xb7b2570): set Properties ( grid,0xb7a1f00)
2015-12-28 21:17:01.875: [UiServer][2572688272] processMessage called
2015-12-28 21:17:01.876: [UiServer][2572688272] Sending message to PE. ctx= 0xb7a2538
2015-12-28 21:17:01.877: [   CRSPE][2576890768] Processing PE command id=117. Description: [Stat Resource : 0x9900cc70]
2015-12-28 21:17:01.881: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed
2015-12-28 21:17:01.882: [   CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.
2015-12-28 21:17:01.886: [UiServer][2572688272] Done for ctx=0xb7a2538
2015-12-28 21:17:01.922: [UiServer][2570587024] S(0xb6e09b0): set Properties ( grid,0xb7b8fb0)
2015-12-28 21:17:01.934: [UiServer][2572688272] processMessage called
2015-12-28 21:17:01.934: [UiServer][2572688272] Sending message to PE. ctx= 0xb7b25f8
2015-12-28 21:17:01.934: [   CRSPE][2576890768] Processing PE command id=118. Description: [Stat Resource : 0x9900cc70]
2015-12-28 21:17:01.936: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed
2015-12-28 21:17:01.937: [   CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.
2015-12-28 21:17:01.938: [UiServer][2572688272] Done for ctx=0xb7b25f8
2015-12-28 21:17:01.947: [UiServer][2570587024] S(0xb79fe88): set Properties ( grid,0xb7a24e8)
2015-12-28 21:17:01.958: [UiServer][2572688272] processMessage called
2015-12-28 21:17:01.958: [UiServer][2572688272] Sending message to PE. ctx= 0xb70b938
2015-12-28 21:17:01.959: [   CRSPE][2576890768] Processing PE command id=119. Description: [Stat Resource : 0x9900cc70]
2015-12-28 21:17:01.960: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed
2015-12-28 21:17:01.960: [   CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.
2015-12-28 21:17:01.961: [UiServer][2572688272] Done for ctx=0xb70b938
2015-12-28 21:17:02.224: [    AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:555 from the agent /app/gridHome/bin/oraagent_grid
2015-12-28 21:17:02.224: [    AGFW][2587397008] Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:550
2015-12-28 21:17:02.225: [   CRSPE][2576890768] Received reply to action [Start] message ID: 550
2015-12-28 21:17:02.371: [    AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:564 from the agent /app/gridHome/bin/oraagent_grid
2015-12-28 21:17:02.371: [    AGFW][2587397008] Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:563
2015-12-28 21:17:02.371: [   CRSPE][2576890768] Received reply to action [Start] message ID: 563
2015-12-28 21:17:02.374: [    AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:555 from the agent /app/gridHome/bin/oraagent_grid
2015-12-28 21:17:02.374: [    AGFW][2587397008] Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:550
2015-12-28 21:17:02.375: [   CRSPE][2576890768] Received reply to action [Start] message ID: 550
2015-12-28 21:17:02.375: [   CRSPE][2576890768] RI [ora.LISTENER_SCAN1.lsnr 1 1] new external state [ONLINE] old value: [OFFLINE] on rac1 label = []
2015-12-28 21:17:02.376: [   CRSPE][2576890768] CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'rac1' succeeded

2015-12-28 21:17:02.377: [  CRSRPT][2574789520] Publishing event: Cluster Resource State Change Event for ora.LISTENER_SCAN1.lsnr:rac1 : 0xb7b93a8
2015-12-28 21:17:02.377: [  CRSRPT][2574789520] FPE: Publisher not ready.
2015-12-28 21:17:02.459: [    AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:564 from the agent /app/gridHome/bin/oraagent_grid
2015-12-28 21:17:02.459: [    AGFW][2587397008] Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:563
2015-12-28 21:17:02.459: [   CRSPE][2576890768] Received reply to action [Start] message ID: 563
2015-12-28 21:17:02.459: [   CRSPE][2576890768] RI [ora.LISTENER.lsnr rac1 1] new external state [ONLINE] old value: [OFFLINE] on rac1 label = []
2015-12-28 21:17:02.460: [   CRSPE][2576890768] CRS-2676: Start of 'ora.LISTENER.lsnr' on 'rac1' succeeded

2015-12-28 21:17:02.460: [   CRSPE][2576890768] PE Command [ Resource State Change (ora.net1.network rac1 1) : 0x99004ff8 ] has completed
2015-12-28 21:17:02.462: [  CRSRPT][2574789520] Publishing event: Cluster Resource State Change Event for ora.LISTENER.lsnr:rac1 : 0xb7a0208
2015-12-28 21:17:02.462: [  CRSRPT][2574789520] FPE: Publisher not ready.
2015-12-28 21:17:02.462: [    AGFW][2587397008] Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:620
2015-12-28 21:17:02.462: [    AGFW][2587397008] Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:620
2015-12-28 21:17:02.513: [   CRSPE][2576890768] Re-evaluation of queued op [START of [ora.LISTENER_SCAN1.lsnr 1 1] on [rac1] : 0x99005308]. found it no longer needed:CRS-0223: Resource 'ora.LISTENER_SCAN1.lsnr' has placement error.
. Finishing the op.
2015-12-28 21:17:02.514: [   CRSPE][2576890768] Re-evaluation of queued op [START of [ora.ons rac1 1] on [rac1] : 0x99005810]. found it no longer needed:CRS-0223: Resource 'ora.ons' has placement error.
. Finishing the op.
2015-12-28 21:17:02.516: [   CRSPE][2576890768] Re-evaluation of queued op [START of [ora.rac2.vip 1 1] on [rac1] : 0x99007e48]. found it no longer needed:CRS-0223: Resource 'ora.rac2.vip' has placement error.
. Finishing the op.
2015-12-28 21:17:02.517: [   CRSPE][2576890768] Re-evaluation of queued op [START of [ora.LISTENER.lsnr rac1 1] on [rac1] : 0x99008fd0]. found it no longer needed:CRS-0223: Resource 'ora.LISTENER.lsnr' has placement error.
. Finishing the op.
2015-12-28 21:17:08.729: [    AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.eons rac1 1] ID 4098:508 from the agent /app/gridHome/bin/oraagent_grid
2015-12-28 21:17:08.729: [    AGFW][2587397008] Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.eons rac1 1] ID 4098:507
2015-12-28 21:17:08.729: [   CRSPE][2576890768] Received reply to action [Start] message ID: 507
2015-12-28 21:17:08.730: [    AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.eons rac1 1] ID 4098:508 from the agent /app/gridHome/bin/oraagent_grid
2015-12-28 21:17:08.730: [    AGFW][2587397008] Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.eons rac1 1] ID 4098:507
2015-12-28 21:17:08.730: [   CRSPE][2576890768] Received reply to action [Start] message ID: 507
2015-12-28 21:17:08.731: [   CRSPE][2576890768] RI [ora.eons rac1 1] new external state [ONLINE] old value: [OFFLINE] on rac1 label = []
2015-12-28 21:17:08.731: [   CRSPE][2576890768] CRS-2676: Start of 'ora.eons' on 'rac1' succeeded

2015-12-28 21:17:08.731: [   CRSPE][2576890768] PE Command [ Server Join: Resource Startup {rac1 } : 0x99004188 ] has completed
2015-12-28 21:17:08.732: [   CRSPE][2576890768] Updated state details for server rac1from [AUTOSTARTING RESOURCES] to : []
2015-12-28 21:17:08.732: [  CRSRPT][2574789520] Publishing event: Cluster Resource State Change Event for ora.eons:rac1 : 0xb733660
2015-12-28 21:17:08.732: [  CRSRPT][2574789520] FPE: Publisher not ready.
2015-12-28 21:17:08.733: [  CRSOCR][2585295760] Multi Write Batch processing...
2015-12-28 21:17:08.733: [    AGFW][2587397008] Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:640
2015-12-28 21:17:08.733: [    AGFW][2587397008] Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:640
2015-12-28 21:17:08.733: [  CRSRPT][2574789520] Reporter skipping ServerStateChange event to EVM
2015-12-28 21:17:08.733: [  CRSRPT][2574789520] Publishing event: Cluster Server State Change Event for rac1becoming ONLINE : 0xb7b20e0
2015-12-28 21:17:08.733: [  CRSRPT][2574789520] FPE: Publisher not ready.
2015-12-28 21:17:08.775: [   CRSPE][2576890768] Resource Autostart completed for rac1
2015-12-28 21:17:08.775: [  CRSOCR][2585295760] Multi Write Batch done.
2015-12-28 21:17:09.774: [  CRSRPT][2574789520] Publisher initialized flushing buffered events.
2015-12-28 21:17:09.774: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7db288
2015-12-28 21:17:09.774: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7a7aa0
2015-12-28 21:17:09.774: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb644228
2015-12-28 21:17:09.774: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7b00d8
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7ab9e0
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb79d6d8
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7a9818
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7b0178
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb531e08
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7ab2a0
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7addb0
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb73a9f8
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb70a520
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7af480
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7336e8
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb6bd5d0
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7b93a8
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7a0208
2015-12-28 21:17:09.775: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb733660
2015-12-28 21:17:09.776: [  CRSRPT][2574789520] Publish to eons buffered event : 0xb7b20e0
2015-12-28 21:17:52.973: [UiServer][2570587024] S(0xb7b80b8): set Properties ( grid,0xb6df8b8)
2015-12-28 21:17:52.985: [UiServer][2572688272] processMessage called
2015-12-28 21:17:52.985: [UiServer][2572688272] Sending message to PE. ctx= 0xb70a5b8
2015-12-28 21:17:52.985: [   CRSPE][2576890768] Processing PE command id=120. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:17:52.987: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:17:52.987: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:17:52.993: [UiServer][2572688272] Done for ctx=0xb70a5b8
2015-12-28 21:17:52.996: [UiServer][2570587024] S(0xb7bae90): set Properties ( grid,0xb70cdf0)
2015-12-28 21:17:53.008: [UiServer][2572688272] processMessage called
2015-12-28 21:17:53.008: [UiServer][2572688272] Sending message to PE. ctx= 0xb7b9ee0
2015-12-28 21:17:53.008: [   CRSPE][2576890768] Processing PE command id=121. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:17:53.010: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:17:53.010: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:17:53.014: [UiServer][2572688272] Done for ctx=0xb7b9ee0
2015-12-28 21:17:53.021: [UiServer][2570587024] S(0xb70d018): set Properties ( grid,0xb6df8b8)
2015-12-28 21:17:53.033: [UiServer][2572688272] processMessage called
2015-12-28 21:17:53.033: [UiServer][2572688272] Sending message to PE. ctx= 0xb70a5b8
2015-12-28 21:17:53.033: [   CRSPE][2576890768] Processing PE command id=122. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:17:53.035: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:17:53.035: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:17:53.064: [UiServer][2572688272] Done for ctx=0xb70a5b8
2015-12-28 21:17:53.065: [    AGFW][2587397008] Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:198
2015-12-28 21:17:53.065: [    AGFW][2587397008] Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:198
2015-12-28 21:20:34.067: [UiServer][2570587024] S(0xb7078c8): set Properties ( root,0xb6df8b8)
2015-12-28 21:20:34.078: [UiServer][2572688272] processMessage called
2015-12-28 21:20:34.079: [UiServer][2572688272] Sending message to PE. ctx= 0xb6e1bd0
2015-12-28 21:20:34.079: [   CRSPE][2576890768] Processing PE command id=123. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:20:34.080: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:20:34.080: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:20:34.083: [UiServer][2572688272] Done for ctx=0xb6e1bd0
2015-12-28 21:25:34.156: [UiServer][2570587024] S(0xb70e9c0): set Properties ( root,0xb70cdf0)
2015-12-28 21:25:34.168: [UiServer][2572688272] processMessage called
2015-12-28 21:25:34.168: [UiServer][2572688272] Sending message to PE. ctx= 0xb6df8c8
2015-12-28 21:25:34.170: [   CRSPE][2576890768] Processing PE command id=124. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:25:34.171: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:25:34.171: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:25:34.172: [UiServer][2572688272] Done for ctx=0xb6df8c8
2015-12-28 21:30:34.209: [UiServer][2570587024] S(0xb7a35f8): set Properties ( root,0xb70cdf0)
2015-12-28 21:30:34.220: [UiServer][2572688272] processMessage called
2015-12-28 21:30:34.221: [UiServer][2572688272] Sending message to PE. ctx= 0xb736340
2015-12-28 21:30:34.222: [   CRSPE][2576890768] Processing PE command id=125. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:30:34.222: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:30:34.223: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:30:34.226: [UiServer][2572688272] Done for ctx=0xb736340
2015-12-28 21:35:34.276: [UiServer][2570587024] S(0xb70d468): set Properties ( root,0xb73ca58)
2015-12-28 21:35:34.288: [UiServer][2572688272] processMessage called
2015-12-28 21:35:34.289: [UiServer][2572688272] Sending message to PE. ctx= 0xb6dfd38
2015-12-28 21:35:34.290: [   CRSPE][2576890768] Processing PE command id=126. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:35:34.291: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:35:34.291: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:35:34.292: [UiServer][2572688272] Done for ctx=0xb6dfd38
2015-12-28 21:40:34.542: [UiServer][2570587024] S(0xb73bc60): set Properties ( root,0xb70e238)
2015-12-28 21:40:34.554: [UiServer][2572688272] processMessage called
2015-12-28 21:40:34.554: [UiServer][2572688272] Sending message to PE. ctx= 0xb739c30
2015-12-28 21:40:34.555: [   CRSPE][2576890768] Processing PE command id=127. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:40:34.555: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:40:34.555: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:40:34.557: [UiServer][2572688272] Done for ctx=0xb739c30
2015-12-28 21:45:34.632: [UiServer][2570587024] S(0xb73cb78): set Properties ( root,0xb7b8000)
2015-12-28 21:45:34.644: [UiServer][2572688272] processMessage called
2015-12-28 21:45:34.644: [UiServer][2572688272] Sending message to PE. ctx= 0xb6b81a8
2015-12-28 21:45:34.645: [   CRSPE][2576890768] Processing PE command id=128. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:45:34.645: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:45:34.646: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:45:34.650: [UiServer][2572688272] Done for ctx=0xb6b81a8
2015-12-28 21:50:34.776: [UiServer][2570587024] S(0xb73bc60): set Properties ( root,0xb7b8000)
2015-12-28 21:50:34.793: [UiServer][2572688272] processMessage called
2015-12-28 21:50:34.794: [UiServer][2572688272] Sending message to PE. ctx= 0xb6e3ee8
2015-12-28 21:50:34.795: [   CRSPE][2576890768] Processing PE command id=129. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:50:34.795: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:50:34.796: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:50:34.807: [UiServer][2572688272] Done for ctx=0xb6e3ee8
2015-12-28 21:55:34.772: [UiServer][2570587024] S(0xb73bc60): set Properties ( root,0xb7b8000)
2015-12-28 21:55:34.785: [UiServer][2572688272] processMessage called
2015-12-28 21:55:34.785: [UiServer][2572688272] Sending message to PE. ctx= 0xb7af430
2015-12-28 21:55:34.786: [   CRSPE][2576890768] Processing PE command id=130. Description: [Stat Resource : 0x99000d80]
2015-12-28 21:55:34.787: [   CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed
2015-12-28 21:55:34.787: [   CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.
2015-12-28 21:55:34.788: [UiServer][2572688272] Done for ctx=0xb7af430



--- /app/gridHome/log/rac1/cssd/ocssd.log

2015-12-28 21:58:52.166: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:58:55.061: [    CSSD][2998131600]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:OCR sched delay 1840 > margin 1500 cur_ms 1931954 lastalive 1930114
2015-12-28 21:58:55.061: [    CSSD][2998131600]clssnmvSchedDiskThreads: KillBlockThread for voting file ORCL:OCR sched delay 1830 > margin 1500  cur_ms 1931954 lastalive 1930124
2015-12-28 21:58:57.059: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:58:57.059: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:01.068: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:01.068: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:05.079: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:05.079: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:09.091: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:09.091: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:13.099: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:13.099: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:18.110: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:18.110: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes
2015-12-28 21:59:23.120: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:23.120: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes
2015-12-28 21:59:28.131: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:28.131: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes
2015-12-28 21:59:33.140: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:33.140: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes
2015-12-28 21:59:37.150: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:37.150: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:41.158: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:41.158: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:45.168: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:45.168: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:49.176: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:49.177: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:53.186: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:53.186: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 21:59:57.195: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 21:59:57.195: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 22:00:01.206: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 22:00:01.206: [    CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes
2015-12-28 22:00:06.217: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 22:00:06.217: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes
2015-12-28 22:00:11.228: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 22:00:11.228: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes
2015-12-28 22:00:16.239: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 22:00:16.239: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes
2015-12-28 22:00:21.249: [    CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes
2015-12-28 22:00:21.249: [    CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes


--- /app/oracleBase/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log

Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Real Application Clusters and Automatic Storage Management options.
Using parameter settings in server-side spfile +OCR/rac-cluster/asmparameterfile/registry.253.866216815
System parameters with non-default values:
  large_pool_size          = 12M
  instance_type            = "asm"
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "ORCL:*"
  asm_diskgroups           = "FRA"
  asm_diskgroups           = "DATA"
  asm_power_limit          = 1
  diagnostic_dest          = "/app/oracleBase"
Cluster communication is configured to use the following interface(s) for this instance
  192.168.1.151
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Mon Dec 28 21:16:07 2015
PMON started with pid=2, OS id=5521
Mon Dec 28 21:16:08 2015
VKTM started with pid=3, OS id=5525 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
Mon Dec 28 21:16:08 2015
GEN0 started with pid=4, OS id=5531
Mon Dec 28 21:16:08 2015
DIAG started with pid=5, OS id=5535
Mon Dec 28 21:16:08 2015
PING started with pid=6, OS id=5539
Mon Dec 28 21:16:08 2015
PSP0 started with pid=7, OS id=5543
Mon Dec 28 21:16:10 2015
DIA0 started with pid=8, OS id=5547
Mon Dec 28 21:16:11 2015
LMON started with pid=9, OS id=5552
Mon Dec 28 21:16:11 2015
LMD0 started with pid=10, OS id=5556
Mon Dec 28 21:16:11 2015
LMS0 started with pid=11, OS id=5560 at elevated priority
Mon Dec 28 21:16:12 2015
LMHB started with pid=12, OS id=5566
Mon Dec 28 21:16:12 2015
MMAN started with pid=13, OS id=5571
Mon Dec 28 21:16:12 2015
LGWR started with pid=15, OS id=5579
Mon Dec 28 21:16:12 2015
DBW0 started with pid=14, OS id=5575
Mon Dec 28 21:16:12 2015
CKPT started with pid=16, OS id=5583
Mon Dec 28 21:16:13 2015
SMON started with pid=17, OS id=5587
Mon Dec 28 21:16:13 2015
GMON started with pid=19, OS id=5597
Mon Dec 28 21:16:13 2015
RBAL started with pid=18, OS id=5591
Mon Dec 28 21:16:13 2015
MMON started with pid=20, OS id=5601
Mon Dec 28 21:16:13 2015
MMNL started with pid=21, OS id=5605
lmon registered with NM - instance number 1 (internal mem no 0)
Reconfiguration started (old inc 0, new inc 2)
ASM instance
List of instances:
 1 (myinst: 1)
 Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Mon Dec 28 21:16:16 2015
LCK0 started with pid=22, OS id=5615
Mon Dec 28 21:16:18 2015
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /app/oracleBase
Mon Dec 28 21:16:19 2015
SQL> ALTER DISKGROUP ALL MOUNT /* asm agent */
NOTE: Diskgroups listed in ASM_DISKGROUPS are
         FRA
         DATA
NOTE: Diskgroup used for Voting files is:
         OCR
Diskgroup with spfile:OCR
Diskgroup used for OCR is:OCR
NOTE: cache registered group DATA number=1 incarn=0x6188a14e
NOTE: cache began mount (first) of group DATA number=1 incarn=0x6188a14e
NOTE: cache registered group FRA number=2 incarn=0x6198a14f
NOTE: cache began mount (first) of group FRA number=2 incarn=0x6198a14f
NOTE: cache registered group OCR number=3 incarn=0x6198a150
NOTE: cache began mount (first) of group OCR number=3 incarn=0x6198a150
NOTE: Loaded library: /opt/oracle/extapi/32/asm/orcl/1/libasm.so
NOTE: Assigning number (1,0) to disk (ORCL:DATA)
NOTE: Assigning number (2,0) to disk (ORCL:FRA)
NOTE: Assigning number (3,0) to disk (ORCL:OCR)
Mon Dec 28 21:16:25 2015
NOTE: start heartbeating (grp 1)
kfdp_query(DATA): 5
kfdp_queryBg(): 5
NOTE: cache opening disk 0 of grp 1: DATA label:DATA
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 1/0x6188A14E (DATA)
Mon Dec 28 21:16:27 2015
* allocate domain 1, invalid = TRUE
Mon Dec 28 21:16:28 2015
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=6.266 group=1 (DATA)
NOTE: advancing ckpt for thread=1 ckpt=6.266
NOTE: cache recovered group 1 to fcn 0.3427
Mon Dec 28 21:16:28 2015
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATA)
NOTE: LGWR found thread 1 closed at ABA 6.265
NOTE: LGWR mounted thread 1 for diskgroup 1 (DATA)
NOTE: LGWR opening thread 1 at fcn 0.3427 ABA 7.266
NOTE: cache mounting group 1/0x6188A14E (DATA) succeeded
NOTE: cache ending mount (success) of group DATA number=1 incarn=0x6188a14e
NOTE: start heartbeating (grp 2)
kfdp_query(FRA): 7
kfdp_queryBg(): 7
NOTE: cache opening disk 0 of grp 2: FRA label:FRA
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 2/0x6198A14F (FRA)
* allocate domain 2, invalid = TRUE
NOTE: attached to recovery domain 2
NOTE: starting recovery of thread=1 ckpt=6.183 group=2 (FRA)
NOTE: advancing ckpt for thread=1 ckpt=6.183
NOTE: cache recovered group 2 to fcn 0.1827
NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (FRA)
NOTE: LGWR found thread 1 closed at ABA 6.182
NOTE: LGWR mounted thread 1 for diskgroup 2 (FRA)
NOTE: LGWR opening thread 1 at fcn 0.1827 ABA 7.183
NOTE: cache mounting group 2/0x6198A14F (FRA) succeeded
NOTE: cache ending mount (success) of group FRA number=2 incarn=0x6198a14f
NOTE: start heartbeating (grp 3)
kfdp_query(OCR): 9
kfdp_queryBg(): 9
NOTE: cache opening disk 0 of grp 3: OCR label:OCR
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 3/0x6198A150 (OCR)
* allocate domain 3, invalid = TRUE
NOTE: attached to recovery domain 3
Mon Dec 28 21:16:29 2015
NOTE: starting recovery of thread=1 ckpt=16.68 group=3 (OCR)
NOTE: advancing ckpt for thread=1 ckpt=16.68
NOTE: cache recovered group 3 to fcn 0.790
NOTE: LGWR attempting to mount thread 1 for diskgroup 3 (OCR)
NOTE: LGWR found thread 1 closed at ABA 16.67
NOTE: LGWR mounted thread 1 for diskgroup 3 (OCR)
NOTE: LGWR opening thread 1 at fcn 0.790 ABA 17.68
NOTE: cache mounting group 3/0x6198A150 (OCR) succeeded
NOTE: cache ending mount (success) of group OCR number=3 incarn=0x6198a150
Mon Dec 28 21:16:29 2015
kfdp_query(DATA): 10
kfdp_queryBg(): 10
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup DATA was mounted
kfdp_query(FRA): 11
kfdp_queryBg(): 11
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup FRA was mounted
kfdp_query(OCR): 12
kfdp_queryBg(): 12
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 3
SUCCESS: diskgroup OCR was mounted
SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent */
SQL> ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */
SUCCESS: ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */
Mon Dec 28 21:16:31 2015
WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)
WARNING: failed to online diskgroup resource ora.FRA.dg (unable to communicate with CRSD/OHASD)
WARNING: failed to online diskgroup resource ora.OCR.dg (unable to communicate with CRSD/OHASD)
Mon Dec 28 21:16:35 2015
Starting background process ASMB
Mon Dec 28 21:16:35 2015
ASMB started with pid=24, OS id=5664
Mon Dec 28 21:17:53 2015
ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))))' SCOPE=MEMORY SID='+ASM1';



-------------
NODE: 2
-------------

[root@rac2 ~]# crsctl stat res -init -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  OFFLINE
ora.crsd
      1        ONLINE  OFFLINE
ora.cssd
      1        ONLINE  OFFLINE
ora.cssdmonitor
      1        ONLINE  ONLINE       rac2
ora.ctssd
      1        ONLINE  OFFLINE
ora.diskmon
      1        ONLINE  ONLINE       rac2
ora.evmd
      1        ONLINE  OFFLINE
ora.gipcd
      1        ONLINE  ONLINE       rac2
ora.gpnpd
      1        ONLINE  ONLINE       rac2
ora.mdnsd
      1        ONLINE  ONLINE       rac2



---  OCSSD.LOG

2015-12-27 00:09:03.052: [    CSSD][2888805264]clssnmLocalJoinEvent: begin on node(2), waittime 193000
2015-12-27 00:09:03.052: [    CSSD][2888805264]clssnmLocalJoinEvent: set curtime (4294388080) for my node
2015-12-27 00:09:03.052: [    CSSD][2888805264]clssnmLocalJoinEvent: scanning 32 nodes
2015-12-27 00:09:03.052: [    CSSD][2888805264]clssnmLocalJoinEvent: Node rac1, number 1, is in an existing cluster with disk state 3
2015-12-27 00:09:03.053: [    CSSD][2888805264]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
2015-12-27 00:09:03.572: [    CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 61997, LATS 4294388600, lastSeqNo 61997, uniqueness 1451148676, timestamp 1451155163/6166804
2015-12-27 00:09:04.052: [    CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2015-12-27 00:09:04.581: [    CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 61998, LATS 4294389610, lastSeqNo 61998, uniqueness 1451148676, timestamp 1451155164/6167814
2015-12-27 00:09:04.955: [    CSSD][2899295120]clssnmSendingThread: sending join msg to all nodes
2015-12-27 00:09:04.955: [    CSSD][2899295120]clssnmSendingThread: sent 5 join msgs to all nodes
2015-12-27 00:09:05.054: [    CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2015-12-27 00:09:05.514: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: [network]  failed connect attempt endp 0x9f21488 [00000000000002ec] { gipcEndpoint : localAddr 'gipc://rac2:2b90-7ba3-ca8e-6111#192.168.1.152#50593', remoteAddr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', numPend 0, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x80612, usrFlags 0x0 }, req 0x9f233e8 [00000000000002f5] { gipcConnectRequest : addr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', parentEndp 0x9f21488, ret gipcretConnection
2015-12-27 00:09:05.514: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos op  :  sgipcnTcpConnect
2015-12-27 00:09:05.514: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos dep :  Connection timed out (110)
2015-12-27 00:09:05.515: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos loc :  connect
2015-12-27 00:09:05.515: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos info:  addr '192.168.1.151:12874'
2015-12-27 00:09:05.515: [    CSSD][2878315408]clssscSelect: conn complete ctx 0x99a1c48 endp 0x2ec
2015-12-27 00:09:05.515: [    CSSD][2878315408]clssnmeventhndlr: node(1), endp(0x2ec) failed, probe((nil)) ninf->endp (0x2ec) CONNCOMPLETE
2015-12-27 00:09:05.515: [    CSSD][2878315408]clssnmDiscHelper: rac1, node(1) connection failed, endp (0x2ec), probe((nil)), ninf->endp 0x2ec
2015-12-27 00:09:05.515: [    CSSD][2878315408]clssnmDiscHelper: node 1 clean up, endp (0x2ec), init state 0, cur state 0
2015-12-27 00:09:05.515: [GIPCXCPT][2878315408]gipcInternalDissociate: obj 0x9f21488 [00000000000002ec] { gipcEndpoint : localAddr 'gipc://rac2:2b90-7ba3-ca8e-6111#192.168.1.152#50593', remoteAddr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)
2015-12-27 00:09:05.515: [GIPCXCPT][2878315408]gipcDissociateF [clssnmDiscHelper : clssnm.c : 3215]: EXCEPTION[ ret gipcretFail (1) ]  failed to dissociate obj 0x9f21488 [00000000000002ec] { gipcEndpoint : localAddr 'gipc://rac2:2b90-7ba3-ca8e-6111#192.168.1.152#50593', remoteAddr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 }, flags 0x0
2015-12-27 00:09:05.517: [    CSSD][2878315408]clssnmDiscEndp: gipcDestroy 0x2ec
2015-12-27 00:09:05.588: [    CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 61999, LATS 4294390620, lastSeqNo 61999, uniqueness 1451148676, timestamp 1451155165/6168814
2015-12-27 00:09:05.589: [    CSSD][2878315408]clssnmconnect: connecting to addr gipc://rac1:nm_rac-cluster#192.168.1.151#12874
2015-12-27 00:09:05.590: [    CSSD][2878315408]clssscConnect: endp 0x2f8 - cookie 0x99a1c48 - addr gipc://rac1:nm_rac-cluster#192.168.1.151#12874
2015-12-27 00:09:05.590: [    CSSD][2878315408]clssnmconnect: connecting to node(1), endp(0x2f8), flags 0x10002
2015-12-27 00:09:06.056: [    CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2015-12-27 00:09:06.597: [    CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 62000, LATS 4294391630, lastSeqNo 62000, uniqueness 1451148676, timestamp 1451155166/6169824
2015-12-27 00:09:07.058: [    CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2015-12-27 00:09:07.603: [    CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 62001, LATS 4294392630, lastSeqNo 62001, uniqueness 1451148676, timestamp 1451155167/6170834
2015-12-27 00:09:08.060: [    CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0
2015-12-27 00:09:08.611: [    CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 62002, LATS 4294393640, lastSeqNo 62002, uniqueness 1451148676, timestamp 1451155168/6171844
2015-12-27 00:09:09.062: [    CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 0


--- CRSD.LOG

tail: cannot open `100' for reading: No such file or directory
==> crsd.log <==
2015-12-26 20:42:40.834: [  OCRASM][3046749904]proprasmcl: asmhandle is NULL
2015-12-26 20:42:40.834: [  OCRRAW][3046749904]proprinit: Could not open raw device
2015-12-26 20:42:40.834: [  OCRASM][3046749904]proprasmcl: asmhandle is NULL
2015-12-26 20:42:40.834: [  OCRAPI][3046749904]a_init:16!: Backend init unsuccessful : [26]
2015-12-26 20:42:40.835: [  CRSOCR][3046749904] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
ORA-15077: could not locate ASM instance serving a required diskgroup
] [7]
2015-12-26 20:42:40.835: [    CRSD][3046749904][PANIC] CRSD exiting: Could not init OCR, code: 26
2015-12-26 20:42:40.835: [    CRSD][3046749904] Done.



--- GRID ALERT.LOG

(9009)]CRS-5822:Agent '/app/gridHome/bin/cssdagent_root' disconnected from server. Details at (:CRSAGF00117:) in /app/gridHome/log/rac2/agent/ohasd/oracssdagent_root/oracssdagent_root.log.
2015-12-27 00:06:15.655
[ohasd(4740)]CRS-2112:The OLR service started on node rac2.
2015-12-27 00:06:20.613
[ohasd(4740)]CRS-8011:reboot advisory message from host: rac2, component: mo162335, with time stamp: L-2015-12-25-17:27:14.683
[ohasd(4740)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS
2015-12-27 00:06:20.926
[ohasd(4740)]CRS-8017:location: /etc/oracle/lastgasp has 44 reboot advisory log files, 1 were announced and 0 errors occurred
2015-12-27 00:06:43.252
[ohasd(4740)]CRS-2772:Server 'rac2' has been assigned to pool 'Free'.
2015-12-27 00:06:46.574
[ohasd(4740)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
2015-12-27 00:06:51.107
[cssd(5034)]CRS-1713:CSSD daemon is started in clustered mode
2015-12-27 00:07:01.189
[cssd(5034)]CRS-1707:Lease acquisition for node rac2 number 2 completed
2015-12-27 00:07:01.268
[cssd(5034)]CRS-1605:CSSD voting file is online: ORCL:OCR; details in /app/gridHome/log/rac2/cssd/ocssd.log.


--- +ASM2.log

Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Real Application Clusters and Automatic Storage Management options.
Using parameter settings in server-side spfile +OCR/rac-cluster/asmparameterfile/registry.253.866216815
System parameters with non-default values:
  large_pool_size          = 12M
  instance_type            = "asm"
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskgroups           = ""
  asm_power_limit          = 1
  diagnostic_dest          = "/app/oracleBase"
Cluster communication is configured to use the following interface(s) for this instance
  192.168.1.152
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Sat Dec 26 22:11:17 2015
PMON started with pid=2, OS id=6243
Sat Dec 26 22:11:17 2015
VKTM started with pid=3, OS id=6247 at elevated priority
VKTM running at (10)millisec precision with DBRM quantum (100)ms
Sat Dec 26 22:11:17 2015
GEN0 started with pid=4, OS id=6253
Sat Dec 26 22:11:17 2015
DIAG started with pid=5, OS id=6257
Sat Dec 26 22:11:18 2015
PING started with pid=6, OS id=6262
Sat Dec 26 22:11:18 2015
PSP0 started with pid=7, OS id=6267
Sat Dec 26 22:11:19 2015
DIA0 started with pid=8, OS id=6271
Sat Dec 26 22:11:19 2015
LMON started with pid=9, OS id=6276
Sat Dec 26 22:11:20 2015
LMD0 started with pid=10, OS id=6280
Sat Dec 26 22:11:20 2015
LMS0 started with pid=11, OS id=6284 at elevated priority
Sat Dec 26 22:11:21 2015
LMHB started with pid=12, OS id=6290
Sat Dec 26 22:11:21 2015
MMAN started with pid=13, OS id=6295
Sat Dec 26 22:11:22 2015
DBW0 started with pid=14, OS id=6299
Sat Dec 26 22:11:22 2015
LGWR started with pid=15, OS id=6303
Sat Dec 26 22:11:22 2015
SMON started with pid=17, OS id=6311
Sat Dec 26 22:11:23 2015
RBAL started with pid=18, OS id=6316
Sat Dec 26 22:11:23 2015
CKPT started with pid=16, OS id=6307
Sat Dec 26 22:11:23 2015
GMON started with pid=19, OS id=6320
Sat Dec 26 22:11:23 2015
MMON started with pid=20, OS id=6324
Sat Dec 26 22:11:23 2015
MMNL started with pid=21, OS id=6328
lmon registered with NM - instance number 2 (internal mem no 1)
Reconfiguration started (old inc 0, new inc 2)
ASM instance
List of instances:
 2 (myinst: 2)
 Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Sat Dec 26 22:11:25 2015
LCK0 started with pid=22, OS id=6337
Sat Dec 26 22:11:28 2015
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /app/oracleBase
Sat Dec 26 22:11:28 2015
SQL> ALTER DISKGROUP ALL MOUNT /* asm agent */
NOTE: Diskgroup used for Voting files is:
         OCR
Diskgroup with spfile:OCR
Diskgroup used for OCR is:OCR
NOTE: cache registered group OCR number=1 incarn=0x81c19341
NOTE: cache began mount (first) of group OCR number=1 incarn=0x81c19341
NOTE: Loaded library: /opt/oracle/extapi/32/asm/orcl/1/libasm.so
NOTE: Assigning number (1,0) to disk (ORCL:OCR)
Sat Dec 26 22:11:35 2015
NOTE: start heartbeating (grp 1)
kfdp_query(OCR): 3
kfdp_queryBg(): 3
NOTE: cache opening disk 0 of grp 1: OCR label:OCR
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache mounting (first) external redundancy group 1/0x81C19341 (OCR)
Sat Dec 26 22:11:36 2015
* allocate domain 1, invalid = TRUE
Sat Dec 26 22:11:36 2015
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=14.64 group=1 (OCR)
NOTE: advancing ckpt for thread=1 ckpt=14.64
NOTE: cache recovered group 1 to fcn 0.780
Sat Dec 26 22:11:36 2015
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (OCR)
NOTE: LGWR found thread 1 closed at ABA 14.63
NOTE: LGWR mounted thread 1 for diskgroup 1 (OCR)
NOTE: LGWR opening thread 1 at fcn 0.780 ABA 15.64
NOTE: cache mounting group 1/0x81C19341 (OCR) succeeded
NOTE: cache ending mount (success) of group OCR number=1 incarn=0x81c19341
Sat Dec 26 22:11:36 2015
kfdp_query(OCR): 4
kfdp_queryBg(): 4
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup OCR was mounted
SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent */
SQL> ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */
SUCCESS: ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */
Sat Dec 26 22:11:38 2015
WARNING: failed to online diskgroup resource ora.OCR.dg (unable to communicate with CRSD/OHASD)
Sat Dec 26 22:18:24 2015
Detected partial connectivity in the cluster - IMR is being issued. This may cause some instances to be evicted
Evicting instance 1 from cluster
Reconfiguration started (old inc 2, new inc 4)
List of instances:
 2 (myinst: 2)
Sat Dec 26 22:18:25 2015
Shutting down instance (immediate)
Shutting down instance: further logons disabled
Stopping background process MMNL
 Global Resource Directory frozen
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Sat Dec 26 22:18:25 2015
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Submitted all GCS remote-cache requests
 Fix write in gcs resources
Reconfiguration complete
Stopping background process MMON
License high water mark = 1
SQL> ALTER DISKGROUP ALL DISMOUNT /* asm agent */
NOTE: cache dismounting (clean) group 1/0x81C19341 (OCR)
Sat Dec 26 22:18:26 2015
NOTE: LGWR doing clean dismount of group 1 (OCR)
NOTE: LGWR closing thread 1 of diskgroup 1 (OCR) at ABA 15.65
NOTE: LGWR released thread recovery enqueue
Sat Dec 26 22:18:26 2015
freeing rdom 1
Sat Dec 26 22:18:26 2015
NOTE: detached from domain 1
NOTE: cache dismounted group 1/0x81C19341 (OCR)
kfdp_dismount(): 5
Sat Dec 26 22:18:26 2015
kfdp_dismountBg(): 5
NOTE: De-assigning number (1,0) from disk (ORCL:OCR)
SUCCESS: diskgroup OCR was dismounted
NOTE: cache deleting context for group OCR 1/-2118020287
SUCCESS: ALTER DISKGROUP ALL DISMOUNT /* asm agent */
Reconfiguration started (old inc 4, new inc 6)
List of instances:
 2 (myinst: 2)
 Global Resource Directory frozen
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Submitted all GCS remote-cache requests
 Fix write in gcs resources
Reconfiguration complete
Sat Dec 26 22:18:30 2015
WARNING: failed to offline diskgroup resource ora.OCR.dg (unable to communicate with CRSD/OHASD)
Sat Dec 26 22:18:31 2015
Stopping background process VKTM:
freeing rdom 0
Sat Dec 26 22:18:38 2015
Instance shutdown complete

--------------------
SOLUTION: -
--------------------

In my case the problem is ssh and the solution is co-ordinate with OS admin and resolved the issue of SSH make coonection grid and oracle user vice-versa NODES.

--------------
Solution 4:
--------------
First Check ssh connection work or not , In that case also above the errors.

sshd issue grid to grid


Problem:-- ssh: connect to host xxx.xxx.x.xx port 22: Connection timed out


--- Port listening Check ,DNS also Check.
# ps -A | grep sshd # results in
# lanscan 
# netstat -in
# netstat -rn
# netstat -nat | grep 22
# netstat -ntl |grep :22
# sudo netstat -alnp|grep :22
# traceroute 10.230.xx.xx
# telnet 10.230.xx.xx 22
# reinstall ssh 


Some relevant Solution available when facing the issue starting grid infrastructure.

--------------
Solution 1:
--------------

Oracle 11.2 with ASM, crsd fail to start on second node after reboot.

Description:
OS rhel 5.8 x64 bit. Oracle 11.2 CRS fail to start on second node after reboot.
Errors:
crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
crsd.log:
2013-11-05 15:58:19.095: [  OCRASM][1520055008]proprasmo: Error in open/create file in dg [DATACOST]
[  OCRASM][1520055008]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
ORA-15077: could not locate ASM instance serving a required diskgroup
2013-11-05 15:58:19.097: [  OCRASM][1520055008]proprasmo: kgfoCheckMount returned [7]
2013-11-05 15:58:19.097: [  OCRASM][1520055008]proprasmo: The ASM instance is down
2013-11-05 15:58:19.097: [  OCRRAW][1520055008]proprioo: Failed to open [+DATACOST]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2013-11-05 15:58:19.097: [  OCRRAW][1520055008]proprioo: No OCR/OLR devices are usable
2013-11-05 15:58:19.097: [  OCRASM][1520055008]proprasmcl: asmhandle is NULL
2013-11-05 15:58:19.097: [  OCRRAW][1520055008]proprinit: Could not open raw device
2013-11-05 15:58:19.097: [  OCRASM][1520055008]proprasmcl: asmhandle is NULL
2013-11-05 15:58:19.097: [  OCRAPI][1520055008]a_init:16!: Backend init unsuccessful : [26]
2013-11-05 15:58:19.097: [  CRSOCR][1520055008] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
ORA-15077: could not locate ASM instance serving a required diskgroup
] [7]
2013-11-05 15:58:19.097: [    CRSD][1520055008][PANIC] CRSD exiting: Could not init OCR, code: 26
2013-11-05 15:58:19.098: [    CRSD][1520055008] Done.

trying to start ASM manually (i have one user for grid and db – oracle, otherwise login to your grid owner).

su – oracle

export ORACLE_HOME=$GRID_HOME

export ORACLE_SID=+ASM2

asmcmd

amdcmd> startup
ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpsemsper
This related to kernel parameter settings, if you have free space on your node. Check it with df -h.
-- Solution:

1. Edit /etc/sysctl.conf file and modify semmni value to:
kernel.sem = 250 32000 100 200

2. reboot the node.
All must work fine, but if you will not see crsd started again, login to grid owner user and try to start ASM instance manually.
You will see different error, not related to free space.


---------------
Solution 2:
---------------


Problem was in ASM parameter file:

01.+ASM1.asm_diskgroups='ARCH','REDO','OCR','VOTE'#Manual Mount   
02.+ASM2.asm_diskgroups='REDO','VOTE','ARCH'#Manual Mount  
+ASM1.asm_diskgroups='ARCH','REDO','OCR','VOTE'#Manual Mount
+ASM2.asm_diskgroups='REDO','VOTE','ARCH'#Manual Mount
I added missing 'OCR' to +ASM2.asm_diskgroups parameter. After physically rebooted both machines. And everything is OK.

01.[root@faa5 tmp]# crsctl check crs   02.CRS-4638: Oracle High Availability Services is online   03.CRS-4537: Cluster Ready Services is online   04.CRS-4529: Cluster Synchronization Services is online   05.CRS-4533: Event Manager is online   06.[root@faa5 tmp]#    07.[root@faa5 tmp]#    08.[root@faa5 tmp]# srvctl status database -d FAARACDB   09.Instance FAARACDB1 is running on node faa4   10.Instance FAARACDB2 is running on node faa5   11.[root@faa5 tmp]#   [root@faa5 tmp]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

[root@faa5 tmp]# 
[root@faa5 tmp]# 
[root@faa5 tmp]# srvctl status database -d FAARACDB
Instance FAARACDB1 is running on node faa4
Instance FAARACDB2 is running on node faa5
[root@faa5 tmp]# 

---------------
Solution 3:
---------------

Execute script on those node which is not start.
 root.sh script execute in RAC2 node.



RAC STARTUP ISSUES

11g R2


1. ocssd.log shows:


   2012-01-27 13:42:58.796: [ CSSD][19]clssnmvDHBValidateNCopy: node 1, racnode1, has a disk HB, but no network HB, DHB has rcfg 223132864, wrtcnt, 1112, LATS 783238209,
lastSeqNo 1111, uniqueness 1327692232, timestamp 1327693378/787089065


2. For 3 or more node cases, 2 nodes form cluster fine, the 3rd node joined then failed, ocssd.log show:


   2012-02-09 11:33:53.048: [ CSSD][1120926016](:CSSNM00008JclssnmCheckDskInfo: Aborting local node to avoid splitbrain. Cohort of 2 nodes with leader 2, racnode2, is smaller than
   cohort of 2 nodes led by node 1, racnode1, based on map type 2
   2012-02-09 11:33:53.048: [ CSSD][1120926016]###################################
   2012-02-09 11:33:53.048: [ CSSD][1120926016]clssscExit: CSSD aborting from thread clssnmRcfgMgrThread


3. Ocssd.bin startup timeout after 10minutes


   2012-04-08 12:04:33.153: [    CSSD][1]clssscmain: Starting CSS daemon, version 11.2.0.3.0, in (clustered) mode with uniqueness value 1333911873
   …...
   2012-04-08 12:14:31.994: [    CSSD][5]clssgmShutDown: Received abortive shutdown request from client.
   2012-04-08 12:14:31.994: [    CSSD][5]###################################
   2012-04-08 12:14:31.994: [    CSSD][5]clssscExit: CSSD aborting from thread GMClientListener
   2012-04-08 12:14:31.994: [    CSSD][5]###################################
   2012-04-08 12:14:31.994: [    CSSD][5](:CSSSC00012JclssscExit: A fatal error occurred and the CSS daemon is terminating abnormally


Possible Causes:

1. Voting disk is missing or inaccessible

2. Multicast is not working (for 11.2.0.2+)

3. Private network is not working, ping or traceroute shows destination unreachable.

4. Private network is pingable with normal ping command but not pingable with jumbo frame size (eg: ping –s 8900 ) when jumbo frame is enabled (MTU: 9000+). Or partial cluster nodes have jumbo frame set (MTU: 9000) and the problem node does not have jumbo frame set (MTU:1500)

5. Gpnpd does not come up, stuck in dispatch thread, Bug 10105195

6. Too many disks discovered via asm_diskstring or slow scan of disks due to Bug 13454354 on Solaris 11.2.0.3 only


Solutions:


1. Restore the voting disk access by checking storage access,  disk permissions etc.
   If the voting disk is missing from the OCR ASM diskgroup, start CRS in exclusive mode and recreate the voting disk:
   # crsctl start crs –excl
   # crsctl replace votedisk <+OCRVOTE diskgroup>

2. Refer to Document 1212703.1 for multicast test and fix

3. Consult with the network administrator to restore private network access

4. Engage network admin to enable jumbo frame from switch layer if it is enabled at Network card

5. Kill the gpnpd.bin process on surviving node, refer Document 10105195.8 Once above issues are resolved, restart Grid Infrastructure stack. If ping/traceroute all work for private network,
    there is a failed 11.2.0.1 to  11.2.0.2 upgrade happened, please check out   Bug 13416559 for workaround

6. Limit the number of ASM disks scan by supplying a more pecific asm_diskstring, refer bug 13583387  For Solaris 11.2.0.3 only, please apply patch 13250497, see <document



Issue #3: CRS-4535: Cannot communicate with Cluster Ready Services, crsd.bin is not running


Symptoms:


1. Command ‘$GRID_HOME/bin/crsctl check crs’ returns errors:
    CRS-4638: Oracle High Availability Services is online
    CRS-4535: Cannot communicate with Cluster Ready Services
    CRS-4529: Cluster Synchronization Services is online
    CRS-4534: Cannot communicate with Event Manager

2. Command ‘ps –ef | grep d.bin’ does not show a line similar to:
    root 23017 1 1 22:34 ? 00:00:00 /u01/app/11.2.0/grid/bin/crsd.bin reboot

3. Even if the crsd.bin process exists, command ‘crsctl stat res –t –init’ shows:
    ora.crsd
        1    ONLINE     INTERMEDIATE


Possible Causes:


1. Ocssd.bin is not running or resource ora.cssd is not ONLINE
2. +ASM instance can not startup
3. OCR is inaccessible
4. Network configuration has been changed causing gpnp profile.xml mismatch
5. $GRID_HOME/crs/init/.pid file for crsd has been removed or renamed manually, crsd.log shows: ‘Error3 -2 writing PID to the file’
6. Ocr.loc content mismatch with other cluster nodes. Crsd.log shows: ‘Shutdown CacheLocal. My hash ids don’t match’


Solutions:


1. Check the solution for Issue 2, ensure ocssd.bin is running and ora.cssd is ONLINE

2. For 11.2.0.2+, ensure that the resource ora.cluster_interconnect.haip is ONLINE, refer to Document 1383737.1 for ASM startup  issues related to HAIP.

3. Ensure the OCR disk is available and accessible. If the OCR is lost for any reason, refer to Document 1062983.1 on how to restore  the OCR.

4. Restore network configuration to be the same as interface defined in   $GRID_HOME//profiles/peer/profile.xml, refer to   Document 1073502.1 for private network modifications.

5. Touch the file with .pid under $GRID_HOME/crs/init.
   For 11.2.0.1, the file is owned by user.
   For 11.2.0.2, the file is owned by root user.

6. Using ocrconfig –repair command to fix the ocr.loc content:
   for example, as root user:
# ocrconfig –repair –add +OCR2 (to add an entry)
# ocrconfig –repair –delete +OCR2 (to remove an entry)
ohasd.bin needs to be up and running in order for above command to run.


Once above issues are resolved, either restart GI stack or start crsd.bin via:
   # crsctl start res ora.crsd –init



Issue #4: HAIP is not ONLINE (for 11.2.0.2+)


-Symptoms:

Command ‘crsctl stat res –t –init’ shows:
     ora.cluster_interconnect.haip
         1   ONLINE    OFFLINE


-Possible Causes:

1. Bug 10370797: START OF ‘ORA.CLUSTER_INTERCONNECT.HAIP’ FAILED DURING UPGRADE TO 11.2.0.2 (AIX only)
2. The private network information stored in the OCR does not match the actual OS setup, eg: oifcfg getif and ifconfig output mismatch (wrong interface name, subnet etc.)


-Solutions:

1. Apply Patch:10370797
2. Correct the OCR configuration, make sure it matches the OS network configuration

More information about HAIP is provided in Document 1210883.1.



Issue #5: ASM instance does not start, ora.asm is OFFLINE


Symptoms:

1. Command ‘ps –ef | grep asm’ shows no ASM processes
2. Command ‘crsctl stat res –t –init’ shows:
         ora.asm
               1    ONLINE    OFFLINE


Possible Causes:

1. ASM spfile is corrupted
2. ASM discovery string is incorrect and therefore voting disk/OCR cannot be discovered
3. ASMlib configuration problem

Solutions:

1. Create a temporary pfile to start ASM instance, then recreate spfile, see Document 1095214.1 for more details.
2. Refer to Document 1077094.1 to correct the ASM discovery string.
3. Refer to Document 1050164.1 to fix ASMlib configuration.


For further debugging GI startup issue, please refer to Document 1050908.1 Troubleshoot Grid Infrastructure Startup Issues.</document






Saturday, 2 January 2016

12c : Installation On OEL 5.6

Setup – OS - OEL 5.6 x86_64   
Setup Packages at the time of OEL installation
  • GNOME Desktop Environment
  • Editors
  • Graphical Internet
  • Text-based Internet
  • Development Libraries
  • Development Tools
  • Server Configuration Tools
  • Administration Tools
  • Base
  • System Tools
  • X Window System
No need to create user and group , because its already created when you setup OEL.
# passwd oracle

Set secure Linux to permissive by editing the "/etc/selinux/config" file, making sure the SELINUX flag is set as follows.
SELINUX=permissive
Once the change is complete, restart the server or run the following command.
# setenforce Permissive
If you have the Linux firewall enabled, you will need to disable or configure it, as shown here or here. To disable it, do the following.

# service iptables stop
# chkconfig iptables off
Create the directories in which the Oracle software will be installed.
mkdir -p /u01/app/oracle/product/12.1.0.2/db_1
chown -R oracle:oinstall /u01
chmod -R 775 /u01
Add the following lines at the end of the "/home/oracle/.bash_profile" file.
# Oracle Settings
export TMP=/tmp
export TMPDIR=$TMP

export ORACLE_HOSTNAME=el5.localdomain
export ORACLE_UNQNAME=cdb1
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/12.1.0.2/db_1
export ORACLE_SID=cdb1

export PATH=/usr/sbin:$PATH
export PATH=$ORACLE_HOME/bin:$PATH

export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib:/usr/lib
export CLASSPATH=$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlib

if [ $USER = "oracle" ]; then
  if [ $SHELL = "/bin/ksh" ]; then
    ulimit -p 16384
    ulimit -n 65536
  else
    ulimit -u 16384 -n 65536
  fi
fi



./runInstaller

12c : Manage Tablespace in PDB/CDB

Note: undo shared with all PDB with CDB, Temp either Shared or non- shared 

--------------------------------------------------------
Add and drop tablespace in CDB and PDB.
--------------------------------------------------------

Pluggable Databases

To create tablespaces in the pluggable databases (PDB), you need to ensure that you are in the correct container you want to create the tablespace for.
PDBs have container ids greater than 2.  The CON_NAME option is also a good way to ensure that you are in the correct container as well.



1. verify container db and id
select NAME, OPEN_MODE, CON_ID from V$PDBS;



2. Set container database.
alter session set container=PDB1;


3. See tablespace
select * from v$tablespaces;


4. Add/Drop tablespace
create tablespace SLOB datafile size 1M autoextend on next 1m;


5.Verify tablespace.
select * from v$tablespaces;



--------------------------------------------------------------
How to check where the tablespaces are located
--------------------------------------------------------------

conn sys as sysdba
OR
alter session set container=cdb$root;



select tablespace_name,status,con_id from cdb_tablespaces
where tablespace_name='SLOB' order by 3;



select pdb_id,pdb_name,status,con_id from cdb_pdbs order by 1;



------------------------
Default tablespace
------------------------

ALTER PLUGGABLE DATABASE DEFAULT TABLESPACE users;
ALTER PLUGGABLE DATABASE DEFAULT TEMPORARY TABLESPACE temp;



show con_name
show con_id

12c : Clone a Remote Pluggable Database PDB

---------------------------------------------------------------------------
Multitenant : Clone a Remote Pluggable Database (PDB)
---------------------------------------------------------------------------

SQL> select name from v$database;

NAME
----------
CDB3

SQL> show con_name

-- create a user we can point a database link to.


SQL> drop user remote_clone_user CASCADE;


SQL> CREATE USER remote_clone_user IDENTIFIED BY remote_clone_user;
create user remote_clone_user identified by remote_clone_user
            *
ERROR at line 1:
ORA-65096: invalid common user or role name


SQL> alter session set "_ORACLE_SCRIPT"=true;


SQL> CREATE USER remote_clone_user IDENTIFIED BY remote_clone_user;


SQL> grant create session,create pluggable database to remote_clone_user;


-- Switch the remote PDB to read-only mode.
SQL> alter pluggable database pdb5 close;

sql> ALTER PLUGGABLE DATABASE pdb5 open read only;


-- Connect to root container in the local CDB.
SQL> conn sys/oracle@cdb1 AS sysdba


-- Check database name.
SQL> select name from v$database;

NAME
--------
CDB1

-- Create a database link to the remote PDB
SQL> Drop database link clone_link


SQL> Create Database Link clone_link connect to remote_clone_user identified by remote_clone_user using pdb5;

-- Clone the remote PDB over the database link.
-- If You are not using OMF, Convert the paths if necessary.
SQL> Create Pluggable Database pdb5new from pdb5@clone_link
FILE_NAME_CONVERT=('/u01/app/oracle/oradata/cdb3/pdb5/',
'/u01/app/oracle/oradata/cdb1/pdb5new/');


SQL> alter pluggable database pdb5new open;


-- Check the PDBs.
SQL> select name,open_mode from v$pdbs order by name;

-- Open the remote PDB in read-write mode again.
SQL> conn sys/oracle@pdb5 as sysdba

alter pluggable database pdb5 close;

alter pluggable database pdb5 open;


-- Clean up new clone
conn sys/oracle@cdb1 as sysdba

alter pluggable database pdb5new close

drop pluggable database pdb5new including datafiles;

-- Check the PDBs.

SQL> select name,open_mode from v$pdbs order by name;


12c : Clone a Remote Non-Container Database CDB

---------------------------------------------------------------------------
Multitenant : Clone a Remote Pluggable Database (PDB)
---------------------------------------------------------------------------
-- In my case Two database exist on my single machine One is CDB database and Second is Non-CDB.



-- Connect to remote non-CDB.
sqlplus sys/oracle@db12c as sysdba

SQL> select name,open_mode from v$database;

SQL> show con_name


-- create a user we can point a database link to.

SQL> drop user remote_clone_user CASCADE;


SQL> CREATE USER remote_clone_user IDENTIFIED BY remote_clone_user;
create user remote_clone_user identified by remote_clone_user
            *
ERROR at line 1:
ORA-65096: invalid common user or role name


SQL> alter session set "_ORACLE_SCRIPT"=true;


SQL> CREATE USER remote_clone_user IDENTIFIED BY remote_clone_user;


SQL> grant create session,create pluggable database to remote_clone_user;



-- Switch the remote non-CDB to read-only mode.

shutdown immediate
startup mount
alter database open read only;


-- Connect to root container in the local CDB.
sqlplus sys/oracle@cdb1 AS sysdba


-- Check database name.
SQL> select name from v$database;


-- Create a database link to the remote PDB
SQL> Drop database link clone_link
SQL> Create Database Link clone_link connect to remote_clone_user identified by remote_clone_user using 'db12c';


-- Clone the remote PDB over the database link.
-- If You are not using OMF, Convert the paths if necessary.
SQL> Create Pluggable Database db12cpdb from NON$CDB@clone_link
FILE_NAME_CONVERT=('/u01/app/oracle/oradata/DB12C/','/u01/app/oracle/oradata/CDB1/db12cpdb/');

---else OMF

Create Pluggable Database db12cpdb from NON$CDB@clone_link
FILE_NAME_CONVERT=('/u01/app/oracle/oradata/DB12C/datafile/o1_mf_system_c6lq1ynk_.dbf',
'/u01/app/oracle/oradata/CDB1/DB12CPDB/datafile/system01.dbf',
'/u01/app/oracle/oradata/DB12C/datafile/o1_mf_sysaux_c6lpcz6v_.dbf',
'/u01/app/oracle/oradata/CDB1/DB12CPDB/datafile/sysaux01.dbf',
'/u01/app/oracle/oradata/DB12C/datafile/o1_mf_users_c6z4wyn8_.dbf',
'/u01/app/oracle/oradata/CDB1/DB12CPDB/datafile/users01.dbf',
'/u01/app/oracle/oradata/DB12C/datafile/o1_mf_temp_c6z65ljc_.tmp',
'/u01/app/oracle/oradata/CDB1/DB12CPDB/datafile/temp01.dbf');

-- Check the PDBs.
SQL> select name,open_mode from v$pdbs order by name;



-- Clean up extra components not found in a PDB.
-- ALTER SESSION SET CONTAINER=db12cpdb;
-- @$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql


alter pluggable database db12cpdb OPEN;


-- Open the remote non-CDB in read-write mode again.
SQL> conn sys/oracle@cb12c as sysdba

shutdown immediate
startup



-- Clean up new clone
conn sys/oracle@cdb1 as sysdba

alter pluggable database db12cpdb close;

drop pluggable database db12cpdb including datafiles;

-- Check the PDBs.

SQL> select name,open_mode from v$pdbs order by name;


12c : Clone a Local Pluggable Database PDB

-----------------------------------------------------------------------
Multitenant: Clone a Local Pluggable Database (PDB)
-----------------------------------------------------------------------

SQL> show con_name

SQL> show con_id

SQL> select name,open_mode from v$pdbs order by name;


SQL> alter pluggable database PDB1 open;
else
 alter pluggable database PDB1 close;


SQL> alter pluggable database pdb1 open read only;


---------------------------------------
Verify PDB1 Datafile location
---------------------------------------

 select FILE_NAME from cdb_data_files where CON_ID=3;


-------------------------------------------------------------
-- Clone the PDB.
-- If you are not using OMF, convert the paths.
-------------------------------------------------------------

SQL> create pluggable database pdb2 from pdb1
FILE_NAME_CONVERT=('/u01/app/oracle/oradata/cdb1/pdb1/',
  '/u01/app/oracle/oradata/cdb1/pdb2/');

OR

----------------------------------------
-- In My Case,I'm using OMF
----------------------------------------

create pluggable database pdb2 from pdb1
FILE_NAME_CONVERT=('/u01/app/oracle/oradata/CDB1/260DCEBE98C33DD5E0530100007F3F93/datafile/o1_mf_system_c62fglyh_.dbf',
'/u01/app/oracle/oradata/CDB1/pdb2/datafile/system01.dbf',
'/u01/app/oracle/oradata/CDB1/260DCEBE98C33DD5E0530100007F3F93/datafile/o1_mf_sysaux_c62fgm20_.dbf',
'/u01/app/oracle/oradata/CDB1/pdb2/datafile/sysaux01.dbf',
'/u01/app/oracle/oradata/CDB1/260DCEBE98C33DD5E0530100007F3F93/datafile/o1_mf_users_c62fm10j_.dbf',
'/u01/app/oracle/oradata/CDB1/pdb2/datafile/users01.dbf',
'/u01/app/oracle/oradata/CDB1/260DCEBE98C33DD5E0530100007F3F93/datafile/o1_mf_slob_c62qjm9g_.dbf',
'/u01/app/oracle/oradata/CDB1/pdb2/datafile/slob01.dbf',
'/u01/app/oracle/oradata/CDB1/260DCEBE98C33DD5E0530100007F3F93/datafile/o1_mf_temp_c62fgm21_.dbf',
'/u01/app/oracle/oradata/CDB1/pdb2/datafile/temp01.dbf');


SQL> alter pluggable database pdb2 open;

SQL> select name ,open_mode from v$pdbs order by name;

SQL> alter pluggable database pdb1 close;

SQL> alter pluggable database pdb1 open;




---------------------------
-- Remove the Clone
----------------------------

SQL> alter pluggable database pdb2 close;

SQL> Drop pluggable database pdb2 including datafiles;

SQL> select name ,open_mode from v$pdbs order by name;

Friday, 1 January 2016

DB File Scattered Read

This event signifies that the user process is reading buffers into the SGA buffer cache and is waiting for a physical I/O call to return. A db file scattered read issues a scattered read to read the data into multiple discontinuous memory locations. A scattered read is usually a multiblock read. It can occur for a fast full scan (of an index) in addition to a full table scan.

The db file scattered read wait event identifies that a full scan is occurring. When performing a full scan into the buffer cache, the blocks read are read into memory locations that are not physically adjacent to each other. Such reads are called scattered read calls, because the blocks are scattered throughout memory. This is why the corresponding wait event is called 'db file scattered read'. multiblock (up to DB_FILE_MULTIBLOCK_READ_COUNT blocks) reads due to full scans into the buffer cache show up as waits for 'db file scattered read'

In one line we can describe scattered read  The Oracle session has requested and is  waiting for multiple contiguous database blocks to be read into the SGA from disk.

Cuase :

·                     Full Table scans
·                     Fast Full Index Scans
·                     Missing or unselective or unusable  index
·                     Table not analyzed after created index or lack of accurate statistics for the optimizer

If an application that has been running fine for a while suddenly clocks a lot of time on the db file scattered read event and there hasn’t been a code change, you might want to check to see if one or more indexes has been dropped or become unusable.

SELECT p1 "file#", p2 "block#", p3 "class#",event
FROM v$session_wait
WHERE event IN ('db file scattered read');

Where P1,P2,P3 are
P1 - The absolute file number
P2 - The block being read
P3 - The number of blocks (should be greater than 1)


SELECT relative_fno, owner, segment_name, segment_type
FROM DBA_EXTENTS
WHERE file_id = &FILE
AND &BLOCK BETWEEN block_id
AND block_id + blocks - 1;


SELECT
      s.p1 file_id, s.p2 block_id,o.object_name obj,
       o.object_type otype,
       s.SQL_ID,
       w.CLASS,event
FROM v$session s,
     ( SELECT ROWNUM CLASS#, CLASS FROM v$waitstat ) w,
      ALL_OBJECTS o
WHERE
 event IN ('db file scattered read')
AND
    w.CLASS#(+)=s.p3
   AND o.object_id (+)= s.row_wait_OBJ#
ORDER BY 1;


Find the related SQL statement  using sql_id

SELECT sql_text FROM   v$sqltext WHERE sql_id=&sq_id ORDER BY piece


You can also find the objects using below sql :-

Finding the SQL Statement executed by Sessions Waiting for I/O

SELECT SQL_ADDRESS, SQL_HASH_VALUE
  FROM V$SESSION
 WHERE EVENT ='read by other session';

Finding the Object Requiring I/O

SELECT row_wait_obj#
  FROM V$SESSION
 WHERE EVENT = 'db file scattered read';

To identify the object and object type contended for, query DBA_OBJECTS using the value for ROW_WAIT_OBJ# that is returned from V$SESSION. For example:

SELECT owner, object_name, subobject_name, object_type
  FROM DBA_OBJECTS
 WHERE data_object_id = &row_wait_obj;


Once you identify the hot blocks and the segments they belong to, and related quires then you reduce the using following solutions

1.   Optimize the SQL statement that  initiated most of the waits. The goal is  to minimize the number of physical  and logical reads.

2.   Optimize multi-block I/O by setting the parameter DB_FILE_MULTIBLOCK_READ_COUNT i.e if
DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter  values too high which favors full scans reduce it.

Partition pruning to reduce number of  blocks visited  Consider the usage of multiple buffer  pools and cache frequently used  indexes/tables in the KEEP pool   Make sure that  the query use the right driving table?

  Are the SQL predicates appropriate  for hash or merge join?

5   If full scans are appropriate, can  parallel query improve the response  time?

 6  The objective is to reduce the demands for both the logical and physical I/Os, and this is best  achieved through SQL and application tuning.

Gather statistics the related objects if they are missing  . 

8 Check the LAST_ANALYZED date from user_tables view

DataBase Refresh / Cloning

Database Refresh / Cloning / Migration

---------------------------
Non-ASM 10g/11g 
---------------------------
Reference: 

http://oraclegurukul.blogspot.in/2012/12/rman-cloning-10g-11g.html

----------------------------------------------------
DB Refresh DataPump
----------------------------------------------------
Reference:

https://born4dba.wordpress.com/2012/05/19/schema-refresh-using-data-pump/

-----------------------------------------------------------------------
11g ASM DB Refresh With DataPump
-----------------------------------------------------------------------
Reference:

http://select-star-from.blogspot.in/2013/08/schema-refresh-using-asm-diskgroups.html

--------------------------------------------------------------------------------------------------------------
RAC 10g/11g RMAN Cloning/Refresh RAC ASM to ASM & ASM to NON ASM 
--------------------------------------------------------------------------------------------------------------
Reference: 

https://simplyoracle.blogspot.in/2010/04/rac-database-refreshcloning-on-asm.html
https://dbmentors.blogspot.in/2012/04/rman-duplicate-database-from-rac-asm-to.html
https://gavinsoorma.com/2012/04/rman-duplicate-database-rac-to-rac/
https://nikhilappsdba.blogspot.in/2014/01/cloning-11g-rac-asm-to-non-asm.html
https://www.pythian.com/blog/duplicate-from-active-database-using-rman-step-by-step/
https://tsbreddy.blogspot.in/2012/06/rman-refresh-steps.html
https://dbarock.blogspot.in/2012/08/database-refresh-using-rman.html

---------------------------
Migration :
 -----------------------

NON-ASM to ASM in 10g Different host

NON-ASM to ASM in 11g (Different host Duplicate Command)

NON-ASM 10g to ASM 11g (Different host)

Non-ASM 11g Database To ASM (Same Host)


NON-ASM to ASM in 11g Different host

1- Connect to 11g database and take RMAN full backup.

    #rman target /
    RMAN> backup as backupset database;

2- Copy 10g database backup files and archive files to 11g database server.


Install and Configure Standalone Grid infrastructure and mount the ASM DISK(+DATA)
Install the 11.2.0.3 Software alone.

The steps for 11g database:

3- Create temporary pfile in $ORACLE_HOME/dbs

*.audit_file_dest=’/oracle/admin/orcl/adump’
*.compatible=’11.2.0.0.0’
*.control_files=’+DATA/orcl/controlfile/current.1′,’+DATA/orcl/controlfile/current.2′
*.db_block_size=8192
*.db_create_file_dest=’+DATA’
*.db_create_online_log_dest_1=’+RECO’
*.db_name=’ORCL’
*.diagnostic_dest=’/oracle’
*.job_queue_processes=0
*.open_cursors=300
*.pga_aggregate_target=1G
*.processes=150
*.remote_login_passwordfile=’EXCLUSIVE’
*.sga_target=2G
*.undo_tablespace=’UNDOTBS1′

4- Open the database in NOMOUNT state.

    # rman target /

    RMAN> startup nomount;

5- Restore controlfile from backup.
    RMAN>LIST BACKUP OF CONTROLFILE

    RMAN> restore controlfile from ‘<backup piece name>′;

6- Open the database in MOUNT state.

    RMAN> alter database mount;

7- Catalog RMAN backup files and archive log files.(Ensure you have restored the backup copy)

    RMAN> catalog start with ‘/oracle/ora11g/orcl/backup’;
    RMAN> catalog start with ‘/oracle/ora11g/orcl/archive’;

8- Restore 10g database backup to +DATA diskgroup and perform incomplete recovery.

    RMAN> run
    {
    allocate channel c1 device type disk;
    SET NEWNAME FOR DATAFILE 1 TO ‘+DATA’;
    SET NEWNAME FOR DATAFILE 2 TO ‘+DATA’;
    SET NEWNAME FOR DATAFILE 3 TO ‘+DATA’;
    SET NEWNAME FOR DATAFILE 4 TO ‘+DATA’;
    restore database until sequence 4;
    switch datafile all;
    recover database until sequence 4;
    }


    ###shutdown the  database

    asmcmd>cd DATA/ControlFile