Insights and Strategies for DBA

Wednesday, 6 January 2016

ORA-15077 Has a disk HB but no network HB

Few days ago i have face some issue in RAC below the full explanation the issue ,detection and resolution.

---------------
Problem :
---------------

At the time of starting Grid Infrastructure facing issue.
Vice-Versa Issue At a time only single node work and 2nd node all services started but ora.crsd and ora.asm not started.

---------------

NODE 1:

---------------

[root@rac1 ~]# crsctl stat res -init -t

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

1 ONLINE ONLINE rac1 Started

ora.crsd

1 ONLINE ONLINE rac1

ora.cssd

1 ONLINE ONLINE rac1

ora.cssdmonitor

1 ONLINE ONLINE rac1

ora.ctssd

1 ONLINE ONLINE rac1 ACTIVE:0

ora.diskmon

1 ONLINE ONLINE rac1

ora.evmd

1 ONLINE ONLINE rac1

ora.gipcd

1 ONLINE ONLINE rac1

ora.gpnpd

1 ONLINE ONLINE rac1

ora.mdnsd

1 ONLINE ONLINE rac1

--- /app/gridHome/log/rac1/alertrac1.log

2015-12-28 21:14:01.314

[ohasd(5082)]CRS-2112:The OLR service started on node rac1.

2015-12-28 21:14:02.084

[ohasd(5082)]CRS-8011:reboot advisory message from host: rac1, component: ag163608, with time stamp: L-2015-12-25-17:13:18.517

[ohasd(5082)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS

2015-12-28 21:14:02.368

[ohasd(5082)]CRS-8011:reboot advisory message from host: rac1, component: ag203044, with time stamp: L-2015-12-26-22:10:09.463

[ohasd(5082)]CRS-8013:reboot advisory message text: Rebooting after limit 760 exceeded; disk timeout 710, network timeout 760, last heartbeat from CSSD at epoch seconds 1451148008.203, 1251 milliseconds ago based on invariant clock value of 5249663

2015-12-28 21:14:02.785

[ohasd(5082)]CRS-8011:reboot advisory message from host: rac1, component: mo163608, with time stamp: L-2015-12-25-17:13:18.517

[ohasd(5082)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS

2015-12-28 21:14:02.936

[ohasd(5082)]CRS-8017:location: /etc/oracle/lastgasp has 34 reboot advisory log files, 3 were announced and 0 errors occurred

2015-12-28 21:14:33.180

[ohasd(5082)]CRS-2772:Server 'rac1' has been assigned to pool 'Free'.

2015-12-28 21:14:36.486

[ohasd(5082)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).

2015-12-28 21:14:44.368

[cssd(5347)]CRS-1713:CSSD daemon is started in clustered mode

2015-12-28 21:15:00.603

[cssd(5347)]CRS-1707:Lease acquisition for node rac1 number 1 completed

2015-12-28 21:15:00.707

[cssd(5347)]CRS-1605:CSSD voting file is online: ORCL:OCR; details in /app/gridHome/log/rac1/cssd/ocssd.log.

2015-12-28 21:15:18.881

[cssd(5347)]CRS-1625:Node rac2, number 2, was manually shut down

2015-12-28 21:15:28.300

[cssd(5347)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 .

2015-12-28 21:15:31.235

[ctssd(5410)]CRS-2407:The new Cluster Time Synchronization Service reference node is host rac1.

2015-12-28 21:15:31.337

[ctssd(5410)]CRS-2401:The Cluster Time Synchronization Service started on host rac1.

2015-12-28 21:16:40.946

[crsd(5644)]CRS-1012:The OCR service started on node rac1.

2015-12-28 21:16:46.469

[crsd(5644)]CRS-1201:CRSD started on node rac1.

2015-12-28 21:16:55.272

[crsd(5644)]CRS-2772:Server 'rac1' has been assigned to pool 'Generic'.

2015-12-28 21:16:55.273

[crsd(5644)]CRS-2772:Server 'rac1' has been assigned to pool 'ora.racdb'.

--- /app/gridHome/log/rac1/crsd/crsd.log

2015-12-28 21:17:01.730: [UiServer][2572688272] Sending message to PE. ctx= 0xb58edc0

2015-12-28 21:17:01.730: [UiServer][2572688272] processMessage called

2015-12-28 21:17:01.730: [UiServer][2572688272] Sending message to PE. ctx= 0xb82c1a8

2015-12-28 21:17:01.731: [ CRSPE][2576890768] Cmd : 0x9900cc70 : flags: EVENT_TAG | HOST_TAG

2015-12-28 21:17:01.731: [ CRSPE][2576890768] Processing PE command id=115. Description: [Start Resource : 0x9900cc70]

2015-12-28 21:17:01.731: [ CRSPE][2576890768] Attribute overrides for the command: USR_ORA_OPI = true;

2015-12-28 21:17:01.735: [ CRSPE][2576890768] PE Command [ Start Resource : 0x9900cc70 ] has completed

2015-12-28 21:17:01.735: [ CRSPE][2576890768] UI Command [Start Resource : 0x9900cc70] is replying to sender.

2015-12-28 21:17:01.735: [ CRSPE][2576890768] Processing PE command id=116. Description: [Stat Resource : 0x9900cc70]

2015-12-28 21:17:01.737: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed

2015-12-28 21:17:01.737: [ CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.

2015-12-28 21:17:01.738: [UiServer][2572688272] Container [ Name: ORDER

MESSAGE:

TextMessage[CRS-2545: Cannot operate on 'instance of ora.LISTENER.lsnr assigned to rac1'. It is locked by 'SYSTEM' for command 'Resource Autostart : rac1']

MSGTYPE:

TextMessage[1]

OBJID:

TextMessage[ora.LISTENER.lsnr]

WAIT:

TextMessage[0]

]

2015-12-28 21:17:01.738: [ COMMCRS][2572688272]clscsendx: (0xb7b25f8) Connection not active

2015-12-28 21:17:01.738: [UiServer][2572688272] S(0xb56cfb8): Error sending msg over socket.6

2015-12-28 21:17:01.738: [UiServer][2572688272] Fatal exception sending reply back to client.FatalCommsException : Failed to send response to client.

(File: clsMessageStream.cpp, line: 275

2015-12-28 21:17:01.739: [UiServer][2572688272] Container [ Name: UI_DATA

ora.LISTENER.lsnr:

TextMessage[233]

]

2015-12-28 21:17:01.739: [UiServer][2572688272] S(0xb56cfb8): No connection to client.6

2015-12-28 21:17:01.739: [UiServer][2572688272] Fatal exception sending reply back to client.FatalCommsException : Failed to send response to client.

(File: clsMessageStream.cpp, line: 275

2015-12-28 21:17:01.739: [UiServer][2572688272] Done for ctx=0xb58edc0

2015-12-28 21:17:01.743: [UiServer][2572688272] Done for ctx=0xb82c1a8

2015-12-28 21:17:01.858: [UiServer][2570587024] S(0xb7b2570): set Properties ( grid,0xb7a1f00)

2015-12-28 21:17:01.875: [UiServer][2572688272] processMessage called

2015-12-28 21:17:01.876: [UiServer][2572688272] Sending message to PE. ctx= 0xb7a2538

2015-12-28 21:17:01.877: [ CRSPE][2576890768] Processing PE command id=117. Description: [Stat Resource : 0x9900cc70]

2015-12-28 21:17:01.881: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed

2015-12-28 21:17:01.882: [ CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.

2015-12-28 21:17:01.886: [UiServer][2572688272] Done for ctx=0xb7a2538

2015-12-28 21:17:01.922: [UiServer][2570587024] S(0xb6e09b0): set Properties ( grid,0xb7b8fb0)

2015-12-28 21:17:01.934: [UiServer][2572688272] processMessage called

2015-12-28 21:17:01.934: [UiServer][2572688272] Sending message to PE. ctx= 0xb7b25f8

2015-12-28 21:17:01.934: [ CRSPE][2576890768] Processing PE command id=118. Description: [Stat Resource : 0x9900cc70]

2015-12-28 21:17:01.936: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed

2015-12-28 21:17:01.937: [ CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.

2015-12-28 21:17:01.938: [UiServer][2572688272] Done for ctx=0xb7b25f8

2015-12-28 21:17:01.947: [UiServer][2570587024] S(0xb79fe88): set Properties ( grid,0xb7a24e8)

2015-12-28 21:17:01.958: [UiServer][2572688272] processMessage called

2015-12-28 21:17:01.958: [UiServer][2572688272] Sending message to PE. ctx= 0xb70b938

2015-12-28 21:17:01.959: [ CRSPE][2576890768] Processing PE command id=119. Description: [Stat Resource : 0x9900cc70]

2015-12-28 21:17:01.960: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x9900cc70 ] has completed

2015-12-28 21:17:01.960: [ CRSPE][2576890768] UI Command [Stat Resource : 0x9900cc70] is replying to sender.

2015-12-28 21:17:01.961: [UiServer][2572688272] Done for ctx=0xb70b938

2015-12-28 21:17:02.224: [ AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:555 from the agent /app/gridHome/bin/oraagent_grid

2015-12-28 21:17:02.224: [ AGFW][2587397008] Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:550

2015-12-28 21:17:02.225: [ CRSPE][2576890768] Received reply to action [Start] message ID: 550

2015-12-28 21:17:02.371: [ AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:564 from the agent /app/gridHome/bin/oraagent_grid

2015-12-28 21:17:02.371: [ AGFW][2587397008] Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:563

2015-12-28 21:17:02.371: [ CRSPE][2576890768] Received reply to action [Start] message ID: 563

2015-12-28 21:17:02.374: [ AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:555 from the agent /app/gridHome/bin/oraagent_grid

2015-12-28 21:17:02.374: [ AGFW][2587397008] Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.LISTENER_SCAN1.lsnr 1 1] ID 4098:550

2015-12-28 21:17:02.375: [ CRSPE][2576890768] Received reply to action [Start] message ID: 550

2015-12-28 21:17:02.375: [ CRSPE][2576890768] RI [ora.LISTENER_SCAN1.lsnr 1 1] new external state [ONLINE] old value: [OFFLINE] on rac1 label = []

2015-12-28 21:17:02.376: [ CRSPE][2576890768] CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'rac1' succeeded

2015-12-28 21:17:02.377: [ CRSRPT][2574789520] Publishing event: Cluster Resource State Change Event for ora.LISTENER_SCAN1.lsnr:rac1 : 0xb7b93a8

2015-12-28 21:17:02.377: [ CRSRPT][2574789520] FPE: Publisher not ready.

2015-12-28 21:17:02.459: [ AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:564 from the agent /app/gridHome/bin/oraagent_grid

2015-12-28 21:17:02.459: [ AGFW][2587397008] Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.LISTENER.lsnr rac1 1] ID 4098:563

2015-12-28 21:17:02.459: [ CRSPE][2576890768] Received reply to action [Start] message ID: 563

2015-12-28 21:17:02.459: [ CRSPE][2576890768] RI [ora.LISTENER.lsnr rac1 1] new external state [ONLINE] old value: [OFFLINE] on rac1 label = []

2015-12-28 21:17:02.460: [ CRSPE][2576890768] CRS-2676: Start of 'ora.LISTENER.lsnr' on 'rac1' succeeded

2015-12-28 21:17:02.460: [ CRSPE][2576890768] PE Command [ Resource State Change (ora.net1.network rac1 1) : 0x99004ff8 ] has completed

2015-12-28 21:17:02.462: [ CRSRPT][2574789520] Publishing event: Cluster Resource State Change Event for ora.LISTENER.lsnr:rac1 : 0xb7a0208

2015-12-28 21:17:02.462: [ CRSRPT][2574789520] FPE: Publisher not ready.

2015-12-28 21:17:02.462: [ AGFW][2587397008] Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:620

2015-12-28 21:17:02.462: [ AGFW][2587397008] Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:620

2015-12-28 21:17:02.513: [ CRSPE][2576890768] Re-evaluation of queued op [START of [ora.LISTENER_SCAN1.lsnr 1 1] on [rac1] : 0x99005308]. found it no longer needed:CRS-0223: Resource 'ora.LISTENER_SCAN1.lsnr' has placement error.

. Finishing the op.

2015-12-28 21:17:02.514: [ CRSPE][2576890768] Re-evaluation of queued op [START of [ora.ons rac1 1] on [rac1] : 0x99005810]. found it no longer needed:CRS-0223: Resource 'ora.ons' has placement error.

. Finishing the op.

2015-12-28 21:17:02.516: [ CRSPE][2576890768] Re-evaluation of queued op [START of [ora.rac2.vip 1 1] on [rac1] : 0x99007e48]. found it no longer needed:CRS-0223: Resource 'ora.rac2.vip' has placement error.

. Finishing the op.

2015-12-28 21:17:02.517: [ CRSPE][2576890768] Re-evaluation of queued op [START of [ora.LISTENER.lsnr rac1 1] on [rac1] : 0x99008fd0]. found it no longer needed:CRS-0223: Resource 'ora.LISTENER.lsnr' has placement error.

. Finishing the op.

2015-12-28 21:17:08.729: [ AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.eons rac1 1] ID 4098:508 from the agent /app/gridHome/bin/oraagent_grid

2015-12-28 21:17:08.729: [ AGFW][2587397008] Agfw Proxy Server sending the reply to PE for message:RESOURCE_START[ora.eons rac1 1] ID 4098:507

2015-12-28 21:17:08.729: [ CRSPE][2576890768] Received reply to action [Start] message ID: 507

2015-12-28 21:17:08.730: [ AGFW][2587397008] Received the reply to the message: RESOURCE_START[ora.eons rac1 1] ID 4098:508 from the agent /app/gridHome/bin/oraagent_grid

2015-12-28 21:17:08.730: [ AGFW][2587397008] Agfw Proxy Server sending the last reply to PE for message:RESOURCE_START[ora.eons rac1 1] ID 4098:507

2015-12-28 21:17:08.730: [ CRSPE][2576890768] Received reply to action [Start] message ID: 507

2015-12-28 21:17:08.731: [ CRSPE][2576890768] RI [ora.eons rac1 1] new external state [ONLINE] old value: [OFFLINE] on rac1 label = []

2015-12-28 21:17:08.731: [ CRSPE][2576890768] CRS-2676: Start of 'ora.eons' on 'rac1' succeeded

2015-12-28 21:17:08.731: [ CRSPE][2576890768] PE Command [ Server Join: Resource Startup {rac1 } : 0x99004188 ] has completed

2015-12-28 21:17:08.732: [ CRSPE][2576890768] Updated state details for server rac1from [AUTOSTARTING RESOURCES] to : []

2015-12-28 21:17:08.732: [ CRSRPT][2574789520] Publishing event: Cluster Resource State Change Event for ora.eons:rac1 : 0xb733660

2015-12-28 21:17:08.732: [ CRSRPT][2574789520] FPE: Publisher not ready.

2015-12-28 21:17:08.733: [ CRSOCR][2585295760] Multi Write Batch processing...

2015-12-28 21:17:08.733: [ AGFW][2587397008] Agfw Proxy Server received the message: CMD_COMPLETED[Proxy] ID 20482:640

2015-12-28 21:17:08.733: [ AGFW][2587397008] Agfw Proxy Server replying to the message: CMD_COMPLETED[Proxy] ID 20482:640

2015-12-28 21:17:08.733: [ CRSRPT][2574789520] Reporter skipping ServerStateChange event to EVM

2015-12-28 21:17:08.733: [ CRSRPT][2574789520] Publishing event: Cluster Server State Change Event for rac1becoming ONLINE : 0xb7b20e0

2015-12-28 21:17:08.733: [ CRSRPT][2574789520] FPE: Publisher not ready.

2015-12-28 21:17:08.775: [ CRSPE][2576890768] Resource Autostart completed for rac1

2015-12-28 21:17:08.775: [ CRSOCR][2585295760] Multi Write Batch done.

2015-12-28 21:17:09.774: [ CRSRPT][2574789520] Publisher initialized flushing buffered events.

2015-12-28 21:17:09.774: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7db288

2015-12-28 21:17:09.774: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7a7aa0

2015-12-28 21:17:09.774: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb644228

2015-12-28 21:17:09.774: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7b00d8

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7ab9e0

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb79d6d8

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7a9818

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7b0178

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb531e08

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7ab2a0

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7addb0

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb73a9f8

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb70a520

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7af480

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7336e8

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb6bd5d0

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7b93a8

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7a0208

2015-12-28 21:17:09.775: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb733660

2015-12-28 21:17:09.776: [ CRSRPT][2574789520] Publish to eons buffered event : 0xb7b20e0

2015-12-28 21:17:52.973: [UiServer][2570587024] S(0xb7b80b8): set Properties ( grid,0xb6df8b8)

2015-12-28 21:17:52.985: [UiServer][2572688272] processMessage called

2015-12-28 21:17:52.985: [UiServer][2572688272] Sending message to PE. ctx= 0xb70a5b8

2015-12-28 21:17:52.985: [ CRSPE][2576890768] Processing PE command id=120. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:17:52.987: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:17:52.987: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:17:52.993: [UiServer][2572688272] Done for ctx=0xb70a5b8

2015-12-28 21:17:52.996: [UiServer][2570587024] S(0xb7bae90): set Properties ( grid,0xb70cdf0)

2015-12-28 21:17:53.008: [UiServer][2572688272] processMessage called

2015-12-28 21:17:53.008: [UiServer][2572688272] Sending message to PE. ctx= 0xb7b9ee0

2015-12-28 21:17:53.008: [ CRSPE][2576890768] Processing PE command id=121. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:17:53.010: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:17:53.010: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:17:53.014: [UiServer][2572688272] Done for ctx=0xb7b9ee0

2015-12-28 21:17:53.021: [UiServer][2570587024] S(0xb70d018): set Properties ( grid,0xb6df8b8)

2015-12-28 21:17:53.033: [UiServer][2572688272] processMessage called

2015-12-28 21:17:53.033: [UiServer][2572688272] Sending message to PE. ctx= 0xb70a5b8

2015-12-28 21:17:53.033: [ CRSPE][2576890768] Processing PE command id=122. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:17:53.035: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:17:53.035: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:17:53.064: [UiServer][2572688272] Done for ctx=0xb70a5b8

2015-12-28 21:17:53.065: [ AGFW][2587397008] Agfw Proxy Server received the message: RESOURCE_STATUS[Proxy] ID 20481:198

2015-12-28 21:17:53.065: [ AGFW][2587397008] Agfw Proxy Server replying to the message: RESOURCE_STATUS[Proxy] ID 20481:198

2015-12-28 21:20:34.067: [UiServer][2570587024] S(0xb7078c8): set Properties ( root,0xb6df8b8)

2015-12-28 21:20:34.078: [UiServer][2572688272] processMessage called

2015-12-28 21:20:34.079: [UiServer][2572688272] Sending message to PE. ctx= 0xb6e1bd0

2015-12-28 21:20:34.079: [ CRSPE][2576890768] Processing PE command id=123. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:20:34.080: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:20:34.080: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:20:34.083: [UiServer][2572688272] Done for ctx=0xb6e1bd0

2015-12-28 21:25:34.156: [UiServer][2570587024] S(0xb70e9c0): set Properties ( root,0xb70cdf0)

2015-12-28 21:25:34.168: [UiServer][2572688272] processMessage called

2015-12-28 21:25:34.168: [UiServer][2572688272] Sending message to PE. ctx= 0xb6df8c8

2015-12-28 21:25:34.170: [ CRSPE][2576890768] Processing PE command id=124. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:25:34.171: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:25:34.171: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:25:34.172: [UiServer][2572688272] Done for ctx=0xb6df8c8

2015-12-28 21:30:34.209: [UiServer][2570587024] S(0xb7a35f8): set Properties ( root,0xb70cdf0)

2015-12-28 21:30:34.220: [UiServer][2572688272] processMessage called

2015-12-28 21:30:34.221: [UiServer][2572688272] Sending message to PE. ctx= 0xb736340

2015-12-28 21:30:34.222: [ CRSPE][2576890768] Processing PE command id=125. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:30:34.222: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:30:34.223: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:30:34.226: [UiServer][2572688272] Done for ctx=0xb736340

2015-12-28 21:35:34.276: [UiServer][2570587024] S(0xb70d468): set Properties ( root,0xb73ca58)

2015-12-28 21:35:34.288: [UiServer][2572688272] processMessage called

2015-12-28 21:35:34.289: [UiServer][2572688272] Sending message to PE. ctx= 0xb6dfd38

2015-12-28 21:35:34.290: [ CRSPE][2576890768] Processing PE command id=126. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:35:34.291: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:35:34.291: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:35:34.292: [UiServer][2572688272] Done for ctx=0xb6dfd38

2015-12-28 21:40:34.542: [UiServer][2570587024] S(0xb73bc60): set Properties ( root,0xb70e238)

2015-12-28 21:40:34.554: [UiServer][2572688272] processMessage called

2015-12-28 21:40:34.554: [UiServer][2572688272] Sending message to PE. ctx= 0xb739c30

2015-12-28 21:40:34.555: [ CRSPE][2576890768] Processing PE command id=127. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:40:34.555: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:40:34.555: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:40:34.557: [UiServer][2572688272] Done for ctx=0xb739c30

2015-12-28 21:45:34.632: [UiServer][2570587024] S(0xb73cb78): set Properties ( root,0xb7b8000)

2015-12-28 21:45:34.644: [UiServer][2572688272] processMessage called

2015-12-28 21:45:34.644: [UiServer][2572688272] Sending message to PE. ctx= 0xb6b81a8

2015-12-28 21:45:34.645: [ CRSPE][2576890768] Processing PE command id=128. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:45:34.645: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:45:34.646: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:45:34.650: [UiServer][2572688272] Done for ctx=0xb6b81a8

2015-12-28 21:50:34.776: [UiServer][2570587024] S(0xb73bc60): set Properties ( root,0xb7b8000)

2015-12-28 21:50:34.793: [UiServer][2572688272] processMessage called

2015-12-28 21:50:34.794: [UiServer][2572688272] Sending message to PE. ctx= 0xb6e3ee8

2015-12-28 21:50:34.795: [ CRSPE][2576890768] Processing PE command id=129. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:50:34.795: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:50:34.796: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:50:34.807: [UiServer][2572688272] Done for ctx=0xb6e3ee8

2015-12-28 21:55:34.772: [UiServer][2570587024] S(0xb73bc60): set Properties ( root,0xb7b8000)

2015-12-28 21:55:34.785: [UiServer][2572688272] processMessage called

2015-12-28 21:55:34.785: [UiServer][2572688272] Sending message to PE. ctx= 0xb7af430

2015-12-28 21:55:34.786: [ CRSPE][2576890768] Processing PE command id=130. Description: [Stat Resource : 0x99000d80]

2015-12-28 21:55:34.787: [ CRSPE][2576890768] PE Command [ Stat Resource : 0x99000d80 ] has completed

2015-12-28 21:55:34.787: [ CRSPE][2576890768] UI Command [Stat Resource : 0x99000d80] is replying to sender.

2015-12-28 21:55:34.788: [UiServer][2572688272] Done for ctx=0xb7af430

--- /app/gridHome/log/rac1/cssd/ocssd.log

2015-12-28 21:58:52.166: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:58:55.061: [ CSSD][2998131600]clssnmvSchedDiskThreads: DiskPingThread for voting file ORCL:OCR sched delay 1840 > margin 1500 cur_ms 1931954 lastalive 1930114

2015-12-28 21:58:55.061: [ CSSD][2998131600]clssnmvSchedDiskThreads: KillBlockThread for voting file ORCL:OCR sched delay 1830 > margin 1500 cur_ms 1931954 lastalive 1930124

2015-12-28 21:58:57.059: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:58:57.059: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:01.068: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:01.068: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:05.079: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:05.079: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:09.091: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:09.091: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:13.099: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:13.099: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:18.110: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:18.110: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

2015-12-28 21:59:23.120: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:23.120: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

2015-12-28 21:59:28.131: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:28.131: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

2015-12-28 21:59:33.140: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:33.140: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

2015-12-28 21:59:37.150: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:37.150: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:41.158: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:41.158: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:45.168: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:45.168: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:49.176: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:49.177: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:53.186: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:53.186: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 21:59:57.195: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 21:59:57.195: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 22:00:01.206: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 22:00:01.206: [ CSSD][2898258832]clssnmSendingThread: sent 4 status msgs to all nodes

2015-12-28 22:00:06.217: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 22:00:06.217: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

2015-12-28 22:00:11.228: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 22:00:11.228: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

2015-12-28 22:00:16.239: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 22:00:16.239: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

2015-12-28 22:00:21.249: [ CSSD][2898258832]clssnmSendingThread: sending status msg to all nodes

2015-12-28 22:00:21.249: [ CSSD][2898258832]clssnmSendingThread: sent 5 status msgs to all nodes

--- /app/oracleBase/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log

Starting up:

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production

With the Real Application Clusters and Automatic Storage Management options.

Using parameter settings in server-side spfile +OCR/rac-cluster/asmparameterfile/registry.253.866216815

System parameters with non-default values:

large_pool_size = 12M

instance_type = "asm"

remote_login_passwordfile= "EXCLUSIVE"

asm_diskstring = "ORCL:*"

asm_diskgroups = "FRA"

asm_diskgroups = "DATA"

asm_power_limit = 1

diagnostic_dest = "/app/oracleBase"

Cluster communication is configured to use the following interface(s) for this instance

192.168.1.151

cluster interconnect IPC version:Oracle UDP/IP (generic)

IPC Vendor 1 proto 2

Mon Dec 28 21:16:07 2015

PMON started with pid=2, OS id=5521

Mon Dec 28 21:16:08 2015

VKTM started with pid=3, OS id=5525 at elevated priority

VKTM running at (10)millisec precision with DBRM quantum (100)ms

Mon Dec 28 21:16:08 2015

GEN0 started with pid=4, OS id=5531

Mon Dec 28 21:16:08 2015

DIAG started with pid=5, OS id=5535

Mon Dec 28 21:16:08 2015

PING started with pid=6, OS id=5539

Mon Dec 28 21:16:08 2015

PSP0 started with pid=7, OS id=5543

Mon Dec 28 21:16:10 2015

DIA0 started with pid=8, OS id=5547

Mon Dec 28 21:16:11 2015

LMON started with pid=9, OS id=5552

Mon Dec 28 21:16:11 2015

LMD0 started with pid=10, OS id=5556

Mon Dec 28 21:16:11 2015

LMS0 started with pid=11, OS id=5560 at elevated priority

Mon Dec 28 21:16:12 2015

LMHB started with pid=12, OS id=5566

Mon Dec 28 21:16:12 2015

MMAN started with pid=13, OS id=5571

Mon Dec 28 21:16:12 2015

LGWR started with pid=15, OS id=5579

Mon Dec 28 21:16:12 2015

DBW0 started with pid=14, OS id=5575

Mon Dec 28 21:16:12 2015

CKPT started with pid=16, OS id=5583

Mon Dec 28 21:16:13 2015

SMON started with pid=17, OS id=5587

Mon Dec 28 21:16:13 2015

GMON started with pid=19, OS id=5597

Mon Dec 28 21:16:13 2015

RBAL started with pid=18, OS id=5591

Mon Dec 28 21:16:13 2015

MMON started with pid=20, OS id=5601

Mon Dec 28 21:16:13 2015

MMNL started with pid=21, OS id=5605

lmon registered with NM - instance number 1 (internal mem no 0)

Reconfiguration started (old inc 0, new inc 2)

ASM instance

List of instances:

1 (myinst: 1)

Global Resource Directory frozen

* allocate domain 0, invalid = TRUE

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Post SMON to start 1st pass IR

Submitted all GCS remote-cache requests

Post SMON to start 1st pass IR

Fix write in gcs resources

Reconfiguration complete

Mon Dec 28 21:16:16 2015

LCK0 started with pid=22, OS id=5615

Mon Dec 28 21:16:18 2015

ORACLE_BASE not set in environment. It is recommended

that ORACLE_BASE be set in the environment

Reusing ORACLE_BASE from an earlier startup = /app/oracleBase

Mon Dec 28 21:16:19 2015

SQL> ALTER DISKGROUP ALL MOUNT /* asm agent */

NOTE: Diskgroups listed in ASM_DISKGROUPS are

FRA

DATA

NOTE: Diskgroup used for Voting files is:

OCR

Diskgroup with spfile:OCR

Diskgroup used for OCR is:OCR

NOTE: cache registered group DATA number=1 incarn=0x6188a14e

NOTE: cache began mount (first) of group DATA number=1 incarn=0x6188a14e

NOTE: cache registered group FRA number=2 incarn=0x6198a14f

NOTE: cache began mount (first) of group FRA number=2 incarn=0x6198a14f

NOTE: cache registered group OCR number=3 incarn=0x6198a150

NOTE: cache began mount (first) of group OCR number=3 incarn=0x6198a150

NOTE: Loaded library: /opt/oracle/extapi/32/asm/orcl/1/libasm.so

NOTE: Assigning number (1,0) to disk (ORCL:DATA)

NOTE: Assigning number (2,0) to disk (ORCL:FRA)

NOTE: Assigning number (3,0) to disk (ORCL:OCR)

Mon Dec 28 21:16:25 2015

NOTE: start heartbeating (grp 1)

kfdp_query(DATA): 5

kfdp_queryBg(): 5

NOTE: cache opening disk 0 of grp 1: DATA label:DATA

NOTE: F1X0 found on disk 0 au 2 fcn 0.0

NOTE: cache mounting (first) external redundancy group 1/0x6188A14E (DATA)

Mon Dec 28 21:16:27 2015

* allocate domain 1, invalid = TRUE

Mon Dec 28 21:16:28 2015

NOTE: attached to recovery domain 1

NOTE: starting recovery of thread=1 ckpt=6.266 group=1 (DATA)

NOTE: advancing ckpt for thread=1 ckpt=6.266

NOTE: cache recovered group 1 to fcn 0.3427

Mon Dec 28 21:16:28 2015

NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATA)

NOTE: LGWR found thread 1 closed at ABA 6.265

NOTE: LGWR mounted thread 1 for diskgroup 1 (DATA)

NOTE: LGWR opening thread 1 at fcn 0.3427 ABA 7.266

NOTE: cache mounting group 1/0x6188A14E (DATA) succeeded

NOTE: cache ending mount (success) of group DATA number=1 incarn=0x6188a14e

NOTE: start heartbeating (grp 2)

kfdp_query(FRA): 7

kfdp_queryBg(): 7

NOTE: cache opening disk 0 of grp 2: FRA label:FRA

NOTE: F1X0 found on disk 0 au 2 fcn 0.0

NOTE: cache mounting (first) external redundancy group 2/0x6198A14F (FRA)

* allocate domain 2, invalid = TRUE

NOTE: attached to recovery domain 2

NOTE: starting recovery of thread=1 ckpt=6.183 group=2 (FRA)

NOTE: advancing ckpt for thread=1 ckpt=6.183

NOTE: cache recovered group 2 to fcn 0.1827

NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (FRA)

NOTE: LGWR found thread 1 closed at ABA 6.182

NOTE: LGWR mounted thread 1 for diskgroup 2 (FRA)

NOTE: LGWR opening thread 1 at fcn 0.1827 ABA 7.183

NOTE: cache mounting group 2/0x6198A14F (FRA) succeeded

NOTE: cache ending mount (success) of group FRA number=2 incarn=0x6198a14f

NOTE: start heartbeating (grp 3)

kfdp_query(OCR): 9

kfdp_queryBg(): 9

NOTE: cache opening disk 0 of grp 3: OCR label:OCR

NOTE: F1X0 found on disk 0 au 2 fcn 0.0

NOTE: cache mounting (first) external redundancy group 3/0x6198A150 (OCR)

* allocate domain 3, invalid = TRUE

NOTE: attached to recovery domain 3

Mon Dec 28 21:16:29 2015

NOTE: starting recovery of thread=1 ckpt=16.68 group=3 (OCR)

NOTE: advancing ckpt for thread=1 ckpt=16.68

NOTE: cache recovered group 3 to fcn 0.790

NOTE: LGWR attempting to mount thread 1 for diskgroup 3 (OCR)

NOTE: LGWR found thread 1 closed at ABA 16.67

NOTE: LGWR mounted thread 1 for diskgroup 3 (OCR)

NOTE: LGWR opening thread 1 at fcn 0.790 ABA 17.68

NOTE: cache mounting group 3/0x6198A150 (OCR) succeeded

NOTE: cache ending mount (success) of group OCR number=3 incarn=0x6198a150

Mon Dec 28 21:16:29 2015

kfdp_query(DATA): 10

kfdp_queryBg(): 10

NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1

SUCCESS: diskgroup DATA was mounted

kfdp_query(FRA): 11

kfdp_queryBg(): 11

NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2

SUCCESS: diskgroup FRA was mounted

kfdp_query(OCR): 12

kfdp_queryBg(): 12

NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 3

SUCCESS: diskgroup OCR was mounted

SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent */

SQL> ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */

SUCCESS: ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */

Mon Dec 28 21:16:31 2015

WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)

WARNING: failed to online diskgroup resource ora.FRA.dg (unable to communicate with CRSD/OHASD)

WARNING: failed to online diskgroup resource ora.OCR.dg (unable to communicate with CRSD/OHASD)

Mon Dec 28 21:16:35 2015

Starting background process ASMB

Mon Dec 28 21:16:35 2015

ASMB started with pid=24, OS id=5664

Mon Dec 28 21:17:53 2015

ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=rac1-vip)(PORT=1521))))' SCOPE=MEMORY SID='+ASM1';

-------------

NODE: 2

-------------

[root@rac2 ~]# crsctl stat res -init -t

--------------------------------------------------------------------------------

NAME TARGET STATE SERVER STATE_DETAILS

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

1 ONLINE OFFLINE

ora.crsd

1 ONLINE OFFLINE

ora.cssd

1 ONLINE OFFLINE

ora.cssdmonitor

1 ONLINE ONLINE rac2

ora.ctssd

1 ONLINE OFFLINE

ora.diskmon

1 ONLINE ONLINE rac2

ora.evmd

1 ONLINE OFFLINE

ora.gipcd

1 ONLINE ONLINE rac2

ora.gpnpd

1 ONLINE ONLINE rac2

ora.mdnsd

1 ONLINE ONLINE rac2

--- OCSSD.LOG

2015-12-27 00:09:03.052: [ CSSD][2888805264]clssnmLocalJoinEvent: begin on node(2), waittime 193000

2015-12-27 00:09:03.052: [ CSSD][2888805264]clssnmLocalJoinEvent: set curtime (4294388080) for my node

2015-12-27 00:09:03.052: [ CSSD][2888805264]clssnmLocalJoinEvent: scanning 32 nodes

2015-12-27 00:09:03.052: [ CSSD][2888805264]clssnmLocalJoinEvent: Node rac1, number 1, is in an existing cluster with disk state 3

2015-12-27 00:09:03.053: [ CSSD][2888805264]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk

2015-12-27 00:09:03.572: [ CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 61997, LATS 4294388600, lastSeqNo 61997, uniqueness 1451148676, timestamp 1451155163/6166804

2015-12-27 00:09:04.052: [ CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2015-12-27 00:09:04.581: [ CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 61998, LATS 4294389610, lastSeqNo 61998, uniqueness 1451148676, timestamp 1451155164/6167814

2015-12-27 00:09:04.955: [ CSSD][2899295120]clssnmSendingThread: sending join msg to all nodes

2015-12-27 00:09:04.955: [ CSSD][2899295120]clssnmSendingThread: sent 5 join msgs to all nodes

2015-12-27 00:09:05.054: [ CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2015-12-27 00:09:05.514: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: [network] failed connect attempt endp 0x9f21488 [00000000000002ec] { gipcEndpoint : localAddr 'gipc://rac2:2b90-7ba3-ca8e-6111#192.168.1.152#50593', remoteAddr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', numPend 0, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x80612, usrFlags 0x0 }, req 0x9f233e8 [00000000000002f5] { gipcConnectRequest : addr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', parentEndp 0x9f21488, ret gipcretConnection

2015-12-27 00:09:05.514: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos op : sgipcnTcpConnect

2015-12-27 00:09:05.514: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos dep : Connection timed out (110)

2015-12-27 00:09:05.515: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos loc : connect

2015-12-27 00:09:05.515: [ GIPCNET][2878315408]gipcmodNetworkProcessConnect: slos info: addr '192.168.1.151:12874'

2015-12-27 00:09:05.515: [ CSSD][2878315408]clssscSelect: conn complete ctx 0x99a1c48 endp 0x2ec

2015-12-27 00:09:05.515: [ CSSD][2878315408]clssnmeventhndlr: node(1), endp(0x2ec) failed, probe((nil)) ninf->endp (0x2ec) CONNCOMPLETE

2015-12-27 00:09:05.515: [ CSSD][2878315408]clssnmDiscHelper: rac1, node(1) connection failed, endp (0x2ec), probe((nil)), ninf->endp 0x2ec

2015-12-27 00:09:05.515: [ CSSD][2878315408]clssnmDiscHelper: node 1 clean up, endp (0x2ec), init state 0, cur state 0

2015-12-27 00:09:05.515: [GIPCXCPT][2878315408]gipcInternalDissociate: obj 0x9f21488 [00000000000002ec] { gipcEndpoint : localAddr 'gipc://rac2:2b90-7ba3-ca8e-6111#192.168.1.152#50593', remoteAddr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 } not associated with any container, ret gipcretFail (1)

2015-12-27 00:09:05.515: [GIPCXCPT][2878315408]gipcDissociateF [clssnmDiscHelper : clssnm.c : 3215]: EXCEPTION[ ret gipcretFail (1) ] failed to dissociate obj 0x9f21488 [00000000000002ec] { gipcEndpoint : localAddr 'gipc://rac2:2b90-7ba3-ca8e-6111#192.168.1.152#50593', remoteAddr 'gipc://rac1:nm_rac-cluster#192.168.1.151#12874', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a, usrFlags 0x0 }, flags 0x0

2015-12-27 00:09:05.517: [ CSSD][2878315408]clssnmDiscEndp: gipcDestroy 0x2ec

2015-12-27 00:09:05.588: [ CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 61999, LATS 4294390620, lastSeqNo 61999, uniqueness 1451148676, timestamp 1451155165/6168814

2015-12-27 00:09:05.589: [ CSSD][2878315408]clssnmconnect: connecting to addr gipc://rac1:nm_rac-cluster#192.168.1.151#12874

2015-12-27 00:09:05.590: [ CSSD][2878315408]clssscConnect: endp 0x2f8 - cookie 0x99a1c48 - addr gipc://rac1:nm_rac-cluster#192.168.1.151#12874

2015-12-27 00:09:05.590: [ CSSD][2878315408]clssnmconnect: connecting to node(1), endp(0x2f8), flags 0x10002

2015-12-27 00:09:06.056: [ CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2015-12-27 00:09:06.597: [ CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 62000, LATS 4294391630, lastSeqNo 62000, uniqueness 1451148676, timestamp 1451155166/6169824

2015-12-27 00:09:07.058: [ CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2015-12-27 00:09:07.603: [ CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 62001, LATS 4294392630, lastSeqNo 62001, uniqueness 1451148676, timestamp 1451155167/6170834

2015-12-27 00:09:08.060: [ CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

2015-12-27 00:09:08.611: [ CSSD][2974567312]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 313947976, wrtcnt, 62002, LATS 4294393640, lastSeqNo 62002, uniqueness 1451148676, timestamp 1451155168/6171844

2015-12-27 00:09:09.062: [ CSSD][2920274832]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0

--- CRSD.LOG

tail: cannot open `100' for reading: No such file or directory

==> crsd.log <==

2015-12-26 20:42:40.834: [ OCRASM][3046749904]proprasmcl: asmhandle is NULL

2015-12-26 20:42:40.834: [ OCRRAW][3046749904]proprinit: Could not open raw device

2015-12-26 20:42:40.834: [ OCRASM][3046749904]proprasmcl: asmhandle is NULL

2015-12-26 20:42:40.834: [ OCRAPI][3046749904]a_init:16!: Backend init unsuccessful : [26]

2015-12-26 20:42:40.835: [ CRSOCR][3046749904] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge

ORA-15077: could not locate ASM instance serving a required diskgroup

] [7]

2015-12-26 20:42:40.835: [ CRSD][3046749904][PANIC] CRSD exiting: Could not init OCR, code: 26

2015-12-26 20:42:40.835: [ CRSD][3046749904] Done.

--- GRID ALERT.LOG

(9009)]CRS-5822:Agent '/app/gridHome/bin/cssdagent_root' disconnected from server. Details at (:CRSAGF00117:) in /app/gridHome/log/rac2/agent/ohasd/oracssdagent_root/oracssdagent_root.log.

2015-12-27 00:06:15.655

[ohasd(4740)]CRS-2112:The OLR service started on node rac2.

2015-12-27 00:06:20.613

[ohasd(4740)]CRS-8011:reboot advisory message from host: rac2, component: mo162335, with time stamp: L-2015-12-25-17:27:14.683

[ohasd(4740)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS

2015-12-27 00:06:20.926

[ohasd(4740)]CRS-8017:location: /etc/oracle/lastgasp has 44 reboot advisory log files, 1 were announced and 0 errors occurred

2015-12-27 00:06:43.252

[ohasd(4740)]CRS-2772:Server 'rac2' has been assigned to pool 'Free'.

2015-12-27 00:06:46.574

[ohasd(4740)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).

2015-12-27 00:06:51.107

[cssd(5034)]CRS-1713:CSSD daemon is started in clustered mode

2015-12-27 00:07:01.189

[cssd(5034)]CRS-1707:Lease acquisition for node rac2 number 2 completed

2015-12-27 00:07:01.268

[cssd(5034)]CRS-1605:CSSD voting file is online: ORCL:OCR; details in /app/gridHome/log/rac2/cssd/ocssd.log.

--- +ASM2.log

Starting up:

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production

With the Real Application Clusters and Automatic Storage Management options.

Using parameter settings in server-side spfile +OCR/rac-cluster/asmparameterfile/registry.253.866216815

System parameters with non-default values:

large_pool_size = 12M

instance_type = "asm"

remote_login_passwordfile= "EXCLUSIVE"

asm_diskgroups = ""

asm_power_limit = 1

diagnostic_dest = "/app/oracleBase"

Cluster communication is configured to use the following interface(s) for this instance

192.168.1.152

cluster interconnect IPC version:Oracle UDP/IP (generic)

IPC Vendor 1 proto 2

Sat Dec 26 22:11:17 2015

PMON started with pid=2, OS id=6243

Sat Dec 26 22:11:17 2015

VKTM started with pid=3, OS id=6247 at elevated priority

VKTM running at (10)millisec precision with DBRM quantum (100)ms

Sat Dec 26 22:11:17 2015

GEN0 started with pid=4, OS id=6253

Sat Dec 26 22:11:17 2015

DIAG started with pid=5, OS id=6257

Sat Dec 26 22:11:18 2015

PING started with pid=6, OS id=6262

Sat Dec 26 22:11:18 2015

PSP0 started with pid=7, OS id=6267

Sat Dec 26 22:11:19 2015

DIA0 started with pid=8, OS id=6271

Sat Dec 26 22:11:19 2015

LMON started with pid=9, OS id=6276

Sat Dec 26 22:11:20 2015

LMD0 started with pid=10, OS id=6280

Sat Dec 26 22:11:20 2015

LMS0 started with pid=11, OS id=6284 at elevated priority

Sat Dec 26 22:11:21 2015

LMHB started with pid=12, OS id=6290

Sat Dec 26 22:11:21 2015

MMAN started with pid=13, OS id=6295

Sat Dec 26 22:11:22 2015

DBW0 started with pid=14, OS id=6299

Sat Dec 26 22:11:22 2015

LGWR started with pid=15, OS id=6303

Sat Dec 26 22:11:22 2015

SMON started with pid=17, OS id=6311

Sat Dec 26 22:11:23 2015

RBAL started with pid=18, OS id=6316

Sat Dec 26 22:11:23 2015

CKPT started with pid=16, OS id=6307

Sat Dec 26 22:11:23 2015

GMON started with pid=19, OS id=6320

Sat Dec 26 22:11:23 2015

MMON started with pid=20, OS id=6324

Sat Dec 26 22:11:23 2015

MMNL started with pid=21, OS id=6328

lmon registered with NM - instance number 2 (internal mem no 1)

Reconfiguration started (old inc 0, new inc 2)

ASM instance

List of instances:

2 (myinst: 2)

Global Resource Directory frozen

* allocate domain 0, invalid = TRUE

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Post SMON to start 1st pass IR

Submitted all GCS remote-cache requests

Post SMON to start 1st pass IR

Fix write in gcs resources

Reconfiguration complete

Sat Dec 26 22:11:25 2015

LCK0 started with pid=22, OS id=6337

Sat Dec 26 22:11:28 2015

ORACLE_BASE not set in environment. It is recommended

that ORACLE_BASE be set in the environment

Reusing ORACLE_BASE from an earlier startup = /app/oracleBase

Sat Dec 26 22:11:28 2015

SQL> ALTER DISKGROUP ALL MOUNT /* asm agent */

NOTE: Diskgroup used for Voting files is:

OCR

Diskgroup with spfile:OCR

Diskgroup used for OCR is:OCR

NOTE: cache registered group OCR number=1 incarn=0x81c19341

NOTE: cache began mount (first) of group OCR number=1 incarn=0x81c19341

NOTE: Loaded library: /opt/oracle/extapi/32/asm/orcl/1/libasm.so

NOTE: Assigning number (1,0) to disk (ORCL:OCR)

Sat Dec 26 22:11:35 2015

NOTE: start heartbeating (grp 1)

kfdp_query(OCR): 3

kfdp_queryBg(): 3

NOTE: cache opening disk 0 of grp 1: OCR label:OCR

NOTE: F1X0 found on disk 0 au 2 fcn 0.0

NOTE: cache mounting (first) external redundancy group 1/0x81C19341 (OCR)

Sat Dec 26 22:11:36 2015

* allocate domain 1, invalid = TRUE

Sat Dec 26 22:11:36 2015

NOTE: attached to recovery domain 1

NOTE: starting recovery of thread=1 ckpt=14.64 group=1 (OCR)

NOTE: advancing ckpt for thread=1 ckpt=14.64

NOTE: cache recovered group 1 to fcn 0.780

Sat Dec 26 22:11:36 2015

NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (OCR)

NOTE: LGWR found thread 1 closed at ABA 14.63

NOTE: LGWR mounted thread 1 for diskgroup 1 (OCR)

NOTE: LGWR opening thread 1 at fcn 0.780 ABA 15.64

NOTE: cache mounting group 1/0x81C19341 (OCR) succeeded

NOTE: cache ending mount (success) of group OCR number=1 incarn=0x81c19341

Sat Dec 26 22:11:36 2015

kfdp_query(OCR): 4

kfdp_queryBg(): 4

NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1

SUCCESS: diskgroup OCR was mounted

SUCCESS: ALTER DISKGROUP ALL MOUNT /* asm agent */

SQL> ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */

SUCCESS: ALTER DISKGROUP ALL ENABLE VOLUME ALL /* asm agent */

Sat Dec 26 22:11:38 2015

WARNING: failed to online diskgroup resource ora.OCR.dg (unable to communicate with CRSD/OHASD)

Sat Dec 26 22:18:24 2015

Detected partial connectivity in the cluster - IMR is being issued. This may cause some instances to be evicted

Evicting instance 1 from cluster

Reconfiguration started (old inc 2, new inc 4)

List of instances:

2 (myinst: 2)

Sat Dec 26 22:18:25 2015

Shutting down instance (immediate)

Shutting down instance: further logons disabled

Stopping background process MMNL

Global Resource Directory frozen

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

Sat Dec 26 22:18:25 2015

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Submitted all GCS remote-cache requests

Fix write in gcs resources

Reconfiguration complete

Stopping background process MMON

License high water mark = 1

SQL> ALTER DISKGROUP ALL DISMOUNT /* asm agent */

NOTE: cache dismounting (clean) group 1/0x81C19341 (OCR)

Sat Dec 26 22:18:26 2015

NOTE: LGWR doing clean dismount of group 1 (OCR)

NOTE: LGWR closing thread 1 of diskgroup 1 (OCR) at ABA 15.65

NOTE: LGWR released thread recovery enqueue

Sat Dec 26 22:18:26 2015

freeing rdom 1

Sat Dec 26 22:18:26 2015

NOTE: detached from domain 1

NOTE: cache dismounted group 1/0x81C19341 (OCR)

kfdp_dismount(): 5

Sat Dec 26 22:18:26 2015

kfdp_dismountBg(): 5

NOTE: De-assigning number (1,0) from disk (ORCL:OCR)

SUCCESS: diskgroup OCR was dismounted

NOTE: cache deleting context for group OCR 1/-2118020287

SUCCESS: ALTER DISKGROUP ALL DISMOUNT /* asm agent */

Reconfiguration started (old inc 4, new inc 6)

List of instances:

2 (myinst: 2)

Global Resource Directory frozen

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Submitted all GCS remote-cache requests

Fix write in gcs resources

Reconfiguration complete

Sat Dec 26 22:18:30 2015

WARNING: failed to offline diskgroup resource ora.OCR.dg (unable to communicate with CRSD/OHASD)

Sat Dec 26 22:18:31 2015

Stopping background process VKTM:

freeing rdom 0

Sat Dec 26 22:18:38 2015

Instance shutdown complete

--------------------

SOLUTION: -

--------------------

In my case the problem is ssh and the solution is co-ordinate with OS admin and resolved the issue of SSH make coonection grid and oracle user vice-versa NODES.

--------------

Solution 4:

--------------

First Check ssh connection work or not , In that case also above the errors.

sshd issue grid to grid

Problem:-- ssh: connect to host xxx.xxx.x.xx port 22: Connection timed out

--- Port listening Check ,DNS also Check.

# ps -A | grep sshd # results in

# lanscan

# netstat -in

# netstat -rn

# netstat -nat | grep 22

# netstat -ntl |grep :22

# sudo netstat -alnp|grep :22

# traceroute 10.230.xx.xx

# telnet 10.230.xx.xx 22

# reinstall ssh

Some relevant Solution available when facing the issue starting grid infrastructure.

--------------

Solution 1:

--------------

Oracle 11.2 with ASM, crsd fail to start on second node after reboot.

Description:

OS rhel 5.8 x64 bit. Oracle 11.2 CRS fail to start on second node after reboot.

Errors:

crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

crsd.log:

2013-11-05 15:58:19.095: [ OCRASM][1520055008]proprasmo: Error in open/create file in dg [DATACOST]

[ OCRASM][1520055008]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge

ORA-15077: could not locate ASM instance serving a required diskgroup

2013-11-05 15:58:19.097: [ OCRASM][1520055008]proprasmo: kgfoCheckMount returned [7]

2013-11-05 15:58:19.097: [ OCRASM][1520055008]proprasmo: The ASM instance is down

2013-11-05 15:58:19.097: [ OCRRAW][1520055008]proprioo: Failed to open [+DATACOST]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.

2013-11-05 15:58:19.097: [ OCRRAW][1520055008]proprioo: No OCR/OLR devices are usable

2013-11-05 15:58:19.097: [ OCRASM][1520055008]proprasmcl: asmhandle is NULL

2013-11-05 15:58:19.097: [ OCRRAW][1520055008]proprinit: Could not open raw device

2013-11-05 15:58:19.097: [ OCRASM][1520055008]proprasmcl: asmhandle is NULL

2013-11-05 15:58:19.097: [ OCRAPI][1520055008]a_init:16!: Backend init unsuccessful : [26]

2013-11-05 15:58:19.097: [ CRSOCR][1520055008] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge

ORA-15077: could not locate ASM instance serving a required diskgroup

] [7]

2013-11-05 15:58:19.097: [ CRSD][1520055008][PANIC] CRSD exiting: Could not init OCR, code: 26

2013-11-05 15:58:19.098: [ CRSD][1520055008] Done.

trying to start ASM manually (i have one user for grid and db – oracle, otherwise login to your grid owner).

su – oracle

export ORACLE_HOME=$GRID_HOME

export ORACLE_SID=+ASM2

asmcmd

amdcmd> startup

ORA-27154: post/wait create failed

ORA-27300: OS system dependent operation:semget failed with status: 28

ORA-27301: OS failure message: No space left on device

ORA-27302: failure occurred at: sskgpsemsper

This related to kernel parameter settings, if you have free space on your node. Check it with df -h.

-- Solution:

1. Edit /etc/sysctl.conf file and modify semmni value to:

kernel.sem = 250 32000 100 200

2. reboot the node.

All must work fine, but if you will not see crsd started again, login to grid owner user and try to start ASM instance manually.

You will see different error, not related to free space.

---------------

Solution 2:

---------------

Problem was in ASM parameter file:

01.+ASM1.asm_diskgroups='ARCH','REDO','OCR','VOTE'#Manual Mount

02.+ASM2.asm_diskgroups='REDO','VOTE','ARCH'#Manual Mount

+ASM1.asm_diskgroups='ARCH','REDO','OCR','VOTE'#Manual Mount

+ASM2.asm_diskgroups='REDO','VOTE','ARCH'#Manual Mount

I added missing 'OCR' to +ASM2.asm_diskgroups parameter. After physically rebooted both machines. And everything is OK.

01.[root@faa5 tmp]# crsctl check crs 02.CRS-4638: Oracle High Availability Services is online 03.CRS-4537: Cluster Ready Services is online 04.CRS-4529: Cluster Synchronization Services is online 05.CRS-4533: Event Manager is online 06.[root@faa5 tmp]# 07.[root@faa5 tmp]# 08.[root@faa5 tmp]# srvctl status database -d FAARACDB 09.Instance FAARACDB1 is running on node faa4 10.Instance FAARACDB2 is running on node faa5 11.[root@faa5 tmp]# [root@faa5 tmp]# crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

[root@faa5 tmp]#

[root@faa5 tmp]# srvctl status database -d FAARACDB

Instance FAARACDB1 is running on node faa4

Instance FAARACDB2 is running on node faa5

[root@faa5 tmp]#

---------------

Solution 3:

---------------

Execute script on those node which is not start.

root.sh script execute in RAC2 node.

Reference:

http://oraclegurukul.blogspot.in/2016/01/11g-r2-rac-startup-issues.html

RAC STARTUP ISSUES

11g R2

1. ocssd.log shows:

2012-01-27 13:42:58.796: [ CSSD][19]clssnmvDHBValidateNCopy: node 1, racnode1, has a disk HB, but no network HB, DHB has rcfg 223132864, wrtcnt, 1112, LATS 783238209,
lastSeqNo 1111, uniqueness 1327692232, timestamp 1327693378/787089065

2. For 3 or more node cases, 2 nodes form cluster fine, the 3^rd node joined then failed, ocssd.log show:

2012-02-09 11:33:53.048: [ CSSD][1120926016](:CSSNM00008JclssnmCheckDskInfo: Aborting local node to avoid splitbrain. Cohort of 2 nodes with leader 2, racnode2, is smaller than
cohort of 2 nodes led by node 1, racnode1, based on map type 2
2012-02-09 11:33:53.048: [ CSSD][1120926016]###################################
2012-02-09 11:33:53.048: [ CSSD][1120926016]clssscExit: CSSD aborting from thread clssnmRcfgMgrThread

3. Ocssd.bin startup timeout after 10minutes

2012-04-08 12:04:33.153: [ CSSD][1]clssscmain: Starting CSS daemon, version 11.2.0.3.0, in (clustered) mode with uniqueness value 1333911873
…...
2012-04-08 12:14:31.994: [ CSSD][5]clssgmShutDown: Received abortive shutdown request from client.
2012-04-08 12:14:31.994: [ CSSD][5]###################################
2012-04-08 12:14:31.994: [ CSSD][5]clssscExit: CSSD aborting from thread GMClientListener
2012-04-08 12:14:31.994: [ CSSD][5]###################################
2012-04-08 12:14:31.994: [ CSSD][5](:CSSSC00012JclssscExit: A fatal error occurred and the CSS daemon is terminating abnormally

Possible Causes:

1. Voting disk is missing or inaccessible

2. Multicast is not working (for 11.2.0.2+)

3. Private network is not working, ping or traceroute shows destination unreachable.

4. Private network is pingable with normal ping command but not pingable with jumbo frame size (eg: ping –s 8900 ) when jumbo frame is enabled (MTU: 9000+). Or partial cluster nodes have jumbo frame set (MTU: 9000) and the problem node does not have jumbo frame set (MTU:1500)

5. Gpnpd does not come up, stuck in dispatch thread, Bug 10105195

6. Too many disks discovered via asm_diskstring or slow scan of disks due to Bug 13454354 on Solaris 11.2.0.3 only

Solutions:

1. Restore the voting disk access by checking storage access, disk permissions etc.
If the voting disk is missing from the OCR ASM diskgroup, start CRS in exclusive mode and recreate the voting disk:
# crsctl start crs –excl
# crsctl replace votedisk <+OCRVOTE diskgroup>

2. Refer to Document 1212703.1 for multicast test and fix

3. Consult with the network administrator to restore private network access

4. Engage network admin to enable jumbo frame from switch layer if it is enabled at Network card

5. Kill the gpnpd.bin process on surviving node, refer Document 10105195.8 Once above issues are resolved, restart Grid Infrastructure stack. If ping/traceroute all work for private network,

there is a failed 11.2.0.1 to 11.2.0.2 upgrade happened, please check out Bug 13416559 for workaround

6. Limit the number of ASM disks scan by supplying a more pecific asm_diskstring, refer bug 13583387 For Solaris 11.2.0.3 only, please apply patch 13250497, see <document

Issue #3: CRS-4535: Cannot communicate with Cluster Ready Services, crsd.bin is not running

Symptoms:

1. Command ‘$GRID_HOME/bin/crsctl check crs’ returns errors:
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager

2. Command ‘ps –ef | grep d.bin’ does not show a line similar to:
root 23017 1 1 22:34 ? 00:00:00 /u01/app/11.2.0/grid/bin/crsd.bin reboot

3. Even if the crsd.bin process exists, command ‘crsctl stat res –t –init’ shows:
ora.crsd

1 ONLINE INTERMEDIATE

Possible Causes:

1. Ocssd.bin is not running or resource ora.cssd is not ONLINE
2. +ASM instance can not startup
3. OCR is inaccessible
4. Network configuration has been changed causing gpnp profile.xml mismatch
5. $GRID_HOME/crs/init/.pid file for crsd has been removed or renamed manually, crsd.log shows: ‘Error3 -2 writing PID to the file’
6. Ocr.loc content mismatch with other cluster nodes. Crsd.log shows: ‘Shutdown CacheLocal. My hash ids don’t match’

Solutions:

1. Check the solution for Issue 2, ensure ocssd.bin is running and ora.cssd is ONLINE

2. For 11.2.0.2+, ensure that the resource ora.cluster_interconnect.haip is ONLINE, refer to Document 1383737.1 for ASM startup issues related to HAIP.

3. Ensure the OCR disk is available and accessible. If the OCR is lost for any reason, refer to Document 1062983.1 on how to restore the OCR.

4. Restore network configuration to be the same as interface defined in $GRID_HOME//profiles/peer/profile.xml, refer to Document 1073502.1 for private network modifications.

5. Touch the file with .pid under $GRID_HOME/crs/init.
For 11.2.0.1, the file is owned by user.
For 11.2.0.2, the file is owned by root user.

6. Using ocrconfig –repair command to fix the ocr.loc content:
for example, as root user:
# ocrconfig –repair –add +OCR2 (to add an entry)
# ocrconfig –repair –delete +OCR2 (to remove an entry)
ohasd.bin needs to be up and running in order for above command to run.

Once above issues are resolved, either restart GI stack or start crsd.bin via:
# crsctl start res ora.crsd –init

Issue #4: HAIP is not ONLINE (for 11.2.0.2+)

-Symptoms:

Command ‘crsctl stat res –t –init’ shows:
ora.cluster_interconnect.haip
1 ONLINE OFFLINE

-Possible Causes:

1. Bug 10370797: START OF ‘ORA.CLUSTER_INTERCONNECT.HAIP’ FAILED DURING UPGRADE TO 11.2.0.2 (AIX only)
2. The private network information stored in the OCR does not match the actual OS setup, eg: oifcfg getif and ifconfig output mismatch (wrong interface name, subnet etc.)

-Solutions:

1. Apply Patch:10370797
2. Correct the OCR configuration, make sure it matches the OS network configuration

More information about HAIP is provided in Document 1210883.1.

Issue #5: ASM instance does not start, ora.asm is OFFLINE

Symptoms:

1. Command ‘ps –ef | grep asm’ shows no ASM processes
2. Command ‘crsctl stat res –t –init’ shows:
ora.asm
1 ONLINE OFFLINE

Possible Causes:

1. ASM spfile is corrupted
2. ASM discovery string is incorrect and therefore voting disk/OCR cannot be discovered
3. ASMlib configuration problem

Solutions:

1. Create a temporary pfile to start ASM instance, then recreate spfile, see Document 1095214.1 for more details.
2. Refer to Document 1077094.1 to correct the ASM discovery string.
3. Refer to Document 1050164.1 to fix ASMlib configuration.

For further debugging GI startup issue, please refer to Document 1050908.1 Troubleshoot Grid Infrastructure Startup Issues.</document

Saturday, 2 January 2016

12c : Installation On OEL 5.6

Setup – OS - OEL 5.6 x86_64

Setup Packages at the time of OEL installation

GNOME Desktop Environment
Editors
Graphical Internet
Text-based Internet
Development Libraries
Development Tools
Server Configuration Tools
Administration Tools
Base
System Tools
X Window System

No need to create user and group , because its already created when you setup OEL.

# passwd oracle

Set secure Linux to permissive by editing the "/etc/selinux/config" file, making sure the SELINUX flag is set as follows.

SELINUX=permissive

Once the change is complete, restart the server or run the following command.

# setenforce Permissive

If you have the Linux firewall enabled, you will need to disable or configure it, as shown here or here. To disable it, do the following.

# service iptables stop

# chkconfig iptables off

Create the directories in which the Oracle software will be installed.

mkdir -p /u01/app/oracle/product/12.1.0.2/db_1

chown -R oracle:oinstall /u01

chmod -R 775 /u01

Add the following lines at the end of the "/home/oracle/.bash_profile" file.

# Oracle Settings

export TMP=/tmp

export TMPDIR=$TMP

export ORACLE_HOSTNAME=el5.localdomain

export ORACLE_UNQNAME=cdb1

export ORACLE_BASE=/u01/app/oracle

export ORACLE_HOME=$ORACLE_BASE/product/12.1.0.2/db_1

export ORACLE_SID=cdb1

export PATH=/usr/sbin:$PATH

export PATH=$ORACLE_HOME/bin:$PATH

export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/lib:/usr/lib

export CLASSPATH=$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlib

if [ $USER = "oracle" ]; then

if [ $SHELL = "/bin/ksh" ]; then

ulimit -p 16384

ulimit -n 65536

else

ulimit -u 16384 -n 65536

./runInstaller

12c : Manage Tablespace in PDB/CDB

Note: undo shared with all PDB with CDB, Temp either Shared or non- shared

--------------------------------------------------------
Add and drop tablespace in CDB and PDB.
--------------------------------------------------------

Pluggable Databases

To create tablespaces in the pluggable databases (PDB), you need to ensure that you are in the correct container you want to create the tablespace for.
PDBs have container ids greater than 2. The CON_NAME option is also a good way to ensure that you are in the correct container as well.

1. verify container db and id
select NAME, OPEN_MODE, CON_ID from V$PDBS;

2. Set container database.
alter session set container=PDB1;

3. See tablespace
select * from v$tablespaces;

4. Add/Drop tablespace
create tablespace SLOB datafile size 1M autoextend on next 1m;

5.Verify tablespace.
select * from v$tablespaces;

--------------------------------------------------------------
How to check where the tablespaces are located
--------------------------------------------------------------

conn sys as sysdba
OR
alter session set container=cdb$root;

select tablespace_name,status,con_id from cdb_tablespaces
where tablespace_name='SLOB' order by 3;

select pdb_id,pdb_name,status,con_id from cdb_pdbs order by 1;

------------------------
Default tablespace
------------------------

ALTER PLUGGABLE DATABASE DEFAULT TABLESPACE users;
ALTER PLUGGABLE DATABASE DEFAULT TEMPORARY TABLESPACE temp;

show con_name
show con_id

12c : Clone a Remote Pluggable Database PDB

---------------------------------------------------------------------------
Multitenant : Clone a Remote Pluggable Database (PDB)
---------------------------------------------------------------------------

SQL> select name from v$database;

NAME
----------
CDB3

SQL> show con_name

-- create a user we can point a database link to.

SQL> drop user remote_clone_user CASCADE;

SQL> CREATE USER remote_clone_user IDENTIFIED BY remote_clone_user;
create user remote_clone_user identified by remote_clone_user
*
ERROR at line 1:
ORA-65096: invalid common user or role name

SQL> alter session set "_ORACLE_SCRIPT"=true;

SQL> CREATE USER remote_clone_user IDENTIFIED BY remote_clone_user;

SQL> grant create session,create pluggable database to remote_clone_user;

-- Switch the remote PDB to read-only mode.
SQL> alter pluggable database pdb5 close;

sql> ALTER PLUGGABLE DATABASE pdb5 open read only;

-- Connect to root container in the local CDB.
SQL> conn sys/oracle@cdb1 AS sysdba

-- Check database name.
SQL> select name from v$database;

NAME
--------
CDB1

-- Create a database link to the remote PDB
SQL> Drop database link clone_link

SQL> Create Database Link clone_link connect to remote_clone_user identified by remote_clone_user using pdb5;

-- Clone the remote PDB over the database link.
-- If You are not using OMF, Convert the paths if necessary.
SQL> Create Pluggable Database pdb5new from pdb5@clone_link
FILE_NAME_CONVERT=('/u01/app/oracle/oradata/cdb3/pdb5/',
'/u01/app/oracle/oradata/cdb1/pdb5new/');

SQL> alter pluggable database pdb5new open;

-- Check the PDBs.
SQL> select name,open_mode from v$pdbs order by name;

-- Open the remote PDB in read-write mode again.
SQL> conn sys/oracle@pdb5 as sysdba

alter pluggable database pdb5 close;

alter pluggable database pdb5 open;

-- Clean up new clone
conn sys/oracle@cdb1 as sysdba

alter pluggable database pdb5new close

drop pluggable database pdb5new including datafiles;

-- Check the PDBs.

SQL> select name,open_mode from v$pdbs order by name;

12c : Clone a Remote Non-Container Database CDB

---------------------------------------------------------------------------
Multitenant : Clone a Remote Pluggable Database (PDB)
---------------------------------------------------------------------------
-- In my case Two database exist on my single machine One is CDB database and Second is Non-CDB.

-- Connect to remote non-CDB.
sqlplus sys/oracle@db12c as sysdba

SQL> select name,open_mode from v$database;

SQL> show con_name

-- create a user we can point a database link to.

SQL> drop user remote_clone_user CASCADE;

SQL> CREATE USER remote_clone_user IDENTIFIED BY remote_clone_user;
create user remote_clone_user identified by remote_clone_user
*
ERROR at line 1:
ORA-65096: invalid common user or role name

SQL> alter session set "_ORACLE_SCRIPT"=true;

SQL> CREATE USER remote_clone_user IDENTIFIED BY remote_clone_user;

SQL> grant create session,create pluggable database to remote_clone_user;

-- Switch the remote non-CDB to read-only mode.

shutdown immediate
startup mount
alter database open read only;

-- Connect to root container in the local CDB.
sqlplus sys/oracle@cdb1 AS sysdba

-- Check database name.
SQL> select name from v$database;

-- Create a database link to the remote PDB
SQL> Drop database link clone_link
SQL> Create Database Link clone_link connect to remote_clone_user identified by remote_clone_user using 'db12c';

-- Clone the remote PDB over the database link.
-- If You are not using OMF, Convert the paths if necessary.
SQL> Create Pluggable Database db12cpdb from NON$CDB@clone_link
FILE_NAME_CONVERT=('/u01/app/oracle/oradata/DB12C/','/u01/app/oracle/oradata/CDB1/db12cpdb/');

---else OMF

Create Pluggable Database db12cpdb from NON$CDB@clone_link
FILE_NAME_CONVERT=('/u01/app/oracle/oradata/DB12C/datafile/o1_mf_system_c6lq1ynk_.dbf',
'/u01/app/oracle/oradata/CDB1/DB12CPDB/datafile/system01.dbf',
'/u01/app/oracle/oradata/DB12C/datafile/o1_mf_sysaux_c6lpcz6v_.dbf',
'/u01/app/oracle/oradata/CDB1/DB12CPDB/datafile/sysaux01.dbf',
'/u01/app/oracle/oradata/DB12C/datafile/o1_mf_users_c6z4wyn8_.dbf',
'/u01/app/oracle/oradata/CDB1/DB12CPDB/datafile/users01.dbf',
'/u01/app/oracle/oradata/DB12C/datafile/o1_mf_temp_c6z65ljc_.tmp',
'/u01/app/oracle/oradata/CDB1/DB12CPDB/datafile/temp01.dbf');

-- Check the PDBs.
SQL> select name,open_mode from v$pdbs order by name;

-- Clean up extra components not found in a PDB.
-- ALTER SESSION SET CONTAINER=db12cpdb;
-- @$ORACLE_HOME/rdbms/admin/noncdb_to_pdb.sql

alter pluggable database db12cpdb OPEN;

-- Open the remote non-CDB in read-write mode again.
SQL> conn sys/oracle@cb12c as sysdba

shutdown immediate
startup

-- Clean up new clone
conn sys/oracle@cdb1 as sysdba

alter pluggable database db12cpdb close;

drop pluggable database db12cpdb including datafiles;

-- Check the PDBs.

SQL> select name,open_mode from v$pdbs order by name;

12c : Clone a Local Pluggable Database PDB

-----------------------------------------------------------------------
Multitenant: Clone a Local Pluggable Database (PDB)
-----------------------------------------------------------------------

SQL> show con_name

SQL> show con_id

SQL> select name,open_mode from v$pdbs order by name;

SQL> alter pluggable database PDB1 open;
else
alter pluggable database PDB1 close;

SQL> alter pluggable database pdb1 open read only;

---------------------------------------
Verify PDB1 Datafile location
---------------------------------------

select FILE_NAME from cdb_data_files where CON_ID=3;

-------------------------------------------------------------
-- Clone the PDB.
-- If you are not using OMF, convert the paths.
-------------------------------------------------------------

SQL> create pluggable database pdb2 from pdb1
FILE_NAME_CONVERT=('/u01/app/oracle/oradata/cdb1/pdb1/',
'/u01/app/oracle/oradata/cdb1/pdb2/');

OR

----------------------------------------
-- In My Case,I'm using OMF
----------------------------------------

create pluggable database pdb2 from pdb1
FILE_NAME_CONVERT=('/u01/app/oracle/oradata/CDB1/260DCEBE98C33DD5E0530100007F3F93/datafile/o1_mf_system_c62fglyh_.dbf',
'/u01/app/oracle/oradata/CDB1/pdb2/datafile/system01.dbf',
'/u01/app/oracle/oradata/CDB1/260DCEBE98C33DD5E0530100007F3F93/datafile/o1_mf_sysaux_c62fgm20_.dbf',
'/u01/app/oracle/oradata/CDB1/pdb2/datafile/sysaux01.dbf',
'/u01/app/oracle/oradata/CDB1/260DCEBE98C33DD5E0530100007F3F93/datafile/o1_mf_users_c62fm10j_.dbf',
'/u01/app/oracle/oradata/CDB1/pdb2/datafile/users01.dbf',
'/u01/app/oracle/oradata/CDB1/260DCEBE98C33DD5E0530100007F3F93/datafile/o1_mf_slob_c62qjm9g_.dbf',
'/u01/app/oracle/oradata/CDB1/pdb2/datafile/slob01.dbf',
'/u01/app/oracle/oradata/CDB1/260DCEBE98C33DD5E0530100007F3F93/datafile/o1_mf_temp_c62fgm21_.dbf',
'/u01/app/oracle/oradata/CDB1/pdb2/datafile/temp01.dbf');

SQL> alter pluggable database pdb2 open;

SQL> select name ,open_mode from v$pdbs order by name;

SQL> alter pluggable database pdb1 close;

SQL> alter pluggable database pdb1 open;

---------------------------
-- Remove the Clone
----------------------------

SQL> alter pluggable database pdb2 close;

SQL> Drop pluggable database pdb2 including datafiles;

SQL> select name ,open_mode from v$pdbs order by name;

Friday, 1 January 2016

DB File Scattered Read

This event signifies that the user process is reading buffers into the SGA buffer cache and is waiting for a physical I/O call to return. A db file scattered read issues a scattered read to read the data into multiple discontinuous memory locations. A scattered read is usually a multiblock read. It can occur for a fast full scan (of an index) in addition to a full table scan.

The db file scattered read wait event identifies that a full scan is occurring. When performing a full scan into the buffer cache, the blocks read are read into memory locations that are not physically adjacent to each other. Such reads are called scattered read calls, because the blocks are scattered throughout memory. This is why the corresponding wait event is called 'db file scattered read'. multiblock (up to DB_FILE_MULTIBLOCK_READ_COUNT blocks) reads due to full scans into the buffer cache show up as waits for 'db file scattered read'

In one line we can describe scattered read The Oracle session has requested and is waiting for multiple contiguous database blocks to be read into the SGA from disk.

Cuase :

· Full Table scans

· Fast Full Index Scans

· Missing or unselective or unusable index

· Table not analyzed after created index or lack of accurate statistics for the optimizer

If an application that has been running fine for a while suddenly clocks a lot of time on the db file scattered read event and there hasn’t been a code change, you might want to check to see if one or more indexes has been dropped or become unusable.

SELECT p1 "file#", p2 "block#", p3 "class#",event

FROM v$session_wait

WHERE event IN ('db file scattered read');

Where P1,P2,P3 are

P1 - The absolute file number

P2 - The block being read

P3 - The number of blocks (should be greater than 1)

SELECT relative_fno, owner, segment_name, segment_type

FROM DBA_EXTENTS

WHERE file_id = &FILE

AND &BLOCK BETWEEN block_id

AND block_id + blocks - 1;

SELECT

s.p1 file_id, s.p2 block_id,o.object_name obj,

o.object_type otype,

s.SQL_ID,

w.CLASS,event

FROM v$session s,

( SELECT ROWNUM CLASS#, CLASS FROM v$waitstat ) w,

ALL_OBJECTS o

WHERE

event IN ('db file scattered read')

AND

w.CLASS#(+)=s.p3

AND o.object_id (+)= s.row_wait_OBJ#

ORDER BY 1;

Find the related SQL statement using sql_id

SELECT sql_text FROM v$sqltext WHERE sql_id=&sq_id ORDER BY piece

You can also find the objects using below sql :-

Finding the SQL Statement executed by Sessions Waiting for I/O

SELECT SQL_ADDRESS, SQL_HASH_VALUE

FROM V$SESSION

WHERE EVENT ='read by other session';

Finding the Object Requiring I/O

SELECT row_wait_obj#

FROM V$SESSION

WHERE EVENT = 'db file scattered read';

To identify the object and object type contended for, query DBA_OBJECTS using the value for ROW_WAIT_OBJ# that is returned from V$SESSION. For example:

SELECT owner, object_name, subobject_name, object_type

FROM DBA_OBJECTS

WHERE data_object_id = &row_wait_obj;

Once you identify the hot blocks and the segments they belong to, and related quires then you reduce the using following solutions

1. Optimize the SQL statement that initiated most of the waits. The goal is to minimize the number of physical and logical reads.

2. Optimize multi-block I/O by setting the parameter DB_FILE_MULTIBLOCK_READ_COUNT i.e if

DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter values too high which favors full scans reduce it.

3 Partition pruning to reduce number of blocks visited Consider the usage of multiple buffer pools and cache frequently used indexes/tables in the KEEP pool Make sure that the query use the right driving table?

4 Are the SQL predicates appropriate for hash or merge join?

5 If full scans are appropriate, can parallel query improve the response time?

6 The objective is to reduce the demands for both the logical and physical I/Os, and this is best achieved through SQL and application tuning.

7 Gather statistics the related objects if they are missing .

8 Check the LAST_ANALYZED date from user_tables view

DataBase Refresh / Cloning

Database Refresh / Cloning / Migration

---------------------------
Non-ASM 10g/11g
---------------------------
Reference:

http://oraclegurukul.blogspot.in/2012/12/rman-cloning-10g-11g.html

----------------------------------------------------
DB Refresh DataPump
----------------------------------------------------
Reference:

https://born4dba.wordpress.com/2012/05/19/schema-refresh-using-data-pump/

-----------------------------------------------------------------------
11g ASM DB Refresh With DataPump
-----------------------------------------------------------------------
Reference:

http://select-star-from.blogspot.in/2013/08/schema-refresh-using-asm-diskgroups.html

--------------------------------------------------------------------------------------------------------------
RAC 10g/11g RMAN Cloning/Refresh RAC ASM to ASM & ASM to NON ASM
--------------------------------------------------------------------------------------------------------------
Reference:

https://simplyoracle.blogspot.in/2010/04/rac-database-refreshcloning-on-asm.html
https://dbmentors.blogspot.in/2012/04/rman-duplicate-database-from-rac-asm-to.html
https://gavinsoorma.com/2012/04/rman-duplicate-database-rac-to-rac/
https://nikhilappsdba.blogspot.in/2014/01/cloning-11g-rac-asm-to-non-asm.html
https://www.pythian.com/blog/duplicate-from-active-database-using-rman-step-by-step/
https://tsbreddy.blogspot.in/2012/06/rman-refresh-steps.html
https://dbarock.blogspot.in/2012/08/database-refresh-using-rman.html

---------------------------

Migration :

-----------------------

NON-ASM to ASM in 10g Different host

http://gavinsoorma.com/2010/04/creating-an-asm-duplicate-database-from-a-non-asm-database/

http://oraclegurukul.blogspot.in/2014/04/non-asm-to-asm-with-rman.html

NON-ASM to ASM in 11g (Different host Duplicate Command)

http://dbaworkshop.blogspot.in/2010/10/rman-how-to-duplicate-non-asm-active.html

NON-ASM 10g to ASM 11g (Different host)

https://blogs.oracle.com/RobinAntony/entry/converting_10gr2_non_asm_database

Non-ASM 11g Database To ASM (Same Host)

https://orasteps.wordpress.com/2013/04/23/migration-of-non-asm-11g-database-to-asm/

NON-ASM to ASM in 11g Different host

1- Connect to 11g database and take RMAN full backup.

#rman target /

RMAN> backup as backupset database;

2- Copy 10g database backup files and archive files to 11g database server.

Install and Configure Standalone Grid infrastructure and mount the ASM DISK(+DATA)

Install the 11.2.0.3 Software alone.

The steps for 11g database:

3- Create temporary pfile in $ORACLE_HOME/dbs

*.audit_file_dest=’/oracle/admin/orcl/adump’

*.compatible=’11.2.0.0.0’

*.control_files=’+DATA/orcl/controlfile/current.1′,’+DATA/orcl/controlfile/current.2′

*.db_block_size=8192

*.db_create_file_dest=’+DATA’

*.db_create_online_log_dest_1=’+RECO’

*.db_name=’ORCL’

*.diagnostic_dest=’/oracle’

*.job_queue_processes=0

*.open_cursors=300

*.pga_aggregate_target=1G

*.processes=150

*.remote_login_passwordfile=’EXCLUSIVE’

*.sga_target=2G

*.undo_tablespace=’UNDOTBS1′

4- Open the database in NOMOUNT state.

# rman target /

RMAN> startup nomount;

5- Restore controlfile from backup.

RMAN>LIST BACKUP OF CONTROLFILE

RMAN> restore controlfile from ‘<backup piece name>′;

6- Open the database in MOUNT state.

RMAN> alter database mount;

7- Catalog RMAN backup files and archive log files.(Ensure you have restored the backup copy)

RMAN> catalog start with ‘/oracle/ora11g/orcl/backup’;

RMAN> catalog start with ‘/oracle/ora11g/orcl/archive’;

8- Restore 10g database backup to +DATA diskgroup and perform incomplete recovery.

RMAN> run

{

allocate channel c1 device type disk;

SET NEWNAME FOR DATAFILE 1 TO ‘+DATA’;

SET NEWNAME FOR DATAFILE 2 TO ‘+DATA’;

SET NEWNAME FOR DATAFILE 3 TO ‘+DATA’;

SET NEWNAME FOR DATAFILE 4 TO ‘+DATA’;

restore database until sequence 4;

switch datafile all;

recover database until sequence 4;

}

###shutdown the database

asmcmd>cd DATA/ControlFile

Labels

Wednesday, 6 January 2016

Saturday, 2 January 2016

Friday, 1 January 2016