CRS-4530:
Communications failure contacting Cluster Synchronization Services daemon
Environment:
Oracle Grid Infrastructure 11.2.0.1
Oracle database server 11.2.0.1
> crsctl check crs
CRS-4638: Oracle High
Availability Services is online
CRS-4535: Cannot
communicate with Cluster Ready Services
CRS-4530:
Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot
communicate with Event Manager
>
crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE
SERVER
STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1
OFFLINE OFFLINE
ora.crsd
1
ONLINE INTERMEDIATE raclr41
ora.cssd
1
ONLINE OFFLINE
ora.cssdmonitor
1
ONLINE ONLINE raclr41
ora.ctssd
1
ONLINE OFFLINE
ora.diskmon
1
OFFLINE OFFLINE
ora.drivers.acfs
1
OFFLINE OFFLINE
ora.evmd
1
ONLINE OFFLINE
ora.gipcd
1
ONLINE ONLINE raclr41
ora.gpnpd
1
ONLINE ONLINE raclr41
ora.mdnsd
1
ONLINE ONLINE raclr41
Tried to start ora.cssd manually
raclr41 | CRS |
/home/oracle
> crsctl start res
ora.cssd -init
It was not responding and was hung. Checked the ocssd log
from another session ($GI_HOME/log/<host_name>/cssd)
2012-08-15
14:05:50.103: [ GIPCNET][1120729408]gipcmodNetworkProcessConnect: slos op :
sgipcnTcpConnect
2012-08-15
14:05:50.103: [ GIPCNET][1120729408]gipcmodNetworkProcessConnect: slos dep
: No route to host (113)
2012-08-15
14:05:50.103: [ GIPCNET][1120729408]gipcmodNetworkProcessConnect: slos loc
: connect
2012-08-15
14:05:50.103: [ GIPCNET][1120729408]gipcmodNetworkProcessConnect: slos
info: addr '192.168.1.110:29850'
2012-08-15
14:05:50.103: [
CSSD][1120729408]clssscSelect: conn complete ctx 0x2aaaac09bae0 endp
0xa66
2012-08-15
14:05:50.103: [
CSSD][1120729408]clssnmeventhndlr: node(1), endp(0xa66) failed,
probe((nil)) ninf->endp (0x100000a66) CONNCOMPLETE
2012-08-15
14:05:50.103: [
CSSD][1120729408]clssnmDiscHelper: raclr40, node(1) connection failed,
endp (0xa66), probe(0x100000000), ninf->endp 0xa66
2012-08-15
14:05:50.103: [
CSSD][1120729408]clssnmDiscHelper: node 1 clean up, endp (0xa66), init
state 0, cur state 0
2012-08-15
14:05:50.103: [GIPCXCPT][1120729408]gipcInternalDissociate: obj 0x11588660
[0000000000000a66] { gipcEndpoint : localAddr 'gipc://raclr41:68bf-1bc8-a218-974f#192.168.1.111#13372',
remoteAddr 'gipc://raclr40:nm_raclr#192.168.1.110#29850', numPend 0, numReady
0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a,
usrFlags 0x0 } not associated with any container, ret gipcretFail (1)
2012-08-15
14:05:50.103: [GIPCXCPT][1120729408]gipcDissociateF [clssnmDiscHelper :
clssnm.c : 3301]: EXCEPTION[ ret gipcretFail (1) ] failed to dissociate obj 0x11588660
[0000000000000a66] { gipcEndpoint : localAddr 'gipc://raclr41:68bf-1bc8-a218-974f#192.168.1.111#13372',
remoteAddr 'gipc://raclr40:nm_raclr#192.168.1.110#29850', numPend 0, numReady
0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x8061a,
usrFlags 0x0 }, flags 0x0
2012-08-15
14:05:50.103: [
CSSD][1120729408]clssnmDiscEndp: gipcDestroy 0xa66
2012-08-15
14:05:50.111: [
CSSD][1108113728]clssnmvDHBValidateNCopy: node 1, raclr40, has a disk
HB, but no network HB, DHB has rcfg 229086889, wrtcnt, 9907057, LATS
1513031694, lastSeqNo 9907057, uniqueness 1345052387, timestamp
1345053949/1513006814
2012-08-15
14:05:50.111: [
CSSD][1120729408]clssnmconnect: connecting to addr gipc://raclr40:nm_raclr#192.168.1.110#29850
2012-08-15
14:05:50.111: [
CSSD][1120729408]clssscConnect: endp 0xa72 - cookie 0x2aaaac09bae0 -
addr gipc://raclr40:nm_raclr#192.168.1.110#29850
2012-08-15
14:05:50.111: [
CSSD][1120729408]clssnmconnect: connecting to node(1), endp(0xa72),
flags 0x10002
2012-08-15
14:05:50.343: [
CSSD][1115998528]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2012-08-15 14:05:50.391: [
CSSD][1112844608]clssnmvDHBValidateNCopy: node 1, raclr40, has a disk
HB, but no network HB, DHB has rcfg 229086889, wrtcnt, 9907057, LATS
1513031974, lastSeqNo 9907057, uniqueness 1345052387, timestamp
1345053949/1513006814
2012-08-15
14:05:50.391: [
CSSD][1103583552]clssnmvDHBValidateNCopy: node 1, raclr40, has a disk
HB, but no network HB, DHB has rcfg 229086889, wrtcnt, 9907057, LATS
1513031974, lastSeqNo 9907057, uniqueness 1345052387, timestamp
1345053949/1513006814
2012-08-15
14:05:51.115: [
CSSD][1108113728]clssnmvDHBValidateNCopy: node 1, raclr40, has a disk
HB, but no network HB, DHB has rcfg 229086889, wrtcnt, 9907058, LATS
1513032704, lastSeqNo 9907058, uniqueness 1345052387, timestamp
1345053950/1513007814
> cat /etc/hosts
|grep 192.168.1.110
192.168.1.110 raclr40ic raclr40ic.imanheim.com
That is the interconnect ip.
Now to the interconnects.
> ping 192.168.1.110
PING 192.168.1.110 (192.168.1.110)
56(84) bytes of data.
From 192.168.1.111
icmp_seq=2 Destination Host Unreachable
From 192.168.1.111
icmp_seq=3 Destination Host Unreachable
From 192.168.1.111
icmp_seq=4 Destination Host Unreachable
--- 192.168.1.110 ping
statistics ---
6 packets transmitted,
0 received, +3 errors, 100% packet loss, time 4999ms
, pipe 3
So, the interconnect interface was down. Engaged system
administrators and brought the interface back online. That fixed the issue.
> crs_stat -t
Name Type Target State
Host
------------------------------------------------------------
ora....ER.lsnr
ora....er.type ONLINE ONLINE raclr40
ora....N1.lsnr
ora....er.type ONLINE ONLINE raclr40
ora....N2.lsnr
ora....er.type ONLINE ONLINE raclr41
ora....N3.lsnr
ora....er.type ONLINE ONLINE raclr41
ora.asm ora.asm.type OFFLINE
OFFLINE
ora....SM1.asm
application OFFLINE OFFLINE
ora....18.lsnr
application ONLINE ONLINE
raclr40
ora....418.gsd
application OFFLINE OFFLINE
ora....418.ons
application ONLINE ONLINE
raclr40
ora....418.vip
ora....t1.type ONLINE ONLINE raclr40
ora....SM2.asm application OFFLINE
OFFLINE
ora....19.lsnr
application ONLINE ONLINE
raclr41
ora....419.gsd
application OFFLINE OFFLINE
ora....419.ons
application ONLINE ONLINE
raclr41
ora....419.vip
ora....t1.type ONLINE ONLINE raclr41
ora.eons ora.eons.type ONLINE
ONLINE raclr40
ora.gsd ora.gsd.type OFFLINE
OFFLINE
ora....network
ora....rk.type ONLINE ONLINE raclr40
ora.oc4j ora.oc4j.type OFFLINE
OFFLINE
ora.ons ora.ons.type ONLINE
ONLINE raclr40
ora....ry.acfs
ora....fs.type OFFLINE OFFLINE
ora.scan1.vip ora....ip.type ONLINE ONLINE
raclr40
ora.scan2.vip ora....ip.type ONLINE ONLINE
raclr41
ora.scan3.vip ora....ip.type ONLINE ONLINE
raclr41
I had a similar problem for
ReplyDeleteora.crsd
1 ONLINE OFFLINE
and managed to start ora.crsd manually after reading this blog.
Oracle up again. Thank you! :)
Thanks for your clear documentation.
ReplyDeleteI'm running 11.2.0.1.0 on OL 5.10.
How do you make it run at boot?
Every time I reboot this machine, I have to issue "crsctl start res ora.cssd -init" again.
Maybe it's related to this error:
[item1@mtp dbs]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Fri Jul 18 16:48:15 2014
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-00099: warning: no parameter file specified for ASM instance
ORA-01031: insufficient privileges
SQL> exit
I found a reference to this issue here: https://community.oracle.com/message/9821617
Great Blog...solved..issue....
ReplyDeleteGreat! Thank You
ReplyDeletecrsctl start res ora.cssd -init
Solved the issue