rootupgrade.sh failed with below error while upgrading Grid from 18.3 to 19.7.0.0
Error:
CRS-2676: Start of 'ora.cssdmonitor' on 'node01' succeeded
CRS-1609: This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity;
details at (:CSSNM00086:) in /app/grid/diag/crs/node01/crs/trace/ocssd.trc.
CRS-2883: Resource 'ora.cssd' failed during Clusterware stack start.
CRS-4406: Oracle High Availability Services synchronous start failed.
CRS-41053: checking Oracle Grid Infrastructure for file permission issues
CRS-4000: Command Start failed, or completed with errors. 2020/09/07 09:08:46
CLSRSC-117: (Bad argc for has:clsrsc-117) Died at /u01/app/19.3.0.0/grid/crs/install/crsupgrade.pm line 1617.
We can get deviated with “unable to communicate with other nodes” errors in alert and trace files, Started looking communication between nodes
1. Verified ssh connectivity between nodes , Its working fine
2. Verified ping and traceroute , Looks good
From Node 1:
+ ping -s 9000 -c 4 -I <node1-private address> <node1-private address>
+ ping -s 9000 -c 4 -I <node1-private address> <node2-private address>
+ traceroute -s <node1-private address> -r -F <node1-private address> 8972
+ traceroute -s <node1-private address> -r -F <node2-private address> 8972
From Node 2:
+ ping -s 9000 -c 4 -I <node2-private address> <node1-private address>
+ ping -s 9000 -c 4 -I <node2-private address> <node2-private address>
+ traceroute -s <node2-private address> -r -F <node1-private address> 8972
+ traceroute -s <node2-private address> -r -F <node2-private address> 8972
While checking gipcd.trc found some failed errors:
020-09-07 08:21:27.483 : GIPCTLS:474797824: gipcmodTlsAuthInit: tls context initialized successfully
2020-09-07 08:21:27.524 :GIPCXCPT:474797824: gipcmodTlsLogErr: [NZOS], ssl_Handshake failed to perform operation on handshake with NZERROR [29024]
2020-09-07 08:21:27.524 :GIPCXCPT:474797824: gipcmodTlsAuthStart: ssl_Handshake() failed with nzosErr : 29024, ret gipcretTlsErr (49)
As per bug id 2667217.1, Similar error reported in 19.6 upgrade
Workaround on 19.7:
1) Run rootupgrade.sh on node1
2) When it fails on Node1 with this error, then shutdown crs on node 2
cd <18c_Gridhome/bin>
./crsctl stop crs
3)rerun rootupgrade.sh on node1