Tuesday, 8 September 2020

19.7 Grid upgrade: rootupgrade.sh fails on first node

 rootupgrade.sh failed with below error while upgrading Grid from 18.3 to 19.7.0.0

Error:

CRS-2676: Start of 'ora.cssdmonitor' on 'node01' succeeded

CRS-1609: This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity;

details at (:CSSNM00086:) in /app/grid/diag/crs/node01/crs/trace/ocssd.trc.

CRS-2883: Resource 'ora.cssd' failed during Clusterware stack start.

CRS-4406: Oracle High Availability Services synchronous start failed.

CRS-41053: checking Oracle Grid Infrastructure for file permission issues

CRS-4000: Command Start failed, or completed with errors. 2020/09/07 09:08:46

CLSRSC-117: (Bad argc for has:clsrsc-117) Died at /u01/app/19.3.0.0/grid/crs/install/crsupgrade.pm line 1617.

 

We can get deviated with  “unable to communicate with other nodes”  errors in alert and trace files, Started looking communication between nodes

 

1.  Verified ssh connectivity between nodes , Its working fine

2.  Verified ping and traceroute , Looks good

From Node 1:

+ ping -s 9000 -c 4 -I <node1-private address> <node1-private address>

+ ping -s 9000 -c 4 -I <node1-private address> <node2-private address>

 

+ traceroute -s <node1-private address> -r -F <node1-private address> 8972

+ traceroute -s <node1-private address> -r -F <node2-private address> 8972

 

From Node 2:

+ ping -s 9000 -c 4 -I <node2-private address> <node1-private address>

+ ping -s 9000 -c 4 -I <node2-private address> <node2-private address>

 

+ traceroute -s <node2-private address> -r -F <node1-private address> 8972

+ traceroute -s <node2-private address> -r -F <node2-private address> 8972

 

While checking gipcd.trc found some failed errors:

020-09-07 08:21:27.483 : GIPCTLS:474797824:  gipcmodTlsAuthInit: tls context initialized successfully

2020-09-07 08:21:27.524 :GIPCXCPT:474797824:  gipcmodTlsLogErr: [NZOS], ssl_Handshake failed to perform operation on handshake with NZERROR [29024]

2020-09-07 08:21:27.524 :GIPCXCPT:474797824:  gipcmodTlsAuthStart: ssl_Handshake() failed with nzosErr : 29024, ret gipcretTlsErr (49)

 

As per bug id 2667217.1, Similar error reported in 19.6 upgrade

Workaround on 19.7:

1) Run rootupgrade.sh on node1

2) When it fails on Node1 with this error, then shutdown crs on node 2

  cd <18c_Gridhome/bin>

   ./crsctl stop crs

3)rerun rootupgrade.sh on node1