Tuesday, 31 July 2018

PRVF-7617 : Node connectivity between and failed TCP connectivity check failed for subnet

runcluvfy.sh failed with below connectivity issue between nodes 

Failed status:
Check: TCP connectivity of subnet "1.1.10.0"
  Source                  Destination                     Connected?
  ---------------------  ------------------------------  -----------
  jcraju1:1.1.10.11         jcraju2:1.1.10.12               failed

ERROR:
PRVF-7617 : Node connectivity between "jcraju1 : 1.1.10.11" and "jcraju2 : 1.1.10.12" failed
Result: TCP connectivity check failed for subnet "1.1.10.0"
Check: Node connectivity of subnet "1.1.20.0"
  Source                    Destination                     Connected?
  ----------------------  ------------------------------  ------------
  jcraju2[1.1.20.22]        jcraju1[1.1.20.21]                 yes
Result: Node connectivity passed for subnet "1.1.20.0" with node(s) jcraju2,jcraju1
Check: TCP connectivity of subnet "1.1.20.0"
  Source                    Destination                     Connected?
  -----------------------  ------------------------------  -----------
  jcraju1:1.1.20.21           jcraju2:1.1.20.22               failed

ERROR:
PRVF-7617 : Node connectivity between "jcraju1 : 1.1.20.21" and "jcraju2 : 1.1.20.22" failed
Result: TCP connectivity check failed for subnet "1.1.20.0"

Issue:
You should not have iptables or any other firewall configured on the interconnect (MOS note 554781.1).

Check firewall Status:
root@jcraju2:# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2018-07-09 17:38:58 PDT; 3 weeks 0 days ago
     Docs: man:firewalld(1)
 Main PID: 1432 (firewalld)
   CGroup: /system.slice/firewalld.service
           └─1432 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid
Jul 09 17:38:57 jcraju2.rac.com systemd[1]: Starting firewalld - dynamic firewall daemon...
Jul 09 17:38:58 jcraju2.rac.com systemd[1]: Started firewalld - dynamic firewall daemon.

Stop firewall:
root@jcraju2:# systemctl stop firewalld
root@jcraju2:# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Tue 2018-07-31 07:20:43 PDT; 4s ago
     Docs: man:firewalld(1)
  Process: 1432 ExecStart=/usr/sbin/firewalld --nofork --nopid $FIREWALLD_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 1432 (code=exited, status=0/SUCCESS)
Jul 09 17:38:57 jcraju2.rac.com systemd[1]: Starting firewalld - dynamic firewall daemon...
Jul 09 17:38:58 jcraju2.rac.com systemd[1]: Started firewalld - dynamic firewall daemon.
Jul 31 07:20:42 jcraju2.rac.com systemd[1]: Stopping firewalld - dynamic firewall daemon...
Jul 31 07:20:43 jcraju2.rac.com systemd[1]: Stopped firewalld - dynamic firewall daemon.

runcluvfy.sh was successful after stopping firewall
Interfaces found on subnet "1.1.20.0" that are likely candidates for a private interconnect are:
jcraju2 eth1:1.1.20.22
jcraju1 eth1:1.1.20.21
Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "1.1.10.0".
Subnet mask consistency check passed for subnet "1.1.20.0".
Subnet mask consistency check passed.

Result: Node connectivity check passed
Checking multicast communication...
Checking subnet "1.1.10.0" for multicast communication with multicast group "20.0.1.0"...
Check of subnet "1.1.10.0" for multicast communication with multicast group "20.0.1.0" passed.
Checking subnet "1.1.20.0" for multicast communication with multicast group "20.0.1.0"...
Check of subnet "1.1.20.0" for multicast communication with multicast group "20.0.1.0" passed.

Friday, 20 July 2018

Failover to Standby Database with some log gap

Issue: Production database server crashed and missed couple of archives and not able to recover standby completely

Failover steps during incomplete recovery:
SQL> select INSTANCE_NAME,host_name,status,logins, to_char(STARTUP_TIME,'DD/MM/YYYY HH24:MI:SS') "STARTUP_TIME" from gv$instance;

INSTANCE_NAME HOST_NAME             STATUS       LOGINS        STARTUP_TIME
---------------- --------------- ------------ ---------- -------------------
proddb        jcraju.test.com       OPEN        ALLOWED         20/07/2018 23:45:34

SQL> select database_role,open_mode,name from gv$database;
DATABASE_ROLE   OPEN_MODE       NAME
---------------- -------------------- ---------
PHYSICAL STANDBY READ ONLY       PRODDB

SQL>!ping jcraju.prod.com
connect: Network is unreachable

SQL> recover standby database until cancel;
ORA-00279: change 1105887 generated at 07/17/2018 13:47:57 needed for thread 1
ORA-00289: suggestion : /u01/app/oracle/flash_recovery_area/STANDBY/archivelog/2018_07_20/o1_mf_1_33_%u_.arc
ORA-00280: change 1105887 for thread 1 is in sequence #33


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
AUTO
ORA-00308: cannot open archived log '/u01/app/oracle/flash_recovery_area/STANDBY/archivelog/2018_07_20/o1_mf_1_33_%u_.arc'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 7

SQL> alter database recover managed standby database finish;
Database altered.

SQL> alter database activate standby database;
Database altered.

Monitor Alert log file:
Alert log:
alter database activate standby database
2018-07-20T23:57:29.481427-04:00
ALTER DATABASE ACTIVATE [PHYSICAL] STANDBY DATABASE [Process Id: 2833] (proddb)
2018-07-20T23:57:29.482278-04:00
Begin: Standby Redo Logfile archival
End: Standby Redo Logfile archival
Standby terminal recovery start SCN: 1105887
RESETLOGS after incomplete recovery UNTIL CHANGE 1105888 time 07/17/2018 13:47:57
Online log /u01/app/oracle/flash_recovery_area/STANDBY/onlinelog/o1_mf_1_fntzyfp8_.log: Thread 1 Group 1 was previously cleared
Online log /u01/app/oracle/flash_recovery_area/STANDBY/onlinelog/o1_mf_2_fntzyg85_.log: Thread 1 Group 2 was previously cleared
Online log /u01/app/oracle/flash_recovery_area/STANDBY/onlinelog/o1_mf_3_fntzygr4_.log: Thread 1 Group 3 was previously cleared
Standby became primary SCN: 1105886
2018-07-20T23:57:29.607784-04:00
Setting recovery target incarnation to 2
2018-07-20T23:57:29.625067-04:00
The Time Management Interface (TMI) is being enabled for role
transition information.  This will result in messages being
output to the alert log file with the prefix 'TMI: '.  This is
being enabled to make the timing of the various stages of the
role transition available for diagnostic purposes.  This
output will end when the role transition is complete.
Network throttle feature is disabled as mount time

AUDIT_TRAIL initialization parameter is changed back to its original value as specified in the parameter file.
ACTIVATE STANDBY: Complete - Database mounted as primary
Completed: alter database activate standby database

Database started in mount
SQL> select INSTANCE_NAME,host_name,status,logins, to_char(STARTUP_TIME,'DD/MM/YYYY HH24:MI:SS') "STARTUP_TIME" from gv$instance;

INSTANCE_NAME HOST_NAME        STATUS       LOGINS      STARTUP_TIME
-------------- ----------------- ------------ ---------- ---------------
proddb        jcraju.test.com   MOUNTED      ALLOWED    20/07/2018 23:45:34

SQL> select database_role,open_mode,name from gv$database;
DATABASE_ROLE   OPEN_MODE       NAME
---------------- -------------------- ---------
PRIMARY     MOUNTED         PRODDB

SQL> alter database open;
Database altered.

SQL>  select database_role,open_mode,name from gv$database;
DATABASE_ROLE   OPEN_MODE       NAME
---------------- -------------------- ---------
PRIMARY     READ WRITE           PRODDB