Friday, March 3, 2017

WatchDog for MySQL Group Replication Servers

MySQL Group Replication has been released since MySQL 5.7.17.
A quick start guide can be on the following URL
http://mysqlhighavailability.com/mysql-group-replication-a-quick-start-guide/

If there is a network partition between a group of Servers, it may introduce arbitration to find out and determine the next action for the Group of the Servers.
For documentation about Network Partitioning with Group Replication, please refer to the following URL::
https://dev.mysql.com/doc/refman/5.7/en/group-replication-network-partitioning.html

This blog is written to introduce my 'demo' version of arbitrator / watchdog.   From here onwards, I refer this arbitrator as gr_watchdog.
Another public arbitrator from Matt Lord can also be found on :
https://github.com/mattlord/myarbitratord
As an example, an application connecting to a group of servers with 5 MySQL instances on 2 machines.   Machine 1 (S1, S2, S3) and Machine 2 (S4, S5)



The following table is an example of the configuration for the Group Replication.
Group Setting
Basic Server Configuration datadir=…
basedir=…
S1: port=3306
S2: port=3316
S3: port=3326
S4: port=3306
S5: port=3316
socket=<socket file not to be same as other on the same machine>
S1:server-id=101
S2:server-id=102
S3:server-id=103
S4:server-id=104
S5:server-id=105
Replication Configuration log-bin
binlog-format=row
binlog-checksum=NONE
gtid-mode=on
enforce-gtid-consistency=true
log-slave-updates=true
master-info-repository=TABLE
relay-log-info-repository=TABLE
Group Replication Configuration plugin-load
group_replication_group_name
= 8a94f357-aab4-11df-86ab-c80aa9429562
S1 : group_replication_local_address=primary:4306
S2:  group_replication_local_address=primary:4316
S3: group_replication_local_address=primary:4326
S4: group_replication_local_address=secondary:4306
S5: group_replication_local_address=secondary:4306
group_replication_group_seeds =primary:4306,primary:4316,primary:4326,secondary:4306,secondary:4316
transaction-write-set-extraction=XXHASH64
group_replication_single_primary_mode=false
group_replication_enforce_update_everywhere_checks=true

The 5 servers are all connected as Group Replication thru INTERNAL Network Interface. 
The setting below tells each of the member server to connect to “interface:port”.   “primary|secondary” on the machine can be the interface to INTERNAL Network dedicated for Group Replication.  
S1 : group_replication_local_address=primary:4306
S2:  group_replication_local_address=primary:4316
S3: group_replication_local_address=primary:4326
S4: group_replication_local_address=secondary:4306
S5: group_replication_local_address=secondary:4306


group_replication_group_seeds =primary:4306,primary:4316,primary:4326,secondary:4306,secondary:4316


Application connecting to the MySQL Servers is  going thru another Network Interface.  MySQL Router is used as the routing between MySQL Servers and the Application(s).

When the 5 MySQL Servers are up and running with the Group Replication, the member status can be checked using the following SQL statement on Performance_Schema
mysql> SELECT * FROM performance_schema.replication_group_members;
The following screenshot shows the 5 MySQL Instances running on 2 machines (virtual-23, virtual-24).  Given that the virtual-23 has 3 MySQL Instances and virtual-24 has 2 MySQL Instances.  They are all running with ONLINE state.


If there is any problem to the INTERVAL Network Interface between the 2 machines.   MySQL Instances (S1-S2-S3) is separated from (S4-S5).
It forms 2 group of Servers which they are disjointed.

 On the Majority Group with the Machine 1 (virtual-23), all three Server S1-S2-S3 are connected as ONLINE.   The size of the group becomes 3.
Where as On the Machine 2 (virtual-24), the Group Size is still 5 but only the 2 servers are ONLINE.   The size of 2 is a minority with regards to the Group Size as 5.   

The above diagram shows the Member State of Machine1:virtual-23 on the Left Side.  The Member State of Machine2:virtual-24 on the Right Side.
Machine1:virtual-23 has 3 ONLINE members in the Group. 
Machine2:virtual-24 has 2 ONLINE members out of the 5 member size in the Group.   3 members are marked as UNREACHABLE. 
Application if connected to this minority group (S4-S5) Servers may have uncertainty with data.  Updating data is prohibited by the nature of the Group Replication Architecture.

The gr_watchdog is shared as illustration to detect the member state in the group and take action (SHUTDOWN | READONLY | RESTARTGR) on the server(s).

A script “gr_watchdog.sh” is created to monitor the local MySQL Servers on the machine.

The gr_watchdog.sh monitor each of the MySQL Server Instances on the machine by using the
--login-path  which is created by mysql_config_editor.
For details about the “mysql_config_editor”, refer to the documentation :
https://dev.mysql.com/doc/refman/5.7/en/mysql-config-editor.html

A stored function “gr_is_major_online_member()” as show below :
delimiter //
use sys//
drop function if exists gr_is_major_online_member//
create function gr_is_major_online_member()
        returns boolean NOT DETERMINISTIC
        READS SQL DATA
BEGIN
        DECLARE li_row INT DEFAULT 0;
        DECLARE CONTINUE HANDLER FOR NOT FOUND SET li_row=0;
        SELECT IF( MEMBER_STATE='ONLINE' AND   ((SELECT COUNT(*) FROM performance_schema.replication_group_members
                WHERE MEMBER_STATE != 'ONLINE') >=  ((SELECT COUNT(*) FROM performance_schema.replication_group_members)/2) = 0), 1, 0 )
                INTO li_row   FROM performance_schema.replication_group_members JOIN performance_schema.replication_group_member_stats USING(member_id);
        IF li_row = 1 THEN
                RETURN(true);
        END IF;
        RETURN(false);
END;
//

If MySQL servers are in a Group Replication and they are all connected, executing the stored function creation on any of the server instance will be able to install the Stored Function to all server instances.



The gr_comm.sh contains the configuration settings.
export MYSQL_HOME=/usr/local/mysql
Provide the MYSQL HOME installation
export PATH=$MYSQL_HOME/bin:$PATH
Add to the PATH
export ACTION=READONLY
# ACTION if there is any failure
 SHUTDOWN – shutdown using mysqladmin
READONLY – issue set READ_ONLY and SUPER_READ_ONLY on the server
 RESTARTGR – issue “stop group_replication;” and start group_replication;” to restart the Group Replication
export RETRY_COUNT=3
In case of Server Failed in the Group (Not online or not in majority group), the retry count determine the number of retry before ACTION is taken
export QUIET_MODE=0
INFO logging if QUIET_MODE=0
export GRSERVERS=gr3306:gr3316:gr3326
The member servers to be checked.  It is based on the
--login-path from the mysql_config_editor. 
Format :login-path[:login-path]…
export GRINTERVAL=30
Polling interval
export OUTDIR=/home/mysql/demo/GroupReplication/grwatchdog/log
Output directory (privilege to “WRITE and CREATE” file(s)

Starting gr_watchdog.sh
# gr_watchdog.sh start


Stopping gr_watchdog.sh
# gr_watchdog.sh stop
OR
kill the process

To illustrate the gr_watchdog for the ACTION=READONLY; while ALL member servers are running, issue “stop group_replication;” on one of the server.   After retrying up to RETRY_COUNT=3, the server is set to READ ONLY – including READ_ONLY and SUPER_READ_ONLY.  

To enable the GROUP REPLICATION, be sure to disable the READ_ONLY and SUPER_READ_ONLY if the “group_replication_single_primary_mode” is not set to TRUE.




You can download the grwatchdog.tar from the following LINK
https://github.com/ivanxma/grwatchdog/blob/master/grwatchdogV1.tar 
 
Any comment, appreciate your sharing!!!    

Again, the is just a demonstration how we can create external arbitrator / watchdog for MySQL Group Replication Servers.