Thursday, December 4, 2014

Extending the data from MySQL Cluster using Replication to INNODB engine

MySQL Cluster NDB is a transaction engine and allows multi-master for write operations.  One may consider the architecture using other storage engine such as INNODB as slave server to provide the historical database.  This serves the analytic applications with historical data.  The master - slave topology allows scale out for both Master and Slave Servers.

In order to achieve the MySQL Cluster Replication between NDB and other storage engine together with providing historical data,
1. Setup the MySQL Cluster Replication
2. Disable binary logging for data deletion process - Ensure data removal / house keeping process on transaction tables in the MySQL Cluster not to propagate to the slaves.  Otherwise, data from slave servers will also be cleaned up.

Setup the MySQL Cluster Replication
a. With a running MySQL Cluster, management node should be running.   Attaching additional MySQL Server for the replication without changing existing processes.
Assuming there are tables created with engine=ndb in database : test.   The changes with the tables in test will be replicated to slave database.

Database : test
Table : mytable1, mytable2 
mysql > create table test.mytable1 (f1 int not null primary key, f2 varchar(20)) engine=ndb;
 mysql > create table test.mytable2 (f1 int not null primary key, f2 varchar(20)) engine=ndb;
With addition of mysqld for Replication Master Server, make sure the cluster has free node id for the connection.  The below cluster (under MySQL Cluster Manager-mcm)  has 3 mysqld configured.  The last one 'connected' is the free slot created or the Replication Master Server.

b. Create the MySQL Server as Master Server
port=<master port>

The sync-binlog value of 1 is the safest choice because in the event of a crash you lose at most one commit group from the binary log.

The binlog_cache_size defines the cache size for holding changes to binary log during a transaction.

c. Create the MySQL Server as Slave Server
port=<slave port>
The slave-exec-mode is used in replication conflict resolution and error checking.  IDEMPOTENT mode causes suppression of duplicate-key and no-key-found errors.

The IDEMPOTENT mode is one of the way to take care of mysql.ndb_apply_status replication from MySQL Cluster.  There are choice to ensure replication to be successful.
1. with IDEMPOTENT setting on slave (but this may have problem if there is replication issue.)
2. Only replicate the database with the data.  That means the mysql database (including the ndb_apply_status ) will not be replicated.
3. Exclude the table (mysql.ndb_apply_status) from replication.
For details of this issue : Please check the documentation link

d. Configure the Master Server
  • create a user for replication
    mysql-master> grant replication slave on *.* to 'repuser'@'slave-host' identified by 'repuser';

e. Configure the Slave Server
    Because the Master Server is the MySQL Cluster, there are ndb tables in mysql database with the ndb information.   In particularly, the mysql.ndb_apply_status is updated whenever there is data replicated.   We need to take special action to deal with the ndb_apply_status.  Otherwise, the replication will fail.  Please see the documentation on MySQL Cluster Replication to other storage engine section on the the URL
  • Create the ndb_apply_status in mysql DB
    mysql-slave> use mysql;
    mysql-slave> CREATE TABLE ndb_apply_status (
    mysql-slave> server_id int(10) unsigned NOT NULL,
    mysql-slave> epoch bigint(20) unsigned NOT NULL,
    mysql-slave> log_name varchar(255) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
    mysql-slave> start_pos bigint(20) unsigned NOT NULL,
    mysql-slave> end_pos bigint(20) unsigned NOT NULL,
    mysql-slave> PRIMARY KEY (server_id) USING HASH)
    mysql-slave> ENGINE=INNODB;
  • Set up the replication connection
    mysql-slave> change master to master_host='master-host', master_port=master-port, master_user='repuser', master_password='repuser';
f. Dump master schema and data and change the engine-ndbcluster to engine=innodb
 mysqldump --no-data --routines --triggers -uroot -h<master-host> -P<master-port> test > /tmp/mydump_schema.sql
sed 's/ndbcluster/innodb/g' /tmp/mydump_schema.sql > /tmp/mydump2.sql

mysqldump --no-create-info -uroot -h<master-host> -P<master-port> --master-data=1 test > /tmp/mydump_data.sql

g. Load the schema and data into the Slave Server
#mysql -uroot -h<slave-host> -P<slave-port> --Dtest< /tmp/mydump2.sql
#mysql -uroot -h<slave-host> -P<slave-port> -Dtest < /tmp/mydump_data.sql

#mysql -uroot  -h<slave-host> -P<slave-port> -e "start slave;"

The replication setup is done for your database 'test'.

To test for the replication from ndb to storage engine=innodb, creating data on MySQL Cluster.   The mysql  client program can connect to any of the mysqld nodes within the cluster. Inserting data can be applied to any of the mysqld nodes.   Data will be visible within the cluster including the MySQL Server for Replication as Master.
mysql-node1> insert into test.mytable1 values(10, 'aaa');
mysql-node1> insert into test.mytable1 values(10, 'aaa');
mysql-node1> insert into test.mytable1 values(10, 'aaa');

On Master Server, check on the epoch value.

On Slave Server check on the 'show slave status;' to ensure there is no error and the Slave_IO_Running and Slave_SQL_Running are marked "Yes"

Check on the epoch having applied to the Slave server.  The epoch value should be the same as the value from the master server.   With more data changes to the MySQL Cluster, the epoch value continues to change.  

The database content on (Cluster & Master) is now replicated across to the slave server.  Any insert, delete, update on tables is replicated.   Housekeeping process does removal of data from transaction tables.   The actions are replicated the same way to the slave.

Historical data can either be handled manually by running data copy to another historical table.   This is safe and is recommended in most cases.
Disable binary logging on for those housekeeping process which means the data removal is not replicated across.  This has risk of corrupting data.
a. truncate table reset the AUTO_INCREMENT column.   With this not replicated to Slave, it will produce error.
b. It is difficult to compare the record between Slave and Master because the record count will be different.
c. If by a chance the housekeeping process/program does not have the sql_log_bin=0, the records from slave server will be removed.  
Disable binary logging for data deletion process
Running set sql_log_bin=0; for the session before the execution of the deletion SQL.

Ref :