Sidebar

Main Menu Mobile

  • Home
  • Blog(s)
    • Marco's Blog
  • Technical Tips
    • MySQL
      • Store Procedure
      • Performance and tuning
      • Architecture and design
      • NDB Cluster
      • NDB Connectors
      • Perl Scripts
      • MySQL not on feed
    • Applications ...
    • Windows System
    • DRBD
    • How To ...
  • Never Forget
    • Environment
TusaCentral
  • Home
  • Blog(s)
    • Marco's Blog
  • Technical Tips
    • MySQL
      • Store Procedure
      • Performance and tuning
      • Architecture and design
      • NDB Cluster
      • NDB Connectors
      • Perl Scripts
      • MySQL not on feed
    • Applications ...
    • Windows System
    • DRBD
    • How To ...
  • Never Forget
    • Environment

Support for Percona XtraDB Cluster in ProxySQL (Part Two)

Details
Marco Tusa
MySQL
01 January 2021

How scheduler and script stand in supporting failover (Percona and Marco example) 

In part one of this series,  I had illustrated how simple scenarios may fail or have problems when using Galera native support inside ProxySQL. In this post, I will repeat the same tests but using the scheduler option and the external script.

The Scheduler

First a brief explanation about the scheduler.

The scheduler inside ProxySQL was created to allow administrators to extend ProxySQL capabilities. The scheduler gives the option to add any kind of script or application and run it at the specified interval of time. The scheduler was also the initial first way we had to deal with Galera/Percona XtraDB Cluster (PXC) node management in case of issues. 

The scheduler table is composed as follows:

CREATE TABLE scheduler (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1,
interval_ms INTEGER CHECK (interval_ms>=100 AND interval_ms<=100000000) NOT NULL,
filename VARCHAR NOT NULL,
arg1 VARCHAR,
arg2 VARCHAR,
arg3 VARCHAR,
arg4 VARCHAR,
arg5 VARCHAR,
comment VARCHAR NOT NULL DEFAULT '')
...
CREATE TABLE scheduler (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1,
interval_ms INTEGER CHECK (interval_ms>=100 AND interval_ms<=100000000) NOT NULL,
filename VARCHAR NOT NULL,
arg1 VARCHAR,
arg2 VARCHAR,
arg3 VARCHAR,
arg4 VARCHAR,
arg5 VARCHAR,
comment VARCHAR NOT NULL DEFAULT '')

The relevant elements are:

  • Active: that defines if the scheduler should execute or not the external script
  • Interval_ms: frequency of the execution. This has NO check if previous executions terminate. Given that a script must include a check to prevent launching multiple instances which will probably create conflicts and resource issues.
  • Filename: the FULL path of the script/app you want to be executed.
  • Arg(s): whatever you want to pass as arguments. When you have a complex script, either use a configuration file or collapse multiple arguments in a single string.

The Scripts

In this blog, I will present two different scripts (as examples). Both will cover the scenarios as in the previous article and can do more, but I will focus only on that part for now.

One script is written in Bash and is the porting of the proxysql_galera_checker Percona was using with ProxySQL-admin in ProxySQL version 1.4. The script is available here from Percona-lab (git clone ).

The other, written by me, is written in Perl and is probably the first script that came out in 2016. I have done some enhancements and bug fixing to it during the years. Available here (git clone).

Both are offered here as examples and I am not suggesting to use them in critical production environments.

The Setup

To use the two scripts some custom setup must be done. First of all, check that the files are executable by the user running ProxySQL.

Let’s start with mine in Perl

To make it work we need to define a set of host groups that will work as Reader/Writer/Backup-writer/backup-reader (optional but recommended). The difference from the native support is that instead of having them indicated in a specialized table, we will use the mysql_servers table.

  • Writer: 100
  • Readers: 101
  • Backup Writers:8100
  • Backup Readers: 8101

Given the above, on top of the already defined servers in the previous article, we just need to add the 8000 HGs. 

For example:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.22',8100,3306,1000,2000,'Failover server preferred');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.23',8100,3306,999,2000,'Second preferred');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.233',8100,3306,998,2000,'Third and last in the list');

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.22',8101,3306,100,2000,'');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.23',8101,3306,1000,2000,'');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.233',8101,3306,1000,2000,'');
...
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.22',8100,3306,1000,2000,'Failover server preferred');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.23',8100,3306,999,2000,'Second preferred');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.233',8100,3306,998,2000,'Third and last in the list');

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.22',8101,3306,100,2000,'');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.23',8101,3306,1000,2000,'');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.233',8101,3306,1000,2000,'');

After that we need to insert the instructions for the scheduler:
INSERT INTO scheduler (id,active,interval_ms,filename,arg1) values (10,0,2000,"/opt/tools/proxy_sql_tools/galera_check.pl","-u=cluster1 -p=clusterpass -h=192.168.4.191 -H=100:W,101:R -P=6032 --retry_down=2 --retry_up=1 --main_segment=2 --debug=0 --log=/var/lib/proxysql/galeraLog --active_failover=1");

The result will be:

    id: 10
active: 0
interval_ms: 2000
filename: /opt/tools/proxy_sql_tools/galera_check.pl
arg1: -u=cluster1 -p=clusterpass -h=192.168.4.191 -H=100:W,101:R -P=6032 --retry_down=2 --retry_up=1 --main_segment=2 --debug=0 --log=/var/lib/proxysql/galeraLog --active_failover=1
arg2: NULL
arg3: NULL
arg4: NULL
arg5: NULL
comment:
...
    id: 10
active: 0
interval_ms: 2000
filename: /opt/tools/proxy_sql_tools/galera_check.pl
arg1: -u=cluster1 -p=clusterpass -h=192.168.4.191 -H=100:W,101:R -P=6032 --retry_down=2 --retry_up=1 --main_segment=2 --debug=0 --log=/var/lib/proxysql/galeraLog --active_failover=1
arg2: NULL
arg3: NULL
arg4: NULL
arg5: NULL
comment:

Please refer to the instruction in Github for the details of the parameters. What we can specify here is:

  • -H=100:W,101:R Are the Host Group we need to refer to as the ones dealing with our PXC cluster
  • –active_failover=1 Failover method to apply
  • –retry_down=2 –retry_up=1 If action must be taken immediately or if a retry is to be done. This is to avoid the possible jojo effect due to any delay from the node or network.  

Always set it to 0 and activate only when all is set and you are ready to go. Once the above is done, the script ready to be used by ProxySQL is the galera_check script.

Percona proxysql_galera_checker

One limitation this script has is that you cannot use different IPs for the PXC internal communication and the ProxySQL node. Given that, we need to modify the setup we had in the previous blog to match the script requirements. Also here we need to define which HG will be the writer which the reader, but we will specify the internal IPs, and, of course, ProxySQL must have access to that network as well.

  • Writer HG : 200
  • Reader HG: 201
  • Network IPs 10.0.0.22 – 23 – 33

Given that, our ProxySQL setup will be:

delete from mysql_users where username='app_test';
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent,comment) values ('app_test','test',1,200,'mysql',1,'application test user DC1');
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;

delete from mysql_query_rules where rule_id in(1040,1042);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(1040,6033,'app_test',200,1,3,'^SELECT.*FOR UPDATE',1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(1042,6033,'app_test',201,1,3,'^SELECT.*$',1);
load mysql query rules to run;save mysql query rules to disk;

delete from mysql_servers where hostgroup_id in (200,201);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('10.0.0.22',200,3306,10000,2000,'DC1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('10.0.0.22',201,3306,100,2000,'DC1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('10.0.0.23',201,3306,10000,2000,'DC1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('10.0.0.33',201,3306,10000,2000,'DC1');

load mysql servers to run;save mysql servers to disk;
...
delete from mysql_users where username='app_test';
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent,comment) values ('app_test','test',1,200,'mysql',1,'application test user DC1');
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;

delete from mysql_query_rules where rule_id in(1040,1042);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(1040,6033,'app_test',200,1,3,'^SELECT.*FOR UPDATE',1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) values(1042,6033,'app_test',201,1,3,'^SELECT.*$',1);
load mysql query rules to run;save mysql query rules to disk;

delete from mysql_servers where hostgroup_id in (200,201);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('10.0.0.22',200,3306,10000,2000,'DC1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('10.0.0.22',201,3306,100,2000,'DC1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('10.0.0.23',201,3306,10000,2000,'DC1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('10.0.0.33',201,3306,10000,2000,'DC1');

load mysql servers to run;save mysql servers to disk;

As you can see here we need to redefine also the user and query rules to match the different HGs, if you use the same (100 -101) no need to do that. 

Now it’s time to add the line in for the scheduler:

delete from scheduler where id=60;
INSERT INTO scheduler (id,active,interval_ms,filename,arg1) values (60,0,3000,"/opt/tools/proxysql-scheduler/proxysql_galera_checker","--config-file=/opt/tools/proxysql-scheduler/proxysql-admin-sample.cnf --writer-is-reader=always --write-hg=200 --read-hg=201 --writer-count=1 --priority=10.0.0.22:3306,10.0.0.23:3306,10.0.0.33:3306 --mode=singlewrite --debug --log=/tmp/pxc_test_proxysql_galera_check.log");
LOAD SCHEDULER TO RUNTIME;SAVE SCHEDULER TO DISK;
...
delete from scheduler where id=60;
INSERT INTO scheduler (id,active,interval_ms,filename,arg1) values (60,0,3000,"/opt/tools/proxysql-scheduler/proxysql_galera_checker","--config-file=/opt/tools/proxysql-scheduler/proxysql-admin-sample.cnf --writer-is-reader=always --write-hg=200 --read-hg=201 --writer-count=1 --priority=10.0.0.22:3306,10.0.0.23:3306,10.0.0.33:3306 --mode=singlewrite --debug --log=/tmp/pxc_test_proxysql_galera_check.log");
LOAD SCHEDULER TO RUNTIME;SAVE SCHEDULER TO DISK;

 

Also in this case please refer to the specifications of the parameters, but it’s worth mentioning:

  • –write-hg=200 –read-hg=201 Host groups definition
  • –writer-is-reader=always Keep this as ALWAYS please, we will see you do not need anything different.
  • –mode=singlewrite Possible modes are load balancer and single writer. This is refuse from the old. Never, ever use Galera/PXC in multi-primary mode, period.
  • –priority=10.0.0.22:3306,10.0.0.23:3306,10.0.0.33:3306 This is where we define the priority for the writers.

Also in this case when loading a schedule, keep the schedule deactivated, and enable it only when ready.

The Tests

Read Test

The first test is the simple read test, so while we have sysbench running in read_only mode we remove one reader after the other.

Marco script:

+---------+-----------+---------------+----------+--------------+----------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree |
+---------+-----------+---------------+----------+--------------+----------+----------+
| 10000 | 100 | 192.168.4.22 | 3306 | ONLINE | 0 | 0 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 38 | 8 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 15 | 49 |
| 100 | 101 | 192.168.4.22 | 3306 | ONLINE | 0 | 64 |
...
+---------+-----------+---------------+----------+--------------+----------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree |
+---------+-----------+---------------+----------+--------------+----------+----------+
| 10000 | 100 | 192.168.4.22 | 3306 | ONLINE | 0 | 0 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 38 | 8 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 15 | 49 |
| 100 | 101 | 192.168.4.22 | 3306 | ONLINE | 0 | 64 |
 

As we can see, by just setting the weight we will be able to prevent sending reads to the Writer, and while some will still arrive there, it is negligible. Once we put all the readers down…

Marco script: 

+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 10000 | 100 | 192.168.4.22 | 3306 | ONLINE | 0 |
| 10000 | 101 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 10000 | 101 | 192.168.4.23 | 3306 | SHUNNED | 0 |
| 100 | 101 | 192.168.4.22 | 3306 | ONLINE | 58 |
...
+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 10000 | 100 | 192.168.4.22 | 3306 | ONLINE | 0 |
| 10000 | 101 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 10000 | 101 | 192.168.4.23 | 3306 | SHUNNED | 0 |
| 100 | 101 | 192.168.4.22 | 3306 | ONLINE | 58 |

Given the last node also if with the low weight it will serve all the reads.

Percona Script:

+---------+-----------+---------------+----------+--------------+--------
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+--------
| 10000 | 200 | 10.0.0.22 | 3306 | ONLINE | 0 |
| 10000 | 201 | 10.0.0.33 | 3306 | ONLINE | 22 |
| 10000 | 201 | 10.0.0.23 | 3306 | ONLINE | 21 |
| 100 | 201 | 10.0.0.22 | 3306 | ONLINE | 1 |
...
+---------+-----------+---------------+----------+--------------+--------
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+--------
| 10000 | 200 | 10.0.0.22 | 3306 | ONLINE | 0 |
| 10000 | 201 | 10.0.0.33 | 3306 | ONLINE | 22 |
| 10000 | 201 | 10.0.0.23 | 3306 | ONLINE | 21 |
| 100 | 201 | 10.0.0.22 | 3306 | ONLINE | 1 |
Remove the reads:
+---------+-----------+---------------+----------+--------------+-------
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+-------
| 10000 | 200 | 10.0.0.22 | 3306 | ONLINE | 0 |
| 10000 | 201 | 10.0.0.33 | 3306 | OFFLINE_SOFT | 0
| 10000 | 201 | 10.0.0.23 | 3306 | OFFLINE_SOFT | 0
| 100 | 201 | 10.0.0.22 | 3306 | ONLINE | 62 |
...
+---------+-----------+---------------+----------+--------------+-------
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+-------
| 10000 | 200 | 10.0.0.22 | 3306 | ONLINE | 0 |
| 10000 | 201 | 10.0.0.33 | 3306 | OFFLINE_SOFT | 0
| 10000 | 201 | 10.0.0.23 | 3306 | OFFLINE_SOFT | 0
| 100 | 201 | 10.0.0.22 | 3306 | ONLINE | 62 |

In both cases, no issue at all; the writer takes the load of the reads only when left alone. 

Maintenance Test

In this test, I will simply put the node down into maintenance mode using pxc_maint_mode=maintenance, as done in the other article. As a reminder, this was working fine also with native Galera.


Marco script:

+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 10000 | 100 | 192.168.4.22 | 3306 | ONLINE | 50 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 8 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 3 |
| 100 | 101 | 192.168.4.22 | 3306 | ONLINE | 0 |
| 1000000 | 200 | 10.0.0.23 | 3306 | OFFLINE_SOFT | 0 |
...
+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 10000 | 100 | 192.168.4.22 | 3306 | ONLINE | 50 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 8 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 3 |
| 100 | 101 | 192.168.4.22 | 3306 | ONLINE | 0 |
| 1000000 | 200 | 10.0.0.23 | 3306 | OFFLINE_SOFT | 0 |
 After:
+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 999 | 100 | 192.168.4.23 | 3306 | ONLINE | 50 |
| 10000 | 100 | 192.168.4.22 | 3306 | OFFLINE_SOFT | 0 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 5 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 6 |
| 100 | 101 | 192.168.4.22 | 3306 | OFFLINE_SOFT | 0 |
...
+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 999 | 100 | 192.168.4.23 | 3306 | ONLINE | 50 |
| 10000 | 100 | 192.168.4.22 | 3306 | OFFLINE_SOFT | 0 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 5 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 6 |
| 100 | 101 | 192.168.4.22 | 3306 | OFFLINE_SOFT | 0 |
Node was elected and connections on the old writer were also able to end given OFFLINE_SOFT. 
Putting back the node, removing it from maintenance:
+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 999 | 100 | 192.168.4.23 | 3306 | ONLINE | 50 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 5 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 5 |
| 100 | 101 | 192.168.4.22 | 3306 | ONLINE | 0 |
...
+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 999 | 100 | 192.168.4.23 | 3306 | ONLINE | 50 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 5 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 5 |
| 100 | 101 | 192.168.4.22 | 3306 | ONLINE | 0 |

Node WILL NOT failback by default (this is by design), this will eventually allow you to warm caches or anything else it may be meaningful before moving the node to Primary role again.

The Percona script will behave a bit differently:

+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 1000000 | 200 | 10.0.0.23 | 3306 | OFFLINE_SOFT | 0 |
| 10000 | 200 | 10.0.0.22 | 3306 | ONLINE | 50 |
| 10000 | 201 | 10.0.0.33 | 3306 | ONLINE | 4 |
| 10000 | 201 | 10.0.0.23 | 3306 | ONLINE | 10 |
| 100 | 201 | 10.0.0.22 | 3306 | ONLINE | 0 |
...
+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 1000000 | 200 | 10.0.0.23 | 3306 | OFFLINE_SOFT | 0 |
| 10000 | 200 | 10.0.0.22 | 3306 | ONLINE | 50 |
| 10000 | 201 | 10.0.0.33 | 3306 | ONLINE | 4 |
| 10000 | 201 | 10.0.0.23 | 3306 | ONLINE | 10 |
| 100 | 201 | 10.0.0.22 | 3306 | ONLINE | 0 |
Then I put the node under maintenance:
+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 1000000 | 200 | 10.0.0.23 | 3306 | ONLINE | 26 |
| 10000 | 200 | 10.0.0.22 | 3306 | OFFLINE_SOFT | 22 |
| 10000 | 201 | 10.0.0.33 | 3306 | ONLINE | 8 |
| 10000 | 201 | 10.0.0.23 | 3306 | ONLINE | 12 |
| 100 | 201 | 10.0.0.22 | 3306 | OFFLINE_SOFT | 0 |
...
+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 1000000 | 200 | 10.0.0.23 | 3306 | ONLINE | 26 |
| 10000 | 200 | 10.0.0.22 | 3306 | OFFLINE_SOFT | 22 |
| 10000 | 201 | 10.0.0.33 | 3306 | ONLINE | 8 |
| 10000 | 201 | 10.0.0.23 | 3306 | ONLINE | 12 |
| 100 | 201 | 10.0.0.22 | 3306 | OFFLINE_SOFT | 0 |

Connections will be moved to the new Writer slowly based on the application approach. 

But when I put the node back from maintenance:

+---------+-----------+---------------+----------+--------------+----------
| weight | hostgroup | srv_host | srv_port | status | ConnUsed
+---------+-----------+---------------+----------+--------------+----------
| 1000000 | 200 | 10.0.0.23 | 3306 | OFFLINE_SOFT | 0
| 10000 | 200 | 10.0.0.22 | 3306 | ONLINE | 49
| 10000 | 201 | 10.0.0.33 | 3306 | ONLINE | 5
| 10000 | 201 | 10.0.0.23 | 3306 | ONLINE | 14
| 100 | 201 | 10.0.0.22 | 3306 | ONLINE | 0
...
+---------+-----------+---------------+----------+--------------+----------
| weight | hostgroup | srv_host | srv_port | status | ConnUsed
+---------+-----------+---------------+----------+--------------+----------
| 1000000 | 200 | 10.0.0.23 | 3306 | OFFLINE_SOFT | 0
| 10000 | 200 | 10.0.0.22 | 3306 | ONLINE | 49
| 10000 | 201 | 10.0.0.33 | 3306 | ONLINE | 5
| 10000 | 201 | 10.0.0.23 | 3306 | ONLINE | 14
| 100 | 201 | 10.0.0.22 | 3306 | ONLINE | 0

The old Writer will be put back as Primary. As indicated above I consider this wrong, given we may risk putting back a node that is cold and that can affect production performance. It is true that putting it back from maintenance is a controlled action, but the more checks the better.

Testing Node Crash

Marco script:

To emulate a crash I will kill the mysqld process with kill -9 <pid>.

+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 1000 | 100 | 192.168.4.22 | 3306 | ONLINE | 50 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 12 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 4 |
| 100 | 101 | 192.168.4.22 | 3306 | ONLINE | 0 |
...
+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 1000 | 100 | 192.168.4.22 | 3306 | ONLINE | 50 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 12 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 4 |
| 100 | 101 | 192.168.4.22 | 3306 | ONLINE | 0 |

Kill the process:

59,50,53.99,6603.16,6205.21,218.97,178.98,1561.52,0.00,2.00
60,50,54.11,5674.25,5295.50,215.43,163.32,1648.20,0.00,1.00
61,50,3.99,3382.12,3327.22,30.95,23.96,2159.29,0.00,48.91 <--- start
62,50,0.00,820.35,820.35,0.00,0.00,0.00,0.00,0.00
63,50,0.00,2848.86,2550.67,195.13,103.07,0.00,0.00,0.00
64,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
65,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
66,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
67,50,50.00,4268.99,4066.99,52.00,150.00,7615.89,0.00,1.00 <--- failover end
68,50,72.00,6522.40,6096.37,268.02,158.01,1109.09,0.00,1.00
...
59,50,53.99,6603.16,6205.21,218.97,178.98,1561.52,0.00,2.00
60,50,54.11,5674.25,5295.50,215.43,163.32,1648.20,0.00,1.00
61,50,3.99,3382.12,3327.22,30.95,23.96,2159.29,0.00,48.91 <--- start
62,50,0.00,820.35,820.35,0.00,0.00,0.00,0.00,0.00
63,50,0.00,2848.86,2550.67,195.13,103.07,0.00,0.00,0.00
64,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
65,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
66,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
67,50,50.00,4268.99,4066.99,52.00,150.00,7615.89,0.00,1.00 <--- failover end
68,50,72.00,6522.40,6096.37,268.02,158.01,1109.09,0.00,1.00

Five seconds is consistently taken, of which two are because I set the scheduler to run every two seconds, and also a retry. 
And the new Primary is serving while the failed node is removed:

+---------+-----------+---------------+----------+--------------+----------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree |
+---------+-----------+---------------+----------+--------------+----------+----------+
| 999 | 100 | 192.168.4.23 | 3306 | ONLINE | 0 | 50 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 0 | 34 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 0 | 35 |
| 100 | 101 | 192.168.4.22 | 3306 | SHUNNED | 0 | 0 |
...
+---------+-----------+---------------+----------+--------------+----------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed | ConnFree |
+---------+-----------+---------------+----------+--------------+----------+----------+
| 999 | 100 | 192.168.4.23 | 3306 | ONLINE | 0 | 50 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 0 | 34 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 0 | 35 |
| 100 | 101 | 192.168.4.22 | 3306 | SHUNNED | 0 | 0 |

Percona script:

Also, in this case, the Percona script behaves a bit differently.

Before the crash: 

+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 10000 | 200 | 10.0.0.22 | 3306 | ONLINE | 49 |
| 10000 | 201 | 10.0.0.33 | 3306 | ONLINE | 5 |
| 10000 | 201 | 10.0.0.23 | 3306 | ONLINE | 14 |
| 100 | 201 | 10.0.0.22 | 3306 | ONLINE | 0 |
...
+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 10000 | 200 | 10.0.0.22 | 3306 | ONLINE | 49 |
| 10000 | 201 | 10.0.0.33 | 3306 | ONLINE | 5 |
| 10000 | 201 | 10.0.0.23 | 3306 | ONLINE | 14 |
| 100 | 201 | 10.0.0.22 | 3306 | ONLINE | 0 |
Then kill the process:
29,50,41.05,4099.74,3838.44,155.18,106.12,2009.23,0.00,0.00
30,50,8.01,1617.92,1547.79,37.07,33.06,1803.47,0.00,50.09
31,50,0.00,2696.60,2696.60,0.00,0.00,0.00,0.00,0.00 <--- start
32,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
33,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
34,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
35,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
36,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
37,50,12.96,2385.82,2172.46,91.72,121.63,8795.93,0.00,0.00 <--- failback ends 6"
38,50,39.95,4360.00,4083.38,148.80,127.82,9284.15,0.00,0.00
...
29,50,41.05,4099.74,3838.44,155.18,106.12,2009.23,0.00,0.00
30,50,8.01,1617.92,1547.79,37.07,33.06,1803.47,0.00,50.09
31,50,0.00,2696.60,2696.60,0.00,0.00,0.00,0.00,0.00 <--- start
32,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
33,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
34,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
35,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
36,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
37,50,12.96,2385.82,2172.46,91.72,121.63,8795.93,0.00,0.00 <--- failback ends 6"
38,50,39.95,4360.00,4083.38,148.80,127.82,9284.15,0.00,0.00
Variable time to recover but around 6-12 seconds.
+---------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+---------+----------+
| 1000000 | 200 | 10.0.0.23 | 3306 | ONLINE | 50 | ← new
| 10000 | 201 | 10.0.0.33 | 3306 | ONLINE | 11 |
| 10000 | 201 | 10.0.0.23 | 3306 | ONLINE | 5 |
...
+---------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+---------+----------+
| 1000000 | 200 | 10.0.0.23 | 3306 | ONLINE | 50 | ← new
| 10000 | 201 | 10.0.0.33 | 3306 | ONLINE | 11 |
| 10000 | 201 | 10.0.0.23 | 3306 | ONLINE | 5 |
New Primary is elected. But on node recovery:
+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 1000000 | 200 | 10.0.0.23 | 3306 | OFFLINE_SOFT | 50 |
| 10000 | 200 | 10.0.0.22 | 3306 | ONLINE | 0 |<--old is back
| 10000 | 201 | 10.0.0.33 | 3306 | ONLINE | 10 |
| 10000 | 201 | 10.0.0.23 | 3306 | ONLINE | 6 |
| 1000 | 201 | 10.0.0.22 | 3306 | ONLINE | 0 |
...
+---------+-----------+---------------+----------+--------------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+---------+-----------+---------------+----------+--------------+----------+
| 1000000 | 200 | 10.0.0.23 | 3306 | OFFLINE_SOFT | 50 |
| 10000 | 200 | 10.0.0.22 | 3306 | ONLINE | 0 |<--old is back
| 10000 | 201 | 10.0.0.33 | 3306 | ONLINE | 10 |
| 10000 | 201 | 10.0.0.23 | 3306 | ONLINE | 6 |
| 1000 | 201 | 10.0.0.22 | 3306 | ONLINE | 0 |
As for maintenance, when the node comes back, by default it is moved to the Primary role. As already explained I consider this wrong and dangerous, but it is a way of seeing what a script should do.

Conclusions

PXC is a complex product, the ways it can be deployed are many, and is not easy or possible to identify all of the possible variants.

Having the opportunity to use native support could be the easier to go solution, but as illustrated part one of this series, misbehavior is just around the corner and it may seriously impact your production environment.

The use of the scheduler with a properly developed script/application that handles the Galera support can guarantee better consistency and proper behavior in respect to your custom expectations. 

There are solutions out there that may fit you and your needs, but if not you can develop your own solution, and be sure that you keep consistency when changing versions of ProxySQL and/or PXC/Galera. In the end, once the main work is done, maintaining a script will be much easier than having to patch a product or wait for a feature request to be implemented. 

I know it may look like a step back, moving out from native support and using a scheduler again. But it is not, it’s just the acknowledgment that sometimes it is better to keep it simple and do some specific tuning/work, rather than trying to put the universe in a bottle which overcomplicates the problem.

Support for Percona XtraDB Cluster in ProxySQL (Part One)

Details
Marco Tusa
MySQL
01 January 2021

In recent times I have been designing several solutions focused on High Availability and Disaster Recovery. Some of them using Percona Server for MySQL with group replication, some using Percona XtraDB Cluster (PXC). What many of them had in common was the use of ProxySQL for the connection layer. This is because I consider the use of a layer 7 Proxy preferable, given the possible advantages provided in ReadWrite split and SQL filtering. 

The other positive aspect provided by ProxySQL, at least for Group Replication, is the native support which allows us to have a very quick resolution of possible node failures.

ProxySQL has Galera support as well, but in the past, that had shown to be pretty unstable, and the old method to use the scheduler was still the best way to go.

After Percona Live Online 2020 I decided to try it again and to see if at least the basics were now working fine. 

What I Have Tested

I was not looking for complicated tests that would have included different levels of transaction isolation. I was instead interested in the more simple and basic ones. My scenario was:

1 ProxySQL node v2.0.15  (192.168.4.191)
1 ProxySQL node v2.1.0  (192.168.4.108)
3 PXC 8.20 nodes (192.168.4.22/23/233) with internal network (10.0.0.22/23/33) 

ProxySQL was freshly installed. 

All the commands used to modify the configuration are here. Tests were done first using ProxySQL v2.015 then v2.1.0. Only if results diverge I will report the version and results. 

PXC- Failover Scenario

As mentioned above I am going to focus on the fail-over needs, period. I will have two different scenarios:

  • Maintenance
  • Node crash 

From the ProxySQL point of view I will have three scenarios always with a single Primary:

  • Writer is NOT a reader (option 0 and 2)
  • Writer is also a reader

The configuration of the native support will be:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.22',100,3306,10000,2000,'DC1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.22',101,3306,100,2000,'DC1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.23',101,3306,10000,2000,'DC1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.233',101,3306,10000,2000,'DC1');
...
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.22',100,3306,10000,2000,'DC1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.22',101,3306,100,2000,'DC1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.23',101,3306,10000,2000,'DC1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.233',101,3306,10000,2000,'DC1');

Galera host groups:

  • Writer: 100
  • Reader: 101
  • Backup_writer: 102
  • Offline_hostgroup: 9101

Before going ahead let us analyze the Mysql Servers settings. As you can notice I am using the weight attribute to indicate ProxySQL which is my preferred write. But I also use weight for the READ Host Group to indicate which servers should be used and how.

Given that we have that:

  • Write
    • 192.168.4.22  is the preferred Primary
    • 192.168.4.23  is the first failover 
    • 192.168.4.233 is the last chance 
  • Read
    • 192.168.4.233/23 have the same weight and load should be balanced between the two of them
    • The 192.168.4.22 given is the preferred writer should NOT receive the same load in reads and have a lower weight value.  

The Tests

First Test

The first test is to see how the cluster will behave in the case of 1 Writer and 2 readers, with the option writer_is_also_reader = 0.
To achieve this the settings for proxysql will be:

insert into mysql_galera_hostgroups (writer_hostgroup,backup_writer_hostgroup,reader_hostgroup, offline_hostgroup,active,max_writers,writer_is_also_reader,max_transactions_behind) 
values (100,102,101,9101,1,1,0,10);
...
insert into mysql_galera_hostgroups (writer_hostgroup,backup_writer_hostgroup,reader_hostgroup, offline_hostgroup,active,max_writers,writer_is_also_reader,max_transactions_behind) 
values (100,102,101,9101,1,1,0,10);

As soon as I load this to runtime, ProxySQL should move the nodes to the relevant Host Group. But this is not happening, instead, it keeps the readers in the writer HG and SHUN them.

+--------+-----------+---------------+----------+---------+
| weight | hostgroup | srv_host | srv_port | status |
+--------+-----------+---------------+----------+---------+
| 10000 | 100 | 192.168.4.233 | 3306 | ONLINE |
| 10000 | 100 | 192.168.4.23 | 3306 | SHUNNED |
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED |
| 10000 | 102 | 192.168.4.23 | 3306 | ONLINE |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE |
+--------+-----------+---------------+----------+---------+
...
+--------+-----------+---------------+----------+---------+
| weight | hostgroup | srv_host | srv_port | status |
+--------+-----------+---------------+----------+---------+
| 10000 | 100 | 192.168.4.233 | 3306 | ONLINE |
| 10000 | 100 | 192.168.4.23 | 3306 | SHUNNED |
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED |
| 10000 | 102 | 192.168.4.23 | 3306 | ONLINE |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE |
+--------+-----------+---------------+----------+---------+

This is, of course, wrong. But why does it happen?

The reason is simple. ProxySQL is expecting to see all nodes in the reader group with READ_ONLY flag set to 1. 

In ProxySQL documentation we can read:

writer_is_also_reader=0: nodes with read_only=0 will be placed either in the writer_hostgroup and in the backup_writer_hostgroup after a topology change, these will be excluded from the reader_hostgroup.

This is conceptually wrong. 

A PXC cluster is a tightly coupled replication cluster, with virtually synchronous replication. One of its benefits is to have the node “virtually” aligned with respect to the data state. 

In this kind of model, the cluster is data-centric, and each node shares the same data view.

tightly coupled

What it also means is that if correctly set the nodes will be fully consistent in data READ.

The other characteristic of the cluster is that ANY node can become a writer anytime.  While best practices indicate that it is better to use one Writer a time as Primary to prevent certification conflicts, this does not mean that the nodes not currently elected as Primary, should be prevented from becoming a writer.

Which is exactly what READ_ONLY flag does if activated.

Not only, the need to have READ_ONLY set means that we must change it BEFORE we have the node able to become a writer in case of fail-over. 

This, in short, means the need to have either a topology manager or a script that will do that with all the relative checks and logic to be safe. Which in time of fail-over means it will add time and complexity when it’s not really needed and that goes against the concept of the tightly-coupled cluster itself.

Given the above, we can say that this ProxySQL method related to writer_is_also_reader =0, as it is implemented today for Galera, is, at the best, useless. 

Why is it working for Group Replication? That is easy; because Group Replication internally uses a mechanism to lock/unlock the nodes when non-primary, when using the cluster in single Primary mode. That internal mechanism was implemented as a security guard to prevent random writes on multiple nodes, and also manage the READ_ONLY flag. 

Second Test

Let us move on and test with writer_is_also_reader = 2. Again from the documentation:

writer_is_also_reader=2 : Only the nodes with read_only=0 which are placed in the backup_writer_hostgroup are also placed in the reader_hostgroup after a topology change i.e. the nodes with read_only=0 exceeding the defined max_writers.

Given the settings as indicated above, my layout before using Galera support is:

+---------+-----------+---------------+----------+--------------+
| weight | hostgroup | srv_host | srv_port | status |
+---------+-----------+---------------+----------+--------------+
| 10000 | 100 | 192.168.4.22 | 3306 | ONLINE |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE |
| 100 | 101 | 192.168.4.22 | 3306 | ONLINE |
+---------+-----------+---------------+----------+--------------+
...
+---------+-----------+---------------+----------+--------------+
| weight | hostgroup | srv_host | srv_port | status |
+---------+-----------+---------------+----------+--------------+
| 10000 | 100 | 192.168.4.22 | 3306 | ONLINE |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE |
| 100 | 101 | 192.168.4.22 | 3306 | ONLINE |
+---------+-----------+---------------+----------+--------------+
 After enabling Galera support:
+--------+-----------+---------------+----------+---------+
| weight | hostgroup | srv_host | srv_port | status |
+--------+-----------+---------------+----------+---------+
| 10000 | 100 | 192.168.4.233 | 3306 | ONLINE |
| 10000 | 100 | 192.168.4.23 | 3306 | SHUNNED |
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE |
| 10000 | 101 | 192.168.4.22 | 3306 | ONLINE |
| 10000 | 102 | 192.168.4.23 | 3306 | ONLINE |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE |
+--------+-----------+---------------+----------+---------+
...
+--------+-----------+---------------+----------+---------+
| weight | hostgroup | srv_host | srv_port | status |
+--------+-----------+---------------+----------+---------+
| 10000 | 100 | 192.168.4.233 | 3306 | ONLINE |
| 10000 | 100 | 192.168.4.23 | 3306 | SHUNNED |
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE |
| 10000 | 101 | 192.168.4.22 | 3306 | ONLINE |
| 10000 | 102 | 192.168.4.23 | 3306 | ONLINE |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE |
+--------+-----------+---------------+----------+---------+

So node ending with 22 (the Primary elected) is not in the reader pool. Which can be ok, I assume. 

But what is not OK at all is that the READERS have now a completely different weight. Nodes x.23 and x.233 are NOT balancing the load any longer, because the weight is not the same or the one I define. It is instead copied over from the WRITER settings. 

Well of course this is wrong and not what I want. Anyhow, let’s test the READ failover.

I will use sysbench read-only:

sysbench ./src/lua/windmills/oltp_read.lua  --mysql-host=192.168.4.191 --mysql-port=6033 --mysql-user=app_test --mysql-password=test --mysql-db=windmills_s --db-driver=mysql --tables=10 --table_size=10000  --rand-type=zipfian --rand-zipfian-exp=0.5 --skip_trx=true  --report-interval=1  --mysql_storage_engine=innodb --auto_inc=off --histogram --table_name=windmills  --stats_format=csv --db-ps-mode=disable --point-selects=50 --range-selects=true --threads=50 --time=2000   run
...
sysbench ./src/lua/windmills/oltp_read.lua  --mysql-host=192.168.4.191 --mysql-port=6033 --mysql-user=app_test --mysql-password=test --mysql-db=windmills_s --db-driver=mysql --tables=10 --table_size=10000  --rand-type=zipfian --rand-zipfian-exp=0.5 --skip_trx=true  --report-interval=1  --mysql_storage_engine=innodb --auto_inc=off --histogram --table_name=windmills  --stats_format=csv --db-ps-mode=disable --point-selects=50 --range-selects=true --threads=50 --time=2000   run
mysql> select * from  runtime_mysql_galera_hostgroups \G
*************************** 1. row ***************************
writer_hostgroup: 100
backup_writer_hostgroup: 102
reader_hostgroup: 101
offline_hostgroup: 9101
active: 1
max_writers: 1
writer_is_also_reader: 2
max_transactions_behind: 10
comment: NULL
1 row in set (0.01 sec)
...
mysql> select * from  runtime_mysql_galera_hostgroups \G
*************************** 1. row ***************************
writer_hostgroup: 100
backup_writer_hostgroup: 102
reader_hostgroup: 101
offline_hostgroup: 9101
active: 1
max_writers: 1
writer_is_also_reader: 2
max_transactions_behind: 10
comment: NULL
1 row in set (0.01 sec)
 Test Running
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 100 | 100 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 1000 | 100 | 192.168.4.23 | 3306 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.22 | 3306 | ONLINE | 0 |
| 100 | 101 | 192.168.4.233 | 3306 | ONLINE | 1 |
| 1000 | 101 | 192.168.4.23 | 3306 | ONLINE | 51 |
| 100 | 102 | 192.168.4.233 | 3306 | ONLINE | 0 |
| 1000 | 102 | 192.168.4.23 | 3306 | ONLINE | 0 |
+--------+-----------+---------------+----------+---------+----------+
...
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 100 | 100 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 1000 | 100 | 192.168.4.23 | 3306 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.22 | 3306 | ONLINE | 0 |
| 100 | 101 | 192.168.4.233 | 3306 | ONLINE | 1 |
| 1000 | 101 | 192.168.4.23 | 3306 | ONLINE | 51 |
| 100 | 102 | 192.168.4.233 | 3306 | ONLINE | 0 |
| 1000 | 102 | 192.168.4.23 | 3306 | ONLINE | 0 |
+--------+-----------+---------------+----------+---------+----------+
As indicated above the reads are not balanced.  Removing node x.23 using wsrep_reject_queries=all:
+---------+--------------+---------------+--------------+----------+
| weight | hostgroup_id | srv_host | status | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100 | 100 | 192.168.4.233 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.22 | ONLINE | 0 |
| 100 | 101 | 192.168.4.233 | ONLINE | 48 |
| 100 | 102 | 192.168.4.233 | ONLINE | 0 |
+---------+--------------+---------------+--------------+----------+
...
+---------+--------------+---------------+--------------+----------+
| weight | hostgroup_id | srv_host | status | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100 | 100 | 192.168.4.233 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.22 | ONLINE | 0 |
| 100 | 101 | 192.168.4.233 | ONLINE | 48 |
| 100 | 102 | 192.168.4.233 | ONLINE | 0 |
+---------+--------------+---------------+--------------+----------+
The remaining node x.233 is taking all the writes, good. If I set wsrep_reject_queries=all also on x.233:
+---------+--------------+---------------+--------------+
| weight | hostgroup_id | srv_host | status |
+---------+--------------+---------------+--------------+
| 10000 | 100 | 192.168.4.22 | ONLINE |
| 100 | 9101 | 192.168.4.233 | SHUNNED |
| 10000 | 9101 | 192.168.4.23 | ONLINE |
+---------+--------------+---------------+--------------+
...
+---------+--------------+---------------+--------------+
| weight | hostgroup_id | srv_host | status |
+---------+--------------+---------------+--------------+
| 10000 | 100 | 192.168.4.22 | ONLINE |
| 100 | 9101 | 192.168.4.233 | SHUNNED |
| 10000 | 9101 | 192.168.4.23 | ONLINE |
+---------+--------------+---------------+--------------+

And application failed:

FATAL: mysql_drv_query() returned error 9001 (Max connect timeout reached while reaching hostgroup 101 after 10000ms) for query ‘SELECT id, millid, date,active,kwatts_s FROM windmills2 WHERE id=9364’

Now, this may be like this by design, but I have serious difficulties understanding what the reasoning is here, given we allow a platform to fail serving while we still have a healthy server. 

Last but not least I am not allowed to decide WHICH the backup_writers are, ProxySQL will choose them from my writer list of servers. SO why not also include the one I have declared as Primary, at least in case of needs?  ¯\_(ツ)_/¯

Third Test

Ok last try with writer_is_also_reader = 1.

mysql> select * from  runtime_mysql_galera_hostgroups \G
*************************** 1. row ***************************
writer_hostgroup: 100
backup_writer_hostgroup: 102
reader_hostgroup: 101
offline_hostgroup: 9101
active: 1
max_writers: 1
writer_is_also_reader: 1
max_transactions_behind: 10
comment: NULL
1 row in set (0.01 sec)
...
mysql> select * from  runtime_mysql_galera_hostgroups \G
*************************** 1. row ***************************
writer_hostgroup: 100
backup_writer_hostgroup: 102
reader_hostgroup: 101
offline_hostgroup: 9101
active: 1
max_writers: 1
writer_is_also_reader: 1
max_transactions_behind: 10
comment: NULL
1 row in set (0.01 sec)

And now I have:

+---------+--------------+---------------+--------------+----------+
| weight | hostgroup_id | srv_host | status | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100 | 100 | 192.168.4.233 | SHUNNED | 0 |
| 1000 | 100 | 192.168.4.23 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.22 | ONLINE | 0 |
| 100 | 101 | 192.168.4.233 | ONLINE | 0 |
| 1000 | 101 | 192.168.4.23 | ONLINE | 0 |
| 10000 | 101 | 192.168.4.22 | ONLINE | 35 | <-- :(
| 100 | 102 | 192.168.4.233 | ONLINE | 0 |
| 1000 | 102 | 192.168.4.23 | ONLINE | 0 |
+---------+--------------+---------------+--------------+----------+
...
+---------+--------------+---------------+--------------+----------+
| weight | hostgroup_id | srv_host | status | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100 | 100 | 192.168.4.233 | SHUNNED | 0 |
| 1000 | 100 | 192.168.4.23 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.22 | ONLINE | 0 |
| 100 | 101 | 192.168.4.233 | ONLINE | 0 |
| 1000 | 101 | 192.168.4.23 | ONLINE | 0 |
| 10000 | 101 | 192.168.4.22 | ONLINE | 35 | <-- :(
| 100 | 102 | 192.168.4.233 | ONLINE | 0 |
| 1000 | 102 | 192.168.4.23 | ONLINE | 0 |
+---------+--------------+---------------+--------------+----------+
Then remove on Reader at the time as before:
+---------+--------------+---------------+--------------+----------+
| weight | hostgroup_id | srv_host | status | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100 | 100 | 192.168.4.233 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.22 | ONLINE | 0 |
| 100 | 101 | 192.168.4.233 | ONLINE | 0 |
| 10000 | 101 | 192.168.4.22 | ONLINE | 52 | <-- :(
| 100 | 102 | 192.168.4.233 | ONLINE | 0 |
| 10000 | 9101 | 192.168.4.23 | ONLINE | 0 |
+---------+--------------+---------------+--------------+----------+

+---------+--------------+---------------+--------------+----------+
| weight | hostgroup_id | srv_host | status | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 10000 | 100 | 192.168.4.22 | ONLINE | 0 |
| 100 | 101 | 192.168.4.22 | ONLINE | 39 | <-- :(
| 100 | 9101 | 192.168.4.233 | SHUNNED | 0 |
| 10000 | 9101 | 192.168.4.23 | ONLINE | 0 |
+---------+--------------+---------------+--------------+----------+
...
+---------+--------------+---------------+--------------+----------+
| weight | hostgroup_id | srv_host | status | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100 | 100 | 192.168.4.233 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.22 | ONLINE | 0 |
| 100 | 101 | 192.168.4.233 | ONLINE | 0 |
| 10000 | 101 | 192.168.4.22 | ONLINE | 52 | <-- :(
| 100 | 102 | 192.168.4.233 | ONLINE | 0 |
| 10000 | 9101 | 192.168.4.23 | ONLINE | 0 |
+---------+--------------+---------------+--------------+----------+

+---------+--------------+---------------+--------------+----------+
| weight | hostgroup_id | srv_host | status | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 10000 | 100 | 192.168.4.22 | ONLINE | 0 |
| 100 | 101 | 192.168.4.22 | ONLINE | 39 | <-- :(
| 100 | 9101 | 192.168.4.233 | SHUNNED | 0 |
| 10000 | 9101 | 192.168.4.23 | ONLINE | 0 |
+---------+--------------+---------------+--------------+----------+

Now as you may have already realized, the point here is that, YES I have my node x.22 (Primary) able to get the READS as well, but the node was taking the whole load from the beginning. This is because of the shift ProxySQL had done in regards to the weight. 

This happens because while internally ProxySQL initially populates the internal table mysql_servers_incoming with the data from the mysql_servers, after several steps that information is overwritten using the data coming from the writer also for the readers. 

Messing up the desired results.

Fourth Test

Failover due to maintenance. In this case, I will set the writer pxc_maint_mode = MAINTENANCE to failover to another writer.
The sysbench command used:

 
sysbench ./src/lua/windmills/oltp_read_write.lua  --mysql-host=192.168.4.191 --mysql-port=6033 --mysql-user=app_test --mysql-password=test --mysql-db=windmills_s --db-driver=mysql --tables=10 --table_size=10000  --rand-type=zipfian --rand-zipfian-exp=0.5 --skip_trx=false  --report-interval=1  --mysql_storage_engine=innodb --auto_inc=off --histogram --table_name=windmills  --stats_format=csv --db-ps-mode=disable --point-selects=50 --range-selects=true --threads=50 --time=2000   run
...
sysbench ./src/lua/windmills/oltp_read_write.lua  --mysql-host=192.168.4.191 --mysql-port=6033 --mysql-user=app_test --mysql-password=test --mysql-db=windmills_s --db-driver=mysql --tables=10 --table_size=10000  --rand-type=zipfian --rand-zipfian-exp=0.5 --skip_trx=false  --report-interval=1  --mysql_storage_engine=innodb --auto_inc=off --histogram --table_name=windmills  --stats_format=csv --db-ps-mode=disable --point-selects=50 --range-selects=true --threads=50 --time=2000   run
 After started sysbench I set the writer in maintenance mode:
+-----------------------------+-------------+
| Variable_name | Value |
+-----------------------------+-------------+
| pxc_encrypt_cluster_traffic | OFF |
| pxc_maint_mode | MAINTENANCE |
| pxc_maint_transition_period | 10 |
| pxc_strict_mode | ENFORCING |
+-----------------------------+-------------+
...
+-----------------------------+-------------+
| Variable_name | Value |
+-----------------------------+-------------+
| pxc_encrypt_cluster_traffic | OFF |
| pxc_maint_mode | MAINTENANCE |
| pxc_maint_transition_period | 10 |
| pxc_strict_mode | ENFORCING |
+-----------------------------+-------------+
ProxySQL is setting the node as SHUNNED, but is not able to pass over the connection given sysbench uses sticky connections.
+---------+--------------+---------------+--------------+----------+
| weight | hostgroup_id | srv_host | status | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100 | 100 | 192.168.4.233 | SHUNNED | 0 |
| 1000 | 100 | 192.168.4.23 | ONLINE | 0 |
| 10000 | 100 | 192.168.4.22 | SHUNNED | 50 |
| 100 | 101 | 192.168.4.233 | ONLINE | 2 |
| 1000 | 101 | 192.168.4.23 | ONLINE | 13 |
| 100 | 102 | 192.168.4.233 | ONLINE | 0 |
| 10000 | 9101 | 192.168.4.22 | ONLINE | 0 |
+---------+--------------+---------------+--------------+----------+
...
+---------+--------------+---------------+--------------+----------+
| weight | hostgroup_id | srv_host | status | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100 | 100 | 192.168.4.233 | SHUNNED | 0 |
| 1000 | 100 | 192.168.4.23 | ONLINE | 0 |
| 10000 | 100 | 192.168.4.22 | SHUNNED | 50 |
| 100 | 101 | 192.168.4.233 | ONLINE | 2 |
| 1000 | 101 | 192.168.4.23 | ONLINE | 13 |
| 100 | 102 | 192.168.4.233 | ONLINE | 0 |
| 10000 | 9101 | 192.168.4.22 | ONLINE | 0 |
+---------+--------------+---------------+--------------+----------+
If your application uses sticky connections and never refreshes, you must restart the application. Adding to the sysbench command –reconnect=50 I can see that the connections are a shift to the new master as expected:
+---------+--------------+---------------+--------------+----------+
| weight | hostgroup_id | srv_host | status | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100 | 100 | 192.168.4.233 | SHUNNED | 0 |
| 1000 | 100 | 192.168.4.23 | ONLINE | 26 | <-- New Primary
| 10000 | 100 | 192.168.4.22 | SHUNNED | 19 | <-- shift
| 100 | 101 | 192.168.4.233 | ONLINE | 0 |
| 10000 | 101 | 192.168.4.23 | ONLINE | 21 |
| 100 | 102 | 192.168.4.233 | ONLINE | 0 |
| 10000 | 9101 | 192.168.4.23 | ONLINE | 0 | <-- ??
| 10000 | 9101 | 192.168.4.22 | ONLINE | 0 |
+---------+--------------+---------------+--------------+----------+
...
+---------+--------------+---------------+--------------+----------+
| weight | hostgroup_id | srv_host | status | ConnUsed |
+---------+--------------+---------------+--------------+----------+
| 100 | 100 | 192.168.4.233 | SHUNNED | 0 |
| 1000 | 100 | 192.168.4.23 | ONLINE | 26 | <-- New Primary
| 10000 | 100 | 192.168.4.22 | SHUNNED | 19 | <-- shift
| 100 | 101 | 192.168.4.233 | ONLINE | 0 |
| 10000 | 101 | 192.168.4.23 | ONLINE | 21 |
| 100 | 102 | 192.168.4.233 | ONLINE | 0 |
| 10000 | 9101 | 192.168.4.23 | ONLINE | 0 | <-- ??
| 10000 | 9101 | 192.168.4.22 | ONLINE | 0 |
+---------+--------------+---------------+--------------+----------+

As we can see ProxySQL does the failover to node x.23 as expected. But it also adds the node in the HG 9101, which is supposed to host the offline servers.

So why move the Primary there? 

Once maintenance is over, disable pxc_main_mode will restore the master. In short, ProxySQL will fail-back. 

The whole process will be not impactful if the application is NOT using sticky connection, otherwise, the application will have to deal with:

  • Error with the connection
  • Retry cycle to re-run the drop DML

Failover Because of a Crash

To check the next case I will add –mysql-ignore-errors=all to sysbench, to be able to see how many errors I will have and for how long, when in the need to failover. To simulate a crash I will KILL -9 the mysqld process on the writer.

After Kill:

98,50,53.00,6472.71,6070.73,221.99,179.99,1327.91,0.00,1.00 <--
99,50,0.00,2719.17,2719.17,0.00,0.00,0.00,0.00,50.00 <--start
100,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
101,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
102,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
103,50,0.00,2849.89,2549.90,193.99,106.00,0.00,0.00,0.00
104,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
105,50,49.85,2663.99,2556.31,23.93,83.75,7615.89,0.00,6.98 <-- done
...
98,50,53.00,6472.71,6070.73,221.99,179.99,1327.91,0.00,1.00 <--
99,50,0.00,2719.17,2719.17,0.00,0.00,0.00,0.00,50.00 <--start
100,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
101,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
102,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
103,50,0.00,2849.89,2549.90,193.99,106.00,0.00,0.00,0.00
104,50,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
105,50,49.85,2663.99,2556.31,23.93,83.75,7615.89,0.00,6.98 <-- done
In this case, it takes 6 seconds for a failover.
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 100 | 100 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 1000 | 100 | 192.168.4.23 | 3306 | ONLINE | 48 |
| 100 | 101 | 192.168.4.233 | 3306 | ONLINE | 1 |
| 1000 | 101 | 192.168.4.23 | 3306 | ONLINE | 18 |
| 100 | 102 | 192.168.4.233 | 3306 | ONLINE | 0 |
| 10000 | 9101 | 192.168.4.22 | 3306 | SHUNNED | 0 |
+--------+-----------+---------------+----------+---------+----------+
...
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 100 | 100 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 1000 | 100 | 192.168.4.23 | 3306 | ONLINE | 48 |
| 100 | 101 | 192.168.4.233 | 3306 | ONLINE | 1 |
| 1000 | 101 | 192.168.4.23 | 3306 | ONLINE | 18 |
| 100 | 102 | 192.168.4.233 | 3306 | ONLINE | 0 |
| 10000 | 9101 | 192.168.4.22 | 3306 | SHUNNED | 0 |
+--------+-----------+---------------+----------+---------+----------+
So all good here. But during one of my tests ONLY on v2.0.15 and when using the same weight, I had the following weird behavior. Once the failover is done I found that ProxySQL is sending connections to BOTH remaining nodes.

Check below the data taken one after the other nodeS start to take over, keep in mind here the PRIMARY was node 192.168.4.233:
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 10000 | 100 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.23 | 3306 | ONLINE | 10 |<--
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED | 40 |<--
| 10000 | 101 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 3 |
| 10000 | 101 | 192.168.4.22 | 3306 | ONLINE | 12 |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE | 0 |
+--------+-----------+---------------+----------+---------+----------+
...
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 10000 | 100 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.23 | 3306 | ONLINE | 37 |<--
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED | 13 |<--
| 10000 | 101 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 7 |
| 10000 | 101 | 192.168.4.22 | 3306 | ONLINE | 12 |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE | 0 |
+--------+-----------+---------------+----------+---------+----------+
...
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 10000 | 100 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.23 | 3306 | ONLINE | 49 |<--
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED | 0 |<--
| 10000 | 101 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 10 |
| 10000 | 101 | 192.168.4.22 | 3306 | ONLINE | 10 |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE | 0 |
+--------+-----------+---------------+----------+---------+----------+
...
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 10000 | 100 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.23 | 3306 | ONLINE | 10 |<--
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED | 40 |<--
| 10000 | 101 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 3 |
| 10000 | 101 | 192.168.4.22 | 3306 | ONLINE | 12 |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE | 0 |
+--------+-----------+---------------+----------+---------+----------+
...
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 10000 | 100 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.23 | 3306 | ONLINE | 37 |<--
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED | 13 |<--
| 10000 | 101 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 7 |
| 10000 | 101 | 192.168.4.22 | 3306 | ONLINE | 12 |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE | 0 |
+--------+-----------+---------------+----------+---------+----------+
...
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 10000 | 100 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.23 | 3306 | ONLINE | 49 |<--
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED | 0 |<--
| 10000 | 101 | 192.168.4.233 | 3306 | SHUNNED | 0 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 10 |
| 10000 | 101 | 192.168.4.22 | 3306 | ONLINE | 10 |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE | 0 |
+--------+-----------+---------------+----------+---------+----------+

In the end, only one node will remain as Primary, but for an X amount of time, both were serving also if only ONE node was declared ONLINE.

A Problem Along the Road… (only with v2.0.15)

While I was trying to “fix” the issue with the weight for READERS…

Let’s say we have this:

+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 10000 | 100 | 192.168.4.23 | 3306 | ONLINE | 686 |
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED | 0 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 62 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 43 |
| 10000 | 101 | 192.168.4.22 | 3306 | ONLINE | 19 |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE | 0 |
+--------+-----------+---------------+----------+---------+----------+
...
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 10000 | 100 | 192.168.4.23 | 3306 | ONLINE | 686 |
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED | 0 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 62 |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE | 43 |
| 10000 | 101 | 192.168.4.22 | 3306 | ONLINE | 19 |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE | 0 |
+--------+-----------+---------------+----------+---------+----------+

And I want to release some of the READ load from WRITER (currently 192.168.4.23).

If I simply do:

update mysql_servers set weight=100 where hostgroup_id=101 and hostname='192.168.4.23';

+--------------+---------------+------+-----------+--------+--------+
| hostgroup_id | hostname | port | gtid_port | status | weight |
+--------------+---------------+------+-----------+--------+--------+
| 100 | 192.168.4.23 | 3306 | 0 | ONLINE | 10000 |
| 101 | 192.168.4.22 | 3306 | 0 | ONLINE | 10000 |
| 101 | 192.168.4.23 | 3306 | 0 | ONLINE | 100 |
| 101 | 192.168.4.233 | 3306 | 0 | ONLINE | 10000 |
+--------------+---------------+------+-----------+--------+--------+
...
update mysql_servers set weight=100 where hostgroup_id=101 and hostname='192.168.4.23';

+--------------+---------------+------+-----------+--------+--------+
| hostgroup_id | hostname | port | gtid_port | status | weight |
+--------------+---------------+------+-----------+--------+--------+
| 100 | 192.168.4.23 | 3306 | 0 | ONLINE | 10000 |
| 101 | 192.168.4.22 | 3306 | 0 | ONLINE | 10000 |
| 101 | 192.168.4.23 | 3306 | 0 | ONLINE | 100 |
| 101 | 192.168.4.233 | 3306 | 0 | ONLINE | 10000 |
+--------------+---------------+------+-----------+--------+--------+
Now I load it into runtime, and… if I am lucky:
+--------+-----------+---------------+----------+---------+
| weight | hostgroup | srv_host | srv_port | status |
+--------+-----------+---------------+----------+---------+
| 10000 | 100 | 192.168.4.23 | 3306 | ONLINE |
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE |
| 100 | 101 | 192.168.4.23 | 3306 | ONLINE |
| 10000 | 101 | 192.168.4.22 | 3306 | ONLINE |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE |
+--------+-----------+---------------+----------+---------+
...
+--------+-----------+---------------+----------+---------+
| weight | hostgroup | srv_host | srv_port | status |
+--------+-----------+---------------+----------+---------+
| 10000 | 100 | 192.168.4.23 | 3306 | ONLINE |
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE |
| 100 | 101 | 192.168.4.23 | 3306 | ONLINE |
| 10000 | 101 | 192.168.4.22 | 3306 | ONLINE |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE |
+--------+-----------+---------------+----------+---------+
And then it is changed to:
+--------+-----------+---------------+----------+---------+
| weight | hostgroup | srv_host | srv_port | status |
+--------+-----------+---------------+----------+---------+
| 10000 | 100 | 192.168.4.23 | 3306 | ONLINE |
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE |
| 10000 | 101 | 192.168.4.22 | 3306 | ONLINE |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE |
+--------+-----------+---------------+----------+---------+
...
+--------+-----------+---------------+----------+---------+
| weight | hostgroup | srv_host | srv_port | status |
+--------+-----------+---------------+----------+---------+
| 10000 | 100 | 192.168.4.23 | 3306 | ONLINE |
| 10000 | 100 | 192.168.4.22 | 3306 | SHUNNED |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE |
| 10000 | 101 | 192.168.4.23 | 3306 | ONLINE |
| 10000 | 101 | 192.168.4.22 | 3306 | ONLINE |
| 10000 | 102 | 192.168.4.22 | 3306 | ONLINE |
+--------+-----------+---------------+----------+---------+
As you can notice ProxySQL initially set it to the value I choose. After, it changed back to what was set in the HG 100. 
But worse, is that if I am not lucky:
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 100 | 100 | 192.168.4.23 | 3306 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.22 | 3306 | ONLINE | 0 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 718 |
| 100 | 101 | 192.168.4.23 | 3306 | ONLINE | 0 |
| 10000 | 101 | 192.168.4.22 | 3306 | SHUNNED | 0 |
| 100 | 102 | 192.168.4.23 | 3306 | ONLINE | 0 |
+--------+-----------+---------------+----------+---------+----------+
...
+--------+-----------+---------------+----------+---------+----------+
| weight | hostgroup | srv_host | srv_port | status | ConnUsed |
+--------+-----------+---------------+----------+---------+----------+
| 100 | 100 | 192.168.4.23 | 3306 | SHUNNED | 0 |
| 10000 | 100 | 192.168.4.22 | 3306 | ONLINE | 0 |
| 10000 | 101 | 192.168.4.233 | 3306 | ONLINE | 718 |
| 100 | 101 | 192.168.4.23 | 3306 | ONLINE | 0 |
| 10000 | 101 | 192.168.4.22 | 3306 | SHUNNED | 0 |
| 100 | 102 | 192.168.4.23 | 3306 | ONLINE | 0 |
+--------+-----------+---------------+----------+---------+----------+

it changes the value (randomly) also for the HG 102 which will impact also the WRITER HG causing a failover. At this point, I stopped testing. Too many things making a failover scenario too unpredictable. 

Conclusions

ProxySQL has a great concept behind it and is for sure covering a really needed gap existing in the MySQL environment, optimizing and powering up the connection layer between the application layer to the data layer.  

But, in regards to the Galera support, we are not there. The support provided is not only limited, it is fallacious, and could lead to serious and/or unexpected problems. Also using the option writer_is_also_reader=1, which is the only one worthy of usage, we still see too many issues in how the nodes are managed in case of serious events as failover.

ProxySQL v2.1.0 seems to have fixed some instabilities, but we still have too many open issues to trust the Galera native support. My advice is to stay away from it and use the scheduler to deal with the Galera cluster. Write a robust script that will cover your specific needs if you must customize the actions. A scheduler will serve you well. 

If too lazy to do so, there is a sample in Percona-Lab. This is the old script used in ProxySQL 1.4.x modified to work with ProxySQL 2.x. I have also written one a long time ago that can help as well here. Both come without any guarantee and I advise you to use them as examples for your own, see Part 2 of this post for details. 

Finally, let me say that ProxySQL is a great tool, but no tool can cover all. People like me that have been around for long enough have seen this happening many times, and it is of no surprise. 

Great MySQL to all.

References

https://www.percona.com/doc/percona-xtradb-cluster/LATEST/install/index.html

https://galeracluster.com/

https://proxysql.com/blog/proxysql-native-galera-support/

https://www.percona.com/blog/2019/02/20/proxysql-native-support-for-percona-xtradb-cluster-pxc/

https://proxysql.com/documentation/galera-configuration/

Aurora multi-Primary first impression

Details
Marco Tusa
MySQL
28 October 2020

For what reason should I use a real multi-Primary setup?
To be clear, not a multi-writer solution where any node can become the active writer in case of needs, as for PXC or PS-Group_replication.
No, we are talking about a multi-Primary setup where I can write at the same time on multiple nodes.
I want to insist on this “why?”.

After having excluded the possible solutions mentioned above, both covering the famous 99,995% availability, which is 26.30 minutes downtime in a year, what is left?

Disaster Recovery? Well that is something I would love to have, but to be a real DR solution we need to put several kilometers (miles for imperial) in the middle. 

And we know (see here and here) that aside some misleading advertising, we cannot have a tightly coupled cluster solution across geographical regions.

So, what is left? I may need more HA, ok that is a valid reason. Or I may need to scale the number of writes, ok that is a valid reason as well.
This means, at the end, that I am looking to a multi-Primary because:

  • Scale writes (more nodes more writes)
    • Consistent reads (what I write on A must be visible on B)
  • Gives me 0 (zero) downtime, or close to that (5 nines is a maximum downtime of 864 milliseconds per day!!)
  • Allow me to shift the writer pointer at any time from A to B and vice versa, consistently.   

Now, keeping myself bound to the MySQL ecosystem, my natural choice would be MySQL NDB cluster.

But my (virtual) boss was at AWS re-invent and someone mentioned to him that Aurora Multi-Primary does what I was looking for.

This (long) article is my voyage in discovering if that is true or … not.

Given I am focused on the behaviour first, and NOT interested in absolute numbers to shock the audience with millions of QPS, I will use low level Aurora instances. And will perform tests from two EC2 in the same VPC/region of the nodes.

instances

You can find the details about the tests on GitHub here Finally I will test:

  • Connection speed
  • Stale read
  • Write single node for baseline
  • Write on both node:
    • Scaling splitting the load by schema
    • Scaling same schema 

Tests results

Let us start to have some real fun. The first test is … 

Connection Speed

The purpose of this test is to evaluate the time taken in opening a new connection and time taken to close it. The action of open/close connection can be a very expensive operation especially if applications do not use a connection pool mechanism.

a1

a3

As we can see ProxySQL results to be the most efficient way to deal with opening connections, which was expected given the way it is designed to reuse open connections towards the backend. 

a2

a4

Different is the close connection operation in which ProxySQL seems to take a little bit longer.  As global observation we can say that using ProxySQL we have more consistent behaviour. Of course this test is a simplistic one, and we are not checking the scalability (from 1 to N connections) but it is good enough to give us the initial feeling. Specific connection tests will be the focus of the next blog on Aurora MM. 

Stale Reads

Aurora MultiPrimary use the same mechanism of the default Aurora to update the buffer pool:

aurora multi master sharing BP

Using the Page Cache update, just doing both ways. This means that the Buffer Pool of Node2 is updated with the modification performed in Node1 and vice versa.

To verify if an application would be really able to have consistent reads, I have run this test. This test is meant to measure if, and how many, stale reads we will have when writing on a node and reading from the other.

Amazon Aurora multi Primary has 2 consistency model:

Consistency model

As an interesting fact the result was that with the default consistency model (INSTANCE_RAW), we got 100% stale read.

Given that I focused on identifying the level of the cost that exists when using the other consistency model (REGIONAL_RAW) that allows an application to have consistent reads.

The results indicate an increase of the 44% in total execution time, and of the 95% (22 time slower) in write execution. 

a5

a6

a7

It is interesting to note that the time taken is in some way predictable and consistent between the two consistency models. 

The graph below shows in yellow how long the application must wait to see the correct data on the reader node. While in blue is the amount of time the application waits to get back the same consistent read because it must wait for the commit on the writer.

   a8

As you can see the two are more or less aligned. Given the performance cost imposed by using REGIONAL_RAW,  all the other tests are done the defaut INSTANCE_RAW, unless explicitly stated.

Writing tests

All tests run in this section were done using sysbench-tpcc with the following settings:

sysbench ./tpcc.lua --mysql-host=<> --mysql-port=3306 --mysql-user=<> --mysql-password=<> --mysql-db=tpcc --time=300 --threads=32 --report-interval=1 --tables=10 --scale=15  --mysql_table_options=" CHARSET=utf8 COLLATE=utf8_bin"  --db-driver=mysql prepare

sysbench /opt/tools/sysbench-tpcc/tpcc.lua --mysql-host=$mysqlhost --mysql-port=$port --mysql-user=<> --mysql-password=<> --mysql-db=tpcc --db-driver=mysql --tables=10 --scale=15 --time=$time  --rand-type=zipfian --rand-zipfian-exp=0 --report-interval=1 --mysql-ignore-errors=all --histogram  --report_csv=yes --stats_format=csv --db-ps-mode=disable --threads=$threads run

Write Single node (Baseline)

Before starting the comparative analysis, I was looking to define what was the “limit” of traffic/load for this platform. 

Picture 1

t1 t2

From the graph above, we can see that this setup scales up to 128 threads after that, the performance remains more or less steady. 

Amazon claims that we can mainly double the performance when using both nodes in write mode and use a different schema to avoid conflict.

scalability

 

Once more remember I am not interested in the absolute numbers here, but I am expecting the same behaviour Given that our expectation is to see:

Picture 2

Write on both nodes different schemas

So AWS recommend this as the scaling solution:

split traffic by db table partition to avoid conflicts

And I diligently follow the advice.

I used 2 EC2 nodes in the same subnet of the Aurora Node, writing to a different schema (tpcc & tpcc2). 

Overview

Let us make it short and go straight to the point. Did we get the expected scalability?

Well no:

Picture 3

We just had 26% increase, quite far to be the expected 100% Let us see what happened in detail (if not interested just skip and go to the next test).

Node 1

Picture 5

Node 2

Picture 6

As you can see Node1 was (more or less) keeping up with the expectations and being close to the expected performance.
But Node2 was just not keeping up, performances there were just terrible. 

The graphs below show what happened.

While Node1 was (again more or less) scaling up to the baseline expectations (128 threads), Node2 collapsed on its knees at 16 threads. Node2 was never able to scale up.

Reads

Node 1

t4

Node1 is scaling the reads as expected also if here and there we can see performance deterioration.

Node 2

t7

Node2 is not scaling Reads at all. 

Writes

Node 1

t5

Same as Read

Node 2

t8

Same as read

Now someone may think I was making a mistake and I was writing on the same schema. I assure you I was not.

Check the next test to see what happened if using the same schema.  

Write on both nodes same schema

Overview

Now, now Marco, this is unfair. You know this will cause contention.

Yes I do! But nonetheless I was curious to see what was going to happen and how the platform would deal with that level of contention. 
My expectations were to have a lot of performance degradation and increased number of locks. About conflict I was not wrong, node2 after the test reported:

+-------------+---------+-------------------------+
| table       | index   | PHYSICAL_CONFLICTS_HIST |
+-------------+---------+-------------------------+
| district9   | PRIMARY |                    3450 |
| district6   | PRIMARY |                    3361 |
| district2   | PRIMARY |                    3356 |
| district8   | PRIMARY |                    3271 |
| district4   | PRIMARY |                    3237 |
| district10  | PRIMARY |                    3237 |
| district7   | PRIMARY |                    3237 |
| district3   | PRIMARY |                    3217 |
| district5   | PRIMARY |                    3156 |
| district1   | PRIMARY |                    3072 |
| warehouse2  | PRIMARY |                    1867 |
| warehouse10 | PRIMARY |                    1850 |
| warehouse6  | PRIMARY |                    1808 |
| warehouse5  | PRIMARY |                    1781 |
| warehouse3  | PRIMARY |                    1773 |
| warehouse9  | PRIMARY |                    1769 |
| warehouse4  | PRIMARY |                    1745 |
| warehouse7  | PRIMARY |                    1736 |
| warehouse1  | PRIMARY |                    1735 |
| warehouse8  | PRIMARY |                    1635 |
+-------------+---------+-------------------------+

Which is obviously a strong indication something was not working right. In terms of performance gain, if we compare ONLY the result with the 128 Threads : Picture 4

Also with the high level of conflict we still have 12% of performance gain.

The problem is that in general we have the two nodes behave quite badly.
If you check the graph below you can see that the level of conflict is such to prevent the nodes not only to scale but to act consistently.

Node 1

Picture 7

Node 2

Picture 8

Reads

In the following graphs we can see how node1 had issues and it actually crashed 3 times, during tests with 32/64/512 treads.
Node2 was always up but the performances were very low. 

Node 1

t10

Node 2

t13

Writes

Node 1

t11

Node 2

t14

Recovery from crashed Node

About recovery time reading the AWS documentation and listening to presentations, I often heard that Aurora Multi Primary is a 0 downtime solution.
Or other statements like: “
in applications where you can't afford even brief downtime for database write operations, a multi-master cluster can help to avoid an outage when a writer instance becomes unavailable. The multi-master cluster doesn't use the failover mechanism, because it doesn't need to promote another DB instance to have read/write capability”

To achieve this the suggestion I found, was to have applications pointing directly to the Nodes endpoint and not use the Cluster endpoint.
In this context the solution pointing to the Nodes should be able to failover within a seconds or so, while the cluster endpoint:

fail over times using mariadb driver

Personally I think that designing an architecture where the application is responsible for the connection to the database and failover is some kind of refuse from 2001. But if you feel this is the way, well go for it.

What I did for testing is to use ProxySQL, as plain as possible, with nothing else then the basic monitor coming from the native monitor.

I then compare the results with the tests using the Cluster endpoint.
In this way I adopt the advice of pointing directly at the nodes, but I was doing things in our time.  

The results are below and they confirm (more or less) the data coming from Amazon.

a10

A downtime of 7 seconds is quite a long time nowadays, especially if I am targeting the 5 nines solution that I want to remember is 864 ms downtime per day.

Using ProxySQL is going closer to that, still too long to be called 0 (zero) downtime.

I also have fail-back issues when using the AWS cluster endpoint.

Given it was not able to move the connection to the joining node seamlessly. 

Last but not least when using the consistency level INSTANCE_RAW, I had some data issue as well as PK conflict:
FATAL: mysql_drv_query() returned error 1062 (Duplicate entry '18828082' for key 'PRIMARY')   

Conclusions

As state the beginning of this long blog the reasons expectations to go for a multi Primary solution were:

  • Scale writes (more nodes more writes)
  • Gives me 0 (zero) downtime, or close to that (5 nines is a maximum downtime of 864 milliseconds per day!!)
  • Allow me to shift the writer pointer at any time from A to B and vice versa, consistently.   

Honestly I feel we have completely failed the scaling point.

Facepalm Jesus

Probably if I use the largest Aurora I will get much better absolute numbers, and it will take me more to encounter the same issues, but I will.

In any case if the Multi muster solution is designed to provide that scalability, it should do that with any version.

I did not have zero downtime, but I was able to failover pretty quickly with ProxySQL.

Finally, unless the consistency model is REGIONAL_RAW, shifting from one node to the other is not prone to possible negative effects like stale reads.

Because that I consider this requirement not satisfied in full. 

Given all the above, I think this solution could eventually be valid only for High Availability (close to be 5 nines), but given it comes with some limitations I do not feel comfortable in preferring it over others just for HA, at the end default Aurora is already good enough as a High available solution. 

references

https://www.youtube.com/watch?v=p0C0jakzYuc

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-multi-master.html

https://www.slideshare.net/marcotusa/improving-enterprises-ha-and-disaster-recovery-solutions-reviewed

https://www.slideshare.net/marcotusa/robust-ha-solutions-with-proxysql

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-multi-master.html#aurora-multi-master-limitations  

mysql community performance cloud Aurora scaling high availability

More Articles ...

  1. MySQL Asynchronous SOURCE auto failover
  2. Using SKIP LOCK in MySQL For Queue Processing
  3. Deadlocks are our Friends
  4. Achieving Consistent Read and High Availability with Percona XtraDB Cluster 8.0 (Part 2)
  5. Achieving Consistent Read and High Availability with Percona XtraDB Cluster 8.0 (Part 1)
  6. Sysbench and the Random Distribution effect
  7. #StopTRUMP
  8. Dirty reads in High Availability solution
  9. My take on: Percona Live Europe and ProxySQL Technology Day
  10. Another time, another place… about next conferences
  11. A small thing that can save a lot of mess, meet: SET PERSIST
  12. My first impression on Mariadb 10.4.x with Galera4
  13. Reasoning around the recent conferences in 2019
  14. ProxySQL Native Support for Percona XtraDB Cluster (PXC)
  15. How Not to do MySQL High Availability: Geographic Node Distribution with Galera-Based Replication Misuse
  16. MySQL High Availability On-Premises: A Geographically Distributed Scenario
  17. MySQL 8: Load Fine Tuning With Resource Groups
  18. PXC loves firewalls (and System Admins loves iptables)
  19. No orange pants this year
  20. Leveraging ProxySQL with AWS Aurora to Improve Performance
  21. How to Implement ProxySQL with AWS Aurora
  22. ProxySQL server version impersonation
  23. ProxySQL Firewalling
  24. ProxySQL PXC Single Writer Mode and auto failover, re-bootstrap
  25. How ProxySQL deal with schema (and schemaname)
  26. How ProxySQL deal with schema (and schemaname) Long story
  27. Sweet and sour can become bitter
  28. Group-Replication, sweet & sour
  29. ProxySQL and Mirroring what about it?
  30. InnoDB Page Merging and Page Splitting
  31. Setup ProxySQL as High Available (and not a SPOF)
  32. ProxySQL – Percona Cluster (Galera) integration
  33. ProxySQL and MHA integration
Page 11 of 22
  • Start
  • Prev
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • Next
  • End

Related Articles

  • The Jerry Maguire effect combines with John Lennon “Imagine”…
  • The horizon line
  • La storia dei figli del mare
  • A dream on MySQL parallel replication
  • Binary log and Transaction cache in MySQL 5.1 & 5.5
  • How to recover for deleted binlogs
  • How to Reset root password in MySQL
  • How and why tmp_table_size and max_heap_table_size are bounded.
  • How to insert information on Access denied on the MySQL error log
  • How to set up the MySQL Replication

Latest conferences

We have 4963 guests and no members online

login

Remember Me
  • Forgot your username?
  • Forgot your password?
Bootstrap is a front-end framework of Twitter, Inc. Code licensed under MIT License. Font Awesome font licensed under SIL OFL 1.1.