Who is drop-in replacement of

From MySQL to Percona or MariaDB ...and back

As we all know MariaDB starts as a fork of MySQL and then slowly diverges until becoming a different product.

Still I often catch at conferences that MariaDB is a drop-in replacement for MySQL (https://en.wikipedia.org/wiki/Drop-in_replacement). This for me is a bold statement given drop-in-replacement: “It refers to the ability to replace one hardware (or software) component with another one without any other code or configuration changes being required and resulting in no negative impacts”. Which by inheritance also means I can go backwards.

In short if MariaDB is a real drop-in replacement, we should be able to replace the MySQL binaries with the ones coming from MariaDB and then roll back without any issue.

This short article is the result of my notorious disbelieving about any kind of bold marketing statements.

To be clear I am NOT going to compare the functionalities of the different products, I just want to see if I can replace one with the other.

The tests

What and how

For the tests I will use the latest version of:

MySQL 5.7
Percona Server 5.7
MariaDb 10.3

And for the newest:

MySQL 8.0.22 (8.0.23 is out but PS is not yet, so we cannot compare)
Percona Server 8.0.22
MariaDB 10.5

I will have a source pointing to a symbolic link named /opt/mysql_templates/magic then I will replace the target of the link pointing to the different binaries.

What I will do is simple:

Point to MySQL
Create a new instance
Create the world schema and load data (using Innodb)
Select count(*) from world.City;
SET GLOBAL innodb_fast_shutdown=0;
Stop instance
Point link to Percona
Start instance
Select count(*) from world.City;
Drop world
Repeat all steps from #3 and point to MariaDB
Once the MariaDB test is done, point to MySQL and repeat.

Will run the same tests for the 5.7/10.3 series and for the 8.0.22/10.5.

The expectation to be drop-in-replacement is to be able to move from MySQL to Percona Server to MariaDB and back to MySQL. Anything diverging will prove we are NOT dealing with drop-in.

Version 5.7/10.3

MySQL

/opt/mysql_templates/magic/bin/mysql  Ver 14.14 Distrib 5.7.33, for linux-glibc2.12 (x86_64) using  EditLine wrapper
Connection id:		4
Server version:		5.7.33-log MySQL Community Server (GPL)
UNIX socket:		/opt/mysql_instances/magic/mysql.sock
Uptime:			4 sec

+----------+
| count(*) |
+----------+
|     4079 |
+----------+
1 row in set (0.01 sec)

...

/opt/mysql_templates/magic/bin/mysql  Ver 14.14 Distrib 5.7.33, for linux-glibc2.12 (x86_64) using  EditLine wrapper
Connection id:		4
Server version:		5.7.33-log MySQL Community Server (GPL)
UNIX socket:		/opt/mysql_instances/magic/mysql.sock
Uptime:			4 sec

+----------+
| count(*) |
+----------+
|     4079 |
+----------+
1 row in set (0.01 sec)

Shift to Percona:

/opt/mysql_templates/magic/bin/mysql  Ver 14.14 Distrib 5.7.33-36, for Linux (x86_64) using  6.0

Connection id:		8
Server version:		5.7.33-36-log Percona Server (GPL), Release 36, Revision 7e403c5
UNIX socket:		/opt/mysql_instances/magic/mysql.sock
Uptime:			5 min 38 sec

+----------+
| count(*) |
+----------+
|     4079 |
+----------+
1 row in set (0.00 sec)

...

/opt/mysql_templates/magic/bin/mysql  Ver 14.14 Distrib 5.7.33-36, for Linux (x86_64) using  6.0

Connection id:		8
Server version:		5.7.33-36-log Percona Server (GPL), Release 36, Revision 7e403c5
UNIX socket:		/opt/mysql_instances/magic/mysql.sock
Uptime:			5 min 38 sec

+----------+
| count(*) |
+----------+
|     4079 |
+----------+
1 row in set (0.00 sec)

No errors at all in the log

Let us go backwards to MySQL

(root@localhost) [world]>\s
--------------
/opt/mysql_templates/magic/bin/mysql  Ver 14.14 Distrib 5.7.33, for linux-glibc2.12 (x86_64) using  EditLine wrapper

Connection id:		7
Server version:		5.7.33-log MySQL Community Server (GPL)
UNIX socket:		/opt/mysql_instances/magic/mysql.sock
Uptime:			5 min 33 sec

Database changed
+----------+
| count(*) |
+----------+
|     4079 |
+----------+
1 row in set (0.00 sec)

...

(root@localhost) [world]>\s
--------------
/opt/mysql_templates/magic/bin/mysql  Ver 14.14 Distrib 5.7.33, for linux-glibc2.12 (x86_64) using  EditLine wrapper

Connection id:		7
Server version:		5.7.33-log MySQL Community Server (GPL)
UNIX socket:		/opt/mysql_instances/magic/mysql.sock
Uptime:			5 min 33 sec

Database changed
+----------+
| count(*) |
+----------+
|     4079 |
+----------+
1 row in set (0.00 sec)

No issue at all

Let us try to move to MariaDB 10.3:

So at first attempt it failed, and had to modify my.cnf removing GTID config as Performance schema (see reference section about that).

Once done:

2021-03-07 10:59:22 0 [ERROR] Missing system table mysql.roles_mapping; please run mysql_upgrade to create it
2021-03-07 10:59:22 0 [ERROR] Incorrect definition of table mysql.event: expected column 'sql_mode' at position 14 to have type set(...), found type set(...)
2021-03-07 10:59:22 0 [ERROR] mysqld: Event Scheduler: An error occurred when initializing system tables. Disabling the Event Scheduler.

...

2021-03-07 10:59:22 0 [ERROR] Missing system table mysql.roles_mapping; please run mysql_upgrade to create it
2021-03-07 10:59:22 0 [ERROR] Incorrect definition of table mysql.event: expected column 'sql_mode' at position 14 to have type set(...), found type set(...)
2021-03-07 10:59:22 0 [ERROR] mysqld: Event Scheduler: An error occurred when initializing system tables. Disabling the Event Scheduler.

Working but needing to run innodb_update, once done, I had a lot of errors related to the Performance schema, and still InnoDB issues:

/opt/mysql_templates/magic/bin/mysql  Ver 15.1 Distrib 10.3.28-MariaDB, for Linux (x86_64) using readline 5.1

Connection id:		76
Server:			MariaDB
Server version:		10.3.28-MariaDB-log MariaDB Server
UNIX socket:		/opt/mysql_instances/magic/mysql.sock
Uptime:			1 min 17 sec

(root@localhost) [(none)]>select count(*) from world.City;
+----------+
| count(*) |
+----------+
|     4079 |
+----------+
1 row in set (0.003 sec)

...

/opt/mysql_templates/magic/bin/mysql  Ver 15.1 Distrib 10.3.28-MariaDB, for Linux (x86_64) using readline 5.1

Connection id:		76
Server:			MariaDB
Server version:		10.3.28-MariaDB-log MariaDB Server
UNIX socket:		/opt/mysql_instances/magic/mysql.sock
Uptime:			1 min 17 sec

(root@localhost) [(none)]>select count(*) from world.City;
+----------+
| count(*) |
+----------+
|     4079 |
+----------+
1 row in set (0.003 sec)


2021-03-07 11:03:37 229 [ERROR] InnoDB: Column last_update in table `mysql`.`innodb_table_stats` is BINARY(4) NOT NULL but should be INT UNSIGNED NOT NULL (flags mismatch).
2021-03-07 11:03:37 229 [ERROR] InnoDB: Fetch of persistent statistics requested for table `mysql`.`gtid_executed` but the required system tables mysql.innodb_table_stats and mysql.innodb_index_stats are not present or have unexpected structure. Using transient stats instead.
2021-03-07 11:03:38 232 [ERROR] Column count of mysql.proc is wrong. Expected 21, found 20. Created with MariaDB 50733, now running 100328. Please use mysql_upgrade to fix this error
2021-03-07 11:03:38 232 [Note] View `sys`.`x$statements_with_errors_or_warnings`: the version is set to 100328, algorithm restored to be MERGE
2021-03-07 11:03:38 233 [ERROR] InnoDB: Column last_update in table `mysql`.`innodb_table_stats` is BINARY(4) NOT NULL but should be INT UNSIGNED NOT NULL (flags mismatch).
2021-03-07 11:03:38 233 [ERROR] InnoDB: Column last_update in table `mysql`.`innodb_table_stats` is BINARY(4) NOT NULL but should be INT UNSIGNED NOT NULL (flags mismatch).
2021-03-07 11:03:38 233 [ERROR] InnoDB: Column last_update in table `mysql`.`innodb_table_stats` is BINARY(4) NOT NULL but should be INT UNSIGNED NOT NULL (flags mismatch).

...


2021-03-07 11:03:37 229 [ERROR] InnoDB: Column last_update in table `mysql`.`innodb_table_stats` is BINARY(4) NOT NULL but should be INT UNSIGNED NOT NULL (flags mismatch).
2021-03-07 11:03:37 229 [ERROR] InnoDB: Fetch of persistent statistics requested for table `mysql`.`gtid_executed` but the required system tables mysql.innodb_table_stats and mysql.innodb_index_stats are not present or have unexpected structure. Using transient stats instead.
2021-03-07 11:03:38 232 [ERROR] Column count of mysql.proc is wrong. Expected 21, found 20. Created with MariaDB 50733, now running 100328. Please use mysql_upgrade to fix this error
2021-03-07 11:03:38 232 [Note] View `sys`.`x$statements_with_errors_or_warnings`: the version is set to 100328, algorithm restored to be MERGE
2021-03-07 11:03:38 233 [ERROR] InnoDB: Column last_update in table `mysql`.`innodb_table_stats` is BINARY(4) NOT NULL but should be INT UNSIGNED NOT NULL (flags mismatch).
2021-03-07 11:03:38 233 [ERROR] InnoDB: Column last_update in table `mysql`.`innodb_table_stats` is BINARY(4) NOT NULL but should be INT UNSIGNED NOT NULL (flags mismatch).
2021-03-07 11:03:38 233 [ERROR] InnoDB: Column last_update in table `mysql`.`innodb_table_stats` is BINARY(4) NOT NULL but should be INT UNSIGNED NOT NULL (flags mismatch).

Let us go back to MySQL now:

2021-03-07T16:27:01.517996Z 0 [ERROR] Native table 'performance_schema'.'session_variables' has the wrong structure
2021-03-07T16:27:01.518307Z 0 [ERROR] Incorrect definition of table mysql.db: expected column 'User' at position 2 to have type char(32), found type char(80).
2021-03-07T16:27:01.518428Z 0 [ERROR] mysql.user has no `Event_priv` column at position 28
2021-03-07T16:27:01.518697Z 0 [ERROR] Incorrect definition of table mysql.event: expected column 'sql_mode' at position 14 to have type set(...), found type set(...)
2021-03-07T16:27:01.518829Z 0 [ERROR] Event Scheduler: An error occurred when initializing system tables. Disabling the Event Scheduler.
2021-03-07T16:27:01.520065Z 0 [Note] /opt/mysql_templates/magic/bin/mysqld: ready for connections.
Version: '5.7.33-log'  socket: '/opt/mysql_instances/magic/mysql.sock'  port: 3306  MySQL Community Server (GPL)

...

2021-03-07T16:27:01.517996Z 0 [ERROR] Native table 'performance_schema'.'session_variables' has the wrong structure
2021-03-07T16:27:01.518307Z 0 [ERROR] Incorrect definition of table mysql.db: expected column 'User' at position 2 to have type char(32), found type char(80).
2021-03-07T16:27:01.518428Z 0 [ERROR] mysql.user has no `Event_priv` column at position 28
2021-03-07T16:27:01.518697Z 0 [ERROR] Incorrect definition of table mysql.event: expected column 'sql_mode' at position 14 to have type set(...), found type set(...)
2021-03-07T16:27:01.518829Z 0 [ERROR] Event Scheduler: An error occurred when initializing system tables. Disabling the Event Scheduler.
2021-03-07T16:27:01.520065Z 0 [Note] /opt/mysql_templates/magic/bin/mysqld: ready for connections.
Version: '5.7.33-log'  socket: '/opt/mysql_instances/magic/mysql.sock'  port: 3306  MySQL Community Server (GPL)

Try to run mysql_upgrade

mysql@gr1 magic]$ /opt/mysql_templates/magic/bin/mysql_upgrade --defaults-file=./my.cnf -uroot --force
mysql_upgrade: Got error: 1524: Plugin '0' is not loaded while connecting to the MySQL server
Upgrade process encountered error and will not continue.

...

mysql@gr1 magic]$ /opt/mysql_templates/magic/bin/mysql_upgrade --defaults-file=./my.cnf -uroot --force
mysql_upgrade: Got error: 1524: Plugin '0' is not loaded while connecting to the MySQL server
Upgrade process encountered error and will not continue.

Only way to access the data at this point is to use skip-grant-tables. But that is unsustainable and in any case I continue to get errors in the log.

Summarizing

Once I have an instance built with MySQL 5.7, I can easily shift to Percona Server 5.7, and eventually go back. I can migrate to MariaDB, but this implies configuration changes and must run mysql_update to modify the core system tables. Finally cannot rollback to MySQL, modifications done by MariaDB are not allowing it.

Version 8/10.5

MySQL

/opt/mysql_templates/magic8/bin/mysql  Ver 8.0.22 for Linux on x86_64 (MySQL Community Server - GPL)

Connection id:		19
Server version:		8.0.22 MySQL Community Server - GPL
UNIX socket:		/opt/mysql_instances/magic8/mysql.sock
Binary data as:		Hexadecimal
Uptime:			22 sec

(root@localhost) [world]>select count(*) from world.City;
+----------+
| count(*) |
+----------+
|     4079 |
+----------+
1 row in set (0.01 sec)

...

/opt/mysql_templates/magic8/bin/mysql  Ver 8.0.22 for Linux on x86_64 (MySQL Community Server - GPL)

Connection id:		19
Server version:		8.0.22 MySQL Community Server - GPL
UNIX socket:		/opt/mysql_instances/magic8/mysql.sock
Binary data as:		Hexadecimal
Uptime:			22 sec

(root@localhost) [world]>select count(*) from world.City;
+----------+
| count(*) |
+----------+
|     4079 |
+----------+
1 row in set (0.01 sec)

Move to Percona Server

/opt/mysql_templates/magic8/bin/mysql  Ver 8.0.22-13 for Linux on x86_64 (Percona Server (GPL), Release 13, Revision 6f7822f)

Connection id:		23
Server version:		8.0.22-13 Percona Server (GPL), Release 13, Revision 6f7822f
UNIX socket:		/opt/mysql_instances/magic8/mysql.sock
Binary data as:		Hexadecimal
Uptime:			10 min 45 sec

+----------+
| count(*) |
+----------+
|     4079 |
+----------+
1 row in set (0.00 sec)

...

/opt/mysql_templates/magic8/bin/mysql  Ver 8.0.22-13 for Linux on x86_64 (Percona Server (GPL), Release 13, Revision 6f7822f)

Connection id:		23
Server version:		8.0.22-13 Percona Server (GPL), Release 13, Revision 6f7822f
UNIX socket:		/opt/mysql_instances/magic8/mysql.sock
Binary data as:		Hexadecimal
Uptime:			10 min 45 sec

+----------+
| count(*) |
+----------+
|     4079 |
+----------+
1 row in set (0.00 sec)

No error of any type and no changes as for 5.7

Let us rollback to MySQL

/opt/mysql_templates/magic8/bin/mysql  Ver 8.0.22 for Linux on x86_64 (MySQL Community Server - GPL)

Connection id:		18
Server version:		8.0.22 MySQL Community Server - GPL
UNIX socket:		/opt/mysql_instances/magic8/mysql.sock
Binary data as:		Hexadecimal
Uptime:			19 sec

+----------+
| count(*) |
+----------+
|     4079 |
+----------+
1 row in set (0.00 sec)

...

/opt/mysql_templates/magic8/bin/mysql  Ver 8.0.22 for Linux on x86_64 (MySQL Community Server - GPL)

Connection id:		18
Server version:		8.0.22 MySQL Community Server - GPL
UNIX socket:		/opt/mysql_instances/magic8/mysql.sock
Binary data as:		Hexadecimal
Uptime:			19 sec

+----------+
| count(*) |
+----------+
|     4079 |
+----------+
1 row in set (0.00 sec)

Perfect and no issue.

Let us now try MariaDB

210308 09:13:34 mysqld_safe Starting mariadbd daemon with databases from /opt/mysql_instances/magic8/data
2021-03-08  9:13:35 0 [ERROR] /opt/mysql_templates/magic8/bin/mariadbd: Error while setting value 'index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on' to 'optimizer_switch'
210308 09:13:35 mysqld_safe mysqld from pid file /opt/mysql_instances/magic8/mysqld.pid ended

...

210308 09:13:34 mysqld_safe Starting mariadbd daemon with databases from /opt/mysql_instances/magic8/data
2021-03-08  9:13:35 0 [ERROR] /opt/mysql_templates/magic8/bin/mariadbd: Error while setting value 'index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on' to 'optimizer_switch'
210308 09:13:35 mysqld_safe mysqld from pid file /opt/mysql_instances/magic8/mysqld.pid ended

Ok let see if I remove the optimizer settings:

2021-03-08  9:14:53 0 [ERROR] InnoDB: Invalid flags 0x4800 in /opt/mysql_instances/magic8/data/ibdata1
2021-03-08  9:14:53 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption
2021-03-08  9:14:53 0 [Note] InnoDB: Starting shutdown...
2021-03-08  9:14:53 0 [ERROR] Plugin 'InnoDB' init function returned error.
2021-03-08  9:14:53 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2021-03-08  9:14:53 0 [Note] Plugin 'FEEDBACK' is disabled.
2021-03-08  9:14:53 0 [ERROR] Could not open mysql.plugin table: "Table 'mysql.plugin' doesn't exist". Some plugins may be not loaded
2021-03-08  9:14:53 0 [ERROR] /opt/mysql_templates/magic8/bin/mariadbd: unknown variable 'auto-generate-certs=TRUE'
2021-03-08  9:14:53 0 [ERROR] Aborting
210308 09:14:53 mysqld_safe mysqld from pid file /opt/mysql_instances/magic8/mysqld.pid ended

...

2021-03-08  9:14:53 0 [ERROR] InnoDB: Invalid flags 0x4800 in /opt/mysql_instances/magic8/data/ibdata1
2021-03-08  9:14:53 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption
2021-03-08  9:14:53 0 [Note] InnoDB: Starting shutdown...
2021-03-08  9:14:53 0 [ERROR] Plugin 'InnoDB' init function returned error.
2021-03-08  9:14:53 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2021-03-08  9:14:53 0 [Note] Plugin 'FEEDBACK' is disabled.
2021-03-08  9:14:53 0 [ERROR] Could not open mysql.plugin table: "Table 'mysql.plugin' doesn't exist". Some plugins may be not loaded
2021-03-08  9:14:53 0 [ERROR] /opt/mysql_templates/magic8/bin/mariadbd: unknown variable 'auto-generate-certs=TRUE'
2021-03-08  9:14:53 0 [ERROR] Aborting
210308 09:14:53 mysqld_safe mysqld from pid file /opt/mysql_instances/magic8/mysqld.pid ended

No way to have it working unless a logical dump

Summarizing

As for MySQL 5.7 once I have the instance built with MySQL 8.0.22 I can move to Percona 8.0.22 and roll back with no issues. Moving to MariaDB is not possible at all unless a full LOGICAL dump, which I really want to see how long it could take when you have TeraBytes of data.

Another weirdo

While I was working with MariaDB I also review the documentation and my eyes were capture by this (https://mariadb.com/docs/reference/es/system-variables/innodb_purge_threads/)

“WoW”, I thought, “DYNAMIC settings for innodb_purge_threads this is cool”. Let me try it.

/opt/mysql_templates/mariadb-10.5.9-linux-systemd-x86_64/bin/mysql  Ver 15.1 Distrib 10.5.9-MariaDB, for Linux (x86_64) using readline 5.1

Connection id:		7
Server:			MariaDB
Server version:		10.5.9-MariaDB MariaDB Server
UNIX socket:		/opt/mysql_instances/maria105/mysql.sock
Uptime:			8 min 14 sec


(root@localhost:pm) [(none)]>set global innodb_purge_threads=10;
ERROR 1238 (HY000): Variable 'innodb_purge_threads' is a read only variable
(root@localhost:pm) [(none)]>

...

/opt/mysql_templates/mariadb-10.5.9-linux-systemd-x86_64/bin/mysql  Ver 15.1 Distrib 10.5.9-MariaDB, for Linux (x86_64) using readline 5.1

Connection id:		7
Server:			MariaDB
Server version:		10.5.9-MariaDB MariaDB Server
UNIX socket:		/opt/mysql_instances/maria105/mysql.sock
Uptime:			8 min 14 sec


(root@localhost:pm) [(none)]>set global innodb_purge_threads=10;
ERROR 1238 (HY000): Variable 'innodb_purge_threads' is a read only variable
(root@localhost:pm) [(none)]>

Whaaat??

Then I realized I was on the page of “Enterprise Documentation”. Checking with the community version https://mariadb.com/kb/en/innodb-system-variables/#innodb_purge_threads innodb_purge_threads is still NOT dynamic, and after checking even more I also found Innodb_log_file_size, same story.

maria slide

I was very badly impressed by this, for several reasons but want to mention the two most important:

MariaDB is declaring over and over to be the one truly open source and community oriented. But where is the support for the community here?
Is InnoDB not owned by Oracle? Isn't MariaDB using it because Oracle released it under the GPLv2? So why are they modifying the code and not returning it to the open source community? I am not an expert in legal things, but that sounds to me an infringement of the license.

Conclusions

Drop-in replacement has a very specific meaning, and it must be used with caution. It also brings several strings attached, one of these is that replacing binaries is not a one way only solution. Attaching the term limited to drop-in as in MariaDB documentation, it is not correct, it still evokes an inapplicable concept and can be seen as misleading advertising.

As we can see the only real drop-in replacement for MySQL is Percona Server, MariaDB is not even close to it, too many changes in the configuration file, and of course the need to go for a logical dump is not even remotely the right way to go.

In short MariaDB is obviously a different product, as already stated by many in different articles, that is diverging more and more.

The only reason for which MariaDB continues to play the drop-in game with MySQL, for me, is because they continue to use the traction MySQL has on the community and the market, to attract customers. They also absorb from the work done by Oracle and Percona but as shown with innodb_purge_threads/innodb_log_file_size, they do not hesitate to keep significant features only for Enterprise version, without sharing with the community.

As said many times, MariaDB has huge minds in development, starting from Monty down to any level of developer/DBA. We must recognize and respect that, as well we must recognize the great work they do at technical level.

I would love to see more collaboration, but I also understand the need to be something different to survive as a company.

What I cannot accept is when we have claims that are not real like the drop-in replacement (also if limited), or even worse the case of innodb_purge_threads/innodb_log_file_size.

That is wrong and as a strong advocate of open source and a lover of the MySQL/MariaDB community I feel I need to voice my concern.

Great MySQL to all

Reference

https://www.percona.com/resources/webinars/differences-between-mariadb%C2%AE-and-mysql%C2%AE

https://mariadb.com/docs/reference/es/system-variables/innodb_purge_threads/

https://mariadb.com/kb/en/innodb-system-variables/#innodb_purge_threads

https://mariadb.com/kb/en/mariadb-vs-mysql-compatibility/

Full read consistency within Percona Operator for MySQL

Overview

Percona operator for MySQL (POM) :(https://www.percona.com/doc/kubernetes-operator-for-pxc/index.html) it’s aim is a special type of controller introduced to simplify complex deployments. The Operator extends the Kubernetes API with custom resources.

The Percona Operator for MySQL solution is using Percona Xtradb Cluster behind the hood to provide a highly available, resilient and scalable MySQL service in the Kubernetes space.

This solution comes with all the advantages/disadvantages provided by Kubernetes, plus with some advantages of its own like the capacity to scale reads on the nodes that are not Primary.

Of course there are some limitations like the way PXC handle DDLs, which may impact the service, but there is always a cost to pay to get something, expecting to have all for free is unreasonable.

In this context we need to talk and cover what is full read consistency in this solution and why it is important to understand the role it plays.

Stale Reads

When using Kubernetes we should talk about the service and not about the technology/product used to deliver such service.

In our case the Percona operator is there to deliver a MySQL service. We should then see that as a whole as a single object. To be more clear what we must consider is NOT the fact we have a cluster behind the service but that we have a service that to be resilient and highly available use a cluster.

We should not care If a node/pod goes down unless the service is discontinued.

What we have as a plus in the Percona operator for MySQL solution is a certain level of READ scalability. This achieved optimising the use of the non PRIMARY nodes, and instead having them sitting there applying only replicated data, the Percona Operator provides access to them to scale the reads.

But… there is always a BUT

Let us start with an image:

Screen Shot 2019 10 13 at 32714 PM

(https://www.slideshare.net/lefred.descamps/galera-replication-demystified-how-does-it-work) from Fred Descamps)

By design the apply and commit finalize in Galera (PXC) may have (and has) a delay between nodes.

This means that, if using defaults, applications may have inconsistent reads if trying to access the data from different nodes than the Primary.

POM provides access using two different solutions:

Using HAProxy (default)
Using ProxySQL

When using HAProxy you will have 2 entry points:

cluster1-haproxy, which will point to the Primary ONLY, for reads and writes. This is the default entry point for the applications to the MySQL database.
cluster1-haproxy-replicas, which will point to all the 3 nodes and is supposed to be used for READS only. This is the PLUS you can use if your application has READ/WRITE separation.

Please note that at the moment there is nothing preventing application to use the cluster1-haproxy-replicas also for write, but that is dangerous and wrong because will generate a lot of certification conflicts and BF abort given it will distribute writes all over the cluster impacting on performance as well (and not giving you any write scaling):

[marcotusa@instance-1 ~]$ for i in `seq 1 100`; do mysql -h cluster1-haproxy-replicas -e "insert into test.iamwritingto values(null,@@hostname)";done
+----------------+-------------+
| host           | count(host) |
+----------------+-------------+
| cluster1-pxc-1 |          34 |
| cluster1-pxc-2 |          33 |
| cluster1-pxc-0 |          33 |
+----------------+-------------+

When using ProxySQL the entry point is a single one, but you may define query rules to automatically split the R/W requests coming from the application.

This is the preferred method when application has no way to separate the READS from the writes.

I have done a comparison of the two methods in POM here

Now, as mentioned above, by default PXC (any Galera base solution) comes with some relaxed settings, for performance purpose. This is normally fine in many standard cases, but if you use POM and use the PLUS of scaling reads using the second access point with HAproxy or Query Rules with Proxysql, you should NOT have stale reads, given the service must provide consistent data, as if you are acting on a single node.

To achieve that you can change the defaults and change the parameter in PXC wsrep_sync_wait.

When changing the parameter wsrep_sync_wait as explained in the documentation the node initiates a causality check, blocking incoming queries while it catches up with the cluster.

Once all data on the node receiving the READ request is commit_finalized, the node performs the read.

But this has a performance impact as said before.

What is the impact?

To test the performance impact I had used a cluster deployed in GKE, with this characteristics:

3 Main nodes n2-standard-8 (8 vCPUs, 32 GB memory)
1 App node n2-standard-8 (8 vCPUs, 32 GB memory)
PXC pods using:
- 25GB of the 32 available
- 6 CPU of the 8 available
HAProxy:
- 600m CPU
- 1GB RAM
PMM agent
- 500m CPU
- 500 MB Ram

On The application node I used sysbench running two instances, one in r/w mode the other only reads. Finally to test stale read I used the stale read test from my test suite ( https://github.com/Tusamarco/testsuite) .

Given I was looking for results with moderate load I just used 68/96/128 threads per sysbench instance.

Results

Marco, did we have or not stale reads? Yes we did:

stale reads moderate load

I had from 0 (with very light load) up to 37% stale reads with MODERATED load. Where moderated was the 128 threads sysbench running.

Setting wsrep_sync_wait=3 of course I had full consistency.
But I had performance loss:

performance loss reads

As you can see I had an average loss of 11% in case of READS

performance loss writes

While for writes the average loss was the 16%.

Conclusions

At this point we need to stop and think about what is worth doing. If my application is READs heavy and READs scaling, it is probably worth enabling the full synchronicity given scaling on the additional node allows me to have a 2x or more READs.

If instead my application is write critical, probably losing also ~16% performance is not good.

Finally if my application is stale reads tolerant, I will just go with the defaults and get all the benefits without penalties.

Also keep in mind that POM is designed to offer a MySQL service so the state of the single node is not as critical as if you are using a default PXC installation, PODs are by nature ephemeral objects while service is resilient.

References

https://www.percona.com/doc/kubernetes-operator-for-pxc/index.html

https://github.com/Tusamarco/testsuite

https://en.wikipedia.org/wiki/Isolation_(database_systems)#Dirty_reads

https://galeracluster.com/library/documentation/mysql-wsrep-options.html#wsrep-sync-wait

https://www.slideshare.net/lefred.descamps/galera-replication-demystified-how-does-it-work

Percona Operator for MySQL (HAProxy or ProxySQL?)

Overview

Percona Operator for MySQL (POM) comes with two different proxies, HAProxy and ProxySQL. While the initial version was based on ProxySQL, in time Percona opted to set HAProxy as the default Proxy for the operator, this without removing ProxySQL.

While one of the main points was to guarantee users to have a 1:1 compatibility with vanilla MySQL in the way the operator allows connections. There are also other factors that are involved in the decision to have two proxies. In this article I will scratch the surface of this why.

Operator assumptions

When working with the Percona operator for MySQL, there are few things to keep in mind:

Each deployment has to be seen as a single MySQL service as if a single MySQL instance
The technology used to provide the service may change in time
Pod resiliency is not guaranteed, Service resiliency is.
Resources to be allocated are not automatically calculated and must be identified at the moment of the deployment
In Production you cannot set more than 5 or less than 3 nodes when using PXC

There are two very important points in the list above.

The first one is that what you get IS NOT a PXC cluster, but a MySQL service. The fact that Percona at the moment uses PXC to cover the service is purely accidental and we may decide to change it anytime.

The other point is that Service is resilient the pod is not. In short you should expect to see pods stopping to work and being re-created. What should NOT happen is that service goes down. Trying to debug each minor issue per node/pod is not what is expected when you use kubernetes.

Given the above, review your expectations… and let us go ahead.

The plus in the game (read scaling)

As said, what is offered with POM is a mysql service. Percona has added a proxy on top of the nodes/pods that help the service to respect the resiliency service expectations. There are two possible deployments:

HAProxy
ProxySQL

Both allow to optimise one aspect of POM, which is read scaling.
Infact what we were thinking was, given we must use a (virtually synchronous) cluster, why not take advantage of that and allow reads to scale on the other nodes when available?

This approach will help all the ones using POM to have the standard MySQL service but with a plus.

But, with it also come some possible issues like READ/WRITE splitting and stale reads. About stale reads see this article on how to deal with it (https://docs.google.com/document/d/1NyqcEKxfhgD1tDRPbiY0bENdp953DzkBfAoMi22u3J4/edit)

For R/W splitting we instead have a totally different approach in respect to what kind of proxy we implement.

If using HAProxy, we offer a second entry point that can be used for READ operation. That entrypoint will balance the load on all the nodes available.

If instead ProxySQL is in use it is possible to implement automatic R/W splitting.

Global difference and comparison

At this point it is useful to have a better understanding of the functional difference between the two proxies and what is the performance difference if any.

As we know HAProxy acts as a level 4 proxy when operating in TCP mode, it also is a forward-proxy, which means each TCP connection is established with the client with the final target and there is no interpretation of the data-flow.

ProxySQL on the other hand is a level 7 proxy and is a reverse-proxy, this means the client establishes a connection to the proxy who presents itself as the final backend. Data can be altered on the fly when it is in transit.

To be honest, it is more complicated than that but allows me the simplification.

On top of that there are additional functionalities that are present in one (ProxySQL) and not in the other. The point is if they are relevant for the use in this context or not. For a short list see below (source is from ProxySQL blog but data was removed)

proxySQL HAProxy feature comparison

As you may have noticed HAProxy is lacking some of that functionalities, like R/W split, firewalling and caching, proper of the level 7 implemented in ProxySQL.

The test environment

To test the performance impact I had used a cluster deployed in GKE, with this characteristics:

3 Main nodes n2-standard-8 (8 vCPUs, 32 GB memory)
1 App node n2-standard-8 (8 vCPUs, 32 GB memory)
PXC pods using:
- 25GB of the 32 available
- 6 CPU of the 8 available
HAProxy:
- 600m CPU
- 1GB RAM
PMM agent
- 500m CPU
- 500 MB Ram
Tests using sysbench as for (https://github.com/Tusamarco/sysbench)

What I have done is to run several tests running two Sysbench instances. One only executing reads, while the other reads and writes.

In the case of ProxySQL I had R/W splitting thanks to the Query rules, so both sysbench instances were pointing to the same address. While testing HAProxy I was using two entry points:

Cluster1-haproxy – for read and write
Cluster1-haproxy-replicas – for read only

Then I also compare what happens if all requests hit one node only. For that I execute one Sysbench in R/W mode against one entry point, and NO R/W split for ProxySQL.

Finally sysbench tests were executed with the –reconnect option to force the tests to establish new connections.

As usual tests were executed multiple times, on different days of the week and moment of the day. Data reported is a consolidation of that, and images from PMM are samples coming from the execution that was closest to the average values.

Comparing performance when scaling Reads

These tests imply that one node is mainly serving writes while the others are serving reads. To not affect performance and given I was not interested in maintaining full read consistency the parameter wsrep_sync_wait was kept as default (0).

events 3node

operation 3node

A first observation shows how ProxySQL seems to keep a more stable level of request served. The increasing load penalises HAProxy reducing if ⅓ the number of operations at 1024 threads.

writes 3node

reads 3node

Digging a bit more we can see that HAProxy is performing much better than ProxySQL for the WRITE operation. The number of writes remains almost steady with minimal fluctuations. ProxySQL on the other hand is performing great when load in write is low, then performance drops by 50%.

For reads we have the opposite. ProxySQL is able to scale in a very efficient way, distributing the load across the nodes and able to maintain the level of service despite the load increase.

If we start to take a look to the latency distribution statistics (sysbench histogram information), we can see that:

latency68 3node HAproxy w

latency68 3node proxy w

In case of low load and writes both proxies stay on the left side of the graph with low value in ms. HAProxy is a bit more consistent and grouped around 55ms value, while ProxySQL is a bit more sparse and spans between 190-293ms.

latency68 3node HAproxy r

latency68 3node proxy r

About reads we have a similar behaviour, both for the large majority between 28-70ms.

We have a different picture when load increases:

latency1024 3node HAproxy w

latency1024 3node proxy w

ProxySQL is having some occurrences where it performs better, but it spans in a very large range, from ~2k ms to ~29k ms. While HAProxy is substantially grouped around 10-11K ms.
As a result, in this context, HAProxy is able to better serve writes under heavy load than ProxySQL.

Again different picture in case of reads.

latency1024 3node HAproxy r

latency1024 3node proxy r

Here ProxySQL is still spanning on a wide range ~76ms – 1500ms, while HAProxy is more consistent but less efficient, grouping around 1200ms the majority of the service. This is consistent with the performance loss we have seen in READ when using high load and HAProxy.

Comparing when using only one node

But let us now discover what happens when using only one node. So using the service as it should be, without the possible Plus of read scaling.

writes 1node

reads 1node

The first thing I want to mention is a strange behaviour that was consistently happening (no matter what proxy used) at 128 threads. I am investigating it but I do not have a good answer yet on why Operator solution with PXC, was having that significant drop in performance ONLY with 128 threads.

Aside that, the results were consistently showing HAProxy performing better in serving read/writes. Keep in mind that HAProxy just establishes the connection point-to-point and is not doing anything else. While ProxySQL is designed to eventually act on the incoming stream of data.

This becomes even more evident when reviewing the latency distribution.
In this case no matter what load we have, HAProxy performs better:

latency68 1node HAproxy rw

latency68 1node proxy rw

latency2048 1node HAproxy rw

latency2048 1node proxy rw

As you can notice, HAProxy is less grouped than when we have 2 entry points, but it is still able to serve more efficiently than ProxySQL.

Conclusions

As usual my advice is to use the right tool for the job, and do not force yourself in something stupid.
As clearly stated at the beginning POM is designed to provide a mysql SERVICE, not a PXC cluster and all the configuration and utilisation should converge on that.
ProxySQL can help you IF you want to scale a bit more on READS using the possible plus. But this is not guaranteed to work as it works when using standard PXC
Not only you need to have a very good understanding of Kubernetes and ProxySQL if you want to avoid issues.
With HAProxy you can scale reads as well, but you need to be sure you have R/W separation at application level.

In any case utilising HAProxy for the service is the easier way to go.
This is one of the reasons why Percona decided to shift to HAProxy.
HAProxy is the solution that offers the proxy service more in line with the aim of the kubernetes service concept.
It is also the solution that remains closer on how a simple MySQL service should behave.

You need to set your expectations correctly to avoid being in trouble later.

References

https://www.percona.com/doc/kubernetes-operator-for-pxc/index.html

Sidebar

Main Menu Mobile

Who is drop-in replacement of

The tests

What and how

Version 5.7/10.3

MySQL

Shift to Percona:

Let us go backwards to MySQL

Let us try to move to MariaDB 10.3:

Let us go back to MySQL now:

Summarizing

Version 8/10.5

MySQL

Move to Percona Server

Let us rollback to MySQL

Let us now try MariaDB

Summarizing

Another weirdo

Conclusions

Reference

Full read consistency within Percona Operator for MySQL

Overview

Stale Reads

What is the impact?

Results

Conclusions

References

Percona Operator for MySQL (HAProxy or ProxySQL?)

Overview

Operator assumptions

The plus in the game (read scaling)

Global difference and comparison

The test environment

Comparing performance when scaling Reads

Comparing when using only one node

Conclusions

References

More Articles ...

login