My MySQL tips valid-rss-rogers



MySQL Search facility is gone …


Well it happens, after a long period of time without one second of relax The MySQL Dolphin Search, nick name Fetch for the friends, decide to go on vacation.



He was already not working as expected, and I remember times when together with other colleagues we were pushing on him to be more precise, and less restrictive, more “concept” oriented than “word” oriented. Not really accepting comments he was his way, and in any case we were loving him.

What happen recently I don't know, from when I left I didn't had the chance to talk with him too much, but just recently I was in the need to have his support, but he was not there... instead there was a message, saying that a search for “InnoDB” :

Skip navigation links

Oracle Secure Enterprise Search


* Help

* Search


Oracle Secure Enterprise Search Error

A problem has been encountered while processing your request.

Based on the information we have, we suspect the following:


The exception did not contain any error message. Report this problem to your administrator, or contact your Oracle representative.


Of course I can use his friend G, but you know I miss him, and also if I was not too nice with him sometime, I have just realize the great work he was doing, and how much added value he was providing to the documentation I use every day.


So, please whoever knows where he went for vacation, tell him to comes back.




The Binary Log


The binary log in MySQL has two main declared purpose, replication and PTR (point in time recovery), as declared in the MySQL manual. In the MySQL binary log are stored all that statements that cause a change in the database. In short statements like DDL such as ALTER, CREATE, and so on, and DML like INSERT, UPDATE and DELETE.

As said binary log transform multi thread concurrent data/structure modification in to a serialize steps of activity. To handle this, a lock for the binary log—the LOCK_log mutex—is acquired just before the event is written to the binary log and released just after the event has been written. Because all session threads for the server log statements to the binary log, it is quite common for several session threads to block on this lock.

The other side effect of this is on SLAVE side. There the binary log becomes the relay-log, and will be processed SINGLE threaded.

In the MASTER it looks like:


While on SLAVE:




It is quite clear how the SLAVE cannot work efficiently in the case we will have a high level of  data modification on the MASTER.


And now the story

The other day I was working on a client site, I was performing some simple benchmarking to identify the correct thresholds for monitoring tools and so on.

While doing that I start to stress a little his server a 24CPU 48GB RAM SAN attached storage running Linux kernel 2.6.

The client had decided, before I or (we) can say anything to use MMM for HA, given that to have an up to date replication is a crucial factor for their HA architecture.

The test I was running execute inserts and selects, of 650 Bytes per insert using scalable number of thread, modifiable complexity in the SELECT, can do single insert or batch inserts and finally can do also BLOB/TEXT inserts (in that case the size per insert will jump up).

I was executing my standard low traffic exercises, when I realized that with only 6 threads doing 318 inserts per second (204Kb/s) the slaves were lagging behind the master for an indecent number of seconds.

I was expecting some lag but not so much. I stop any activity on the Master and wait for slave to recover, it takes 25 minutes to the slaves to recover/apply 20 seconds of inserts on the master.

I was shocked. That was too much considering I normally reach the 36 threads running at the same time on 5.1 and 144 on 5.5  before starting to see scaling issue and performance degradations (not in the scope of this article to publish that results).

Anyhow the issue was on replication not on Master. I was in the need to find a solution for the client or at least try before suggesting an architecture review.

Given re-inventing the wheel it is not my “motto” I started to look around for possible solutions, and parallelization seems the most interesting one.


My first though goes to Tungsten from Continuent, this also thanks to the very nice bogs coming from Giuseppe Maxia :


Giuseppe was providing a superb introduction on one of the most interesting (actually the only really usable) solution currently available, the “Tungsten Replicator”.


I was reading the articles and get excited more and more.
That was exactly what I was looking for, until I read “A Shard, in this context, is the criteria used to split the transactions across channels. By default, it happens by database”.


Ooops, it will not work for my customer and me, given that his instance is mainly single schema, or at least the major amount of traffic is directed to only one schema.

I pinged Giuseppe, and we had a very interesting talk, he confirms the approach by object/schema, suggesting alternatives for my issue. Unfortunately given the special case no sharding could be done or table distribution, so for the moment and for the specific case this solution will not fit.

Looking around I had seen that almost all use the same approach, which is parallel replication by OBJECT where object is again and unfortunately, mainly the SCHEMA.

Sharding replication by schema is an approach that could work in some cases but not all.

If you have many schemas, with a limited number of well-distributed Questions per second, I agree this solution is a very good one.

If you have only one schema with very high level of Questions per seconds, this will not work anymore, for sure we can distribute the tables in different schemas, but what about transactions doing inserts/modifications in multiple tables?

We should be sure that each action is running on isolate state, so no relation with others. In short it will not work.

Having the possibility to dream, I would like to have parallel replication by MySQL User/Connections.

I know it could looks as a crazy idea, but trying to replicate the behavior of the MASTER on the SLAVES starts from the user(s) behavior, not from the Object inside the server instance.

Trying to move objects from A to B (see from MASTER to SLAVE) is really complicate because it needs to take in to account how people->application behave on a specific OBJECT or set of them. The status of the OBJECTS is subject to constant changes and most of them at the same time, so concurrency and versioning needs to be preserve.

In short what I was thinking was to realize a conceptual shift from REPLICATION to MIMIC approach.

I started my dream looking on what we already have and how it could be used.

We have a flexible binary log format (version4), we have a known set of SCHEMAS, USERS, and connections going from users to schemas. We have (at least for Innodb) the possibility to have good concurrent and versioning mechanism. The MAX_USER_CONNECTIONS can be used to tune the number of threads the specific user can use, and assign also the permissions.

Setting MAX_USER_CONNECTIONS=30 to USER1@ will help us in correctly dimensioning the effort that replication by user will have to do in checking for consistency.

I am aware of the issue given the interaction and relation between user/transactions, and I see the need of having a coordinator that will eventually manage the correct flush of the transaction by epoch.

What I was thinking is that the coordinator on the slave could open as much as MAX_USER_CONNECTIONS on the master, and use them as flushing threads.

As said before, I was thinking to trust the internal versioning, so not supporting transaction engines are excluded.

The epoch and/or Transaction ID would then be used to flush them in correct order, where the flush will be done to the pages in memory and then use internal mechanism for flushing to disk(s), talking of InnoDB.

It will not be real parallelization but also not a serialization given we will trust the internal versioning, so we will respect the EPOCH for the launch order not waiting for the completion.

On MASTER the replicator agent will create an X number of Replication channel for each authorized/configured user per SCHEMA, users and permission must be the same also on SLAVE.

The binary log write could looks like:


On the SLAVE, the logs will be copied and then processed opening 1 to MAX_USER_CONNECTIONS Thread(s). 
Coordination between insert will be taken by replication MUTEX per User based on the EPOCH.


I know quite difficult to realize given the fact that we will have more users writing on different schemas. To have it working correctly, a MUTEX should check the correct process execution at Relay Log Coordinator, ensuring also a correct execution order base on the EPOCH.

Sounds like another serialization, also if we will not wait for the transaction to complete, but only to start.

It seems that a mixed approach will be a better solution, such that replication by SHARD will be used for DB and then per USER inside only by SHARD.  This means archiving action per schema and per user transaction. Replicating the work on the SLAVE ensuring to have the same start order of the transaction for the modifications. This is not so crazy as most can think and not so far from sharding by schema, except the multi-threading during the writes.

Main point will be archiving correctly the information in the binary log:



And execute them in the right order as well, keeping also into account the User/Transaction order:


Relay logs will contain information divide by SCHEMA. Processed using multi thread per User/Connection and organize with reference to the epoch of the starting execution order on the MASTER. Different grades of parallel or serialization execution could be set on the how the single Schema relay coordinator will work.

I know it will not be full parallelization but is not serializing as well, given the possible parallel execution of N transaction per user.


I think that there could be some risk here so in the case we should be able to:

-       Configure the number of executing threads per user

-       Execute only one thread per user

-       Choose between EPOCH execution and on Lock relief execution


Fast/short comment on the new binary log.

From binary log V4 we have the possibility to extend what the binary log store in a flexible way.

The extra headers (from byte 19 to X) are there but not used, while that sounds to me the most interesting place where to store additional information regarding what and how happen on the MASTER.

Not only, an HASH value representing the data on the master after the operation, would be of incredible help in checking if the replicated data on the SLAVE is consistent with the MASTER or not, avoiding to do that with silly methods like we are force to do today.

Below the binary log V4 format description:

v4 format description event (size >= 91 bytes; the size is 76 + the number of event types):

| event  | timestamp         0 : 4    |
| header +----------------------------+
|        | type_code         4 : 1    | = FORMAT_DESCRIPTION_EVENT = 15
|        +----------------------------+
|        | server_id         5 : 4    |
|        +----------------------------+
|        | event_length      9 : 4    | >= 91
|        +----------------------------+
|        | next_position    13 : 4    |
|        +----------------------------+
|        | flags            17 : 2    |
| event  | binlog_version   19 : 2    | = 4
| data   +----------------------------+
|        | server_version   21 : 50   |
|        +----------------------------+
|        | create_timestamp 71 : 4    |
|        +----------------------------+
|        | header_length    75 : 1    |
|        +----------------------------+
|        | post-header      76 : n    | = array of n bytes, one byte per event
|        | lengths for all            |   type that the server knows about
|        | event types                |
v4 event header:
| timestamp         0 : 4    |
| type_code         4 : 1    |
| server_id         5 : 4    |
| event_length      9 : 4    |
| next_position    13 : 4    |
| flags            17 : 2    |
| extra_headers    19 : x-19 |


My favorite mottos are:
 “only the ones that do nothing do not make mistakes”


“don’t shoot on the pianist”.

As I say from the beginning mine is a dream, and I am just proposing an idea, specially because I see that almost all were following the same mind path focused on OBJECTS, probably for a good reason, but never the less I am following my dream.

It could happen that someone would like to see/try to work on that or that will use part of the idea revisiting modifying and making it possible to implement.

For the moment it is too complex and taking too much time, for me to do it alone, but it could be a good idea for a plug-in in my next life.



In addition to Giuseppe blog, which for now it seems to me the only one proposing something that could be implement, I did some research, but at the end the actors are always the same:


MySQL forge




I already received a comment fromGiuseppe about how to handle the Locks and prevent dead lock... working on that (I have some ideas and details will provide soon).



Overview on what a binlog is

The binary log contains “events” that describe database changes such as table creation operations or changes to table data.
It also contains events for statements that potentially could have made changes (for example, a DELETE which matched no rows),
unless row-based logging is used.
The binary log also contains information about how long each statement took that updated data.

The binary log has two important purposes:
  • For replication, the binary log is used on master replication servers as a record of the statements to be sent to slave servers.
    The master server sends the events contained in its binary log to its slaves, which execute those events to make the same data
    changes that were made on the master. See Section 16.2, “Replication Implementation”.
  • Certain data recovery operations require use of the binary log. After a backup has been restored, the events in the binary log
    that were recorded after the backup was made are re-executed. These events bring databases up to date from the
    point of the backup. See Section 6.5, “Point-in-Time (Incremental) Recovery Using the Binary Log”.

Running a server with binary logging enabled makes performance slightly slower.
However, the benefits of the binary log in enabling you to set up replication and for
restore operations generally outweigh this minor performance decrement.


Binlog main settings

The binlog is controlled by few settings in the my.cnf file, let quickly see them (credits MySQL manual):

Enable binary logging. The option value, if given, is the basename for the log sequence.
The server creates binary log files in sequence by adding a numeric suffix to the basename.
It is recommended that you specify a basename, otherwise, MySQL uses host_name-bin as the basename.

Parameter to do or ignore databases


If the value of this variable is greater than 0, the MySQL server synchronizes its binary log to disk (using fdatasync()) after every sync_binlog writes to the binary log.
There is one write to the binary log per statement if autocommit is enabled, and one write per transaction otherwise.
The default value of sync_binlog is 0, which does no synchronizing to disk—in this case, the server relies
on the operating system to flush the binary log's contents from to time as for any other file.
A value of 1 is the safest choice because in the event of a crash you lose at most one statement or transaction from the binary log.
However, it is also the slowest choice (unless the disk has a battery-backed cache, which makes synchronization very fast).

The size of the cache to hold the SQL statements for the binary log during a transaction.
A binary log cache is allocated for each client if the server supports any transactional storage engines and if the server has the binary log enabled (--log-bin option).
If you often use large, multiple-statement transactions, you can increase this cache size to get more performance.

If a multiple-statement transaction requires more than this many bytes of memory, the server generates a Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage error.
The minimum value is 4096.

If a write to the binary log causes the current log file size to exceed the value of this variable, the server rotates the binary logs(closes the current file and opens the next one).
The minimum value is 4096 bytes. The maximum and default value is 1GB.

The number of days for automatic binary log file removal. The default is 0, which means “no automatic removal.”
Possible removals happen at startup and when the binary log is flushed.

An example of usage of all of them:

log_bin = /var/mysql/log/binlog51/mysql-bin
relay-log = /var/mysql/log/relaylog/relay

expire_logs_days = 30
max_binlog_size = 500M
log-slave-updates = 1
max_binlog_cache_size = 4294967295
sync_binlog = 1


In this case what we have define is an automatic purge of the Binary logs after 30 days.
30 days is a safe window for the replication to happen, so there should not be any issue here in removing the binary logs.
Not always we are so lucky, and some time we have to clean up our disks to save space.

In this case is a good idea to set-up a procedure to perform checks against the slaves before performing a purge, you can do it by yourself
or trust one of the many script on the web, good starting point is Maatkit.

Now let us assume that for any reason you have purged a binary log from the master, and that the slave was not already there.

What will happen is that you will receive a very nice message from MySQL replication:
"Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted
(you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log),
a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to
know their names by issuing 'SHOW SLAVE STATUS' on this slave."

Nothing unexpected ... you have deleted it with the PURGE so log not there anymore.
Assuming you have the backup of the purged binlog (as it should ALWAYS be), copy back the file(s), and edit the <binlog>.index file, adding the missed entry previously PURGED.

Then check if it sounds fine:

mysql> show binary logs;

It will show the added entry.

Well Nice ALL done !!!

Assuming from that we could restart the slave... So let us try.
What we will get is something frustrating:

100929 11:21:47 [Note] Slave SQL thread initialized, starting replication
in log 'mysql-bin.002671' at position 1073741674, relay log './xxxxxxxxxxxx-relay-bin.000001' position: 4
100929 11:21:47 [Note] Slave I/O thread: connected to master This email address is being protected from spambots. You need JavaScript enabled to view it.:3306',
replication started in log 'mysql-bin.002671' at position 1073741674
100929 11:21:47 [ERROR] Error reading packet from server: Could not find first log
file name in binary log index file ( server_errno=1236)
100929 11:21:47 [ERROR] Got fatal error 1236: 'Could not find first log file name
in binary log index file' from master when reading data from binary log
100929 11:21:47 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.002671', position 1073741674
100929 11:22:08 [Note] Error reading relay log event: slave SQL thread was killed



Why this happens?

The situation seems a little bit confusing, right?
The MySQL server say that it CAN see them, but files are not valid for the replication process, so what is going on?

Well first of HOW I DO FIX it this will help us to understand.
Here a comment, I got comment form some MySQL DBA saying that the way is to RESTART MySQL, true if you do that it will fix it,
but you DO NOT NEED TO restart MySQL in this case, instead do the right thing:

mysql> FLUSH LOGS;


Then restart the slave process on the SLAVE:


mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_User: xxxx
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.xxxx
Read_Master_Log_Pos: xxxxx
Relay_Log_File: xxxxxxxxxxxx-relay-bin.xxxx
Relay_Log_Pos: xxxxxx
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Skip_Counter: 0
Exec_Master_Log_Pos: 131915378
Relay_Log_Space: 563985741
Until_Condition: None
Until_Log_Pos: 0
Master_SSL_Allowed: No
Seconds_Behind_Master: 946273
1 row in set (0.00 sec)


WOW it works ..., well this is expected, but I want to spend one second on WHY it works.

What we have to do in this cases is to read the source code and once more see what is happening there.

Reading the code we will see that FLUSH LOGS, will also call this function, which will instanciate a new MYSQL_LOG object.

The point is all here.

 if (log_type == LOG_BIN) 
if (!no_auto_events)
We log the whole file name for log file as the user may decide
to change base names at some point.
THD *thd = current_thd; /* may be 0 if we are reacting to SIGHUP */
Rotate_log_event r(thd,new_name+dirname_length(new_name),
bytes_written += r.data_written;
Update needs to be signalled even if there is no rotate event
log rotation should give the waiting thread a signal to
discover EOF and move on to the next log.
name=0; // Don't free name

The function will close the current BINLOG and also the the <binlog>.index associated, then will ROTATE
the binarylog, and create new one with a new name, finally reopen a new <binlog>.index.

/* reopen index binlog file, BUG#34582 */
if (!open_index_file(index_file_name, 0))
open(old_name, save_log_type, new_name_ptr,
io_cache_type, no_auto_events, max_size, 1);


Lesson Learn

Also if MySQL allow us to do nice changes on the fly, editing and modifying the text files like binlog.index,
internally these changes are not considered valid unless the descriptors for these objects are refresh.
Refresh that is done in different way one of them is using FLUSH LOGS.

Final Consideration

So far so good and nothing really new except that we have done another dig in MySQL code, to understand what and how it works.
There will be much more to say about, and I admit that I am doing it simpler then it is, but it was not my intention to write an article
on code digging, but only showing how also simple and well know actions like the one here reported, could be misleading, if we do not
have a better understanding.
This also put in evidence that it would have being much better:

  1. refresh the binlog.index automatically when it is modify
  2. do not accept/show the binlog.index modified IF there is not an internal refresh, this is confusing, misleading and not consistent.

Hope this (simple) issue will be fix by Oracle in on of the many code review to go.



void MYSQL_LOG::new_file(bool need_lock)
char new_name[FN_REFLEN], *new_name_ptr, *old_name;
enum_log_type save_log_type;

if (!is_open())
DBUG_PRINT("info",("log is closed"));

if (need_lock)


if binlog is used as tc log, be sure all xids are "unlogged",
so that on recover we only need to scan one - latest - binlog file
for prepared xids. As this is expected to be a rare event,
simple wait strategy is enough. We're locking LOCK_log to be sure no
new Xid_log_event's are added to the log (and prepared_xids is not
increased), and waiting on COND_prep_xids for late threads to
catch up.
if (prepared_xids)
while (prepared_xids)
pthread_cond_wait(&COND_prep_xids, &LOCK_prep_xids);

/* Reuse old name if not binlog and not update log */
new_name_ptr= name;

If user hasn't specified an extension, generate a new log name
We have to do this here and not in open as we want to store the
new file name in the current binary log file.
if (generate_new_name(new_name, name))
goto end;

if (log_type == LOG_BIN)
if (!no_auto_events)
We log the whole file name for log file as the user may decide
to change base names at some point.
THD *thd = current_thd; /* may be 0 if we are reacting to SIGHUP */
Rotate_log_event r(thd,new_name+dirname_length(new_name),
bytes_written += r.data_written;
Update needs to be signalled even if there is no rotate event
log rotation should give the waiting thread a signal to
discover EOF and move on to the next log.
name=0; // Don't free name

Note that at this point, log_type != LOG_CLOSED (important for is_open()).

new_file() is only used for rotation (in FLUSH LOGS or because size >
max_binlog_size or max_relay_log_size).
If this is a binary log, the Format_description_log_event at the beginning of
the new file should have created=0 (to distinguish with the
Format_description_log_event written at server startup, which should
trigger temp tables deletion on slaves.

/* reopen index binlog file, BUG#34582 */
if (!open_index_file(index_file_name, 0))
open(old_name, save_log_type, new_name_ptr,
io_cache_type, no_auto_events, max_size, 1);

if (need_lock)






The Binary Log

The binary log in MySQL has two main declared purpose, replication and PTR (point in time recovery), as declared in the MySQL manual. In the MySQL binary log are stored all that statements that cause a change in the database. In short statements like DDL such as ALTER, CREATE, and so on, and DML like INSERT, UPDATE and DELETE.

As said binary log transform multi thread concurrent data/structure modification in to a serialize steps of activity. To handle this, a lock for the binary log—the LOCK_log mutex—is acquired just before the event is written to the binary log and released just after the event has been written. Because all session threads for the server log statements to the binary log, it is quite common

for several session threads to block on this lock.

It is not in the scope of this discussion to provide a detailed description of the binary log, which could be found at

What it is relevant is that MySQL write "a" log and read "a" log, serializing what is happening in the MASTER in a multi thread scenario. Also relevant is that MySQL has to deal with different behavior in relation to TRANSACTIONAL and NON-TRANSACTIONAL storage engine, trying to behave consistently.

In order to do that MySQL had changes his behavior while evolving, so it changes the way he was and is using the TRANSACTION CACHE also creating not few confusion.


Transaction Cache


What is the Transaction Cache or TC? It is a "per thread" cache used by MySQL to store the statements that will be needed to be written in the binary log.

This cache is active ONLY when the Binary log is active, otherwise it will not be used, and, as it declare, it is a "Transaction" cache so used to store statements executed in a transaction, and that will be consistently flushed in the binary log serializing the multi thread execution.

But what if we also have non transactional storage engine involved in our operations? Well here is where the fun begins, and also where the difference in how MySQL acts makes our life quite complicate.



Actors involved (who control what)


The TC (transaction Cache) is currently controlled by few parameters, and it is important to have them in mind:


o   this variable is to enable the binary log, if not present all the structure related to it is not active;


o   The size of the cache to hold changes to the binary log during a transaction


o   this variable determines the size of the cache for the binary log to hold nontransactional statements issued during a transaction.


o   This variable when enabled, causes updates to non-transactional tables to be written directly to the binary log, rather than to the transaction cache.

      Binlog_cache_use vs Binlog_cache_disk_use

o   Status Variables to use for checking the efficiency of the cache.

      Binlog_stmt_cache_use vs Binlog_stmt_cache_disk_use

o   Status Variables to use for checking the efficiency of the cache.



MySQL Transaction cache and MySQL versions


Ok let start trying to define some milestones, like what is what, in which version, and what works and what doesn't.

binlog_cache_size is present from the start but behave different, in:

5.1 controls the whole transaction cache

5.5.3 up to 5.5.8 controls both size of the statement and none cache

5.5.9 and newer controls only the transaction cache demanded to transactions



5.1 not present

5.5.9 it controls the dimension of the transaction cache demanded to store the NON transactional statements present in a transaction.



5.1.44 Introduced. it is DISABLE by default. If enable it allows all statements executed, also AFTER a Transaction (we will see later what this means), to be directly flushed to the binary log without going to the TC. It has effect only if the binlog_format is set to STATEMENT or MIXED.

5.5.2 Introduced.

5.5.5 It was modify to be active ONLY if binlog_format is STATEMENT in any other case it is ignored.

This feature, regardless the MySQL version, create more problems then solutions, and honestly I think it should be removed, please see the bug ( to understand what I mean. In any case it has now a very limited scope of action, and better to consider it as not present.



How the whole works


The first point to take in mind, is that the TC is used only during a transaction, also the new TCStatement, is suppose to store only statement that are issued during a transaction, and it will be flush at the moment of the COMMIT, actually it go the queue for acquiring the LOG_LOCK mutex.


All other statements will directly go to the LOG_LOCK mutex queue.

Said that let us see how the TC works and what it does and how.


In general terms when you mix the Transactional and NON-transactional statements these rules apply, unless the infamous

binlog_direct_non_transactional_updates is enable (only 5.1):

1. If the statement is marked as transactional, it is written to the transaction cache.

2. If the statement is not marked as transactional and there are no statements in the

transaction cache, the statement is written directly to the binary log.

3. If the statement is not marked as transactional, but there are statements in the

transaction cache, the statement is written to the transaction cache.


Interesting point here is what and how is define as NON Transactional or as TRANSACTIONAL, from reading the manual and the many reference we have that:


After MySQL 5.1.31

a statement is considered non-transactional if it changes only non-transactional tables;

a statement is transactional if it changes only transactional tables.

Finally a statement that changes both non-transactional and transactional tables is considered “mixed”. Mixed statements, like transactional statements, are cached and logged when the transaction commits.


Before MySQL 5.1.31

a statement was considered non-transactional if the first changes it makes change non-transactional tables;

a statement was considered transactional if the first changes it makes change transactional tables.


In term of code:

Before MySQL 5.1.31



INSERT INTO innodb_tbl VALUES (a),(b);

INSERT INTO myisam_tbl SELECT * FROM innodb_tbl;


NON-Transactional (the first command also if inside a declared transaction):


INSERT INTO myisam_tbl VALUES (a),(b);

INSERT INTO innodb_tbl SELECT * FROM innodb_tbl;


After MySQL 5.1.31



INSERT INTO innodb_tbl VALUES (a),(b);

INSERT INTO innodb_tbl SELECT * FROM innodb_tbl;



INSERT INTO myisam_tbl VALUES (a),(b);

INSERT INTO myisam_tbl SELECT * FROM innodb_tbl;



INSERT INTO innodb_tbl VALUES (a),(b);

INSERT INTO myisam_tbl SELECT * FROM innodb_tbl;



A graph could help as well to understand what happens between threads and flush of the TC in the binary log.

In the graphs we have three treads running in parallel, remember that inside mysql the main entity is the THD "The THD class defines a thread descriptor. It contains the information pertinent to the thread that is handling the given request. Each client connection is handled by a thread. Each thread has a descriptor object."




The first thread will do all the operations using TRANSACTION storage engine (InnoDB);


the second thread will use NON-TRANSACTIONAL (MyISAM) in the first insert, then TRANSACTIONAL (InnoDB);


the third statement will do first a TRANSACTIONAL insert and then will use it for a NON-TRANSACTIONAL.


In the first one I have assume the 5.1.31 and over behavior.



The three statements start almost at the same time, and none of them has data in any cache or buffer. Said that it is probably true that statement 1 in Thread2 will be executed (fully) faster, given it will take less as overhead, and giving that is the first statement in the THD2 transaction, it will be treated as NON-TRANSACTIONAL and directly sent to the binary-log's queue.


THD1 is fully TRANSACTIONAL, it will do his internal action then flush to binary-log.


Then THD2 had done also the second part of the operation TRANSACTIONAL and it is flushing it as well.


Last the THD3 which had first insert a record from TRANSACTIONAL table and then use it to populate the NON-TRANSACTIONAL.




It looks more or less fine, isen't it? Well but assume that all the three THD share the InnoDB table and that as per default the ISOLATION LEVEL is Repeatable Read... Do you get it??


Actions and value on the MASTER will do and set values, that are potentially different from the ones in the SLAVE given the different order in the execution.


Immagine to have updates, instead insert and/or WHERE condition by values. Results could be different given the position, and remember, we are still talking about the STATEMENT replication given it is still the default ALSO in 5.5.


Finally remember that the issue it is NOT that THD1/2/3 could potentially change each other value, but the fact that they do something on the MASTER which could potentially different in the SLAVE.


That's it, the changes introduced in MySQL were done to reduce this wrong behavior, or to in some way try to keep consistency.


Let see what and how in the others graphs.



What about using binlog_direct_non_transactional_updates, will it helps?


I admit that I have exaggerated a little bit, in this example but my intention was to show how dangerous this option could be.



THD2 statement 1, as usual will be the first one, then THD1 consistently apply the whole set, but then it could happens that given the option set binlog_direct_non_transactional_updates THD3 will be faster in flushing the NON-TRANSACTIONAL statement, writing to the binary log before THD2, adding fragmentation and breaking the order in the binary log.


As said I have exaggerate a little, but theoretically this is very likely to happen in a standard context.

So in addition to the already found bug(s) (see the reference section), this option for me it is not really an additional help, but an issue generator.



Transaction Cache in MySQL 5.5


From MySQL 5.5.4 the TC is now divide in two parts, one control/store the TRANSACTIONAL statements, another store ONLY the NON_TRANSACTIONAL statements.


What does it means? In theory it means that any statement which was suppose to be part of a TRANSACTION but related to a NON-TRANSACTIONAL table (like statement at the start of a transaction), instead going directly to the binary log should be moved to the TC-statement, and then flushed when the TC is flushed.


The involvement of the binlog_direct_non_transactional_updates complicate  a little all the scenario and in addition to that, it also create some major issue in the correct flush, so I am ignoring it here.


Let us see what happens using the new TC mechanism:


THD1 is as usual consistent in the way it flush to binary log. Arbitrarily I have decide that this time THD2 is taking a little bit more time like a millisecond, and that is enough for THD1 to acquire the LOG_LOCK before it.


The scope as you have probably guess was to make more ORDERED the graph, given that this time having one or the other flushing before or after will not really make any difference this time.


This because the TCStatement will be flushed (at least this is what I have seen and understood) when the TC is flushed.

Result is that the flush will take place using same the order on master and on the slave.


Finally THD3 also if will have the possibility to use the TCstatements it will not giving the fact that the statement using a NON-TRANSACTIONAL storage engine is NOT the first statement, so the mixed transaction will be placed in the TC as expected.





There was not so much to say from the beginning, to be honest, the TC and TCstatements are useful only for that case we do mix-transactions. The only significant changes are between after 5.1.31 and before and the introduction of the TCstatement. All the other tentative to add flexibility to this area, had in my vision increased bad behaviors.


It is easy for Master and Slave to store or modify data in a unexpected way, issue that any DBA face probably too often in MySQL.

I am not saying that all that cases should bring back to this specific issue, but for sure it is one of the less known and difficult to track, because many things do influence the how MySQL will flush to binary log.


The golden rule is, do not mix NON-TRANSACTIONAL and TRANSACTIONAL, but if you really have to do that, be sure of doing it in using 5.5.9 and above, which at least gives you more insurance of flushing to binary log the commands consistently.


Last notes


Remember that any statement NON-TRANSACTIONAL in the TC cache will be flushed on COMMIT or ROLLBACK, and that in the latest case a WARNING message should be written in the MySQL error log.


Note that I will post later (when I will have the time) some code and related raw data for documentation purpose.




I have a problem, It's been months since I used MySQL and (I believe) I had set it up with a root password.  Now I can't log on to MySQL as root MySQL user and create a new user or manage an existing user (I can log onto server)...

Familiar situation. :)

Do so:

 service mysql stop

wait until MySQL shuts down. Then run

mysqld_safe --skip-grant-tables &

then you will be able to login as root with no password.

mysql -uroot mysql 

In MySQL command line prompt issue the following command:

UPDATE user SET password=PASSWORD("abcd") WHERE user="root"; 


At this time your root password is reset to "abcd" and MySQL will now
know the privileges and you'll be able to login with your new password:

mysql -uroot -pabcd mysql

Latest conferences

We have 46 guests and no members online