MySQL Cloning: more thoughts

I posted a few days ago some initial thoughts on the the MySQL native cloning functionality. Overall this looks good and I need to spend time to test further.

I’m here in San Francisco ahead of Oracle Open World which starts today. As is usual with trips like this jet lag wakes you up rather early. So I thought I would add some further thoughts on the topic which I think may also be of interest.

Version compatibility: The current clone process requires the source and destination to be identical. That is quite a restriction especially as since early 8.0 there’s a way for MySQL to auto-upgrade from 5.7 on startup. This new upgrade process works very well and makes instance management easier. Thanks Oracle! 8.0.17 is the current GA release as I write and MySQL 8.0.18 will come out soon. When 8.0.18 is released I would really like to be able to install 8.0.18 binaries and clone from 8.0.17, letting 8.0.18 do any upgrade tasks that are needed. Clearly there may be some extra restrictions, but if the source knows how to clone why shouldn’t I be able to do this? Reason: simplifies server management. No need to upgrade the source box first. I can then safely test the new version much more easily and quickly.

Support for other engines: Unless I’m mistaken the current process works for a system running InnoDB only. Consider making it possible to clone when other engines are involved. Again I expect there may be some restrictions (transactional engine? requirement to take snapshots? etc) but the idea I have here is this should be a generic mechanism for cloning and it should not be tied to InnoDB only.

Cloning of multiple instances at once: Is this possible? The documentation is not clear. I suspect that right now only a single instance can be copied at once. That may be the case but be explicit. Why? Initial testing of the cloning process make it look quite fast. I need to do further testing but have seen a 3x speed improvement vs traditional other methods I’ve been using up until now. This looks to be related to dong parallel copies of data between the servers but the end result is faster copying. That can lead me to potentially seeing a clone service, that looks like a MySQL server but behaves the same, and is able to service many clone requests in parallel. Clearly this is much to early to consider too seriously, but I know in systems I manage often multiple copies are being cloned at once, to setup a new cluster or to increase cluster capacity. Doing a single clone is good. Having a way to do multiple clones may be interesting or I might want to build something which provides such a service.

Configuration: 8.0 provides persistent configuration. I have not used this much as most of the systems I manage still run 5.7 or at least the master is 5.7 even if slaves run 8.0. There’s some confusion about the best way to configure MySQL in 8.0: persistent settings are encouraged and in many ways I see us moving to having most settings stored here with a few initial settings staying in the my.cnf file. One thing that I considered if you use persistent settings is that they will also be copied to the clone target server[*]. Some settings which perhaps are “environment specific” such as the buffer pool size, max_connections settings, io threads to use often depend on the specific location of the server. So copying these values to a new server if there were stored as persistent settings will lead to the server to pick them up and might prevent instance startup as they are “wrong” for the new destination host. Might make sense to have a filter list of settings you do not want to copy over to prevent this? This is something I have not actually checked, but potentially could be an issue especially if cloning a larger instance to a smaller test server with less RAM. We will see.

Update 08/Oct/2019: Oracle confirms that persistent settings are not copied to the target when cloning, meaning they need to be configured again on the new instance. I think that an option to copy them over, an option to filter out certain settings, and an option to provide replacement values should all be part of the clone process to make cloning smoother.

clone_buffer_size: this new setting seems fine and the default setting is quite small (4MB). We also have max_allowed_packet. I suspect that a higher value of clone_buffer_size than max_allowed_packet might not work. If that’s the case it should be documented. Indeed a quick test shows I get the following error message: ERROR 1153 (08S01) at line 1: Got a packet bigger than ‘max_allowed_packet’ bytes, so this needs fixing, in documentation but also adding a check to prevent me from setting clone_buffer_size to a value close to or greater than max_allowed_packet.

Clone protocol specification: This new process gets triggered by a simple command CLONE INSTANCE FROM which initiates an interaction between the source server and the server which needs to be cloned. There is logging on both sides of the connection but exactly what happens? To some extent this does not matter too much, but the exact behaviour between the servers is not documented in any way which ensures compatibility with later changes or potentially enhancements from others.

I would like to see a clone instance protocol specification that describes the interaction over the wire between both servers:

  • That would state the steps that are needed to perform the action
  • That would state any specific restrictions on what can be done now which needs to be taken into account to ensure the process works properly.
  • Would provide a way to enhance the process should that be necessary in the future in a backwards compatible way.

Why is this needed? Enhancements to the process may make incompatible changes in behaviour and in the past I have seen this can make it very hard to figure out what they are as the information is in the code and implicit rather than explicit. From a coder’s point of view having a specification to follow is much better than trying to reverse engineer existing code. So while I failed to convince Oracle to document more formally the MySQL X protocol perhaps something which is new and has a much tighter scope might be possible? Who knows.

I look forward to getting a chance to talk with the engineers about these topics and several more and to see the presentations in the MySQL track here at Open World. Maybe see you there.

MySQL 8.0.17+: Cloning is now much easier

If you use replication with MySQL, if you need a backup, if you need a spare copy of a system for testing and for many other reasons you need a way to make a copy of your MySQL system. In the past you could make a copy in various ways:

  • using a cold file system copy with MySQL first shutdown
  • using MySQL Enterprise Backup, Xtrabackup from Percona or MariaBackup (an Xtrabackup fork). These work against an online running server. However, they required special binaries to run on the source in addition to the traditional mysqld.
  • copying the LUN to a new server

There are many ways to do this and they all work and we have been doing this for so long that it feels normal, but clearly a mysqld to mysqld copy using the normal mysql port and protocol just simplifies things considerably. That is what is available now in 8.0.17 and later with the new clone plugin.

So I wanted to see how easy it was to set this up and how it worked. One easy way for on your laptop testing is to use dbdeployer as that allows you to spin up one or more instances quickly and is great for all sorts of testing. I’d almost like to see this used for normal MySQL provisioning and management but that’s a topic for a different blog post.

So when I found out about the new cloning method I wrote a quick script to see what effort was involved in setting up and cloning between servers. Take a look at the script to see what’s needed to set things up, and at the sample output when this happens. Thanks go to Giuseppe Maxia for showing how to easily achieve such a setup with dbdeployer.

Summary: once configured you can clone a server with a single SQL command. That is certainly much simpler than the methods that have been available until now.

So are there any gotchas? I found a few things:

  • bug 96281 – Can’t initialize function ‘clone’; Plugin initialization function failed (broken if sql_require_primary_key = 1), appears to be resolved in 8.0.18
  • bug 96302 – support CIDR notation for clone-valid-donor-list (though I’m not convinced this setting is actually useful)
  • bug 96305 – make the clone port optional and default to 3306 (default mysql port)
  • bug 96837 – documentation is a bit confusing

None of these are critical. The main concern I see is the backup_lock that gets taken on the source and which prevents certain operations from continuing normally. If this is to be used to copy from a primary master I see that as problematic.

As for how fast this works compared to other methods I’ll leave that for a later post.

Reminder: Madrid MySQL User Group Meetup presenting dbdeployer on Thursday in Madrid

This is a quick reminder that on Thursday (tomorrow) there there is a Madrid MySQL Users Group Meeting where Giuseppe Maxia will be presenting dbdeployer to us.

More information can be found here.

Do not forget to sign up if you are interested. We look forward to seeing you there.

Reminder: MySQL User Group NL and Madrid MySQL User Group Meetups presenting Scaling MySQL this week (Monday/Thursday)

This is a quick reminder that this week there are 2 MySQL User Group Meetings (Amsterdam and Madrid) where we will be talking about Scaling MySQL.

Guest speaker will be Morgan Tocker from PingCap who will talk about TiDB.

More information:

Do not forget to sign up if you are interested. We look forward to seeing you there.

2018-11-15: Announcing: Scaling MySQL with TiDB, Vitess and MySQL Cluster at Madrid MySQL Users Group

[ English ] – texto en español abajo

We’re pleased to announce the next Madrid MySQL Users Group meetup which will take place on the 15th of November at 19:00.  Sign up details can be found here.  There’ll also be a similar meetup in Amsterdam on the 12th November hosted by a colleague. Details here.

Want to know how to scale MySQL when your data won’t fit in one server?


Morgan Tocker: “TiDB: Distributed, horizontally scalable, MySQL compatible”

TiDB is an open source distributed scalable hybrid transactional and analytical processing (HTAP) database built by PingCAP. It also speaks the MySQL protocol and uses RocksDB as the underlying storage engine!

This talk provides an introduction and overview of TiDB, tailored for those with a strong background in MySQL. I will use MySQL as an example to explain various implementation details of TiDB, and translate terminology to MySQL/InnoDB terms.

Morgan Tocker is a Senior Product and Community Manager at PingCAP, the company behind TiDB. He was previously the Product Manager for the MySQL Server at Oracle. His presentation will be in English.


Simon Mudd: “Vitess and MySQL Cluster”

Simon will be talking about Vitess and MySQL Cluster, two alternative approaches to scaling MySQL, and the perspective of these products as seen by

* Vitess:
* MySQL cluster:

Note: security requirements require us to ask you to provide your full name and ID/passport number prior to attending the meeting. This will be requested when you register your attendance.

[ español ] – English text above

Tenemos el placer de anunciar el próximo Meetup Madrid MySQL Users Group a celebrar el próximo 15 de noviembre a las 19:00 horas.  Se puede encontrar los detalles para apuntarse aquí. Habrá un meetup en Amsterdam el lunes anterior, 2018-11-12. Detalles aquí.

¿Quieres saber como escalar MySQL cuando tus datos ya no caben en un servidor?

Morgan Tocker: “TiDB: Distributed, horizontally scalable, MySQL compatible”

TiDB es una base de datos open-source distribuida construido por PingCAP. Utiliza el protocol MySQL para hablar con clientes y usa RocksDB como su motor para guardar los datos.

Esta charla da una introducción y resumen de TiDB, pensado en aquellos que tienen una buena base en MySQL. Usaré MySQL como ejemplo para explicar los detalles de la implementación de TiDB y traducir terminología a lo equivalente en MySQL/InnoDB.

La presentación de Morgan será en inglés.


Simon Mudd: “Vitess and MySQL Cluster”

Simon hablará sobre Vitess y MySQL Cluster, dos maneras alternativas de mirar la escalabilidad de MySQL, y la perspectiva de estos productos del punto de vista de

* Vitess:
* MySQL cluster:

Nota: por motivos de seguridad es necesario confirmar el nombre completo y DNI/número de pasaporte de cada asistente antes del comienzo de la reunión en la pregunta que se os manda al apuntarse.

MMUG19: Multi-instances with Galera Cluster and MySQL 8.0 on 25th April in Madrid

The Madrid MySQL Users Group has been a bit quiet recently so we thought it would be good to have a meeting again. The next meeting will take place next on Wednesday, 25th April and will cover Galera Cluster (Presentation by Emanuele Mottola).  We will also talk about MySQL 8.0 and any latest updates that may have been announced.  Meetup details and location can be found here.   If you’re not attending Percona Live 2018 in sunny Santa Clara and are in Madrid come and join us.

Barcelona MySQL Users Group Meetup on 5th July

If you’re in Barcelona next week you may be interested in the MySQL Meetup being held there by the Barcelona MySQL Meetup group on Wednesday 7th July at 7pm. I’ll be doing a talk on MySQL Failover and Orchestration and there will be opportunity to talk about MySQL and related topics afterwards.

More information can be found on their web page.  I look forward to seeing you there.

Thoughts on setting up Orchestrator in a Production environment

orchestrator logo


There have been several posts on setting up and using orchestrator.  Most of these are quite simple and do not discuss in detail some of the different choices you may want to consider when setting up orchestrator in a real production environment. As I have been using orchestrator for some time I thought it would be good to discuss how a real production orchestrator setup might be achieved. So here are some thoughts which I would like to share.


The basics for setting up orchestrator are to setup the orchestrator app and configure it to be able to write to a backend MySQL database.

Configuration requires telling orchestrator how to find the MySQL instances you want to monitor and perhaps to forget old servers that are no longer being used. For a small setup you may be happy to do this by hand but adding automation hooks to the provisioning or decommissioning process of MySQL hosts can come in handy. You have the choice of using the command line:

or using the http interface and URLs like

according to which method is easiest to setup. Note: discovery of new servers in an existing replication chain should not be necessary as orchestrator will normally be able to figure this out on its own.

Handling Master or Intermediate master failover

Failover behaviour also needs to be configured. While orchestrator is able to adjust the replication topology if a master or intermediate master fails sometimes, and this is more so for a primary master, additional external tasks may be needed to ensure the completion of the failover process. This may also good for notifying appropriate people or systems prior to and after dealing with the failover.

This is handled in orchestrator.conf.json with the following settings:

which will run scripts on the active orchestrator node to achieve the desired configuration changes. You can use these hooks to do tasks such as:

  • notify people or systems of the issue that’s been seen
  • change the configuration of external systems which need to be aware of a master or intermediate master failure
  • tell the applications of the failure and where to find the new master

All of these tasks will be specific to your environment but there’s plenty of freedom here to hook orchestrator in even if it is not directly aware of “the outside world”.

Selection of servers to be eligible [intermediate] masters

You may have special servers, such as those used for testing, or located in a different part of your network, which you do not want to promote to be a master or intermediate master. Orchestrator is able to allow you to indicate this with settings such as

This works pretty well and covers almost all cases where you need to handle special cases for one or more reasons.

Failover PromotionRules

For larger setups where there are more servers in the cluster you may prefer orchestrator to failover to one or more specific servers and there there are some promotion rules you can apply to adjust the priority of which servers are preferred as a candidate when a failure occurs.

Currently this is configured on a per MySQL instance basis giving it one of the types Prefer, Neutral (default value) or Must Not.  (The code does have two other options Must and Prefer Not but these are not implemented.)

Configuration can be done via the command line via:

though here the configured default promotion rule is used (Prefer), but you can also use the http interface where you can explicitly state the required promotion rule using:

It is also possible to pull out the promotion rules as a bulk operation using the url:

This is convenient if you want to configure this centrally rather than individually on each MySQL instance.

High Availability Setup

If you really care about your MySQL servers not failing you probably also care about orchestrator itself not failing, so what can be done to make this service more reliable?

Orchestrator itself comprises two parts: the orchestrator application and the MySQL backend it writes to.

As far as the orchestrator app is concerned it is easy to configure more than one server. All apps use the same configuration and talk to the same MySQL backend database. They co-operate by writing to a common table in the backend database and electing a leader (or active node) which actively polls all known MySQL instances. The other nodes are working but doing nothing. Should the elected leader stop working another app will be chosen and takeover the process
of checking all MySQL instances. So setting up more than one app is very straightforward and usually it is good to setup orchestrator app servers in the same locations or datacentres where your MySQL servers are running.

Once you have more than one orchestrator app running it is convenient to use some sort of load balancing technology to make orchestrator visible via a single URL. This process works quite nicely as normal usage of the GUI can work on any of the orchestrator nodes, even if the active monitoring only takes place on one of them. This is where it may be convenient to add an authentication and https layer, neither of which is handled directly by orchestrator but which can easily be added using something like nginx.


is very convenient as it shows you the apps which are running, their version which node is the active node. You can see an example below on some testing servers I use:

orchestrator web status
orchestrator web status

As far as Orchestrator’s handling of the backend MySQL server going away this is something which perhaps deserves a comment. Orchestrator has a backend database and expects it to be there.  So configuring a single MySQL server as orchestrator’s backend is probably not ideal. Standard MySQL replication will give you a spare and I think that for most cases this is in practice good enough.

If the “orchestrator db” master fails it is unlikely that orchestrator will be able to fix this. The paranoid may like to consider using something like Galera, MySQL Cluster or even the new MySQL Group Replication (and InnoDB Cluster when it is released), but all that orchestrator really cares about is being able to write to a backend database so it can store state and use that state later.  Additional auditing, logging, and history information is kept but none of this is critical and write rates on the backend are generally low unless the number of instances you monitor is very high. So adjusting the orchestrator configuration to talk to a different MySQL host, or alternatively to make the configuration use a virtual IP or DNS CNAME gives you the flexibility to be able to make quick changes without needing to adjust the orchestrator configuration itself.

While I use standard MySQL replication to provide a spare backend I also keep a record of the MySQL instances ( host:port ) so even under some completely strange broken setup I can feed this information into orchestrator via the discovery interface into an empty configuration and have orchestrator working again in a few seconds. A convenient URL is designed to simplify this task.

So all in all the HA setup is quite easy to get going and the good thing about that is then it is easy to upgrade any of the nodes just by stopping it, adjusting binaries and restarting, without having to worry about the “MySQL Failover Service” not being available.

People may wonder why this matters so much. If you setup is small then the chances of the master or intermediate master failing are also quite low. As your environment grows so does the chance of a failure occurring. I see failures, sometimes more than once a day, and prefer orchestrator to be running so I do not need to have to deal with these failures manually.

Monitoring Orchestrator

What’s required to monitor orchestrator? Basically you want to monitor the orchestrator process is working and the http web interface (especially if you run several app servers) on each of the boxes individually.

Orchestrator itself also supports graphite and can provide you information on internal activity such as the number of successful or failed discovery processes (polling MySQL servers) and also read and write activity to the backend MySQL store. However if you’re not using graphite this is more tricky.

I have made some code changes to provide further more detailed metrics on the time taken to poll and check each of the monitored MySQL servers as I had experienced some load issues due to the number of servers being monitored and these timing metrics helped identify where to focus to fix this. These metrics are available via a raw http api call and for simplicity aggregate values can be retrieved for the last few seconds. This makes tying into any external monitoring system much easier.

Some of these patches have been passed back upstream to github and further patches should arrive shortly. However, adding these metrics allowed me to identify bottlenecks in orchestrator when monitoring a large number of systems and together with colleagues performance enhancements for this sort of situation have been fed back upstream.


I hope that this article helps provide a bit more insight into what might be worth thinking about when setting up orchestrator for the first time in a production environment. Feel free to contact me if more detail is needed or something is not clear enough.

Where I work and why…

I rarely talk explicitly about where I work ( However, I do enjoy it. We are lucky to have just completed our wonderful annual event where we all come together and share some time, not only with direct colleagues, but also with colleagues from other offices we see much less frequently. This video in many ways represents what brings us together and what makes this all so special. I hope you enjoy it.

MMUG16: MySQL document store: SQL and NoSQL united

The Madrid MySQL Users Group has its next meeting on Tuesday, 22nd November 2016.  Giuseppe Maxia will be giving a presentation MySQL document store: SQL and NoSQL united and I’ll be providing a brief summary of the new MySQL 8.0 and MariaDB 10.2 beta versions which were announced recently. There will also be an opportunity to discuss topics related to MySQL. Hope to see you there.

More information can be found at:

El grupo Madrid MySQL Users Group tendrá su próxima reunión el martes 22 de noviembre 2016. Giuseppe Maxia nos ofrecerá una presentación MySQL document store: SQL and NoSQL united y yo ofreceré una breve resumen de las nuevas versiones beta de MySQL 8.0 y MariaDB 10.2 que se anunciaron recientemente. Habrá una oportunidad para comentar otros temas relacionados con MySQL or MariaDB. Espero veros allí.

Se puede ver más información en:

OOW16 talk – MySQL X protocol – Talking to MySQL directly over the Wire

Oracle Open World 2016 has just finished in San Francisco and we are now about to embark on Percona Live Europe in Amsterdam.

I offered a presentation in San Francisco on the MySQL X protocol, the new protocol that Oracle is using to make the DocumentStore work. This new protocol also allow you to send normal SQL queries to it, and it looks like Oracle has plans to use it in more scenarios.

If you’re interested in protocols and how things work you can find my presentation here:


Also I didn’t see you in San Francisco hopefully we’ll get a chance to catch up in Amsterdam.

2016-10-02 I’ll also be doing the same talk in Percona Live Amsterdam on Wednesday 5th October. You can see the schedule here. Other colleagues also have presentations which may be of interest.

The differences between IDEMPOTENT and AUTO-REPAIR mode

I posted recently Lossless RBR for MySQL 8.0 about a concern I have about moving to minimal RBR in MySQL 8.0.  This seems to be the direction that Oracle is considering, but I am not sure it is a good idea as a default setting.

I talked about a hypothetical new replication mode lossless RBR and also about recovery after a crash where perhaps the data on the slave may get out of sync with the master. Under normal circumstances this should not happen but in the real world sometimes it does.

Note: I’m talking about an environment that does not use GTID.  GTID is good but may have its own issues and it’s probably best to leave those discussions to another post.

So let us talk about the difference between IDEMPOTENT mode (slave_exec_mode=IDEMPOTENT) and what I’ll call AUTO-REPAIR mode, mentioned in feature request bug#54250 to Oracle in 2010.  By default the DBA wants to avoid any data corruption, so this should be the default behaviour. Thus I’d prefer auto-repair mode to be off by default, stopping replication if any inconsistencies are found. I could enable it if I see such an issue as it should help me recover the state of the database without adding further “corruption” to the slave.

If I’m confident that this procedure works fine and I’m monitoring the counters mentioned below then it may be fine to leave enabled all the time.

A slave fails, it may crash and it recovers. It’s likely that the replication position it “remembers” is behind the actual state in the database.

If we use full RBR (default setting) in these circumstances then we may get in a set of changes which the SQL thread tries to apply.

They’ll be in the form of:

before row image / after row image

before row image / after row image

where each row image is the set of column values prior to and after the row changes.  Traditionally we use the abbreviations BI and AI for this.

Currently the SQL thread will start up and look for the first row to change and once it has found it change it.  If the exact matching conditions it needs can not be found then an error will be generated and replication stops.

IDEMPOTENT mode attempts to address this and tries to “continue whatever the cost”. To be honest I’m not exactly sure what it does, but it’s clear that it will either do nothing or perhaps it might try to find the row by primary key and update that row. I’d expect it probably does nothing.

See a comment later on.  So I did go and check and the comments in slave_exec_mode say that it suppresses duplicate-key and no-key-found errors. There is no mention of updates where the full AI is unavailable. (e.g. when using minimal RBR)

It also looks like it does not “repair” the issue, but it simply ignores it. The documentation is not 100% clear to me.

I made a comment about different options for AUTO-REPAIR mode and when it can work and when it can not. In FULL RBR mode it should always be able to do something. In MINIMAL RBR mode there will be cases when it can not. Let’s see the case of FULL RBR mode:

  1. For an UPDATE when the requested row can not be found:
    • auto-repair mode would INSERT the row. You have a full AI so you can do this safely.
    • A counter should be updated to record this action.
  2. For a DELETE row operation when the row can not be found:
    • auto-repair mode would ignore the error and given the row does not exist anyway the effect of the DELETE has already been accomplished.
    • A counter should be updated to record this action
  3. For an INSERT row operation when the ROW already exists.
    • Duplicate key insert) This is what generally breaks replication.
    • auto-repair mode would treat this as an UPDATE operation (based on the primary key in the table) and ensure the row is changed to have the same primary key and the columns of the AI.
    • Again a counter should be updated to record this action.

In FULL RBR mode these 3 actions should allow replication to continue. The database is no more corrupt than it was before. In fact it’s in a state that’s somewhat better.

In many cases other row events will proceed as expected without issue:  INSERTS will happen, UPDATES and DELETEs to existing rows will work as the row is found, and things will proceed as normal.

So should we get in a situation like this we can check the 3 counters and this gives us a clue as to the number of “repair actions” which MySQL has had to execute.  It also gives us an idea of how inconsistent the slave seems to be, though those inconsistencies should now have been removed.

As I said I can’t remember exactly what IDEMPOTENT mode does in these 3 circumstances.  It may do something similar to my AUTO-REPAIR mode or it may just skip the errors.

Why don’t I know?  Well I’m currently in a plane and the mysql documentation is not provided with my mysql server software and I’m not online so I can’t check.  I used to find the info file or a pdf of the manual quite helpful in such situations and would love to see it put back again so I don’t need to speculate about what the documentation says.

Yes, I could update this text when I’m back online, but I think I’ll make the point and leave this paragraph here.

So with FULL RBR the situation seems to me to be clear. IDEMPOTENT mode may not do the same thing as the AUTO-REPAIR mode, and whether it does or not there are no counters to see the effect it produces on my server. So I’m blind. I do not like that.

Let’s change the topic slightly and now switch to MINIMAL RBR and do the same thing. In theory now IDEMPOTENT mode and AUTO-REPAIR mode may seem to be the same (assuming IDEMPOTENT mode changes what it can) but that’s also not entirely true.

With minimal RBR mode we get a set of  primary key plus changed columns for each row that changes. For INSERTS we get the full ROW and for DELETES we only need the primary key. That should be enough.

What changes here are the UPDATES: as if we don’t get the full row image we can not know what was in the table before. We only have information on the new data.  So other columns which are not mentioned are unknown to us. If we are UPDATING a row and we can not find it, an INSERT is not possible as we do not have enough information to complete the columns that are unknown to us. So replication MUST stop if we want to avoid corruption.

Additionally, with minimal RBR UPDATES even if you find the ROW to UPDATE you can not be sure you are doing the right thing as you have no reference to the content or state of the before image. My thought here was that the ideal thing would be to send with each row a checksum of the row content on the master.  This would be “small” (so efficient) and could be checked against the row content on the slave prior to making the update.  If the values match we know the RBR UPDATE is working on expected data.  This makes a DBA feel more comfortable.

Table definitions on a master and its slaves are not always identical.  There are several reasons for this such as the fact that different (major) versions of MySQL are being used, or simply due to it being impossible to take downtime on the server some sort of out of band ALTER TABLE may have been run on the slave and that change is still pending on the master. The typical case here is adding new columns, or changing the type, width, character set or collation of an existing column. In these circumstances the binary image on the master and slave may well not be the same so the before row image “checksum” on the master would not be usable.  To detect such a situation it may be necessary to also send a table definition checksum with the row before image checksum, though this could be sent for each set of events on a table not each row. The combination of the two values should be enough to allow us to be ensure that minimal RBR changes can be validated even if we do not push down a full before image into the binlog stream. Again, if the definitions do not match it would seem sensible to update a counter to indicate such a situation.  We probably do not want to stop replication in this situation. Those who do not expect any sort of differences between master and slave may be paranoid enough to want to not continue, but I know for my usage I’d like to monitor changes to the counter but probably just continue.

Even my proposed LOSSLESS RBR would need this checksum to be safe as it would not contain the full before image but only the PK + all columns for an UPDATE operation, so potentially “slave drift” might happen and go undetected.

I can see therefore that optionally being able to add to minimal- and lossless-RBR such checksums would be a good way to ensure that replication works safely and pushes out changes to the slaves which are expected, and catches unexpected inconsistencies. 

The additional counters mentioned would help “catch” the number of inconsistencies that take place and they would be good even with the current replication setup when IDEMPOTENT mode is used. This lack of visibility of errors should make most DBAs rather sleepless, but I suspect there are those that are not aware and those that just have to live without that knowledge. Having these extra counters would help us see when things are not the same and allow us to take any necessary action based on that information should it be necessary.

I hope with this post I have clarify why IDEMPOTENT mode is not the same as my suggested AUTO-REPAIR mode and when it’s safe to continue replicating and when it is not under a variety of different conditions which would normally make RBR stop.

It also seems clear to me that MINIMAL RBR would benefit from some additional checksums to allow the DBA to be more confident that the changes being made on the slave match those made on the master.  This is especially so if using minimal RBR.

As always comments and feedback on this post is most welcome.

Lossless RBR for MySQL 8.0?

Lossless RBR

TL/DR: There’s been talk of moving the next release of MySQL to minimal RBR: I’d like to suggest an alternative: lossless RBR

For MySQL 5.8 there was talk / suggestions about moving to minimal RBR as the default configuration (  I’m not comfortable with this because it means that by default you do not have the full data in the binlog stream of the changes to a table. 

The use of minimal RBR is an optimisation, it is done deliberately in busy environments where the size of written data is large and it is not convenient/possible to keep all data.  Additionally the performance of full RBR can in some cases be an issue especially for “fat/wide” tables etc. It is true that minimal RBR helps considerably here.  There are several issues it resolves:

  • reduces network bandwidth between master and slaves
  • reduces disk i/o reading / writing the binlog files
  • reduces disk storage occupied by said binlogs

There was also a comment about enabling IDEMPOTENT mode by default on a slave.

This is a mode which basically ignores most errors. That does not seem wise.  As a DBA by default you want the server to not lose or munge data. There are times when you may decide to forgo that requirement, but the DBA should decide and the default behaviour should be safe.

Thus the idea of lossless RBR came to mind. What would this involve compared to the current modes of FULL or MINIMAL RBR?

  1. INSERTs are unchanged (as now): you get the full row
  2. DELETEs are as per minimal RBR: The primary key is sent and the matching row is removed.  IFF on a slave the pks differed and more than one row would be deleted this should be treated as an error.
  3. UPDATEs: Send the pk + full new image, thus ensuring that all new data is sent. This reduces the event size by ~ 1/2 so would be especially good for fat tables and tables where large updates go through. If the PK columns do not change then it should be sufficient to send the new row image and pk column names etc

Related to this behaviour it would be most convenient to implement an existing FR (bug#69223) to require that table definitions via CREATE/ALTER TABLE MUST HAVE a PK.  I’ve seen several issues where a developer has not thought a primary key was important (they often forget replication) and this would trigger problems. Inserts would work fine but any updates that happened afterwards would trigger a problem, not on the master but on all slaves.  I think that by default this behaviour should be enabled.  There may be situations where it needs to be disabled but they are likely to be rather limited.

This new mode LOSSLESS RBR is clearly a mix between full and minimal and it ensures that data pushed into a slave will always be complete.  think that is a better target to aim for with MySQL 8.0 than the suggested MINIMAL RBR.

You may know that I do not like IDEMPOTENT mode much. I have created several FRs to add counters to “lost/ignored events” so we can see the impact of using this mode (usually it is used after an outage to keep replication going even if this may mean some data is not being updated correctly.  Usually this is better than having a slave with 100% stale data.)

I would really also like to see you adding a “safe recovery mode” where statements which won’t damage the slave more are accepted.

The examples are in bug#54250 but basically include:

  1. INSERTs with duplicate key: convert to UPDATE
  2. DELETEs with row not found: ignore as the data has gone anyway
  3. UPDATEs with non-matching PK: convert to INSERT
  4. UPDATEs with non-matching columns: update what you can.  (This is likely to happen with full RBR as minimal RBR should never generate this type of error.)

[ For each of these 4 states: add counters to indicate how many times this has happened, so we can see if we’re “correcting” or “fixing” errors or not. ]

You’ll notice that lossless RBR would work perfectly with this even after a crash as you’ll have all the data you need, so you’ll never make the state of the database any worse than it was before.

I would like to see the FRs I’ve made regarding improving RBR being implemented as whether lossless RBR becomes a new replication mode or not they would help DBAs both diagnose and fix problems more easily than now.

It is probably also worth noting that FULL RBR is actually useful for a variety of scenarios, for example for exporting changes to other non-MySQL systems.  We miss for this the definition of tables, and current systems need to extract that out of band which is a major nuisance.  Exporting to external systems may not have happened that frequently in the past, but as larger companies use MySQL this becomes more and more important. For this type of system FULL RBR is probably needed even though it may not be used on the upstream master. I would expect that in most cases LOSSLESS RBR would also serve this purpose pretty well and reduce the replication footprint. The only environment that may need traditional FULL RBR is where auditing of ALL changes in a table is needed and thus both the before and after images are required.

Is it worth adding yet another replication mode to MySQL?  That is a good question and it may not be worth the effort.  However the differences between FULL and LOSSLESS RBR should be minimal: the only difference is the amount of data that’s pushed into the binlog so the scope of changes etc should be more limited.  Improving replication performance seems to be a good goal: we all need that, but over-optimising should be considered more carefully.  I think we are still missing the monitoring metrics which help us diagnose and be better aware of issues in RBR and the “tools” or improvements which would make recovery easier. Unless you live in the real world of systems which break it is hard to understand why these “obscure” edge cases matter that much.  The responses like: “just restart mysqld” may make sense in some environments, but really are not realistic in systems that run 24x7x365. With replication it is similar: stopped replication is worse than replication that is working, but where data may not be complete.  Depending on the situation you may tolerate that “incomplete data” (temporarily) while gaining the changes which your apps need to see.  However, it is vitally important to be able to measure the “damage” and that is why counters like the ones indicated above are so vital. It allows you to distinguish 1 broken row, or 1,000,000 and decide on how to prioritise and deal with that as appropriate.

While I guess the MySQL replication developers are busy I would certainly be interested in hearing their thoughts on this possible new replication mode and would definitely prefer it over the suggested minimal RBR as a default for 8.0.  Both FULL and MINIMAL RBR have their place, but perhaps LOSSLESS would be a better default?  What do you think?

MMUG15: MySQL 5.7 & JSON

English: The Madrid MySQL Users Group is pleased to announce its next meeting on February 10th 2016 at 7pm at the offices of Tuenti in Gran Via, Madrid.  Morgan Tocker of Oracle will be visiting to give a talk on MySQL 5.7 and JSON as part of a European tour.  This will give you an an excellent opportunity to learn about the new MySQL version 5.7 and to learn about the new JSON functionality it provides. Further information on the event and registration can be found on the MMUG web page here:  We look forward to seeing you then.

Español: El grupo Madrid MySQL Users Group tiene el placer de anunciar su próxima reunión el 10 de febrero de 2016 a las 19:00 en las oficinas de Tuenti en Gran Vía, Madrid.   Morgan Tocker de Oracle nos visitará, como parte de una gira Europea, para darnos una charla (en inglés) sobre MySQL 5.7 y JSON.  La charla os dará una oportunidad excelente de aprender sobre la nueva versión 5.7 de MySQL y de aprender sobre la funcionalidad JSON que ofrece. Se puede encontrar más información sobre el evento y como registrar en la página web del MMUG aquí:  La presentación será en inglés pero habrá oportunidad de hablar o hacer preguntas en español.  Esperamos veros allí.

Is MySQL X faster than MySQL Y? – Ask queryprofiler

When trying out new software there are many other questions you may ask and one of those is going to be the one above. The answer requires you to have built your software to capture and record low level database metrics and often the focus of application developers is slightly different: they focus on how fast the application runs, but do not pay direct attention to the speed of each MySQL query they generate, at least under normal circumstances. So often they are not necessarily able to answer the question.

I have been evaluating MySQL 5.7 for some time, but only since its change to GA status has the focus has switched to check for any remaining issues and also to determine if in the systems I use performance is better or worse than MySQL 5.6.  The answers here are very application and load specific and I wanted a tool to help me answer that question more easily.

Since MySQL 5.6, the performance_schema database has had a table performance_schema.events_statements_summary_by_digest which shows collected metrics on normalised versions of queries. This allows you to see which queries are busiest and gives you some metrics on those queries such as minimum, maximum and average query times.

I used this information and built queryprofiler to allow me to collect these metrics in parallel from one or more servers and thus allow me to compare the behaviour of these servers against each other. This allows me to answer the question that had been nagging me for some time in a completely generic way.  It should also work on MariaDB 10.0 and later though I have not had time to try that out yet.

queryprofiler works slightly differently to just querying P_S once. It takes several collections of the data, computes deltas between each collection thus allowing you to know things like the number of queries per second which events_statements_summary_by_digest does not tell you. (There is no information in performance_schema telling you when the collections start. That is something I miss and would like to see fixed in MySQL 5.8 if possible.)

The other difference of course is that P_S gives you information on one server. If you collect the information at the same time from more than one server with a similar load then the numbers you get out should be very similar and that is what queryprofiler does.

How do you use queryprofiler?  Provide it with one or more Go-style MySQL DSNs to connect to the servers and optionally tell it how many times to collect data from the servers (default: 10) and at what interval (default: every second) and it will run and give you the results, telling you the top queries seen (by elapsed time of the query) and the metrics for each server (queries per second, average query latency and how much these values vary).

A couple of examples of the output can be found here:

Hopefully you will find this tool useful.  Feedback and patches to improve it are most welcome.

A Couple of MySQL 5.7 gotchas to be aware of

MySQL 5.7 GA was released a couple of months ago now with 5.7.9 and 5.7.10 has been published a few days ago.  So far initial testing of these versions looks pretty good and both versions have proved to be stable.

I have, however, been bitten by a couple of gotchas which if you are not aware of them may be a bit of a surprise. This post is to bring them to your attention.

New MySQL accounts expire by default after 360 days

This is as per documentation, so there is no bug here. MySQL 5.7 provides a new more secure environment. One of the changes is to add password expiry and the default behaviour is for passwords expire after 360 days.  This seems good, but you, perhaps like me, may not be accustomed to managing your passwords, checking for expiration and adjusting the MySQL user settings accordingly.  The default setting of default_password_lifetime is 360 days, so after upgrading a server to MySQL 5.7 from MySQL 5.6 this setting suddenly comes to life. The good thing is nothing happens immediately so you do not see the time bomb ticking away. I had have been testing the DMR versions of MySQL 5.7 earlier to the GA release and consequently using it for longer than 2 months.  Recently a couple of 5.7.9 servers which had been upgraded from 5.6 a year ago decided to block access to all applications at the same time.  The quick fix is simple: change the default setting to 0 (no expiry) and we have a configuration that behaves like MySQL 5.6 even if it less secure than the default MySQL 5.7 setup. We can then look at how to manage the MySQL accounts and take this new setting into account in a more secure manner.  If you are starting to use MySQL 5.7 and are not migrating from 5.6 then perhaps you’ll put in the right checks in place when you start, but those of us migrating from 5.6 can not push down grants with the new ALTER USER syntax until the 5.6 masters are upgraded so we need to pay more attention to this while in the progress of migration.

New range optimizer setting might cause unexpected table scans if not set properly

MySQL 5.7.9 GA added a new configuration variable: range_optimizer_max_mem_size, set by default to 1536000. The documentation does not say much about this new setting and seems quite harmless. “if … the optimizer estimates that the amount of memory needed for this method would exceed the limit, it abandons the plan and considers other plans.”  The range optimiser is used for point selects, primary key lookups and other similar queries.  What this setting does is after parsing a query look at the number of items which may be referenced in a WHERE clause and if the memory usage is too high fall back to a slower method.

Let’s put this into context. A query like SELECT some_columns FROM some_table WHERE id IN ( 1, 2, 3, ... big list of ids ... 99998, 99999 ) will trigger this limit being reached for a large enough range of ids. DELETE FROM some_table WHERE (pk1 = 1 AND pk2 = 11) OR (pk1 = 2 AND pk2 = 12) .. OR .. (pk1 = 111 AND pk2 = 121) /* pk1 and pk2 form a [primary] key */ would also potentially trigger this.

The questions that come out of this are (a) “How to figure out the point at which this change happens?”, and (b) “What happens at this point?”

The answer to (b) is simple: MySQL falls back to doing a table scan (per item). The answer to (a) is not so clear. Bug#78752 is a feature request to make this clearer, and further investigation pointed to MySQL 5.6’s previous behaviour where the limit was defined in terms of a fixed number of hard-coded “items” (16,000), whereas 5.7’s new behaviour is in terms of memory usage.  The relationship between the two settings is not very clear and initial guestimates on systems I saw issues with seems to indicate that maybe 4kB per item is used by MySQL 5.7 at the moment. The point is that what worked quickly as point selects on 5.6 may fall back to table scans per item in 5.7 if the number of entries is too high, and this would require a reconfiguration (it is dynamic) of the configuration setting mentioned. The bad behaviour may also only happen depending on the size of the query.

Many people may wonder why anyone would be mad enough to use a SELECT or DELETE statement with several thousand entries in an IN () clause, but this comes from having split data in a single server into two and making the application find a list of ids from one server using some criteria and then using the ids obtained in a different one. I see that pattern used frequently and it is probably a common pattern on any system where data will no longer fit in a single server.

The problem with this particular change in behaviour is that point selects are very fast and efficient in MySQL. People use them a lot. Table scans are of course really slow, so depending on the query in question performance can change from ms to minutes just because your query is a tiny bit bigger than the new threshold. In practice it looks like the old hard-coded limit and the new dynamic limit are at least an order of magnitude different in size so it is quite easy to trip up on good queries in 5.6 failing miserably in 5.7 without a configuration change. Again while migrating from MySQL 5.6 to 5.7 you may see this change bite you.

You may get caught by either of these issues. I got caught by both of them while testing 5.7 and while the solutions to resolve them are quite simple to fix they do require a configuration change to resolve the issue. I hope this post at least makes you recognise them and know where to poke so you can make your new 5.7 servers behave properly again.

MMUG14: MySQL Automation at Facebook

English: Madrid MySQL Users Group will be holding their next meeting on Tuesday, 10th November at 19:30h at the offices of Tuenti in Madrid. David Fernández will be offering a presentation “MySQL Automation @ FB”.  If you’re in Madrid and are interested please come along. We have not been able to give much advance notice so if you know of others who may be interested please forward on this information.  Full details of the MeetUp can be found here at the Madrid MySQL Users Group page.

Español: El día 10 de noviembre a las 19:30 Madrid MySQL Users Group tendrá su próxima reunión en las oficinas de Tuenti en Madrid.  David Fernández nos ofrecerá una presentación (en inglés) “MySQL Automation @ FB”.  Si estás en Madrid y interesado nos gustaría verte.  No hemos podido avisar con mucha antelación, así que si conoces a otros que podrían estar interesados agradeceríamos les hagas llegar esta información. Se puede encontrar los detalles completos de la reunión aquí en la página de Madrid MySQL Users Group.

MMUG13: Practical MySQL Optimisation y Galera Cluster presentations

English: Madrid MySQL Users Group will be holding their next meeting on 17th June at 18:00h at EIE Spain in Madrid. Dimitri Vanoverbeke and Stéphane Combaudon from Percona will be offering two presentations for us:

  • Practical MySQL optimisations
  • Galera Cluster – introduction and where it fits in the MySQL eco-system

I think this is an excellent moment to learn new things and meet new people. If you’re in Madrid and are interested please come along.  More information can be found here at the Madrid MySQL Users Group page.

Español: El día 17 de junio a las 18:00 Madrid MySQL Users Group tendrá su próxima reunión en las oficinas de EIE Spain.  Dimitri Vanoverbeke y Stéphane Combaudon de Percona nos ofrecerá dos presentaciones (en inglés):

  • Practical MySQL optimisations
  • Galera Cluster – introduction and where it fits in the MySQL eco-system

Creo que será una oportunidad excelente para aprender algo nuevo y para conocer gente nueva. Si estás en Madrid y interesado nos gustaría verte.  Se puede encontrar más información aquí en la página de Madrid MySQL Users Group.


MMUG12: Talk about Percona Toolkit and the new features of MySQL 5.7

Madrid MySQL Users Group is having a Meetup this afternoon, Wednesday, 13th May at 19:00.

  • I will be presenting (in Spanish) a quick summary of Percona Toolkit and also offering a summary of the new features in MySQL 5.7 as the release candidate has been announced and we don’t expect new functionality.
  • This is also an opportunity to discuss other MySQL related topics in a less formal manner.
  • You can find information about the Meetup here.

So if you are in Madrid and are interested please come along.

El Madrid MySQL Users Group tiene una reunión esta tarde, miércoles 13 de mayo, a las 19:00.

  • Ofreceré una presentación sobre Percona Toolkit y un resumen de las características nuevas de MySQL 5.7 que recientemente se anunció como Release Candidate. Ya no esperamos cambios en su funcionalidad.
  • También habrá una oportunidad de hablar de otros temas relacionados con MySQL de una manera menos formal.
  • Se puede encontrar información de la reunión aquí.

Si estás en Madrid y te interesa estarás bienvenido.


new to pstop – vmstat style stdout interface

In November last year I announced a program I wrote called pstop. I hope that some of you have tried it and found it useful. Certainly I know that colleagues and friends use it and it has proved helpful when trying to look inside MySQL to see what it is doing.

A recent suggestion provoked me to provide a slightly different interface to pstop, that is rather than show the output in a terminal-like top format, provide a line-based summary in a similar way to vmstat(8), pt-diskstats(1p) and other similar command line tools.  I have now incorporated some changes which allow this to be done. So if you want to see every few seconds which tables are generating most load, or which files have most I/O then this tool may be useful. Example output is shown below:

Hopefully this gives you an idea.  The --help option gives you more details. I have not yet paid much attention to the output and the output is not currently well suited for a tool to parse, so I think it’s likely I will need to provide a more machine readable --raw format option at a later stage.  That said feedback on what you want to see or patches are most welcome.