The Madrid MySQL Users Group has been a bit quiet recently so we thought it would be good to have a meeting again. The next meeting will take place next on Wednesday, 25th April and will cover Galera Cluster (Presentation by Emanuele Mottola). We will also talk about MySQL 8.0 and any latest updates that may have been announced. Meetup details and location can be found here. If you’re not attending Percona Live 2018 in sunny Santa Clara and are in Madrid come and join us.
If you’re in Barcelona next week you may be interested in the MySQL Meetup being held there by the Barcelona MySQL Meetup group on Wednesday 7th July at 7pm. I’ll be doing a talk on MySQL Failover and Orchestration and there will be opportunity to talk about MySQL and related topics afterwards.
More information can be found on their web page. I look forward to seeing you there.
There have been several posts on setting up and using orchestrator. Most of these are quite simple and do not discuss in detail some of the different choices you may want to consider when setting up orchestrator in a real production environment. As I have been using orchestrator for some time I thought it would be good to discuss how a real production orchestrator setup might be achieved. So here are some thoughts which I would like to share.
The basics for setting up orchestrator are to setup the orchestrator app and configure it to be able to write to a backend MySQL database.
Configuration requires telling orchestrator how to find the MySQL instances you want to monitor and perhaps to forget old servers that are no longer being used. For a small setup you may be happy to do this by hand but adding automation hooks to the provisioning or decommissioning process of MySQL hosts can come in handy. You have the choice of using the command line:
$ orchestrator -c discover [ -i <host>:<port> ] or
$ orchestrator -c forget [ -i <host>:<port> ]
or using the http interface and URLs like
according to which method is easiest to setup. Note: discovery of new servers in an existing replication chain should not be necessary as orchestrator will normally be able to figure this out on its own.
Handling Master or Intermediate master failover
Failover behaviour also needs to be configured. While orchestrator is able to adjust the replication topology if a master or intermediate master fails sometimes, and this is more so for a primary master, additional external tasks may be needed to ensure the completion of the failover process. This may also good for notifying appropriate people or systems prior to and after dealing with the failover.
This is handled in orchestrator.conf.json with the following settings:
which will run scripts on the active orchestrator node to achieve the desired configuration changes. You can use these hooks to do tasks such as:
- notify people or systems of the issue that’s been seen
- change the configuration of external systems which need to be aware of a master or intermediate master failure
- tell the applications of the failure and where to find the new master
All of these tasks will be specific to your environment but there’s plenty of freedom here to hook orchestrator in even if it is not directly aware of “the outside world”.
Selection of servers to be eligible [intermediate] masters
You may have special servers, such as those used for testing, or located in a different part of your network, which you do not want to promote to be a master or intermediate master. Orchestrator is able to allow you to indicate this with settings such as
This works pretty well and covers almost all cases where you need to handle special cases for one or more reasons.
For larger setups where there are more servers in the cluster you may prefer orchestrator to failover to one or more specific servers and there there are some promotion rules you can apply to adjust the priority of which servers are preferred as a candidate when a failure occurs.
Currently this is configured on a per MySQL instance basis giving it one of the types Prefer, Neutral (default value) or Must Not. (The code does have two other options Must and Prefer Not but these are not implemented.)
Configuration can be done via the command line via:
$ orchestrator -c register-candidate [ -i <host>:<port> ]
though here the configured default promotion rule is used (Prefer), but you can also use the http interface where you can explicitly state the required promotion rule using:
It is also possible to pull out the promotion rules as a bulk operation using the url:
This is convenient if you want to configure this centrally rather than individually on each MySQL instance.
High Availability Setup
If you really care about your MySQL servers not failing you probably also care about orchestrator itself not failing, so what can be done to make this service more reliable?
Orchestrator itself comprises two parts: the orchestrator application and the MySQL backend it writes to.
As far as the orchestrator app is concerned it is easy to configure more than one server. All apps use the same configuration and talk to the same MySQL backend database. They co-operate by writing to a common table in the backend database and electing a leader (or active node) which actively polls all known MySQL instances. The other nodes are working but doing nothing. Should the elected leader stop working another app will be chosen and takeover the process
of checking all MySQL instances. So setting up more than one app is very straightforward and usually it is good to setup orchestrator app servers in the same locations or datacentres where your MySQL servers are running.
Once you have more than one orchestrator app running it is convenient to use some sort of load balancing technology to make orchestrator visible via a single URL. This process works quite nicely as normal usage of the GUI can work on any of the orchestrator nodes, even if the active monitoring only takes place on one of them. This is where it may be convenient to add an authentication and https layer, neither of which is handled directly by orchestrator but which can easily be added using something like nginx.
is very convenient as it shows you the apps which are running, their version which node is the active node. You can see an example below on some testing servers I use:
As far as Orchestrator’s handling of the backend MySQL server going away this is something which perhaps deserves a comment. Orchestrator has a backend database and expects it to be there. So configuring a single MySQL server as orchestrator’s backend is probably not ideal. Standard MySQL replication will give you a spare and I think that for most cases this is in practice good enough.
If the “orchestrator db” master fails it is unlikely that orchestrator will be able to fix this. The paranoid may like to consider using something like Galera, MySQL Cluster or even the new MySQL Group Replication (and InnoDB Cluster when it is released), but all that orchestrator really cares about is being able to write to a backend database so it can store state and use that state later. Additional auditing, logging, and history information is kept but none of this is critical and write rates on the backend are generally low unless the number of instances you monitor is very high. So adjusting the orchestrator configuration to talk to a different MySQL host, or alternatively to make the configuration use a virtual IP or DNS CNAME gives you the flexibility to be able to make quick changes without needing to adjust the orchestrator configuration itself.
While I use standard MySQL replication to provide a spare backend I also keep a record of the MySQL instances ( host:port ) so even under some completely strange broken setup I can feed this information into orchestrator via the discovery interface into an empty configuration and have orchestrator working again in a few seconds. A convenient URL http://orchestrator.mycompany.com/api/bulk-instances is designed to simplify this task.
So all in all the HA setup is quite easy to get going and the good thing about that is then it is easy to upgrade any of the nodes just by stopping it, adjusting binaries and restarting, without having to worry about the “MySQL Failover Service” not being available.
People may wonder why this matters so much. If you setup is small then the chances of the master or intermediate master failing are also quite low. As your environment grows so does the chance of a failure occurring. I see failures, sometimes more than once a day, and prefer orchestrator to be running so I do not need to have to deal with these failures manually.
What’s required to monitor orchestrator? Basically you want to monitor the orchestrator process is working and the http web interface (especially if you run several app servers) on each of the boxes individually.
Orchestrator itself also supports graphite and can provide you information on internal activity such as the number of successful or failed discovery processes (polling MySQL servers) and also read and write activity to the backend MySQL store. However if you’re not using graphite this is more tricky.
I have made some code changes to provide further more detailed metrics on the time taken to poll and check each of the monitored MySQL servers as I had experienced some load issues due to the number of servers being monitored and these timing metrics helped identify where to focus to fix this. These metrics are available via a raw http api call and for simplicity aggregate values can be retrieved for the last few seconds. This makes tying into any external monitoring system much easier.
Some of these patches have been passed back upstream to github and further patches should arrive shortly. However, adding these metrics allowed me to identify bottlenecks in orchestrator when monitoring a large number of systems and together with colleagues performance enhancements for this sort of situation have been fed back upstream.
I hope that this article helps provide a bit more insight into what might be worth thinking about when setting up orchestrator for the first time in a production environment. Feel free to contact me if more detail is needed or something is not clear enough.
I rarely talk explicitly about where I work (booking.com). However, I do enjoy it. We are lucky to have just completed our wonderful annual event where we all come together and share some time, not only with direct colleagues, but also with colleagues from other offices we see much less frequently. This video in many ways represents what brings us together and what makes this all so special. I hope you enjoy it.
The Madrid MySQL Users Group has its next meeting on Tuesday, 22nd November 2016. Giuseppe Maxia will be giving a presentation MySQL document store: SQL and NoSQL united and I’ll be providing a brief summary of the new MySQL 8.0 and MariaDB 10.2 beta versions which were announced recently. There will also be an opportunity to discuss topics related to MySQL. Hope to see you there.
More information can be found at: https://www.meetup.com/Madrid-MySQL-users-group/events/234892086/
El grupo Madrid MySQL Users Group tendrá su próxima reunión el martes 22 de noviembre 2016. Giuseppe Maxia nos ofrecerá una presentación MySQL document store: SQL and NoSQL united y yo ofreceré una breve resumen de las nuevas versiones beta de MySQL 8.0 y MariaDB 10.2 que se anunciaron recientemente. Habrá una oportunidad para comentar otros temas relacionados con MySQL or MariaDB. Espero veros allí.
Se puede ver más información en: https://www.meetup.com/Madrid-MySQL-users-group/events/234892086/
Oracle Open World 2016 has just finished in San Francisco and we are now about to embark on Percona Live Europe in Amsterdam.
I offered a presentation in San Francisco on the MySQL X protocol, the new protocol that Oracle is using to make the DocumentStore work. This new protocol also allow you to send normal SQL queries to it, and it looks like Oracle has plans to use it in more scenarios.
If you’re interested in protocols and how things work you can find my presentation here: http://slideshare.net/sjmudd/mysql-x-protocol-talking-to-mysql-directly-over-the-wire.
Also I didn’t see you in San Francisco hopefully we’ll get a chance to catch up in Amsterdam.
2016-10-02 I’ll also be doing the same talk in Percona Live Amsterdam on Wednesday 5th October. You can see the schedule here. Other colleagues also have presentations which may be of interest.
I posted recently Lossless RBR for MySQL 8.0 about a concern I have about moving to minimal RBR in MySQL 8.0. This seems to be the direction that Oracle is considering, but I am not sure it is a good idea as a default setting.
I talked about a hypothetical new replication mode lossless RBR and also about recovery after a crash where perhaps the data on the slave may get out of sync with the master. Under normal circumstances this should not happen but in the real world sometimes it does.
Note: I’m talking about an environment that does not use GTID. GTID is good but may have its own issues and it’s probably best to leave those discussions to another post.
So let us talk about the difference between IDEMPOTENT mode (slave_exec_mode=IDEMPOTENT) and what I’ll call AUTO-REPAIR mode, mentioned in feature request bug#54250 to Oracle in 2010. By default the DBA wants to avoid any data corruption, so this should be the default behaviour. Thus I’d prefer auto-repair mode to be off by default, stopping replication if any inconsistencies are found. I could enable it if I see such an issue as it should help me recover the state of the database without adding further “corruption” to the slave.
If I’m confident that this procedure works fine and I’m monitoring the counters mentioned below then it may be fine to leave enabled all the time.
A slave fails, it may crash and it recovers. It’s likely that the replication position it “remembers” is behind the actual state in the database.
If we use full RBR (default setting) in these circumstances then we may get in a set of changes which the SQL thread tries to apply.
They’ll be in the form of:
before row image / after row image
before row image / after row image
where each row image is the set of column values prior to and after the row changes. Traditionally we use the abbreviations BI and AI for this.
Currently the SQL thread will start up and look for the first row to change and once it has found it change it. If the exact matching conditions it needs can not be found then an error will be generated and replication stops.
IDEMPOTENT mode attempts to address this and tries to “continue whatever the cost”. To be honest I’m not exactly sure what it does, but it’s clear that it will either do nothing or perhaps it might try to find the row by primary key and update that row. I’d expect it probably does nothing.
See a comment later on. So I did go and check and the comments in slave_exec_mode say that it suppresses duplicate-key and no-key-found errors. There is no mention of updates where the full AI is unavailable. (e.g. when using minimal RBR)
It also looks like it does not “repair” the issue, but it simply ignores it. The documentation is not 100% clear to me.
I made a comment about different options for AUTO-REPAIR mode and when it can work and when it can not. In FULL RBR mode it should always be able to do something. In MINIMAL RBR mode there will be cases when it can not. Let’s see the case of FULL RBR mode:
- For an UPDATE when the requested row can not be found:
- auto-repair mode would INSERT the row. You have a full AI so you can do this safely.
- A counter should be updated to record this action.
- For a DELETE row operation when the row can not be found:
- auto-repair mode would ignore the error and given the row does not exist anyway the effect of the DELETE has already been accomplished.
- A counter should be updated to record this action
- For an INSERT row operation when the ROW already exists.
- Duplicate key insert) This is what generally breaks replication.
- auto-repair mode would treat this as an UPDATE operation (based on the primary key in the table) and ensure the row is changed to have the same primary key and the columns of the AI.
- Again a counter should be updated to record this action.
In FULL RBR mode these 3 actions should allow replication to continue. The database is no more corrupt than it was before. In fact it’s in a state that’s somewhat better.
In many cases other row events will proceed as expected without issue: INSERTS will happen, UPDATES and DELETEs to existing rows will work as the row is found, and things will proceed as normal.
So should we get in a situation like this we can check the 3 counters and this gives us a clue as to the number of “repair actions” which MySQL has had to execute. It also gives us an idea of how inconsistent the slave seems to be, though those inconsistencies should now have been removed.
As I said I can’t remember exactly what IDEMPOTENT mode does in these 3 circumstances. It may do something similar to my AUTO-REPAIR mode or it may just skip the errors.
Why don’t I know? Well I’m currently in a plane and the mysql documentation is not provided with my mysql server software and I’m not online so I can’t check. I used to find the info file or a pdf of the manual quite helpful in such situations and would love to see it put back again so I don’t need to speculate about what the documentation says.
Yes, I could update this text when I’m back online, but I think I’ll make the point and leave this paragraph here.
So with FULL RBR the situation seems to me to be clear. IDEMPOTENT mode may not do the same thing as the AUTO-REPAIR mode, and whether it does or not there are no counters to see the effect it produces on my server. So I’m blind. I do not like that.
Let’s change the topic slightly and now switch to MINIMAL RBR and do the same thing. In theory now IDEMPOTENT mode and AUTO-REPAIR mode may seem to be the same (assuming IDEMPOTENT mode changes what it can) but that’s also not entirely true.
With minimal RBR mode we get a set of primary key plus changed columns for each row that changes. For INSERTS we get the full ROW and for DELETES we only need the primary key. That should be enough.
What changes here are the UPDATES: as if we don’t get the full row image we can not know what was in the table before. We only have information on the new data. So other columns which are not mentioned are unknown to us. If we are UPDATING a row and we can not find it, an INSERT is not possible as we do not have enough information to complete the columns that are unknown to us. So replication MUST stop if we want to avoid corruption.
Additionally, with minimal RBR UPDATES even if you find the ROW to UPDATE you can not be sure you are doing the right thing as you have no reference to the content or state of the before image. My thought here was that the ideal thing would be to send with each row a checksum of the row content on the master. This would be “small” (so efficient) and could be checked against the row content on the slave prior to making the update. If the values match we know the RBR UPDATE is working on expected data. This makes a DBA feel more comfortable.
Table definitions on a master and its slaves are not always identical. There are several reasons for this such as the fact that different (major) versions of MySQL are being used, or simply due to it being impossible to take downtime on the server some sort of out of band ALTER TABLE may have been run on the slave and that change is still pending on the master. The typical case here is adding new columns, or changing the type, width, character set or collation of an existing column. In these circumstances the binary image on the master and slave may well not be the same so the before row image “checksum” on the master would not be usable. To detect such a situation it may be necessary to also send a table definition checksum with the row before image checksum, though this could be sent for each set of events on a table not each row. The combination of the two values should be enough to allow us to be ensure that minimal RBR changes can be validated even if we do not push down a full before image into the binlog stream. Again, if the definitions do not match it would seem sensible to update a counter to indicate such a situation. We probably do not want to stop replication in this situation. Those who do not expect any sort of differences between master and slave may be paranoid enough to want to not continue, but I know for my usage I’d like to monitor changes to the counter but probably just continue.
Even my proposed LOSSLESS RBR would need this checksum to be safe as it would not contain the full before image but only the PK + all columns for an UPDATE operation, so potentially “slave drift” might happen and go undetected.
I can see therefore that optionally being able to add to minimal- and lossless-RBR such checksums would be a good way to ensure that replication works safely and pushes out changes to the slaves which are expected, and catches unexpected inconsistencies.
The additional counters mentioned would help “catch” the number of inconsistencies that take place and they would be good even with the current replication setup when IDEMPOTENT mode is used. This lack of visibility of errors should make most DBAs rather sleepless, but I suspect there are those that are not aware and those that just have to live without that knowledge. Having these extra counters would help us see when things are not the same and allow us to take any necessary action based on that information should it be necessary.
I hope with this post I have clarify why IDEMPOTENT mode is not the same as my suggested AUTO-REPAIR mode and when it’s safe to continue replicating and when it is not under a variety of different conditions which would normally make RBR stop.
It also seems clear to me that MINIMAL RBR would benefit from some additional checksums to allow the DBA to be more confident that the changes being made on the slave match those made on the master. This is especially so if using minimal RBR.
As always comments and feedback on this post is most welcome.
TL/DR: There’s been talk of moving the next release of MySQL to minimal RBR: I’d like to suggest an alternative: lossless RBR
For MySQL 5.8 there was talk / suggestions about moving to minimal RBR as the default configuration (http://mysqlserverteam.com/planning-the-defaults-for-mysql-5-8/). I’m not comfortable with this because it means that by default you do not have the full data in the binlog stream of the changes to a table.
The use of minimal RBR is an optimisation, it is done deliberately in busy environments where the size of written data is large and it is not convenient/possible to keep all data. Additionally the performance of full RBR can in some cases be an issue especially for “fat/wide” tables etc. It is true that minimal RBR helps considerably here. There are several issues it resolves:
- reduces network bandwidth between master and slaves
- reduces disk i/o reading / writing the binlog files
- reduces disk storage occupied by said binlogs
There was also a comment about enabling IDEMPOTENT mode by default on a slave.
This is a mode which basically ignores most errors. That does not seem wise. As a DBA by default you want the server to not lose or munge data. There are times when you may decide to forgo that requirement, but the DBA should decide and the default behaviour should be safe.
Thus the idea of lossless RBR came to mind. What would this involve compared to the current modes of FULL or MINIMAL RBR?
- INSERTs are unchanged (as now): you get the full row
- DELETEs are as per minimal RBR: The primary key is sent and the matching row is removed. IFF on a slave the pks differed and more than one row would be deleted this should be treated as an error.
- UPDATEs: Send the pk + full new image, thus ensuring that all new data is sent. This reduces the event size by ~ 1/2 so would be especially good for fat tables and tables where large updates go through. If the PK columns do not change then it should be sufficient to send the new row image and pk column names etc
Related to this behaviour it would be most convenient to implement an existing FR (bug#69223) to require that table definitions via CREATE/ALTER TABLE MUST HAVE a PK. I’ve seen several issues where a developer has not thought a primary key was important (they often forget replication) and this would trigger problems. Inserts would work fine but any updates that happened afterwards would trigger a problem, not on the master but on all slaves. I think that by default this behaviour should be enabled. There may be situations where it needs to be disabled but they are likely to be rather limited.
This new mode LOSSLESS RBR is clearly a mix between full and minimal and it ensures that data pushed into a slave will always be complete. I think that is a better target to aim for with MySQL 8.0 than the suggested MINIMAL RBR.
You may know that I do not like IDEMPOTENT mode much. I have created several FRs to add counters to “lost/ignored events” so we can see the impact of using this mode (usually it is used after an outage to keep replication going even if this may mean some data is not being updated correctly. Usually this is better than having a slave with 100% stale data.)
I would really also like to see you adding a “safe recovery mode” where statements which won’t damage the slave more are accepted.
The examples are in bug#54250 but basically include:
- INSERTs with duplicate key: convert to UPDATE
- DELETEs with row not found: ignore as the data has gone anyway
- UPDATEs with non-matching PK: convert to INSERT
- UPDATEs with non-matching columns: update what you can. (This is likely to happen with full RBR as minimal RBR should never generate this type of error.)
[ For each of these 4 states: add counters to indicate how many times this has happened, so we can see if we’re “correcting” or “fixing” errors or not. ]
You’ll notice that lossless RBR would work perfectly with this even after a crash as you’ll have all the data you need, so you’ll never make the state of the database any worse than it was before.
I would like to see the FRs I’ve made regarding improving RBR being implemented as whether lossless RBR becomes a new replication mode or not they would help DBAs both diagnose and fix problems more easily than now.
It is probably also worth noting that FULL RBR is actually useful for a variety of scenarios, for example for exporting changes to other non-MySQL systems. We miss for this the definition of tables, and current systems need to extract that out of band which is a major nuisance. Exporting to external systems may not have happened that frequently in the past, but as larger companies use MySQL this becomes more and more important. For this type of system FULL RBR is probably needed even though it may not be used on the upstream master. I would expect that in most cases LOSSLESS RBR would also serve this purpose pretty well and reduce the replication footprint. The only environment that may need traditional FULL RBR is where auditing of ALL changes in a table is needed and thus both the before and after images are required.
Is it worth adding yet another replication mode to MySQL? That is a good question and it may not be worth the effort. However the differences between FULL and LOSSLESS RBR should be minimal: the only difference is the amount of data that’s pushed into the binlog so the scope of changes etc should be more limited. Improving replication performance seems to be a good goal: we all need that, but over-optimising should be considered more carefully. I think we are still missing the monitoring metrics which help us diagnose and be better aware of issues in RBR and the “tools” or improvements which would make recovery easier. Unless you live in the real world of systems which break it is hard to understand why these “obscure” edge cases matter that much. The responses like: “just restart mysqld” may make sense in some environments, but really are not realistic in systems that run 24x7x365. With replication it is similar: stopped replication is worse than replication that is working, but where data may not be complete. Depending on the situation you may tolerate that “incomplete data” (temporarily) while gaining the changes which your apps need to see. However, it is vitally important to be able to measure the “damage” and that is why counters like the ones indicated above are so vital. It allows you to distinguish 1 broken row, or 1,000,000 and decide on how to prioritise and deal with that as appropriate.
While I guess the MySQL replication developers are busy I would certainly be interested in hearing their thoughts on this possible new replication mode and would definitely prefer it over the suggested minimal RBR as a default for 8.0. Both FULL and MINIMAL RBR have their place, but perhaps LOSSLESS would be a better default? What do you think?
English: The Madrid MySQL Users Group is pleased to announce its next meeting on February 10th 2016 at 7pm at the offices of Tuenti in Gran Via, Madrid. Morgan Tocker of Oracle will be visiting to give a talk on MySQL 5.7 and JSON as part of a European tour. This will give you an an excellent opportunity to learn about the new MySQL version 5.7 and to learn about the new JSON functionality it provides. Further information on the event and registration can be found on the MMUG web page here: http://www.meetup.com/Madrid-MySQL-users-group/events/227495184/. We look forward to seeing you then.
Español: El grupo Madrid MySQL Users Group tiene el placer de anunciar su próxima reunión el 10 de febrero de 2016 a las 19:00 en las oficinas de Tuenti en Gran Vía, Madrid. Morgan Tocker de Oracle nos visitará, como parte de una gira Europea, para darnos una charla (en inglés) sobre MySQL 5.7 y JSON. La charla os dará una oportunidad excelente de aprender sobre la nueva versión 5.7 de MySQL y de aprender sobre la funcionalidad JSON que ofrece. Se puede encontrar más información sobre el evento y como registrar en la página web del MMUG aquí: http://www.meetup.com/Madrid-MySQL-users-group/events/227495184/. La presentación será en inglés pero habrá oportunidad de hablar o hacer preguntas en español. Esperamos veros allí.
When trying out new software there are many other questions you may ask and one of those is going to be the one above. The answer requires you to have built your software to capture and record low level database metrics and often the focus of application developers is slightly different: they focus on how fast the application runs, but do not pay direct attention to the speed of each MySQL query they generate, at least under normal circumstances. So often they are not necessarily able to answer the question.
I have been evaluating MySQL 5.7 for some time, but only since its change to GA status has the focus has switched to check for any remaining issues and also to determine if in the systems I use performance is better or worse than MySQL 5.6. The answers here are very application and load specific and I wanted a tool to help me answer that question more easily.
Since MySQL 5.6, the
performance_schema database has had a table
performance_schema.events_statements_summary_by_digest which shows collected metrics on normalised versions of queries. This allows you to see which queries are busiest and gives you some metrics on those queries such as minimum, maximum and average query times.
I used this information and built queryprofiler to allow me to collect these metrics in parallel from one or more servers and thus allow me to compare the behaviour of these servers against each other. This allows me to answer the question that had been nagging me for some time in a completely generic way. It should also work on MariaDB 10.0 and later though I have not had time to try that out yet.
queryprofiler works slightly differently to just querying
P_S once. It takes several collections of the data, computes deltas between each collection thus allowing you to know things like the number of queries per second which
events_statements_summary_by_digest does not tell you. (There is no information in performance_schema telling you when the collections start. That is something I miss and would like to see fixed in MySQL 5.8 if possible.)
The other difference of course is that P_S gives you information on one server. If you collect the information at the same time from more than one server with a similar load then the numbers you get out should be very similar and that is what
How do you use
queryprofiler? Provide it with one or more Go-style MySQL DSNs to connect to the servers and optionally tell it how many times to collect data from the servers (default: 10) and at what interval (default: every second) and it will run and give you the results, telling you the top queries seen (by elapsed time of the query) and the metrics for each server (queries per second, average query latency and how much these values vary).
A couple of examples of the output can be found here:
Hopefully you will find this tool useful. Feedback and patches to improve it are most welcome.
MySQL 5.7 GA was released a couple of months ago now with 5.7.9 and 5.7.10 has been published a few days ago. So far initial testing of these versions looks pretty good and both versions have proved to be stable.
I have, however, been bitten by a couple of gotchas which if you are not aware of them may be a bit of a surprise. This post is to bring them to your attention.
New MySQL accounts expire by default after 360 days
This is as per documentation, so there is no bug here. MySQL 5.7 provides a new more secure environment. One of the changes is to add password expiry and the default behaviour is for passwords expire after 360 days. This seems good, but you, perhaps like me, may not be accustomed to managing your passwords, checking for expiration and adjusting the MySQL user settings accordingly. The default setting of default_password_lifetime is 360 days, so after upgrading a server to MySQL 5.7 from MySQL 5.6 this setting suddenly comes to life. The good thing is nothing happens immediately so you do not see the time bomb ticking away. I had have been testing the DMR versions of MySQL 5.7 earlier to the GA release and consequently using it for longer than 2 months. Recently a couple of 5.7.9 servers which had been upgraded from 5.6 a year ago decided to block access to all applications at the same time. The quick fix is simple: change the default setting to 0 (no expiry) and we have a configuration that behaves like MySQL 5.6 even if it less secure than the default MySQL 5.7 setup. We can then look at how to manage the MySQL accounts and take this new setting into account in a more secure manner. If you are starting to use MySQL 5.7 and are not migrating from 5.6 then perhaps you’ll put in the right checks in place when you start, but those of us migrating from 5.6 can not push down grants with the new ALTER USER syntax until the 5.6 masters are upgraded so we need to pay more attention to this while in the progress of migration.
New range optimizer setting might cause unexpected table scans if not set properly
MySQL 5.7.9 GA added a new configuration variable: range_optimizer_max_mem_size, set by default to 1536000. The documentation does not say much about this new setting and seems quite harmless. “if … the optimizer estimates that the amount of memory needed for this method would exceed the limit, it abandons the plan and considers other plans.” The range optimiser is used for point selects, primary key lookups and other similar queries. What this setting does is after parsing a query look at the number of items which may be referenced in a WHERE clause and if the memory usage is too high fall back to a slower method.
Let’s put this into context. A query like SELECT some_columns FROM some_table WHERE id IN ( 1, 2, 3, ... big list of ids ... 99998, 99999 ) will trigger this limit being reached for a large enough range of ids. DELETE FROM some_table WHERE (pk1 = 1 AND pk2 = 11) OR (pk1 = 2 AND pk2 = 12) .. OR .. (pk1 = 111 AND pk2 = 121) /* pk1 and pk2 form a [primary] key */ would also potentially trigger this.
The questions that come out of this are (a) “How to figure out the point at which this change happens?”, and (b) “What happens at this point?”
The answer to (b) is simple: MySQL falls back to doing a table scan (per item). The answer to (a) is not so clear. Bug#78752 is a feature request to make this clearer, and further investigation pointed to MySQL 5.6’s previous behaviour where the limit was defined in terms of a fixed number of hard-coded “items” (16,000), whereas 5.7’s new behaviour is in terms of memory usage. The relationship between the two settings is not very clear and initial guestimates on systems I saw issues with seems to indicate that maybe 4kB per item is used by MySQL 5.7 at the moment. The point is that what worked quickly as point selects on 5.6 may fall back to table scans per item in 5.7 if the number of entries is too high, and this would require a reconfiguration (it is dynamic) of the configuration setting mentioned. The bad behaviour may also only happen depending on the size of the query.
Many people may wonder why anyone would be mad enough to use a SELECT or DELETE statement with several thousand entries in an IN () clause, but this comes from having split data in a single server into two and making the application find a list of ids from one server using some criteria and then using the ids obtained in a different one. I see that pattern used frequently and it is probably a common pattern on any system where data will no longer fit in a single server.
The problem with this particular change in behaviour is that point selects are very fast and efficient in MySQL. People use them a lot. Table scans are of course really slow, so depending on the query in question performance can change from ms to minutes just because your query is a tiny bit bigger than the new threshold. In practice it looks like the old hard-coded limit and the new dynamic limit are at least an order of magnitude different in size so it is quite easy to trip up on good queries in 5.6 failing miserably in 5.7 without a configuration change. Again while migrating from MySQL 5.6 to 5.7 you may see this change bite you.
You may get caught by either of these issues. I got caught by both of them while testing 5.7 and while the solutions to resolve them are quite simple to fix they do require a configuration change to resolve the issue. I hope this post at least makes you recognise them and know where to poke so you can make your new 5.7 servers behave properly again.
English: Madrid MySQL Users Group will be holding their next meeting on Tuesday, 10th November at 19:30h at the offices of Tuenti in Madrid. David Fernández will be offering a presentation “MySQL Automation @ FB”. If you’re in Madrid and are interested please come along. We have not been able to give much advance notice so if you know of others who may be interested please forward on this information. Full details of the MeetUp can be found here at the Madrid MySQL Users Group page.
Español: El día 10 de noviembre a las 19:30 Madrid MySQL Users Group tendrá su próxima reunión en las oficinas de Tuenti en Madrid. David Fernández nos ofrecerá una presentación (en inglés) “MySQL Automation @ FB”. Si estás en Madrid y interesado nos gustaría verte. No hemos podido avisar con mucha antelación, así que si conoces a otros que podrían estar interesados agradeceríamos les hagas llegar esta información. Se puede encontrar los detalles completos de la reunión aquí en la página de Madrid MySQL Users Group.
English: Madrid MySQL Users Group will be holding their next meeting on 17th June at 18:00h at EIE Spain in Madrid. Dimitri Vanoverbeke and Stéphane Combaudon from Percona will be offering two presentations for us:
- Practical MySQL optimisations
- Galera Cluster – introduction and where it fits in the MySQL eco-system
I think this is an excellent moment to learn new things and meet new people. If you’re in Madrid and are interested please come along. More information can be found here at the Madrid MySQL Users Group page.
Español: El día 17 de junio a las 18:00 Madrid MySQL Users Group tendrá su próxima reunión en las oficinas de EIE Spain. Dimitri Vanoverbeke y Stéphane Combaudon de Percona nos ofrecerá dos presentaciones (en inglés):
- Practical MySQL optimisations
- Galera Cluster – introduction and where it fits in the MySQL eco-system
Creo que será una oportunidad excelente para aprender algo nuevo y para conocer gente nueva. Si estás en Madrid y interesado nos gustaría verte. Se puede encontrar más información aquí en la página de Madrid MySQL Users Group.
Madrid MySQL Users Group is having a Meetup this afternoon, Wednesday, 13th May at 19:00.
- I will be presenting (in Spanish) a quick summary of Percona Toolkit and also offering a summary of the new features in MySQL 5.7 as the release candidate has been announced and we don’t expect new functionality.
- This is also an opportunity to discuss other MySQL related topics in a less formal manner.
- You can find information about the Meetup here.
So if you are in Madrid and are interested please come along.
El Madrid MySQL Users Group tiene una reunión esta tarde, miércoles 13 de mayo, a las 19:00.
- Ofreceré una presentación sobre Percona Toolkit y un resumen de las características nuevas de MySQL 5.7 que recientemente se anunció como Release Candidate. Ya no esperamos cambios en su funcionalidad.
- También habrá una oportunidad de hablar de otros temas relacionados con MySQL de una manera menos formal.
- Se puede encontrar información de la reunión aquí.
Si estás en Madrid y te interesa estarás bienvenido.
In November last year I announced a program I wrote called
pstop. I hope that some of you have tried it and found it useful. Certainly I know that colleagues and friends use it and it has proved helpful when trying to look inside MySQL to see what it is doing.
A recent suggestion provoked me to provide a slightly different interface to
pstop, that is rather than show the output in a terminal-like
top format, provide a line-based summary in a similar way to
pt-diskstats(1p) and other similar command line tools. I have now incorporated some changes which allow this to be done. So if you want to see every few seconds which tables are generating most load, or which files have most I/O then this tool may be useful. Example output is shown below:
$ pstop --stdout --count=3 --interval=10 --limit=4 --view=file_io_latency
pstop 0.4.0 - 18:29:38 myserver-0001 / 5.6.xx-log, up 99d 22h 42m 42s [REL] 0 seconds
File I/O Latency (file_summary_by_instance) 6 row(s)
Latency %| Read Write Misc|Rd bytes Wr bytes| Ops R Ops W Ops M Ops|Table Name
27.88 ms 43.1%| 1.6% 98.4%| 28.50 k| 40 50.0% 50.0%|<redo_log>
27.05 ms 41.8%| 23.1% 1.7% 75.3%|356.98 k 9.79 k| 1.91 k 98.0% 1.0% 1.0%|<binlog>
9.54 ms 14.7%|100.0% | 32.00 k | 2 100.0% |db.mytable1
70.32 us 0.1%|100.0% | 16.00 k | 1 100.0% |db.mytable2
64.66 ms 100.0%| 24.7% 1.4% 73.9%|436.98 k 38.29 k| 1.95 k 96.0% 2.0% 2.0%|Totals
pstop 0.4.0 - 18:29:48 myserver-0001 / 5.6.xx-log, up 99d 22h 42m 52s [REL] 10 seconds
File I/O Latency (file_summary_by_instance) 38 row(s)
Latency %| Read Write Misc|Rd bytes Wr bytes| Ops R Ops W Ops M Ops|Table Name
3.71 s 33.2%| 16.1% 3.8% 80.1%| 35.01 M 968.91 k|203.50 k 97.9% 1.0% 1.0%|<binlog>
3.17 s 28.3%| 1.5% 98.5%| 3.35 M| 4.34 k 50.0% 50.0%|<redo_log>
893.29 ms 8.0%| 94.9% 1.2% 3.9%| 1.89 M 4.41 M| 571 21.2% 49.4% 29.4%|db.mytable1
668.57 ms 6.0%| 77.8% 22.2%| 30.28 M| 683 50.1% 49.9%|<ibdata>
11.18 s 100.0%| 35.7% 7.0% 57.4%| 42.52 M 64.83 M|212.06 k 94.3% 3.1% 2.6%|Totals
pstop 0.4.0 - 18:29:58 myserver-0001 / 5.6.xx-log, up 99d 22h 43m 2s [REL] 10 seconds
File I/O Latency (file_summary_by_instance) 34 row(s)
Latency %| Read Write Misc|Rd bytes Wr bytes| Ops R Ops W Ops M Ops|Table Name
3.48 s 32.2%| 17.0% 3.8% 79.2%| 33.80 M 935.31 k|202.72 k 98.0% 1.0% 1.0%|<binlog>
2.92 s 27.1%| 1.6% 98.4%| 3.37 M| 4.34 k 50.0% 50.0%|<redo_log>
916.01 ms 8.5%| 92.7% 1.1% 6.2%| 1.98 M 4.58 M| 590 21.5% 49.7% 28.8%|db.mytable1
898.80 ms 8.3%| 98.0% 0.4% 1.7%| 2.09 M 1.52 M| 303 44.2% 32.0% 23.8%|db.mytable3
10.79 s 100.0%| 38.6% 6.3% 55.1%| 42.03 M 57.94 M|210.93 k 94.5% 3.0% 2.6%|Totals
Hopefully this gives you an idea. The
--help option gives you more details. I have not yet paid much attention to the output and the output is not currently well suited for a tool to parse, so I think it’s likely I will need to provide a more machine readable
--raw format option at a later stage. That said feedback on what you want to see or patches are most welcome.
Madrid MySQL Users Group will have its next meeting on Thursday, the 29th of January.
I will be giving a presentation on the MySQL binlog server and how it can be used to help scale reads and be used for other purposes. If you have (more than) a few slaves this talk might be interesting for you. The meeting will be in Spanish. I hope to see you there.
Details can be found on the group’s Meetup page here: http://www.meetup.com/Madrid-MySQL-users-group/events/219810484/
La próxima reunión de Madrid MySQL Users Group tendrá lugar el jueves 29 de enero. Ofreceré una presentación sobre el MySQL binlog server y como se puede utilizar para ayudar con la escalada de lecturas a la base de datos y para otros usos. La reunión será en español. Espero veros allí.
Se puede encontrar más detalles en la página del grupo: http://www.meetup.com/Madrid-MySQL-users-group/events/219810484/.
Madrid MySQL Users Group will have its next meeting on Tuesday, the 18th of December. Details can be found on the group’s Meetup page here: http://www.meetup.com/Madrid-MySQL-users-group/events/219081693/. This will be meeting number 10 of MMUG and the last meeting of the year. We plan to talk about MySQL, MariaDB and related things. An excuse to talk about our favourite subject. Come along and meet us. The meeting will be in Spanish. I hope to see you there.
La próxima reunión de Madrid MySQL Users Group tendrá lugar el jueves 18 de diciembre. Se puede encontrar más detalles en la página del grupo:http://www.meetup.com/Madrid-MySQL-users-group/events/219081693/. Esta será nuestra reunión número 10 y la última del año. Hablaremos sobre MySQL, MariaDB y cosas relacionadas. Será una excusa para hablar de nuestro tema favorito. Ven a vernos. La reunión será en español. Espero veros allí.
Madrid MySQL Users Group will have its next meeting on the 20th of November. Details can be found on the group’s Meetup page.
We plan to talk about pstop, which I’ve announced earlier and also the latest changes in MariaDB and MySQL since our last meeting. The meeting will be in Spanish. I hope to see you there.
La próxima reunión de Madrid MySQL Users Group tendrá lugar el jueves 20 de noviembre. Se puede encontrar más detalles en la página del grupo. Hablaremos sobre pstop y los últimos cambios en MariaDB y MySQL desde nuestra última reunión. La reunión será en español. Espero veros allí.
I have been working with MySQL for some time and it has changed significantly from what I was using in 5.0 to what we have now in 5.6. One of the biggest handicap we’ve had in the past is to not be able to see what MySQL is doing or why.
MySQL 5.5 introduced us to performance_schema. It was a good start but quite crude. MySQL 5.6 gave us a significant increase in stuff that allows you to see what is going on inside MySQL. That’s great, except it’s hard to read, the documentation is good, but not oriented at the DBA but more at the MySQL developer (that’s what it seems like at least). So most of us have ignored it. Others complained about the overhead and said it’s not good to use it.
Mark Leith developed mysql-sys as a way to see this great information in a more usable way. It’s only a set of views so doesn’t really have much overhead. However, one thing I missed was getting the information of what was happening inside performance_schema in real-time, top-like, so I could see where a server was busy, and what it was doing. So inspired by mysql-sys and also as a way for me to start playing with go I have built P_S top, or pstop.
You can find it on github here: https://github.com/sjmudd/pstop.
What does pstop show you? It takes some counters from performance_schema and subtracts the values from when it started up. The output is in four different screens which you toggle between using the <tab> key. The idea is to look at the total latency (wait time) and order by table or file that causes it in heaviest first. Table waits are also then split between read, insert, update and delete and there’s a screen which shows some locking information.
Access to the db server is currently via a ~/.my.cnf defaults file. I probably need to make this more sophisticated, and allow the credentials to be provided directly but have not done that yet. I have used this on a couple of systems which I monitor for work and it has been most informative in showing where the load is, which table or file generates it and how that varies over time. This information was already in performance_schema but there have not been any tools to get this out.
Here are a couple of examples:
Latency by table name
pstop 0.0.10 - 21:40:51 xxxxxxxxxxxxx / 5.6.16-log, up 13m 7s [ABS]
Latency by Table Name (table_io_waits_summary_by_table) 10 rows
Table Name Latency %| Fetch Insert Update Delete
wordpress.wp_comments 1.34 s 75.0%| 99.6% 0.4%
wordpress.wp_options 192.48 ms 10.8%| 99.1% 0.9%
wordpress.wp_posts 109.03 ms 6.1%|100.0%
wordpress.wp_commentmeta 54.86 ms 3.1%| 18.2% 81.8%
wordpress.wp_term_relationshi 27.85 ms 1.6%|100.0%
wordpress.wp_usermeta 18.35 ms 1.0%|100.0%
wordpress.wp_postmeta 16.90 ms 0.9%|100.0%
wordpress.wp_term_taxonomy 15.60 ms 0.9%|100.0%
wordpress.wp_terms 8.06 ms 0.5%|100.0%
wordpress.wp_users 2.06 ms 0.1%|100.0%
TOTALS 1.78 s 100.0%| 97.0% 2.9% 0.1%
Operations by table name
pstop 0.0.10 - 21:42:01 xxxxxxxxxxxxx / 5.6.16-log, up 14m 17s [ABS]
Operations by Table Name (table_io_waits_summary_by_table) 10 rows
Table Name Ops %| Fetch Insert Update Delete
wordpress.wp_comments 85.91 k 57.3%|100.0% 0.0%
wordpress.wp_options 25.79 k 17.2%| 99.8% 0.2%
wordpress.wp_term_relationshi 19.07 k 12.7%|100.0%
wordpress.wp_posts 8.37 k 5.6%|100.0%
wordpress.wp_postmeta 2.89 k 1.9%|100.0%
wordpress.wp_term_taxonomy 2.59 k 1.7%|100.0%
wordpress.wp_terms 2.42 k 1.6%|100.0%
wordpress.wp_usermeta 2.10 k 1.4%|100.0%
wordpress.wp_commentmeta 666 0.4%| 88.1% 11.9%
wordpress.wp_users 58 0.0%|100.0%
TOTALS 149.85 k 100.0%| 99.9% 0.1% 0.0%
Latency by filename
pstop 0.0.10 - 21:43:10 xxxxxxxxxxxxx / 5.6.16-log, up 15m 26s [ABS]
File I/O by filename (file_summary_by_instance)
Table Name Latency %| Read Write Misc|Rd bytes Wr bytes| Ops R Ops W Ops M Ops
<redo_log> 7.22 s 47.3%| 0.1% 99.9%| 68.00 k 82.50 k| 240 2.5% 47.5% 50.0%
<ibdata> 6.76 s 44.3%| 0.4% 0.4% 99.2%| 8.53 M 8.05 M| 763 55.0% 36.4% 8.5%
wordpress.wp_comments 1.09 s 7.2%| 83.8% 0.2% 16.0%|104.81 M 28.83 k|393.50 k 66.6% 0.0% 33.3%
wordpress.wp_options 71.20 ms 0.5%| 3.0% 2.4% 94.6%|608.92 k 352.00 k| 100 45.0% 22.0% 33.0%
wordpress.wp_posts 66.68 ms 0.4%| 82.9% 17.1%| 9.52 M | 21.47 k 65.6% 34.4%
wordpress.wp_commentmeta 12.38 ms 0.1%| 58.6% 24.0% 17.4%|207.73 k 57.48 k| 3.82 k 57.8% 8.8% 33.4%
wordpress.wp_usermeta 8.21 ms 0.1%| 14.3% 1.0% 84.7%|369.44 k 16.00 k| 43 69.8% 2.3% 27.9%
<binlog> 5.56 ms 0.0%| 0.9% 76.2% 22.9%| 24.12 k 121.44 k| 178 2.8% 85.4% 11.8%
wordpress.wp_term_relationshi 715.04 us | 72.8% 27.2%|144.64 k | 27 59.3% 40.7%
wordpress.wp_users 621.54 us | 76.0% 24.0%|130.99 k | 26 57.7% 42.3%
TOTALS 15.26 s 100.0%| 6.6% 0.3% 93.1%|125.37 M 8.69 M|421.77 k 66.3% 0.2% 33.5%
You can also see these screen samples here. I think that if you compile and build this and point it to a server of your own you’ll find the output much more interesting.
So please let me know what you think. I hope you find it interesting and useful.
A very old post of mine in 2009, MySQL’s stored procedure language could be so much more Useful suggested that it would be nice if MySQL could be adapted to use compound statements directly from the command line in a similar way to the language used for stored procedures. I’ve just seen that this seems to be possible now in MariaDB 10.1. See the release notes.
I now need to look at this. So thanks, it looks like this feature request is now available.