The differences between IDEMPOTENT and AUTO-REPAIR mode

I posted recently Lossless RBR for MySQL 8.0 about a concern I have about moving to minimal RBR in MySQL 8.0.  This seems to be the direction that Oracle is considering, but I am not sure it is a good idea as a default setting.

I talked about a hypothetical new replication mode lossless RBR and also about recovery after a crash where perhaps the data on the slave may get out of sync with the master. Under normal circumstances this should not happen but in the real world sometimes it does.

Note: I’m talking about an environment that does not use GTID.  GTID is good but may have its own issues and it’s probably best to leave those discussions to another post.

So let us talk about the difference between IDEMPOTENT mode (slave_exec_mode=IDEMPOTENT) and what I’ll call AUTO-REPAIR mode, mentioned in feature request bug#54250 to Oracle in 2010.  By default the DBA wants to avoid any data corruption, so this should be the default behaviour. Thus I’d prefer auto-repair mode to be off by default, stopping replication if any inconsistencies are found. I could enable it if I see such an issue as it should help me recover the state of the database without adding further “corruption” to the slave.

If I’m confident that this procedure works fine and I’m monitoring the counters mentioned below then it may be fine to leave enabled all the time.

A slave fails, it may crash and it recovers. It’s likely that the replication position it “remembers” is behind the actual state in the database.

If we use full RBR (default setting) in these circumstances then we may get in a set of changes which the SQL thread tries to apply.

They’ll be in the form of:

before row image / after row image

before row image / after row image

where each row image is the set of column values prior to and after the row changes.  Traditionally we use the abbreviations BI and AI for this.

Currently the SQL thread will start up and look for the first row to change and once it has found it change it.  If the exact matching conditions it needs can not be found then an error will be generated and replication stops.

IDEMPOTENT mode attempts to address this and tries to “continue whatever the cost”. To be honest I’m not exactly sure what it does, but it’s clear that it will either do nothing or perhaps it might try to find the row by primary key and update that row. I’d expect it probably does nothing.

See a comment later on.  So I did go and check and the comments in slave_exec_mode say that it suppresses duplicate-key and no-key-found errors. There is no mention of updates where the full AI is unavailable. (e.g. when using minimal RBR)

It also looks like it does not “repair” the issue, but it simply ignores it. The documentation is not 100% clear to me.

I made a comment about different options for AUTO-REPAIR mode and when it can work and when it can not. In FULL RBR mode it should always be able to do something. In MINIMAL RBR mode there will be cases when it can not. Let’s see the case of FULL RBR mode:

  1. For an UPDATE when the requested row can not be found:
    • auto-repair mode would INSERT the row. You have a full AI so you can do this safely.
    • A counter should be updated to record this action.
  2. For a DELETE row operation when the row can not be found:
    • auto-repair mode would ignore the error and given the row does not exist anyway the effect of the DELETE has already been accomplished.
    • A counter should be updated to record this action
  3. For an INSERT row operation when the ROW already exists.
    • Duplicate key insert) This is what generally breaks replication.
    • auto-repair mode would treat this as an UPDATE operation (based on the primary key in the table) and ensure the row is changed to have the same primary key and the columns of the AI.
    • Again a counter should be updated to record this action.

In FULL RBR mode these 3 actions should allow replication to continue. The database is no more corrupt than it was before. In fact it’s in a state that’s somewhat better.

In many cases other row events will proceed as expected without issue:  INSERTS will happen, UPDATES and DELETEs to existing rows will work as the row is found, and things will proceed as normal.

So should we get in a situation like this we can check the 3 counters and this gives us a clue as to the number of “repair actions” which MySQL has had to execute.  It also gives us an idea of how inconsistent the slave seems to be, though those inconsistencies should now have been removed.

As I said I can’t remember exactly what IDEMPOTENT mode does in these 3 circumstances.  It may do something similar to my AUTO-REPAIR mode or it may just skip the errors.

Why don’t I know?  Well I’m currently in a plane and the mysql documentation is not provided with my mysql server software and I’m not online so I can’t check.  I used to find the info file or a pdf of the manual quite helpful in such situations and would love to see it put back again so I don’t need to speculate about what the documentation says.

Yes, I could update this text when I’m back online, but I think I’ll make the point and leave this paragraph here.

So with FULL RBR the situation seems to me to be clear. IDEMPOTENT mode may not do the same thing as the AUTO-REPAIR mode, and whether it does or not there are no counters to see the effect it produces on my server. So I’m blind. I do not like that.

Let’s change the topic slightly and now switch to MINIMAL RBR and do the same thing. In theory now IDEMPOTENT mode and AUTO-REPAIR mode may seem to be the same (assuming IDEMPOTENT mode changes what it can) but that’s also not entirely true.

With minimal RBR mode we get a set of  primary key plus changed columns for each row that changes. For INSERTS we get the full ROW and for DELETES we only need the primary key. That should be enough.

What changes here are the UPDATES: as if we don’t get the full row image we can not know what was in the table before. We only have information on the new data.  So other columns which are not mentioned are unknown to us. If we are UPDATING a row and we can not find it, an INSERT is not possible as we do not have enough information to complete the columns that are unknown to us. So replication MUST stop if we want to avoid corruption.

Additionally, with minimal RBR UPDATES even if you find the ROW to UPDATE you can not be sure you are doing the right thing as you have no reference to the content or state of the before image. My thought here was that the ideal thing would be to send with each row a checksum of the row content on the master.  This would be “small” (so efficient) and could be checked against the row content on the slave prior to making the update.  If the values match we know the RBR UPDATE is working on expected data.  This makes a DBA feel more comfortable.

Table definitions on a master and its slaves are not always identical.  There are several reasons for this such as the fact that different (major) versions of MySQL are being used, or simply due to it being impossible to take downtime on the server some sort of out of band ALTER TABLE may have been run on the slave and that change is still pending on the master. The typical case here is adding new columns, or changing the type, width, character set or collation of an existing column. In these circumstances the binary image on the master and slave may well not be the same so the before row image “checksum” on the master would not be usable.  To detect such a situation it may be necessary to also send a table definition checksum with the row before image checksum, though this could be sent for each set of events on a table not each row. The combination of the two values should be enough to allow us to be ensure that minimal RBR changes can be validated even if we do not push down a full before image into the binlog stream. Again, if the definitions do not match it would seem sensible to update a counter to indicate such a situation.  We probably do not want to stop replication in this situation. Those who do not expect any sort of differences between master and slave may be paranoid enough to want to not continue, but I know for my usage I’d like to monitor changes to the counter but probably just continue.

Even my proposed LOSSLESS RBR would need this checksum to be safe as it would not contain the full before image but only the PK + all columns for an UPDATE operation, so potentially “slave drift” might happen and go undetected.

I can see therefore that optionally being able to add to minimal- and lossless-RBR such checksums would be a good way to ensure that replication works safely and pushes out changes to the slaves which are expected, and catches unexpected inconsistencies. 

The additional counters mentioned would help “catch” the number of inconsistencies that take place and they would be good even with the current replication setup when IDEMPOTENT mode is used. This lack of visibility of errors should make most DBAs rather sleepless, but I suspect there are those that are not aware and those that just have to live without that knowledge. Having these extra counters would help us see when things are not the same and allow us to take any necessary action based on that information should it be necessary.

I hope with this post I have clarify why IDEMPOTENT mode is not the same as my suggested AUTO-REPAIR mode and when it’s safe to continue replicating and when it is not under a variety of different conditions which would normally make RBR stop.

It also seems clear to me that MINIMAL RBR would benefit from some additional checksums to allow the DBA to be more confident that the changes being made on the slave match those made on the master.  This is especially so if using minimal RBR.

As always comments and feedback on this post is most welcome.

Lossless RBR for MySQL 8.0?

Lossless RBR

TL/DR: There’s been talk of moving the next release of MySQL to minimal RBR: I’d like to suggest an alternative: lossless RBR

For MySQL 5.8 there was talk / suggestions about moving to minimal RBR as the default configuration (http://mysqlserverteam.com/planning-the-defaults-for-mysql-5-8/).  I’m not comfortable with this because it means that by default you do not have the full data in the binlog stream of the changes to a table. 

The use of minimal RBR is an optimisation, it is done deliberately in busy environments where the size of written data is large and it is not convenient/possible to keep all data.  Additionally the performance of full RBR can in some cases be an issue especially for “fat/wide” tables etc. It is true that minimal RBR helps considerably here.  There are several issues it resolves:

  • reduces network bandwidth between master and slaves
  • reduces disk i/o reading / writing the binlog files
  • reduces disk storage occupied by said binlogs

There was also a comment about enabling IDEMPOTENT mode by default on a slave.

This is a mode which basically ignores most errors. That does not seem wise.  As a DBA by default you want the server to not lose or munge data. There are times when you may decide to forgo that requirement, but the DBA should decide and the default behaviour should be safe.

Thus the idea of lossless RBR came to mind. What would this involve compared to the current modes of FULL or MINIMAL RBR?

  1. INSERTs are unchanged (as now): you get the full row
  2. DELETEs are as per minimal RBR: The primary key is sent and the matching row is removed.  IFF on a slave the pks differed and more than one row would be deleted this should be treated as an error.
  3. UPDATEs: Send the pk + full new image, thus ensuring that all new data is sent. This reduces the event size by ~ 1/2 so would be especially good for fat tables and tables where large updates go through. If the PK columns do not change then it should be sufficient to send the new row image and pk column names etc

Related to this behaviour it would be most convenient to implement an existing FR (bug#69223) to require that table definitions via CREATE/ALTER TABLE MUST HAVE a PK.  I’ve seen several issues where a developer has not thought a primary key was important (they often forget replication) and this would trigger problems. Inserts would work fine but any updates that happened afterwards would trigger a problem, not on the master but on all slaves.  I think that by default this behaviour should be enabled.  There may be situations where it needs to be disabled but they are likely to be rather limited.

This new mode LOSSLESS RBR is clearly a mix between full and minimal and it ensures that data pushed into a slave will always be complete.  think that is a better target to aim for with MySQL 8.0 than the suggested MINIMAL RBR.

You may know that I do not like IDEMPOTENT mode much. I have created several FRs to add counters to “lost/ignored events” so we can see the impact of using this mode (usually it is used after an outage to keep replication going even if this may mean some data is not being updated correctly.  Usually this is better than having a slave with 100% stale data.)

I would really also like to see you adding a “safe recovery mode” where statements which won’t damage the slave more are accepted.

The examples are in bug#54250 but basically include:

  1. INSERTs with duplicate key: convert to UPDATE
  2. DELETEs with row not found: ignore as the data has gone anyway
  3. UPDATEs with non-matching PK: convert to INSERT
  4. UPDATEs with non-matching columns: update what you can.  (This is likely to happen with full RBR as minimal RBR should never generate this type of error.)

[ For each of these 4 states: add counters to indicate how many times this has happened, so we can see if we’re “correcting” or “fixing” errors or not. ]

You’ll notice that lossless RBR would work perfectly with this even after a crash as you’ll have all the data you need, so you’ll never make the state of the database any worse than it was before.

I would like to see the FRs I’ve made regarding improving RBR being implemented as whether lossless RBR becomes a new replication mode or not they would help DBAs both diagnose and fix problems more easily than now.

It is probably also worth noting that FULL RBR is actually useful for a variety of scenarios, for example for exporting changes to other non-MySQL systems.  We miss for this the definition of tables, and current systems need to extract that out of band which is a major nuisance.  Exporting to external systems may not have happened that frequently in the past, but as larger companies use MySQL this becomes more and more important. For this type of system FULL RBR is probably needed even though it may not be used on the upstream master. I would expect that in most cases LOSSLESS RBR would also serve this purpose pretty well and reduce the replication footprint. The only environment that may need traditional FULL RBR is where auditing of ALL changes in a table is needed and thus both the before and after images are required.

Is it worth adding yet another replication mode to MySQL?  That is a good question and it may not be worth the effort.  However the differences between FULL and LOSSLESS RBR should be minimal: the only difference is the amount of data that’s pushed into the binlog so the scope of changes etc should be more limited.  Improving replication performance seems to be a good goal: we all need that, but over-optimising should be considered more carefully.  I think we are still missing the monitoring metrics which help us diagnose and be better aware of issues in RBR and the “tools” or improvements which would make recovery easier. Unless you live in the real world of systems which break it is hard to understand why these “obscure” edge cases matter that much.  The responses like: “just restart mysqld” may make sense in some environments, but really are not realistic in systems that run 24x7x365. With replication it is similar: stopped replication is worse than replication that is working, but where data may not be complete.  Depending on the situation you may tolerate that “incomplete data” (temporarily) while gaining the changes which your apps need to see.  However, it is vitally important to be able to measure the “damage” and that is why counters like the ones indicated above are so vital. It allows you to distinguish 1 broken row, or 1,000,000 and decide on how to prioritise and deal with that as appropriate.

While I guess the MySQL replication developers are busy I would certainly be interested in hearing their thoughts on this possible new replication mode and would definitely prefer it over the suggested minimal RBR as a default for 8.0.  Both FULL and MINIMAL RBR have their place, but perhaps LOSSLESS would be a better default?  What do you think?

MMUG15: MySQL 5.7 & JSON

English: The Madrid MySQL Users Group is pleased to announce its next meeting on February 10th 2016 at 7pm at the offices of Tuenti in Gran Via, Madrid.  Morgan Tocker of Oracle will be visiting to give a talk on MySQL 5.7 and JSON as part of a European tour.  This will give you an an excellent opportunity to learn about the new MySQL version 5.7 and to learn about the new JSON functionality it provides. Further information on the event and registration can be found on the MMUG web page here: http://www.meetup.com/Madrid-MySQL-users-group/events/227495184/.  We look forward to seeing you then.

Español: El grupo Madrid MySQL Users Group tiene el placer de anunciar su próxima reunión el 10 de febrero de 2016 a las 19:00 en las oficinas de Tuenti en Gran Vía, Madrid.   Morgan Tocker de Oracle nos visitará, como parte de una gira Europea, para darnos una charla (en inglés) sobre MySQL 5.7 y JSON.  La charla os dará una oportunidad excelente de aprender sobre la nueva versión 5.7 de MySQL y de aprender sobre la funcionalidad JSON que ofrece. Se puede encontrar más información sobre el evento y como registrar en la página web del MMUG aquí: http://www.meetup.com/Madrid-MySQL-users-group/events/227495184/.  La presentación será en inglés pero habrá oportunidad de hablar o hacer preguntas en español.  Esperamos veros allí.

Is MySQL X faster than MySQL Y? – Ask queryprofiler

When trying out new software there are many other questions you may ask and one of those is going to be the one above. The answer requires you to have built your software to capture and record low level database metrics and often the focus of application developers is slightly different: they focus on how fast the application runs, but do not pay direct attention to the speed of each MySQL query they generate, at least under normal circumstances. So often they are not necessarily able to answer the question.

I have been evaluating MySQL 5.7 for some time, but only since its change to GA status has the focus has switched to check for any remaining issues and also to determine if in the systems I use performance is better or worse than MySQL 5.6.  The answers here are very application and load specific and I wanted a tool to help me answer that question more easily.

Since MySQL 5.6, the performance_schema database has had a table performance_schema.events_statements_summary_by_digest which shows collected metrics on normalised versions of queries. This allows you to see which queries are busiest and gives you some metrics on those queries such as minimum, maximum and average query times.

I used this information and built queryprofiler to allow me to collect these metrics in parallel from one or more servers and thus allow me to compare the behaviour of these servers against each other. This allows me to answer the question that had been nagging me for some time in a completely generic way.  It should also work on MariaDB 10.0 and later though I have not had time to try that out yet.

queryprofiler works slightly differently to just querying P_S once. It takes several collections of the data, computes deltas between each collection thus allowing you to know things like the number of queries per second which events_statements_summary_by_digest does not tell you. (There is no information in performance_schema telling you when the collections start. That is something I miss and would like to see fixed in MySQL 5.8 if possible.)

The other difference of course is that P_S gives you information on one server. If you collect the information at the same time from more than one server with a similar load then the numbers you get out should be very similar and that is what queryprofiler does.

How do you use queryprofiler?  Provide it with one or more Go-style MySQL DSNs to connect to the servers and optionally tell it how many times to collect data from the servers (default: 10) and at what interval (default: every second) and it will run and give you the results, telling you the top queries seen (by elapsed time of the query) and the metrics for each server (queries per second, average query latency and how much these values vary).

A couple of examples of the output can be found here:

Hopefully you will find this tool useful.  Feedback and patches to improve it are most welcome.

A Couple of MySQL 5.7 gotchas to be aware of

MySQL 5.7 GA was released a couple of months ago now with 5.7.9 and 5.7.10 has been published a few days ago.  So far initial testing of these versions looks pretty good and both versions have proved to be stable.

I have, however, been bitten by a couple of gotchas which if you are not aware of them may be a bit of a surprise. This post is to bring them to your attention.

New MySQL accounts expire by default after 360 days

This is as per documentation, so there is no bug here. MySQL 5.7 provides a new more secure environment. One of the changes is to add password expiry and the default behaviour is for passwords expire after 360 days.  This seems good, but you, perhaps like me, may not be accustomed to managing your passwords, checking for expiration and adjusting the MySQL user settings accordingly.  The default setting of default_password_lifetime is 360 days, so after upgrading a server to MySQL 5.7 from MySQL 5.6 this setting suddenly comes to life. The good thing is nothing happens immediately so you do not see the time bomb ticking away. I had have been testing the DMR versions of MySQL 5.7 earlier to the GA release and consequently using it for longer than 2 months.  Recently a couple of 5.7.9 servers which had been upgraded from 5.6 a year ago decided to block access to all applications at the same time.  The quick fix is simple: change the default setting to 0 (no expiry) and we have a configuration that behaves like MySQL 5.6 even if it less secure than the default MySQL 5.7 setup. We can then look at how to manage the MySQL accounts and take this new setting into account in a more secure manner.  If you are starting to use MySQL 5.7 and are not migrating from 5.6 then perhaps you’ll put in the right checks in place when you start, but those of us migrating from 5.6 can not push down grants with the new ALTER USER syntax until the 5.6 masters are upgraded so we need to pay more attention to this while in the progress of migration.

New range optimizer setting might cause unexpected table scans if not set properly

MySQL 5.7.9 GA added a new configuration variable: range_optimizer_max_mem_size, set by default to 1536000. The documentation does not say much about this new setting and seems quite harmless. “if … the optimizer estimates that the amount of memory needed for this method would exceed the limit, it abandons the plan and considers other plans.”  The range optimiser is used for point selects, primary key lookups and other similar queries.  What this setting does is after parsing a query look at the number of items which may be referenced in a WHERE clause and if the memory usage is too high fall back to a slower method.

Let’s put this into context. A query like SELECT some_columns FROM some_table WHERE id IN ( 1, 2, 3, ... big list of ids ... 99998, 99999 ) will trigger this limit being reached for a large enough range of ids. DELETE FROM some_table WHERE (pk1 = 1 AND pk2 = 11) OR (pk1 = 2 AND pk2 = 12) .. OR .. (pk1 = 111 AND pk2 = 121) /* pk1 and pk2 form a [primary] key */ would also potentially trigger this.

The questions that come out of this are (a) “How to figure out the point at which this change happens?”, and (b) “What happens at this point?”

The answer to (b) is simple: MySQL falls back to doing a table scan (per item). The answer to (a) is not so clear. Bug#78752 is a feature request to make this clearer, and further investigation pointed to MySQL 5.6’s previous behaviour where the limit was defined in terms of a fixed number of hard-coded “items” (16,000), whereas 5.7’s new behaviour is in terms of memory usage.  The relationship between the two settings is not very clear and initial guestimates on systems I saw issues with seems to indicate that maybe 4kB per item is used by MySQL 5.7 at the moment. The point is that what worked quickly as point selects on 5.6 may fall back to table scans per item in 5.7 if the number of entries is too high, and this would require a reconfiguration (it is dynamic) of the configuration setting mentioned. The bad behaviour may also only happen depending on the size of the query.

Many people may wonder why anyone would be mad enough to use a SELECT or DELETE statement with several thousand entries in an IN () clause, but this comes from having split data in a single server into two and making the application find a list of ids from one server using some criteria and then using the ids obtained in a different one. I see that pattern used frequently and it is probably a common pattern on any system where data will no longer fit in a single server.

The problem with this particular change in behaviour is that point selects are very fast and efficient in MySQL. People use them a lot. Table scans are of course really slow, so depending on the query in question performance can change from ms to minutes just because your query is a tiny bit bigger than the new threshold. In practice it looks like the old hard-coded limit and the new dynamic limit are at least an order of magnitude different in size so it is quite easy to trip up on good queries in 5.6 failing miserably in 5.7 without a configuration change. Again while migrating from MySQL 5.6 to 5.7 you may see this change bite you.

You may get caught by either of these issues. I got caught by both of them while testing 5.7 and while the solutions to resolve them are quite simple to fix they do require a configuration change to resolve the issue. I hope this post at least makes you recognise them and know where to poke so you can make your new 5.7 servers behave properly again.

MMUG14: MySQL Automation at Facebook

English: Madrid MySQL Users Group will be holding their next meeting on Tuesday, 10th November at 19:30h at the offices of Tuenti in Madrid. David Fernández will be offering a presentation “MySQL Automation @ FB”.  If you’re in Madrid and are interested please come along. We have not been able to give much advance notice so if you know of others who may be interested please forward on this information.  Full details of the MeetUp can be found here at the Madrid MySQL Users Group page.

Español: El día 10 de noviembre a las 19:30 Madrid MySQL Users Group tendrá su próxima reunión en las oficinas de Tuenti en Madrid.  David Fernández nos ofrecerá una presentación (en inglés) “MySQL Automation @ FB”.  Si estás en Madrid y interesado nos gustaría verte.  No hemos podido avisar con mucha antelación, así que si conoces a otros que podrían estar interesados agradeceríamos les hagas llegar esta información. Se puede encontrar los detalles completos de la reunión aquí en la página de Madrid MySQL Users Group.

MMUG13: Practical MySQL Optimisation y Galera Cluster presentations

English: Madrid MySQL Users Group will be holding their next meeting on 17th June at 18:00h at EIE Spain in Madrid. Dimitri Vanoverbeke and Stéphane Combaudon from Percona will be offering two presentations for us:

  • Practical MySQL optimisations
  • Galera Cluster – introduction and where it fits in the MySQL eco-system

I think this is an excellent moment to learn new things and meet new people. If you’re in Madrid and are interested please come along.  More information can be found here at the Madrid MySQL Users Group page.

Español: El día 17 de junio a las 18:00 Madrid MySQL Users Group tendrá su próxima reunión en las oficinas de EIE Spain.  Dimitri Vanoverbeke y Stéphane Combaudon de Percona nos ofrecerá dos presentaciones (en inglés):

  • Practical MySQL optimisations
  • Galera Cluster – introduction and where it fits in the MySQL eco-system

Creo que será una oportunidad excelente para aprender algo nuevo y para conocer gente nueva. Si estás en Madrid y interesado nos gustaría verte.  Se puede encontrar más información aquí en la página de Madrid MySQL Users Group.

 

MMUG12: Talk about Percona Toolkit and the new features of MySQL 5.7

Madrid MySQL Users Group is having a Meetup this afternoon, Wednesday, 13th May at 19:00.

  • I will be presenting (in Spanish) a quick summary of Percona Toolkit and also offering a summary of the new features in MySQL 5.7 as the release candidate has been announced and we don’t expect new functionality.
  • This is also an opportunity to discuss other MySQL related topics in a less formal manner.
  • You can find information about the Meetup here.

So if you are in Madrid and are interested please come along.

El Madrid MySQL Users Group tiene una reunión esta tarde, miércoles 13 de mayo, a las 19:00.

  • Ofreceré una presentación sobre Percona Toolkit y un resumen de las características nuevas de MySQL 5.7 que recientemente se anunció como Release Candidate. Ya no esperamos cambios en su funcionalidad.
  • También habrá una oportunidad de hablar de otros temas relacionados con MySQL de una manera menos formal.
  • Se puede encontrar información de la reunión aquí.

Si estás en Madrid y te interesa estarás bienvenido.

 

new to pstop – vmstat style stdout interface

In November last year I announced a program I wrote called pstop. I hope that some of you have tried it and found it useful. Certainly I know that colleagues and friends use it and it has proved helpful when trying to look inside MySQL to see what it is doing.

A recent suggestion provoked me to provide a slightly different interface to pstop, that is rather than show the output in a terminal-like top format, provide a line-based summary in a similar way to vmstat(8), pt-diskstats(1p) and other similar command line tools.  I have now incorporated some changes which allow this to be done. So if you want to see every few seconds which tables are generating most load, or which files have most I/O then this tool may be useful. Example output is shown below:

Hopefully this gives you an idea.  The --help option gives you more details. I have not yet paid much attention to the output and the output is not currently well suited for a tool to parse, so I think it’s likely I will need to provide a more machine readable --raw format option at a later stage.  That said feedback on what you want to see or patches are most welcome.

MMUG11: Talk about binlog servers at Madrid MySQL Users Group meeting on 29th January 2015

Madrid MySQL Users Group will have its next meeting on Thursday, the 29th of January.

I will be giving a presentation on the MySQL binlog server and how it can be used to help scale reads and be used for other purposes.  If you have (more than) a few slaves this talk might be interesting for you.  The meeting will be in Spanish. I hope to see you there.

Details can be found on the group’s Meetup page here: http://www.meetup.com/Madrid-MySQL-users-group/events/219810484/

La próxima reunión de Madrid MySQL Users Group tendrá lugar el jueves 29 de enero. Ofreceré una presentación sobre el MySQL binlog server y como se puede utilizar para ayudar con la escalada de lecturas a la base de datos y para otros usos. La reunión será en español. Espero veros allí.

Se puede encontrar más detalles en la página del grupo: http://www.meetup.com/Madrid-MySQL-users-group/events/219810484/.

MMUG10: Madrid MySQL Users Group meeting to take place on 18th December 2014

Madrid MySQL Users Group will have its next meeting on Tuesday, the 18th of December. Details can be found on the group’s Meetup page here: http://www.meetup.com/Madrid-MySQL-users-group/events/219081693/.  This will be meeting number 10 of MMUG and the last meeting of the year. We plan to talk about MySQL, MariaDB and related things. An excuse to talk about our favourite subject. Come along and meet us.  The meeting will be in Spanish. I hope to see you there.

La próxima reunión de Madrid MySQL Users Group tendrá lugar el jueves 18 de diciembre. Se puede encontrar más detalles en la página del grupo:http://www.meetup.com/Madrid-MySQL-users-group/events/219081693/.  Esta será nuestra reunión número 10 y la última del año. Hablaremos sobre MySQL, MariaDB y cosas relacionadas.  Será una excusa para hablar de nuestro tema favorito.  Ven a vernos. La reunión será en español.  Espero veros allí.

MMUG9: Madrid MySQL Users Group meeting to take place on 20th Noevember 2014

Madrid MySQL Users Group will have its next meeting on the 20th of November. Details can be found on the group’s Meetup page.

We plan to talk about pstop, which I’ve announced earlier and also the latest changes in MariaDB and MySQL since our last meeting.  The meeting will be in Spanish. I hope to see you there.

La próxima reunión de Madrid MySQL Users Group tendrá lugar el jueves 20 de noviembre. Se puede encontrar más detalles en la página del grupo.  Hablaremos sobre pstop y los últimos cambios en MariaDB y MySQL desde nuestra última reunión.  La reunión será en español.  Espero veros allí.

pstop – a top-like program for MySQL (based on performance_schema)

I have been working with MySQL for some time and it has changed significantly from what I was using in 5.0 to what we have now in 5.6. One of the biggest handicap we’ve had in the past is to not be able to see what MySQL is doing or why.

MySQL 5.5 introduced us to performance_schema. It was a good start but quite crude. MySQL 5.6 gave us a significant increase in stuff that allows you to see what is going on inside MySQL. That’s great, except it’s hard to read, the documentation is good, but not oriented at the DBA but more at the MySQL developer (that’s what it seems like at least). So most of us have ignored it. Others complained about the overhead and said it’s not good to use it.

Mark Leith developed mysql-sys as a way to see this great information in a more usable way. It’s only a set of views so doesn’t really have much overhead. However, one thing I missed was getting the information of what was happening inside performance_schema in real-time, top-like, so I could see where a server was busy, and what it was doing. So inspired by mysql-sys and also as a way for me to start playing with go I have built P_S top, or pstop.

You can find it on github here: https://github.com/sjmudd/pstop.

What does pstop show you?  It takes some counters from performance_schema and subtracts the values from when it started up. The output is in four different screens which you toggle between using the <tab> key.  The idea is to look at the total latency (wait time) and order by table or file that causes it in heaviest first.  Table waits are also then split between read, insert, update and delete and there’s a screen which shows some locking information.

Access to the db server is currently via a ~/.my.cnf defaults file. I probably need to make this more sophisticated, and allow the credentials to be provided directly but have not done that yet.  I have used this on a couple of systems which I monitor for work and it has been most informative in showing where the load is, which table or file generates it and how that varies over time.  This information was already in performance_schema but there have not been any tools to get this out.

Here are a couple of examples:

Latency by table name

Operations by table name

Latency by filename

You can also see these screen samples here. I think that if you compile and build this and point it to a server of your own you’ll find the output much more interesting.

So please let me know what you think. I hope you find it interesting and useful.

MariaDB 10.1 Brings Compound Statements

A very old post of mine in 2009, MySQL’s stored procedure language could be so much more Useful suggested that it would be nice if MySQL could be adapted to use compound statements directly from the command line in a similar way to the language used for stored procedures. I’ve just seen that this seems to be possible now in MariaDB 10.1. See the release notes.

I now need to look at this. So thanks, it looks like this feature request is now available.

Making MySQL Better More Quickly

With the upcoming release of MySQL 5.7 I begin to see a problem which I think needs attention at least for 5.8 or whatever comes next.

  • The GA release cycle is too long, being about 2 years and that means 3 years between upgrades in a production environment
  • More people use MySQL and the data it holds becomes more important. So playing with development versions while possible becomes harder.  This is bad for Oracle as they do not get the feedback they need to adjust the development of new features and have to best guess the right choices.
  • Production DBAs do want new features and crave them if it makes our life easier, if performance improves, but we also have to live in an environment which is sufficiently stable.  This is a hard mixture of requirements to work with.
  • In larger environments the transition from one major version to another, even when automated can take time. If any gotcha comes along then it may interrupt that process and leave us with a mixed environment of old and new, or simply in the state of not being able to upgrade at all.  Usually that pause may not be long but even new minor versions of MySQL are not released that frequently so from getting an issue fixed to seeing it released and then upgrading all servers to this new version is again another round of upgrades.

I would like to see Oracle provide new features and make MySQL better. They are doing that and it is clear that since I have been using 5.0 professionally up to the current 5.7 a huge amount has changed. The product is much more stable and performs much better, but my workload has also increased so I am still looking for more features and an easier life. I am an optimist that is for sure.

One issue that I believe holds back earlier experimentation is that MySQL is not modular. Even the engines that you can use in it, if built as plugins, do not seem to be switchable from one minor version to another.

This leads to 2 issues:

  • any breakage or bug (and all software has bugs, that is inevitable) requires you when it is fixed to upgrade to a new version. That new version has changes in many different components. Sometimes that is fine but sometimes that may bring in new bugs which cause their own problems
  • potentially the developers of MySQL could replace a “GA module” with a more experimental version of that module which maybe has more features, could perform better but maybe breaks. Changing a single module is hopefully much safer than changing a full binary for a development version, and that should be much easier to do on spare machines. A module such as this would be something I could much more easily test than installing 5.7.4 on lots of machines.

However, the problem is that MySQL is not modular and that is where several people have explained to me my madness and how hard it is to achieve things like this. My current employer likes to push out changes in small chunks, look at the result of those small changes and then if they seem good, go ahead and do more. If something goes wrong, back it out and look elsewhere to do things. Doing the same on a database server not designed that way may well be hard, but making small changes along these lines would I think longer term help improve things and give the people that use a GA MySQL the opportunity to try out new ideas, give feedback quickly and allow things to evolve.

Inevitably when you start to build interfaces like this some interfaces need to change to allow to allow for a larger redesign of the innards of a system. That is fine, when it happens we’ll move over to that and a DEV version will have these new much improved features and we may have to wait longer for that.

What modules might I be talking about when I talk about modularising MySQL?  I’ll agree I do not know the code other than having glanced at it on several occasions but there are some quite clear functional parts to MySQL:

  • the engines have often been plugins, though now InnoDB is a bit of an exception. I still wonder if that is necessary whatever MySQL’s design.  However these plugins do not seem to have a completely clear interface with MySQL as I have seen plugins for example for something like Spider or TokuDB which work for a specific MySQL or MariaDB version. That just shows that whatever this interface is it is not designed to be stable and swappable between different MySQL minor versions.  Doing something to make that better would mean that people who build a new engine can build it once for a a major version and know that on binaries built the same way the files they produce should just plug in without issue unchanged. Me dreaming? Perhaps but no-one worries if I upgrade my db4 rpm from 4.7.25 to 4.7.29 that all the applications that use it will break: the expectation is clear: it should not make any difference at all. Why does something like this not work with MySQL engine code?
  • logging has been rather inconsistent for a long time. I think it may improve in 5.7, but however it’s built, build it as a module. If I want to replace that module with something new that stores all my log data in a Sybase or DB2 database MySQL should not care, assuming the module does the right thing and there are settings to configure this appropriately.  The point being also that if there is a bug in the logging, the bug can be fixed and the module replaced with a bug-free version, without necessarily requiring me to upgrade the whole server.
  • Replication is generally split into 2 parts: the writing to binlogs and the reading of those binlogs from a master, storing them locally and reloading the relay logs and processing them.
    • I have seen bugs in replication, mainly in the more complex SQL thread component where the same change could potentially apply. Swap out the module for a fixed one.
    • MySQL 5.6 was supposed to make life great with replication and we would not get stuck in a situation where a crashed server would come up, out of sync with its master, and because of that we would need to reclone the server again. Even when moving over to using the master_info_repository and relay_log_info_repository settings to TABLE you can have issues. The quick fix implemented by Oracle of relay_log_recovery = 1 sounds great. It is a quick, cheap and cheerful solution which works assuming you never have delayed slaves.  Different environments I maintain do not follow this pattern and I have servers with a deliberate multi-hour delay, which can be useful for recovering from issues. Also copying large databases between datacentres may take several days, triggering after starting the system a need to pull logs and process them for several days. A mistaken restart would lose all that data and require it to be downloaded again which is costly. So I have discussed with colleagues a theoretical improved behaviour of the I/O thread should MySQL crash but there is no way to test it on boxes I currently use. Making the I/O thread into a module would make it much easier to try out different ideas on GA boxes to show whether these ideas are really workable or not.
  • The query parser and optimiser in MySQL is supposed to be a horrendous beast that everyone must keep clear of.  Improvements are happening and posts like this are an indication of progress. My understanding is that this beast is spread all over the server code and thus hard to untangle but certainly from a theoretical point of view doing so would allow alternative optimisers to be usable/pluggable, and for example different optimisers might be better at handling workloads such as batch based workloads with sub queries and such which MySQL is known not to handle well, but which for certain workloads could potentially make a great deal of difference to us all.  The MySQL of 5.0 is quite different from the MySQL of today and sharding is the norm, but that requires help from the app to do all the dirty work. Other options are to use something like Vitess, ScaleBase, or Spider, or some built-in new module which knows about this type of thing better and can do this sort of stuff transparently to the application. MySQL Fabric tries to do this at the application level and that’s fine, but it adds much more complexity for the application developers who probably should not really have to worry (too much) about this type of detail.  So solving the problem is not the issue here, it’s providing hooks to let others try, or simply to swap out version 1 with version 10, and see if version 10 is better and faster, with everything else unchanged.
  • The handling of memory in MySQL has always been interesting to us all. Each engine has traditionally managed the memory it needs itself and there is no concept of sharing, or memory pressure, all of which can lead to sudden memory explosions due to a changing workload which may kill mysqld (Linux OOM) or trigger swapping (database servers should never swap…). I have seen in 5.7 that there is now some memory instrumentation and this at least allows looking to see where memory is used. The next step would be to use the same memory management routines, and finally perhaps to add this concept of memory pressure allowing a large query if needed to page out or reduce the size of the innodb buffer pool while it is running, or the heavy use of some MyISAM or Aria tables could do the same.  Doing that is hard, but we are no longer using a MySQL “toy” database. Many large billion $ companies depend on MySQL so this sort of functionality would be most welcome there I am sure.  Changes in this area would certainly need to be done cautiously but I can envisage swapping out the default 5.8 memory manager for a “new feature” 5.9 version with all the “if it breaks you keep the bits” warnings attached, allowing us to see if indeed problematic memory behaviour is resolved by this new module.
  • The event scheduler is in theory a small and tiny component which does it’s thing.  An early version of 5.5 had some bugs and I had to wait a long time to upgrade the server just to fix this pesky event_scheduler module which all it does is send out heartbeat changes used for measuring replication delay.  Had this been a module I could have installed a fixed version and not had to use a work around for several months.

I am sure there are lots of other components of MySQL which could receive the same treatment.

Making these sort of changes is of course a huge project and most managers do not see the gain of this, certainly not short term.  However, if care is taken and as different subsystems are modified there is an opportunity for making progress and allowing the sort of experimentation I describe.  Also, and while Oracle may not see it this way, having a clearer interface and more modular framework would allow others to perhaps try different things, and replace a module with their own.  Oracle do seem to be putting a lot of resources into MySQL and that is good, but they do not have infinite resources and they can not solve specialised or every need that we might see. Making it easier, for those who can, to use this hypothetical modular framework, provides an opportunity for some things to be done which can not be done now.  Add a bounty feature and let people pay for that and where something is modularised it will be much easier for them to try to solve problems that may come up. In any case, later testing will be easier if these interfaces exist.

This is the way I would like to see MySQL improve, notice I do not actually talk about functional improvements, but how to make it potentially easier to experiment and test these new features. This sort of design change would allow those of us that need new features now to test and perhaps include them in our GA versions. Maybe then the definition of GA will become rather vague if I am using 5.7.10 + innodb 5.8.1 + io_thread_5.8.3 + sql_thread_5.8.6 + event_scheduler….. Support will probably hate the suggestion I have just made as it would potentially make their life more challenging, but then again I do not see most people playing this game. It is meant for those of us who need it, and if not needed at all bug fixing specific issues should be much easier than now, where you need to do a full new test on a new version to make sure you do not catch another set of new bugs.

If you have got to the end of this thanks for reading. I need to learn to write less but I do believe that the reasoning I make above makes a lot of sense. This can only be done with small changes and with people seeing the idea and trying it out, and at least initially doing it on parts of the system which are easy to do. If they work further progress can be made.

Oracle and MariaDB both want feedback and ideas of where we want MySQL / MariaDB to go.  Independently of some of the technical aspects of new features and improvements this is my 2 cents of one thing I would like to see and why.

Does it make sense?

MariaDB 10.0 upgrade goes smoothly

I have been meaning to update some systems to MariaDB 10.0 and finally had a bit of time to get around to that.  The documentation of specifics of what’s needed to go from MariaDB 5.5 to 10.0 can be found here and while it’s not very long it seems there’s little to actually do.

Having already upgraded some servers from MySQL 5.5 to 5.6 the process and appropriate configuration changes were very similar so all in all a rather non event.

One thing which is always a concern if systems can not be down for long is the time to do the upgrade. While you see many blog posts talking about taking a backup via mysqldump and then loading it all back this is not really an option on many systems I manage and a replacement of binaries, adjustment of /etc/my.cnf  and restart of the server with the new binaries followed by running mysql_upgrade is what I usually do.  That usually works fine.

One server I had to upgrade had quite a few files (15,000) and the database occupied 2 TB. The run time of mysql_upgrade on such a system takes a couple of hours of checking tables after which the actual system changes take almost no time at all. So that is something to be aware of if your dataset is similar.

It seems I was confused by Colin’s post on performance_schema being disabled in MariaDB 10.0.12 and later. It seems that actually it’s disabled on startup, so can be easily configured if desired by setting performance_schema = 1 in /etc/my.cnf prior to starting mysqld. I had thought that it was not compiled into the binaries at all, which is what’s been done in WebScaleSQL. That is not the case.

There’s a lot of talk about performance_schema overhead, even recently from colleagues, so this is a subject which needs looking at in more detail and if there is indeed an unacceptable overhead then that needs looking at. I’m sure Oracle or MariaDB would appreciate reports of specific issues as otherwise there’s just too much fud out there.

Anyway the MariaDB 10.0 servers I upgraded did have p_s and my configuration enabled it. That’s nice as now I can see where some of the load and performance points are, and had thought that would not be possible. I also tried Mark Leith’s mysql_sys and was not sure if it would work in MariaDB 10.0 but a quick look seems to indicate it does which is helpful.

The views in this sys schema are very useful but some care is needed when using them as several tables are joined in performance_schema and if the number of rows involved is high, and given performance_schema has no indexes, this can make queries be very costly, taking minutes to run. That’s not been mentioned, so take a little care. Under normal usage this does not seem to be an issue, it really depends on different use cases. Longer term I think that P_S will need indexes, even if these are only “memory tables” …

The upgrade went smoothly, so now it’s time for me to check the new MariaDB 10.0 features and see how they fare.

 

Time to get some 128-bit types into MySQL?

I think that getting 128-bit types into MySQL would be good. There are a few use cases for this and right now we have to work around them. That should not be necessary.  While not essential they would make things easier.

The headline is easy to understand, but is this really needed?

First we need to look to see where this might be used. I can think of three different 128-bit types which are missing at the moment:

  • IPv6 addresses
  • uuid values
  • a bigger value than (signed) bigint [64-bit numbers]

IPv6 Addresses

IPv6 addresses are 128-bit numbers, and having a native way to store them would be really helpful. Given this also includes an IPv4 representation then for those people who store IP addresses (client connections and other things) such a native type would be much better than the typical unsigned int or binary(4) which you might be using now. Is this an IPv4 address? Well it might be, but it also might not be.  The same applies to IPv6, and having a real IPv6 type makes this knowledge more explicit.

MySQL already provides support routines for IPv4 (even if the type does not exist) such as INET_ATON(), INET_NTOA() so a similar set of routines would be needed to support this type, converting between their text and numeric representation and also for converting between IPv4 and IPv6.

UUID Values

MySQL itself uses UUID values in 5.6 and above as the server_uuid, but it’s stored or seems to be as a string. Other software (MEM is a good example) also uses UUID values in various places.

Have a look on search engines for MySQL and UUID and you see lots of questions on how to best store these values in MySQL. So there is already a demand for this, and no good answers as far as I can see.

One common concern I have currently when storing such values as binary(16) is that the values are hard to visualise, especially if used as a primary key, and also from the DBA’s point of view who may want to “manually” access or modify data  it is not possible to do something similar to SELECT name FROM servers WHERE uuid = ‘cd2180ae-9b94-11e2-b407-e83935c12500’, as this just does not work. Casting could make this work magically but right now it’s much harder than it should be.  There is not a single UUID format but the basics are the same and if we had a uuid format any supporting routines (which would be needed) would be able to convert as needed.

Signed or unsigned integers

Yes, the (signed or unsigned) bigint type gives us 64-bits and that allows for huge numbers but one size bigger matches the use cases above, so it’s good to be able to convert between them depending on the usage.  That is if we’re going to have IPv6 and UUID type values, it makes sense to allow an integer equivalent representation and sometimes this might be needed when stripping out parts of a uuid, or parts of an IPv6 address.  The name of this type should be something a little better than we’ve seen before so hugeint (unsigned) would not be what I would suggest. Something as simple as int128 (unsigned) would be much easier to understand.

Conversion routines

Each of the three types above need routines to support their “native” usage and probably converting from / to numeric or text representations of the value.  Given the three types have the same size then it may also be useful to convert from one format to another. The actual content would not change, just it’s representation. Included with this would be a BINARY(16) so that people who might have had to use other MySQL times to represent these values have an easy way to convert more explicitly to them and if for any reason a conversion back is needed this is also possible.

ALTER TABLE should be aware of these equivalents too so if I have a table defined with a BINARY(16) I can convert it to an IPv6 address/type as a no-op operation (definition only change), in a similar way as can be done with some other conversions (ENUM being a common type that changes but if you add a new value there’s no need to check the table for existing values as the old definition was a subset of the new one).

No incompatible changes in minor versions please

A change such as this can not reasonably be added as a minor version change as if we would break many things.  Minor versions should really, really only included bug fixes, or performance improvements, and if a new feature really has to be added by default it must be disabled (for compatibility) and enabled with some sort of special option. Given there’s no agreed way to do this and it is likely to cause all sorts of issues, just do not do it.

That means that a feature such as this can only be added in a new version such as MySQL 5.7 or MariaDB 10.1 both of which are DEV versions, and so allowed to change in any way their authors deem reasonable. I have seen no indication of 5.7 including this functionality and given the time that 5.7 has been about I am inclined to think that an extra change such as this is unlikely to make it there. So MySQL 5.8 then? MariaDB 10.1 development has not been ongoing for that long so maybe such a feature might be considered there.

In the end we do need these new features and long lead times to make them available is a considerable source of frustration for those of us who have a number of systems to upgrade.  One thing is a new version going GA, but it’s something else to have all systems upgraded to use that version and thus make it available to developers.

Whatever happens it would be really helpful if the different “MySQL vendors” talk to each other, if they agree that this is a sensible path to take. Having various different interpretations of how these new types should be stored, converted and which associated functions etc are needed would be a user or developer’s nightmare. I understand there is competition, but for something like this it is really important to get it right.  The first implementor of such a feature would potentially have an advantage over the others but I would expect usage of this type of data types to be quite popular so agreeing generally on what to do should not be that hard and avoids the different forks from drifting off further apart, something which I think is bad for everyone concerned.

Conclusion

Some people I have spoken share the opinion that having such a set of 128-bit types would be good. It is something else of course to implement that.  For those looking for new features to develop in MySQL this is one which in theory is not absolutely necessary but which I think would not only be popular but would be used.  In the end MySQL is there to store data, and make it easy to retrieve and it seems clear to me that this type of data is one such usage which while it can be handled differently would really welcome “native” support. I hope that this will happen sometime soon.

Update 2014-07-03

MariaDB seems to have some support on its way for this. Referenced on maria-developers on 1st July, details can be found here:

If a plugin type is available for IPv4 that might be good as well.

This looks like work in progress and there’s no mention of a 128bit (unsigned) int, or how to convert between different values, but this looks like a good start. In fact if it’s possible to make these types available via a plugin interface this does seem to add the possibility of adding new special types even once MariaDB is working, so it makes it easier to expand functionality later.

In terms of routines that probably should be available in MySQL to support some of these types the following stand out:

  • INET_PTON() and INET_NTOP() to supplement the existing INET_ATON() and INET_NTOA() functions.
  • GETADDRINFO() and GETNAMEINFO() to convert between IPv4 or IPv6 addresses and names. existing INET_ATON() and INET_NTOA() functions.
  • Something like  UUID_LONG() to generate a 128-bit numeric equivalent of  UUID(), and functions to convert a text-based uuid into a number and a something to convert back again, STRING_TO_UUID() and UUID_TO_STRING() unless there already exists some standard function name for these tasks.

I think all of these look like useful routines to go with the types above. I’ll add more as I think of them.

Update 2014-07-11

I also see this very old bug referenced in the mysql bug list: http://bugs.mysql.com/bug.php?id=15940

webscalesql-5.6.17.69 RPMs available for CentOS 6

A new commit b955fd46ee60b134c6935badb43eb838872cfbbf was pushed out to the webscalesql-5.6 so I’ve built some updated RPMs using my webscalesql-rpm scripts.  The new binaries if you want to try them can be found at http://ftp.wl0.org/webscalesql/.

The rpms are:

Again these packages are work in progress, but feedback is welcome.

MMUG7: Madrid MySQL Users Group meeting to take place on 24th April 2014

Madrid MySQL Users Group will have its next meeting on the 24th of April. Details can be found on the group’s Meetup page.

We plan to talk about WebScaleSQL and I will give a short presentation on how to build WebScaleSQL RPMs on CentOS 6.  The meeting will be in Spanish.

We’ve changed the place that we’ll be holding the meeting. See the Meetup URL for details. Looking forward to seeing you there.

La próxima reunión de Madrid MySQL Users Group tendrá lugar el jueves 24 de abril. Se puede encontrar más detalles en la página del grupo.  Hablaremos sobre WebScaleSQL y ofreceré una breve presentación sobre como construir RPMS de WebScaleSQL para CentOS 6.  La reunión será en español.

Hemos cambiado el lugar donde se ubicará la reunión. Mirar la URL del Meetup para más detalles. Esperamos veros allí.