MMUG15: MySQL 5.7 & JSON

English: The Madrid MySQL Users Group is pleased to announce its next meeting on February 10th 2016 at 7pm at the offices of Tuenti in Gran Via, Madrid.  Morgan Tocker of Oracle will be visiting to give a talk on MySQL 5.7 and JSON as part of a European tour.  This will give you an an excellent opportunity to learn about the new MySQL version 5.7 and to learn about the new JSON functionality it provides. Further information on the event and registration can be found on the MMUG web page here:  We look forward to seeing you then.

Español: El grupo Madrid MySQL Users Group tiene el placer de anunciar su próxima reunión el 10 de febrero de 2016 a las 19:00 en las oficinas de Tuenti en Gran Vía, Madrid.   Morgan Tocker de Oracle nos visitará, como parte de una gira Europea, para darnos una charla (en inglés) sobre MySQL 5.7 y JSON.  La charla os dará una oportunidad excelente de aprender sobre la nueva versión 5.7 de MySQL y de aprender sobre la funcionalidad JSON que ofrece. Se puede encontrar más información sobre el evento y como registrar en la página web del MMUG aquí:  La presentación será en inglés pero habrá oportunidad de hablar o hacer preguntas en español.  Esperamos veros allí.

Is MySQL X faster than MySQL Y? – Ask queryprofiler

When trying out new software there are many other questions you may ask and one of those is going to be the one above. The answer requires you to have built your software to capture and record low level database metrics and often the focus of application developers is slightly different: they focus on how fast the application runs, but do not pay direct attention to the speed of each MySQL query they generate, at least under normal circumstances. So often they are not necessarily able to answer the question.

I have been evaluating MySQL 5.7 for some time, but only since its change to GA status has the focus has switched to check for any remaining issues and also to determine if in the systems I use performance is better or worse than MySQL 5.6.  The answers here are very application and load specific and I wanted a tool to help me answer that question more easily.

Since MySQL 5.6, the performance_schema database has had a table performance_schema.events_statements_summary_by_digest which shows collected metrics on normalised versions of queries. This allows you to see which queries are busiest and gives you some metrics on those queries such as minimum, maximum and average query times.

I used this information and built queryprofiler to allow me to collect these metrics in parallel from one or more servers and thus allow me to compare the behaviour of these servers against each other. This allows me to answer the question that had been nagging me for some time in a completely generic way.  It should also work on MariaDB 10.0 and later though I have not had time to try that out yet.

queryprofiler works slightly differently to just querying P_S once. It takes several collections of the data, computes deltas between each collection thus allowing you to know things like the number of queries per second which events_statements_summary_by_digest does not tell you. (There is no information in performance_schema telling you when the collections start. That is something I miss and would like to see fixed in MySQL 5.8 if possible.)

The other difference of course is that P_S gives you information on one server. If you collect the information at the same time from more than one server with a similar load then the numbers you get out should be very similar and that is what queryprofiler does.

How do you use queryprofiler?  Provide it with one or more Go-style MySQL DSNs to connect to the servers and optionally tell it how many times to collect data from the servers (default: 10) and at what interval (default: every second) and it will run and give you the results, telling you the top queries seen (by elapsed time of the query) and the metrics for each server (queries per second, average query latency and how much these values vary).

A couple of examples of the output can be found here:

Hopefully you will find this tool useful.  Feedback and patches to improve it are most welcome.

A Couple of MySQL 5.7 gotchas to be aware of

MySQL 5.7 GA was released a couple of months ago now with 5.7.9 and 5.7.10 has been published a few days ago.  So far initial testing of these versions looks pretty good and both versions have proved to be stable.

I have, however, been bitten by a couple of gotchas which if you are not aware of them may be a bit of a surprise. This post is to bring them to your attention.

New MySQL accounts expire by default after 360 days

This is as per documentation, so there is no bug here. MySQL 5.7 provides a new more secure environment. One of the changes is to add password expiry and the default behaviour is for passwords expire after 360 days.  This seems good, but you, perhaps like me, may not be accustomed to managing your passwords, checking for expiration and adjusting the MySQL user settings accordingly.  The default setting of default_password_lifetime is 360 days, so after upgrading a server to MySQL 5.7 from MySQL 5.6 this setting suddenly comes to life. The good thing is nothing happens immediately so you do not see the time bomb ticking away. I had have been testing the DMR versions of MySQL 5.7 earlier to the GA release and consequently using it for longer than 2 months.  Recently a couple of 5.7.9 servers which had been upgraded from 5.6 a year ago decided to block access to all applications at the same time.  The quick fix is simple: change the default setting to 0 (no expiry) and we have a configuration that behaves like MySQL 5.6 even if it less secure than the default MySQL 5.7 setup. We can then look at how to manage the MySQL accounts and take this new setting into account in a more secure manner.  If you are starting to use MySQL 5.7 and are not migrating from 5.6 then perhaps you’ll put in the right checks in place when you start, but those of us migrating from 5.6 can not push down grants with the new ALTER USER syntax until the 5.6 masters are upgraded so we need to pay more attention to this while in the progress of migration.

New range optimizer setting might cause unexpected table scans if not set properly

MySQL 5.7.9 GA added a new configuration variable: range_optimizer_max_mem_size, set by default to 1536000. The documentation does not say much about this new setting and seems quite harmless. “if … the optimizer estimates that the amount of memory needed for this method would exceed the limit, it abandons the plan and considers other plans.”  The range optimiser is used for point selects, primary key lookups and other similar queries.  What this setting does is after parsing a query look at the number of items which may be referenced in a WHERE clause and if the memory usage is too high fall back to a slower method.

Let’s put this into context. A query like SELECT some_columns FROM some_table WHERE id IN ( 1, 2, 3, ... big list of ids ... 99998, 99999 ) will trigger this limit being reached for a large enough range of ids. DELETE FROM some_table WHERE (pk1 = 1 AND pk2 = 11) OR (pk1 = 2 AND pk2 = 12) .. OR .. (pk1 = 111 AND pk2 = 121) /* pk1 and pk2 form a [primary] key */ would also potentially trigger this.

The questions that come out of this are (a) “How to figure out the point at which this change happens?”, and (b) “What happens at this point?”

The answer to (b) is simple: MySQL falls back to doing a table scan (per item). The answer to (a) is not so clear. Bug#78752 is a feature request to make this clearer, and further investigation pointed to MySQL 5.6’s previous behaviour where the limit was defined in terms of a fixed number of hard-coded “items” (16,000), whereas 5.7’s new behaviour is in terms of memory usage.  The relationship between the two settings is not very clear and initial guestimates on systems I saw issues with seems to indicate that maybe 4kB per item is used by MySQL 5.7 at the moment. The point is that what worked quickly as point selects on 5.6 may fall back to table scans per item in 5.7 if the number of entries is too high, and this would require a reconfiguration (it is dynamic) of the configuration setting mentioned. The bad behaviour may also only happen depending on the size of the query.

Many people may wonder why anyone would be mad enough to use a SELECT or DELETE statement with several thousand entries in an IN () clause, but this comes from having split data in a single server into two and making the application find a list of ids from one server using some criteria and then using the ids obtained in a different one. I see that pattern used frequently and it is probably a common pattern on any system where data will no longer fit in a single server.

The problem with this particular change in behaviour is that point selects are very fast and efficient in MySQL. People use them a lot. Table scans are of course really slow, so depending on the query in question performance can change from ms to minutes just because your query is a tiny bit bigger than the new threshold. In practice it looks like the old hard-coded limit and the new dynamic limit are at least an order of magnitude different in size so it is quite easy to trip up on good queries in 5.6 failing miserably in 5.7 without a configuration change. Again while migrating from MySQL 5.6 to 5.7 you may see this change bite you.

You may get caught by either of these issues. I got caught by both of them while testing 5.7 and while the solutions to resolve them are quite simple to fix they do require a configuration change to resolve the issue. I hope this post at least makes you recognise them and know where to poke so you can make your new 5.7 servers behave properly again.

MMUG14: MySQL Automation at Facebook

English: Madrid MySQL Users Group will be holding their next meeting on Tuesday, 10th November at 19:30h at the offices of Tuenti in Madrid. David Fernández will be offering a presentation “MySQL Automation @ FB”.  If you’re in Madrid and are interested please come along. We have not been able to give much advance notice so if you know of others who may be interested please forward on this information.  Full details of the MeetUp can be found here at the Madrid MySQL Users Group page.

Español: El día 10 de noviembre a las 19:30 Madrid MySQL Users Group tendrá su próxima reunión en las oficinas de Tuenti en Madrid.  David Fernández nos ofrecerá una presentación (en inglés) “MySQL Automation @ FB”.  Si estás en Madrid y interesado nos gustaría verte.  No hemos podido avisar con mucha antelación, así que si conoces a otros que podrían estar interesados agradeceríamos les hagas llegar esta información. Se puede encontrar los detalles completos de la reunión aquí en la página de Madrid MySQL Users Group.

MMUG13: Practical MySQL Optimisation y Galera Cluster presentations

English: Madrid MySQL Users Group will be holding their next meeting on 17th June at 18:00h at EIE Spain in Madrid. Dimitri Vanoverbeke and Stéphane Combaudon from Percona will be offering two presentations for us:

  • Practical MySQL optimisations
  • Galera Cluster – introduction and where it fits in the MySQL eco-system

I think this is an excellent moment to learn new things and meet new people. If you’re in Madrid and are interested please come along.  More information can be found here at the Madrid MySQL Users Group page.

Español: El día 17 de junio a las 18:00 Madrid MySQL Users Group tendrá su próxima reunión en las oficinas de EIE Spain.  Dimitri Vanoverbeke y Stéphane Combaudon de Percona nos ofrecerá dos presentaciones (en inglés):

  • Practical MySQL optimisations
  • Galera Cluster – introduction and where it fits in the MySQL eco-system

Creo que será una oportunidad excelente para aprender algo nuevo y para conocer gente nueva. Si estás en Madrid y interesado nos gustaría verte.  Se puede encontrar más información aquí en la página de Madrid MySQL Users Group.


MMUG12: Talk about Percona Toolkit and the new features of MySQL 5.7

Madrid MySQL Users Group is having a Meetup this afternoon, Wednesday, 13th May at 19:00.

  • I will be presenting (in Spanish) a quick summary of Percona Toolkit and also offering a summary of the new features in MySQL 5.7 as the release candidate has been announced and we don’t expect new functionality.
  • This is also an opportunity to discuss other MySQL related topics in a less formal manner.
  • You can find information about the Meetup here.

So if you are in Madrid and are interested please come along.

El Madrid MySQL Users Group tiene una reunión esta tarde, miércoles 13 de mayo, a las 19:00.

  • Ofreceré una presentación sobre Percona Toolkit y un resumen de las características nuevas de MySQL 5.7 que recientemente se anunció como Release Candidate. Ya no esperamos cambios en su funcionalidad.
  • También habrá una oportunidad de hablar de otros temas relacionados con MySQL de una manera menos formal.
  • Se puede encontrar información de la reunión aquí.

Si estás en Madrid y te interesa estarás bienvenido.


new to pstop – vmstat style stdout interface

In November last year I announced a program I wrote called pstop. I hope that some of you have tried it and found it useful. Certainly I know that colleagues and friends use it and it has proved helpful when trying to look inside MySQL to see what it is doing.

A recent suggestion provoked me to provide a slightly different interface to pstop, that is rather than show the output in a terminal-like top format, provide a line-based summary in a similar way to vmstat(8), pt-diskstats(1p) and other similar command line tools.  I have now incorporated some changes which allow this to be done. So if you want to see every few seconds which tables are generating most load, or which files have most I/O then this tool may be useful. Example output is shown below:

Hopefully this gives you an idea.  The --help option gives you more details. I have not yet paid much attention to the output and the output is not currently well suited for a tool to parse, so I think it’s likely I will need to provide a more machine readable --raw format option at a later stage.  That said feedback on what you want to see or patches are most welcome.

MMUG11: Talk about binlog servers at Madrid MySQL Users Group meeting on 29th January 2015

Madrid MySQL Users Group will have its next meeting on Thursday, the 29th of January.

I will be giving a presentation on the MySQL binlog server and how it can be used to help scale reads and be used for other purposes.  If you have (more than) a few slaves this talk might be interesting for you.  The meeting will be in Spanish. I hope to see you there.

Details can be found on the group’s Meetup page here:

La próxima reunión de Madrid MySQL Users Group tendrá lugar el jueves 29 de enero. Ofreceré una presentación sobre el MySQL binlog server y como se puede utilizar para ayudar con la escalada de lecturas a la base de datos y para otros usos. La reunión será en español. Espero veros allí.

Se puede encontrar más detalles en la página del grupo:

MMUG10: Madrid MySQL Users Group meeting to take place on 18th December 2014

Madrid MySQL Users Group will have its next meeting on Tuesday, the 18th of December. Details can be found on the group’s Meetup page here:  This will be meeting number 10 of MMUG and the last meeting of the year. We plan to talk about MySQL, MariaDB and related things. An excuse to talk about our favourite subject. Come along and meet us.  The meeting will be in Spanish. I hope to see you there.

La próxima reunión de Madrid MySQL Users Group tendrá lugar el jueves 18 de diciembre. Se puede encontrar más detalles en la página del grupo:  Esta será nuestra reunión número 10 y la última del año. Hablaremos sobre MySQL, MariaDB y cosas relacionadas.  Será una excusa para hablar de nuestro tema favorito.  Ven a vernos. La reunión será en español.  Espero veros allí.

MMUG9: Madrid MySQL Users Group meeting to take place on 20th Noevember 2014

Madrid MySQL Users Group will have its next meeting on the 20th of November. Details can be found on the group’s Meetup page.

We plan to talk about pstop, which I’ve announced earlier and also the latest changes in MariaDB and MySQL since our last meeting.  The meeting will be in Spanish. I hope to see you there.

La próxima reunión de Madrid MySQL Users Group tendrá lugar el jueves 20 de noviembre. Se puede encontrar más detalles en la página del grupo.  Hablaremos sobre pstop y los últimos cambios en MariaDB y MySQL desde nuestra última reunión.  La reunión será en español.  Espero veros allí.

pstop – a top-like program for MySQL (based on performance_schema)

I have been working with MySQL for some time and it has changed significantly from what I was using in 5.0 to what we have now in 5.6. One of the biggest handicap we’ve had in the past is to not be able to see what MySQL is doing or why.

MySQL 5.5 introduced us to performance_schema. It was a good start but quite crude. MySQL 5.6 gave us a significant increase in stuff that allows you to see what is going on inside MySQL. That’s great, except it’s hard to read, the documentation is good, but not oriented at the DBA but more at the MySQL developer (that’s what it seems like at least). So most of us have ignored it. Others complained about the overhead and said it’s not good to use it.

Mark Leith developed mysql-sys as a way to see this great information in a more usable way. It’s only a set of views so doesn’t really have much overhead. However, one thing I missed was getting the information of what was happening inside performance_schema in real-time, top-like, so I could see where a server was busy, and what it was doing. So inspired by mysql-sys and also as a way for me to start playing with go I have built P_S top, or pstop.

You can find it on github here:

What does pstop show you?  It takes some counters from performance_schema and subtracts the values from when it started up. The output is in four different screens which you toggle between using the <tab> key.  The idea is to look at the total latency (wait time) and order by table or file that causes it in heaviest first.  Table waits are also then split between read, insert, update and delete and there’s a screen which shows some locking information.

Access to the db server is currently via a ~/.my.cnf defaults file. I probably need to make this more sophisticated, and allow the credentials to be provided directly but have not done that yet.  I have used this on a couple of systems which I monitor for work and it has been most informative in showing where the load is, which table or file generates it and how that varies over time.  This information was already in performance_schema but there have not been any tools to get this out.

Here are a couple of examples:

Latency by table name

Operations by table name

Latency by filename

You can also see these screen samples here. I think that if you compile and build this and point it to a server of your own you’ll find the output much more interesting.

So please let me know what you think. I hope you find it interesting and useful.

MariaDB 10.1 Brings Compound Statements

A very old post of mine in 2009, MySQL’s stored procedure language could be so much more Useful suggested that it would be nice if MySQL could be adapted to use compound statements directly from the command line in a similar way to the language used for stored procedures. I’ve just seen that this seems to be possible now in MariaDB 10.1. See the release notes.

I now need to look at this. So thanks, it looks like this feature request is now available.

Making MySQL Better More Quickly

With the upcoming release of MySQL 5.7 I begin to see a problem which I think needs attention at least for 5.8 or whatever comes next.

  • The GA release cycle is too long, being about 2 years and that means 3 years between upgrades in a production environment
  • More people use MySQL and the data it holds becomes more important. So playing with development versions while possible becomes harder.  This is bad for Oracle as they do not get the feedback they need to adjust the development of new features and have to best guess the right choices.
  • Production DBAs do want new features and crave them if it makes our life easier, if performance improves, but we also have to live in an environment which is sufficiently stable.  This is a hard mixture of requirements to work with.
  • In larger environments the transition from one major version to another, even when automated can take time. If any gotcha comes along then it may interrupt that process and leave us with a mixed environment of old and new, or simply in the state of not being able to upgrade at all.  Usually that pause may not be long but even new minor versions of MySQL are not released that frequently so from getting an issue fixed to seeing it released and then upgrading all servers to this new version is again another round of upgrades.

I would like to see Oracle provide new features and make MySQL better. They are doing that and it is clear that since I have been using 5.0 professionally up to the current 5.7 a huge amount has changed. The product is much more stable and performs much better, but my workload has also increased so I am still looking for more features and an easier life. I am an optimist that is for sure.

One issue that I believe holds back earlier experimentation is that MySQL is not modular. Even the engines that you can use in it, if built as plugins, do not seem to be switchable from one minor version to another.

This leads to 2 issues:

  • any breakage or bug (and all software has bugs, that is inevitable) requires you when it is fixed to upgrade to a new version. That new version has changes in many different components. Sometimes that is fine but sometimes that may bring in new bugs which cause their own problems
  • potentially the developers of MySQL could replace a “GA module” with a more experimental version of that module which maybe has more features, could perform better but maybe breaks. Changing a single module is hopefully much safer than changing a full binary for a development version, and that should be much easier to do on spare machines. A module such as this would be something I could much more easily test than installing 5.7.4 on lots of machines.

However, the problem is that MySQL is not modular and that is where several people have explained to me my madness and how hard it is to achieve things like this. My current employer likes to push out changes in small chunks, look at the result of those small changes and then if they seem good, go ahead and do more. If something goes wrong, back it out and look elsewhere to do things. Doing the same on a database server not designed that way may well be hard, but making small changes along these lines would I think longer term help improve things and give the people that use a GA MySQL the opportunity to try out new ideas, give feedback quickly and allow things to evolve.

Inevitably when you start to build interfaces like this some interfaces need to change to allow to allow for a larger redesign of the innards of a system. That is fine, when it happens we’ll move over to that and a DEV version will have these new much improved features and we may have to wait longer for that.

What modules might I be talking about when I talk about modularising MySQL?  I’ll agree I do not know the code other than having glanced at it on several occasions but there are some quite clear functional parts to MySQL:

  • the engines have often been plugins, though now InnoDB is a bit of an exception. I still wonder if that is necessary whatever MySQL’s design.  However these plugins do not seem to have a completely clear interface with MySQL as I have seen plugins for example for something like Spider or TokuDB which work for a specific MySQL or MariaDB version. That just shows that whatever this interface is it is not designed to be stable and swappable between different MySQL minor versions.  Doing something to make that better would mean that people who build a new engine can build it once for a a major version and know that on binaries built the same way the files they produce should just plug in without issue unchanged. Me dreaming? Perhaps but no-one worries if I upgrade my db4 rpm from 4.7.25 to 4.7.29 that all the applications that use it will break: the expectation is clear: it should not make any difference at all. Why does something like this not work with MySQL engine code?
  • logging has been rather inconsistent for a long time. I think it may improve in 5.7, but however it’s built, build it as a module. If I want to replace that module with something new that stores all my log data in a Sybase or DB2 database MySQL should not care, assuming the module does the right thing and there are settings to configure this appropriately.  The point being also that if there is a bug in the logging, the bug can be fixed and the module replaced with a bug-free version, without necessarily requiring me to upgrade the whole server.
  • Replication is generally split into 2 parts: the writing to binlogs and the reading of those binlogs from a master, storing them locally and reloading the relay logs and processing them.
    • I have seen bugs in replication, mainly in the more complex SQL thread component where the same change could potentially apply. Swap out the module for a fixed one.
    • MySQL 5.6 was supposed to make life great with replication and we would not get stuck in a situation where a crashed server would come up, out of sync with its master, and because of that we would need to reclone the server again. Even when moving over to using the master_info_repository and relay_log_info_repository settings to TABLE you can have issues. The quick fix implemented by Oracle of relay_log_recovery = 1 sounds great. It is a quick, cheap and cheerful solution which works assuming you never have delayed slaves.  Different environments I maintain do not follow this pattern and I have servers with a deliberate multi-hour delay, which can be useful for recovering from issues. Also copying large databases between datacentres may take several days, triggering after starting the system a need to pull logs and process them for several days. A mistaken restart would lose all that data and require it to be downloaded again which is costly. So I have discussed with colleagues a theoretical improved behaviour of the I/O thread should MySQL crash but there is no way to test it on boxes I currently use. Making the I/O thread into a module would make it much easier to try out different ideas on GA boxes to show whether these ideas are really workable or not.
  • The query parser and optimiser in MySQL is supposed to be a horrendous beast that everyone must keep clear of.  Improvements are happening and posts like this are an indication of progress. My understanding is that this beast is spread all over the server code and thus hard to untangle but certainly from a theoretical point of view doing so would allow alternative optimisers to be usable/pluggable, and for example different optimisers might be better at handling workloads such as batch based workloads with sub queries and such which MySQL is known not to handle well, but which for certain workloads could potentially make a great deal of difference to us all.  The MySQL of 5.0 is quite different from the MySQL of today and sharding is the norm, but that requires help from the app to do all the dirty work. Other options are to use something like Vitess, ScaleBase, or Spider, or some built-in new module which knows about this type of thing better and can do this sort of stuff transparently to the application. MySQL Fabric tries to do this at the application level and that’s fine, but it adds much more complexity for the application developers who probably should not really have to worry (too much) about this type of detail.  So solving the problem is not the issue here, it’s providing hooks to let others try, or simply to swap out version 1 with version 10, and see if version 10 is better and faster, with everything else unchanged.
  • The handling of memory in MySQL has always been interesting to us all. Each engine has traditionally managed the memory it needs itself and there is no concept of sharing, or memory pressure, all of which can lead to sudden memory explosions due to a changing workload which may kill mysqld (Linux OOM) or trigger swapping (database servers should never swap…). I have seen in 5.7 that there is now some memory instrumentation and this at least allows looking to see where memory is used. The next step would be to use the same memory management routines, and finally perhaps to add this concept of memory pressure allowing a large query if needed to page out or reduce the size of the innodb buffer pool while it is running, or the heavy use of some MyISAM or Aria tables could do the same.  Doing that is hard, but we are no longer using a MySQL “toy” database. Many large billion $ companies depend on MySQL so this sort of functionality would be most welcome there I am sure.  Changes in this area would certainly need to be done cautiously but I can envisage swapping out the default 5.8 memory manager for a “new feature” 5.9 version with all the “if it breaks you keep the bits” warnings attached, allowing us to see if indeed problematic memory behaviour is resolved by this new module.
  • The event scheduler is in theory a small and tiny component which does it’s thing.  An early version of 5.5 had some bugs and I had to wait a long time to upgrade the server just to fix this pesky event_scheduler module which all it does is send out heartbeat changes used for measuring replication delay.  Had this been a module I could have installed a fixed version and not had to use a work around for several months.

I am sure there are lots of other components of MySQL which could receive the same treatment.

Making these sort of changes is of course a huge project and most managers do not see the gain of this, certainly not short term.  However, if care is taken and as different subsystems are modified there is an opportunity for making progress and allowing the sort of experimentation I describe.  Also, and while Oracle may not see it this way, having a clearer interface and more modular framework would allow others to perhaps try different things, and replace a module with their own.  Oracle do seem to be putting a lot of resources into MySQL and that is good, but they do not have infinite resources and they can not solve specialised or every need that we might see. Making it easier, for those who can, to use this hypothetical modular framework, provides an opportunity for some things to be done which can not be done now.  Add a bounty feature and let people pay for that and where something is modularised it will be much easier for them to try to solve problems that may come up. In any case, later testing will be easier if these interfaces exist.

This is the way I would like to see MySQL improve, notice I do not actually talk about functional improvements, but how to make it potentially easier to experiment and test these new features. This sort of design change would allow those of us that need new features now to test and perhaps include them in our GA versions. Maybe then the definition of GA will become rather vague if I am using 5.7.10 + innodb 5.8.1 + io_thread_5.8.3 + sql_thread_5.8.6 + event_scheduler….. Support will probably hate the suggestion I have just made as it would potentially make their life more challenging, but then again I do not see most people playing this game. It is meant for those of us who need it, and if not needed at all bug fixing specific issues should be much easier than now, where you need to do a full new test on a new version to make sure you do not catch another set of new bugs.

If you have got to the end of this thanks for reading. I need to learn to write less but I do believe that the reasoning I make above makes a lot of sense. This can only be done with small changes and with people seeing the idea and trying it out, and at least initially doing it on parts of the system which are easy to do. If they work further progress can be made.

Oracle and MariaDB both want feedback and ideas of where we want MySQL / MariaDB to go.  Independently of some of the technical aspects of new features and improvements this is my 2 cents of one thing I would like to see and why.

Does it make sense?

MariaDB 10.0 upgrade goes smoothly

I have been meaning to update some systems to MariaDB 10.0 and finally had a bit of time to get around to that.  The documentation of specifics of what’s needed to go from MariaDB 5.5 to 10.0 can be found here and while it’s not very long it seems there’s little to actually do.

Having already upgraded some servers from MySQL 5.5 to 5.6 the process and appropriate configuration changes were very similar so all in all a rather non event.

One thing which is always a concern if systems can not be down for long is the time to do the upgrade. While you see many blog posts talking about taking a backup via mysqldump and then loading it all back this is not really an option on many systems I manage and a replacement of binaries, adjustment of /etc/my.cnf  and restart of the server with the new binaries followed by running mysql_upgrade is what I usually do.  That usually works fine.

One server I had to upgrade had quite a few files (15,000) and the database occupied 2 TB. The run time of mysql_upgrade on such a system takes a couple of hours of checking tables after which the actual system changes take almost no time at all. So that is something to be aware of if your dataset is similar.

It seems I was confused by Colin’s post on performance_schema being disabled in MariaDB 10.0.12 and later. It seems that actually it’s disabled on startup, so can be easily configured if desired by setting performance_schema = 1 in /etc/my.cnf prior to starting mysqld. I had thought that it was not compiled into the binaries at all, which is what’s been done in WebScaleSQL. That is not the case.

There’s a lot of talk about performance_schema overhead, even recently from colleagues, so this is a subject which needs looking at in more detail and if there is indeed an unacceptable overhead then that needs looking at. I’m sure Oracle or MariaDB would appreciate reports of specific issues as otherwise there’s just too much fud out there.

Anyway the MariaDB 10.0 servers I upgraded did have p_s and my configuration enabled it. That’s nice as now I can see where some of the load and performance points are, and had thought that would not be possible. I also tried Mark Leith’s mysql_sys and was not sure if it would work in MariaDB 10.0 but a quick look seems to indicate it does which is helpful.

The views in this sys schema are very useful but some care is needed when using them as several tables are joined in performance_schema and if the number of rows involved is high, and given performance_schema has no indexes, this can make queries be very costly, taking minutes to run. That’s not been mentioned, so take a little care. Under normal usage this does not seem to be an issue, it really depends on different use cases. Longer term I think that P_S will need indexes, even if these are only “memory tables” …

The upgrade went smoothly, so now it’s time for me to check the new MariaDB 10.0 features and see how they fare.


Time to get some 128-bit types into MySQL?

I think that getting 128-bit types into MySQL would be good. There are a few use cases for this and right now we have to work around them. That should not be necessary.  While not essential they would make things easier.

The headline is easy to understand, but is this really needed?

First we need to look to see where this might be used. I can think of three different 128-bit types which are missing at the moment:

  • IPv6 addresses
  • uuid values
  • a bigger value than (signed) bigint [64-bit numbers]

IPv6 Addresses

IPv6 addresses are 128-bit numbers, and having a native way to store them would be really helpful. Given this also includes an IPv4 representation then for those people who store IP addresses (client connections and other things) such a native type would be much better than the typical unsigned int or binary(4) which you might be using now. Is this an IPv4 address? Well it might be, but it also might not be.  The same applies to IPv6, and having a real IPv6 type makes this knowledge more explicit.

MySQL already provides support routines for IPv4 (even if the type does not exist) such as INET_ATON(), INET_NTOA() so a similar set of routines would be needed to support this type, converting between their text and numeric representation and also for converting between IPv4 and IPv6.

UUID Values

MySQL itself uses UUID values in 5.6 and above as the server_uuid, but it’s stored or seems to be as a string. Other software (MEM is a good example) also uses UUID values in various places.

Have a look on search engines for MySQL and UUID and you see lots of questions on how to best store these values in MySQL. So there is already a demand for this, and no good answers as far as I can see.

One common concern I have currently when storing such values as binary(16) is that the values are hard to visualise, especially if used as a primary key, and also from the DBA’s point of view who may want to “manually” access or modify data  it is not possible to do something similar to SELECT name FROM servers WHERE uuid = ‘cd2180ae-9b94-11e2-b407-e83935c12500’, as this just does not work. Casting could make this work magically but right now it’s much harder than it should be.  There is not a single UUID format but the basics are the same and if we had a uuid format any supporting routines (which would be needed) would be able to convert as needed.

Signed or unsigned integers

Yes, the (signed or unsigned) bigint type gives us 64-bits and that allows for huge numbers but one size bigger matches the use cases above, so it’s good to be able to convert between them depending on the usage.  That is if we’re going to have IPv6 and UUID type values, it makes sense to allow an integer equivalent representation and sometimes this might be needed when stripping out parts of a uuid, or parts of an IPv6 address.  The name of this type should be something a little better than we’ve seen before so hugeint (unsigned) would not be what I would suggest. Something as simple as int128 (unsigned) would be much easier to understand.

Conversion routines

Each of the three types above need routines to support their “native” usage and probably converting from / to numeric or text representations of the value.  Given the three types have the same size then it may also be useful to convert from one format to another. The actual content would not change, just it’s representation. Included with this would be a BINARY(16) so that people who might have had to use other MySQL times to represent these values have an easy way to convert more explicitly to them and if for any reason a conversion back is needed this is also possible.

ALTER TABLE should be aware of these equivalents too so if I have a table defined with a BINARY(16) I can convert it to an IPv6 address/type as a no-op operation (definition only change), in a similar way as can be done with some other conversions (ENUM being a common type that changes but if you add a new value there’s no need to check the table for existing values as the old definition was a subset of the new one).

No incompatible changes in minor versions please

A change such as this can not reasonably be added as a minor version change as if we would break many things.  Minor versions should really, really only included bug fixes, or performance improvements, and if a new feature really has to be added by default it must be disabled (for compatibility) and enabled with some sort of special option. Given there’s no agreed way to do this and it is likely to cause all sorts of issues, just do not do it.

That means that a feature such as this can only be added in a new version such as MySQL 5.7 or MariaDB 10.1 both of which are DEV versions, and so allowed to change in any way their authors deem reasonable. I have seen no indication of 5.7 including this functionality and given the time that 5.7 has been about I am inclined to think that an extra change such as this is unlikely to make it there. So MySQL 5.8 then? MariaDB 10.1 development has not been ongoing for that long so maybe such a feature might be considered there.

In the end we do need these new features and long lead times to make them available is a considerable source of frustration for those of us who have a number of systems to upgrade.  One thing is a new version going GA, but it’s something else to have all systems upgraded to use that version and thus make it available to developers.

Whatever happens it would be really helpful if the different “MySQL vendors” talk to each other, if they agree that this is a sensible path to take. Having various different interpretations of how these new types should be stored, converted and which associated functions etc are needed would be a user or developer’s nightmare. I understand there is competition, but for something like this it is really important to get it right.  The first implementor of such a feature would potentially have an advantage over the others but I would expect usage of this type of data types to be quite popular so agreeing generally on what to do should not be that hard and avoids the different forks from drifting off further apart, something which I think is bad for everyone concerned.


Some people I have spoken share the opinion that having such a set of 128-bit types would be good. It is something else of course to implement that.  For those looking for new features to develop in MySQL this is one which in theory is not absolutely necessary but which I think would not only be popular but would be used.  In the end MySQL is there to store data, and make it easy to retrieve and it seems clear to me that this type of data is one such usage which while it can be handled differently would really welcome “native” support. I hope that this will happen sometime soon.

Update 2014-07-03

MariaDB seems to have some support on its way for this. Referenced on maria-developers on 1st July, details can be found here:

If a plugin type is available for IPv4 that might be good as well.

This looks like work in progress and there’s no mention of a 128bit (unsigned) int, or how to convert between different values, but this looks like a good start. In fact if it’s possible to make these types available via a plugin interface this does seem to add the possibility of adding new special types even once MariaDB is working, so it makes it easier to expand functionality later.

In terms of routines that probably should be available in MySQL to support some of these types the following stand out:

  • INET_PTON() and INET_NTOP() to supplement the existing INET_ATON() and INET_NTOA() functions.
  • GETADDRINFO() and GETNAMEINFO() to convert between IPv4 or IPv6 addresses and names. existing INET_ATON() and INET_NTOA() functions.
  • Something like  UUID_LONG() to generate a 128-bit numeric equivalent of  UUID(), and functions to convert a text-based uuid into a number and a something to convert back again, STRING_TO_UUID() and UUID_TO_STRING() unless there already exists some standard function name for these tasks.

I think all of these look like useful routines to go with the types above. I’ll add more as I think of them.

Update 2014-07-11

I also see this very old bug referenced in the mysql bug list:

webscalesql- RPMs available for CentOS 6

A new commit b955fd46ee60b134c6935badb43eb838872cfbbf was pushed out to the webscalesql-5.6 so I’ve built some updated RPMs using my webscalesql-rpm scripts.  The new binaries if you want to try them can be found at

The rpms are:

Again these packages are work in progress, but feedback is welcome.

MMUG7: Madrid MySQL Users Group meeting to take place on 24th April 2014

Madrid MySQL Users Group will have its next meeting on the 24th of April. Details can be found on the group’s Meetup page.

We plan to talk about WebScaleSQL and I will give a short presentation on how to build WebScaleSQL RPMs on CentOS 6.  The meeting will be in Spanish.

We’ve changed the place that we’ll be holding the meeting. See the Meetup URL for details. Looking forward to seeing you there.

La próxima reunión de Madrid MySQL Users Group tendrá lugar el jueves 24 de abril. Se puede encontrar más detalles en la página del grupo.  Hablaremos sobre WebScaleSQL y ofreceré una breve presentación sobre como construir RPMS de WebScaleSQL para CentOS 6.  La reunión será en español.

Hemos cambiado el lugar donde se ubicará la reunión. Mirar la URL del Meetup para más detalles. Esperamos veros allí.

WebScaleSQL RPMs for CentOS 6

Looks like this post was rather unclear. See the bottom for how to build the rpms quickly.

WebScaleSQL was announced last week. This looks like a good thing for MySQL as it provides a buildable version of MySQL which includes multiple patches from Facebook, Google, LinkedIn, and Twitter needed by large users of MySQL, patches which have not been incorporated into the upstream source tree.  Making this more visible will possibly encourage more of these patches to be brought into the code sooner.

The source is provided as a git repo at and as detailed at the documentation says there is currently no intention to provide binaries.

Instructions on building the binaries and the build requirements for WebScaleSql can be found at and do not look too hard. However, I prefer to install my software as rpms as this makes upgrading or removing it later much easier.

With that in mind I thought I’d try and build some webscalesql rpms.

As I’m currently using MySQL-5.6 rpms, downloaded from, I wanted to build WebScaleSQL rpms which were compatible with these.  I’m aware of the Oracle-built “community” rpms which are downloadable directory from their yum repo ( and will probably use these when upgrading to MySQL 5.7 but moving over to that now requires changing internal infrastructure and is currently not worth the effort.

In order to build the WebScaleSQL rpms I did the following:

  • download the latest source rpm, MySQL-5.6-17-1.el6.src.rpm from
  • extract the spec file: rpm -ivh MySQL-5.6.17-1.el6.src.rpm and look in the directory specified by rpm –eval ‘%{_specdir}’ for the spec file (mysql.spec)
  • install the devtools package in order to use GCC 4.7:

  •  clone the webscalesql.git repo to any directory, let’s call it  $WSS_HOME
  • create a webscale-5.6.tar.gz tar ball and put it in the RPM SRC_DIR:
  • create a minimally changed webscalesql.spec file, based on mysql.spec, to include the new path for the compiler toolchain
  • build the rpm by cd’ing into the directory with the webscalesql.spec file and running:

Warning: no explict check is made for the devtools chain to be installed and compilation will break if you try to use the native gcc compiler.

  •  this built the following rpms:

These rpms should work for CentOS 6, RHEL 6 and other equivalent distributions.  I have not actually tried to use any of the packages except the webscalesql-server.

I have had very little time so far to play with this, but did replace the MySQL-server package with webscalesql-server on a development server and let it run for a few hours.

One thing I did notice is that the performance_schema* settings I had in /etc/my.cnf were not recognised by webscalesql-server and had to be commented out. That said performance_schema still seemed to be there.

I need to check further but guess that this may be due to differences between MySQL and webscalesql or potentially something I have not done correctly when building.

Other than that the server replicated fine and I saw no issues.

This has given me some basical rpms for testing.  I have not tested the package on anything other than CentOS 6 and it is likely that other changes are needed. I probably need to do a few other things like:

  • Clean up the package further maybe adjusting copyrights or other messages about the packages.
  • Obsolete the installed MySQL-server, … rpms so I can just do rpm -Uvh webscalesql-server …. rather than remove the MySQL-server package first.
  • Add a bit of scripting to incorporate the date of the latest webscalesql commit into the version/release settings in the spec file. This avoids having to manually change the different values and as updates happen a rerun of the build script should just build a new package transparently.

I have not yet had time to look at the patches that have been applied to WebScaleSQL. It would certainly be nice to have some sort of list of functional changes (such as the performance_schema difference I noted earlier, assuming this is not a build error) of WebScaleSQL compared to the upstream source and any new configuration settings. Perhaps that will happen later?

At least for those of you who want to run a quick test of the binaries, or look at my spec file, you can find them on my website:  No guarantees of any kind as you can imagine but feedback and improvements to the current spec file or build procedure would be most welcome.

2014-04-03 Update

See: which I have created as a quick helper script to do the build. It still probably needs quite a bit of work but avoids copying instructions and doing stuff by hand.

2014-04-22 Update 2

Basically all you need to do is: