The Data Charmer: New MySQL Sandbox 3.1

I have three pieces of information to share about MySQL::Sandbox:

Version 3.1.0 has migrated from Launchpad to GitHub
This version is released under the Apache license. Both these changes are meant to improve and promote cooperation on the project.
There is an important change related to usability. When using replication with MySQL::Sandbox and MySQL 5.6+, the server UUIDs become more readable (see below).

First, some words on the location changes. About two years ago, I started plans for a rewrite of MySQL::Sandbox. Then, I had some unexpected changes, which involved moving home to a different continent twice within twelve months. The project was neglected, but I haven't dismissed it. While I wait for the rewrite to start, I wanted to get rid of the obstacles for rapid development, and I decided to transfer the current codebase to GitHub. This will allow me to use only one RCS instead of three (My team has abandoned svn too ¹).
Apart from the changes described in this post, there is little difference in the code published on GitHub.

Now, to the usability. In my recent series of advanced replication features, I complained often about GTIDs being hard to tell apart. Hers is an example from MySQL replication in action - Part 3: all-masters P2P topology. Look at the server identifiers, and see if you can tell at first glance where does the largest transaction set come from:

$ for N in 1 2 3 4 ; do ./n$N -e 'select @@server_id, @@server_uuid; select @@global.gtid_executed\G'; done
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         101 | 18fd3be0-4119-11e5-97cd-24acf2bbd1e4 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 18fd3be0-4119-11e5-97cd-24acf2bbd1e4:1-3,
1e629814-4119-11e5-85cf-aac6e218d3d8:1-119,
226e3350-4119-11e5-8242-de985f123dfc:1-3,
270c0ebe-4119-11e5-a1c9-b7fbc4e42c2c:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         102 | 1e629814-4119-11e5-85cf-aac6e218d3d8 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 18fd3be0-4119-11e5-97cd-24acf2bbd1e4:1-3,
1e629814-4119-11e5-85cf-aac6e218d3d8:1-119,
226e3350-4119-11e5-8242-de985f123dfc:1-3,
270c0ebe-4119-11e5-a1c9-b7fbc4e42c2c:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         103 | 226e3350-4119-11e5-8242-de985f123dfc |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 18fd3be0-4119-11e5-97cd-24acf2bbd1e4:1-3,
1e629814-4119-11e5-85cf-aac6e218d3d8:1-119,
226e3350-4119-11e5-8242-de985f123dfc:1-3,
270c0ebe-4119-11e5-a1c9-b7fbc4e42c2c:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         104 | 270c0ebe-4119-11e5-a1c9-b7fbc4e42c2c |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 18fd3be0-4119-11e5-97cd-24acf2bbd1e4:1-3,
1e629814-4119-11e5-85cf-aac6e218d3d8:1-119,
226e3350-4119-11e5-8242-de985f123dfc:1-3,
270c0ebe-4119-11e5-a1c9-b7fbc4e42c2c:1-3

I decided that I could do something. I use MySQL::Sandbox for my testing, and in these conditions there is little risk of server clashing. The architecture of MySQL::Sandbox ensures that servers within a group of sandboxes keep to themselves. I modified the software so that when a server is created with a server-ID, the server-uuid is modified (with a unsupported-and-do-not-try-this-in-production hack). I know: it defies the purpose of having unique identifiers, but frankly, I care more about being able to understand what is going on than worrying about my server-uuid being the same in a different cluster.
The way it works is simple: when a server is created and has a server-id (i.e. we can safely assume that its purpose is to be used in replication), its server-UUID is changed to a new string made of port number and the server-id repeated many times. If the server ID is between 101 and 109 (what usually MySQL::Sandbox does for groups of sandboxes) it is simplified by subtracting 100, and having IDs that are still hard to pronounce, but that can be visually identified at a glance. (such as 00008480-1111-1111-1111-111111111111)
If the server ID is something more complex, then MySQL::Sandbox uses the port number to create the last part as well. For example 00005708-5708-5708-5708-000000005708.
Here is the same scenario shown in the article, but using the newest MySQL Sandbox version. Now server #2 is more easily identified as the source of the largest transaction group.

$ for N in 1 2 3 4 ; do ./n$N -e 'select @@server_id, @@server_uuid; select @@global.gtid_executed\G'; done
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         101 | 00008480-1111-1111-1111-111111111111 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 00008480-1111-1111-1111-111111111111:1-3,
00008481-2222-2222-2222-222222222222:1-119,
00008482-3333-3333-3333-333333333333:1-3,
00008483-4444-4444-4444-444444444444:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         102 | 00008481-2222-2222-2222-222222222222 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 00008480-1111-1111-1111-111111111111:1-3,
00008481-2222-2222-2222-222222222222:1-119,
00008482-3333-3333-3333-333333333333:1-3,
00008483-4444-4444-4444-444444444444:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         103 | 00008482-3333-3333-3333-333333333333 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 00008480-1111-1111-1111-111111111111:1-3,
00008481-2222-2222-2222-222222222222:1-119,
00008482-3333-3333-3333-333333333333:1-3,
00008483-4444-4444-4444-444444444444:1-3
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         104 | 00008483-4444-4444-4444-444444444444 |
+-------------+--------------------------------------+
*************************** 1. row ***************************
@@global.gtid_executed: 00008480-1111-1111-1111-111111111111:1-3,
00008481-2222-2222-2222-222222222222:1-119,
00008482-3333-3333-3333-333333333333:1-3,
00008483-4444-4444-4444-444444444444:1-3

Should you decide that you don't want MySQL::Sandbox to do this transformation, you can skip it by setting the variable KEEP_UUID before running any sandbox installation command.
Another change that was long overdue is the behavior of the './clear' script within the sandboxes. With the introduction of MySQL 5.7, I faced the problem of having innodb tables in places where there hadn't been before (i.e. the 'mysql' database). As a solution, I made a copy of that database with mysaldump right after the installation, and then I used it to re-create the database after a cleanup. This solution doesn't work, as it leads to more problems than benefits. So I changed the behavior. There is no mysqldump backup, and there is no removal of innodb files. Moreover, since the sys database was introduced, it is not safe to truncate 'mysql.proc', as it would also make the sys schema ineffective. Now, if a cleanup that just removes non-system schemas is enough for you, go for the './clear' script. If your testing has messed up with stored routines, then you would be better off with a reinstallation (which takes just a few seconds anyway.)
And one more thing: This release of MySQL::Sandbox is ready for MySQL 5.7.9. The MySQL team at Oracle has graciously provided a preview build to test some early bug fixes, and MySQL::Sandbox 3.1.00 works smoothly with it.

^{1. For the ones who are new to my blog, let's make clear that MySQL::Sandbox is my personal project, and my company does not have any relation or influence on such project. ↩}

2 comments:

Mark Leith said...: FWIW, if you want to easily re-create the sys schema for any reason, just running mysql_upgrade against the instance will re-install it for you.

You don't even need to drop the schema in the above case (mysql.proc being truncated), we just detect the object count being different to the expected count, and do a re-install:

https://github.com/mysql/mysql-server/blob/mysql-5.7.8/client/upgrade/program.cc#L48
https://github.com/mysql/mysql-server/blob/mysql-5.7.8/client/upgrade/program.cc#L270

(Counts will be different in 5.7.9).

Otherwise, just dropping the "broken" sys schema, and running mysql_upgrade will do the install as well.; August 26, 2015 at 10:41:00 AM GMT+2
Giuseppe Maxia said...: Thanks, @Mark,
I will keep this in mind for a refactoring. Here the main goal was to have a procedure that works for all versions.
I can probably make a customized procedure for 5.7, once its bahavior is stabilized; August 26, 2015 at 12:07:00 PM GMT+2