Thursday, April 19, 2012

A few hacks to simulate mysqldump --ignore-database

A few days ago, Ronald Bradford asked for a mysqldump –ignore-database option.

As a workaround, he proposes:
mysqldump --databases `mysql --skip-column-names \
   -e "SELECT GROUP_CONCAT(schema_name SEPARATOR ' ') \
   FROM information_schema.schemata WHERE schema_name \
   NOT IN ('mysql','performance_schema','information_schema');" \
   >` >/mysql/backup/rds2.sql

It's a clever solution, but unfortunately it only works if you have a handful of schemas. If your databases happens to have several dozens (or hundreds or thousands) of schemas (which is where you need this option more), then the output will be truncated to the length of group_concat_max_len (by default, 1024.)

There are two alternative methods.

The all-shell method

This method lets shell commands filter the wanted databases. In its simplest way, it goes
DATABASE_LIST=$(mysql -NBe 'show schemas' | grep -wv 'mysql\|personnel\|buildings')
mysqldump --all-databases $DATABASE_LIST

Notice that, when you use --all-databases, information_schema and performance_schema are filtered off by default.

This method works, because the default length of the command line in Unix is much longer than group_concat_max_len:

$ getconf ARG_MAX
131072  # Linux

$ getconf ARG_MAX
262144  # Mac OSX

(in Windows it's much shorter: 8191, but since I haven't used Windows for ages, I don't really care).

A more elaborate method would require a scripting wrapper around the above instructions, but I think that as it is, it's simple enough to be remembered.

The options file method

If the length of all your database names combined is more than the maximum allowed by the shell, and all you want is filtering a few databases off a huge list, there is still hope.

Let's assume that you want to ignore mysql, personnel, and buildings from your backup.

echo '[mysqldump]' > mydump.cnf
mysql -NBe "select concat('ignore-table=', table_schema, '.', table_name) \
  from information_schema.tables \
  where table_schema in ('mysql', 'personnel', 'buildings')" \
  >> mydump.cnf

Now the options file looks like this:

[mysqldump]
ignore-table=mysql.db
ignore-table=mysql.host
ignore-table=mysql.user
[...]
What we need to do is tell mysqldump to get its information from this options file, and it will duly skip all the tables that are listed in there.
mysqldump --defaults-file=./mydump.cnf  -u $DBUSER -p$DBPWD --all-databases
There are two drawbacks with this approach:
  • There will be a DROP DATABASE IF EXISTS and CREATE DATABASE for each of the excluded schemas, although no tables will be dumped.
  • This method only works with --default-options-file. Theoretically, it should also work with --default-extra-file, but it doesn't. Therefore, if you are relying on an options file for connection parameters, they should be added to this file or listed in the command line.
Summing up, there is no perfect solution, but there are enough hacks available that you probably would find what suits you.

Tuesday, April 17, 2012

Some lessons from MySQL Conference 2012

The Percona Live MySQL Conference and Expo 2012 is over. Together with the SkySQL solutions day, it has kept me occupied for 4 full days, from early morning to late at night.

I have to say that I am pleased. The quality of the organization was very high, with a very good lineup of speakers and an excellent technical support.

As usual, I have learned a lot during this week, either directly, by attending talks, or indirectly, by meeting people who told me what was juicy at the talks that I had missed. And I have met new interesting people, and caught up with the people that I know already.

This conference was particularly intense also because I got myself involved in 5 talks, which was probably more than I should have. How did I end up with such a task? It's a long story.

It all started when the CfP opened. In the review committee, we all knew that Oracle was not eager to participate, but we hoped that it would change its mind and send someone in the end. So we planned ahead, and some of us proposed talks aimed at beginner and intermediate users, with topics that are usually best covered by the people who work at the MySQL team. I proposed Replication 101 and What's new in MySQL 5.5 and 5.6 replication, with the idea that I would hand them over to a couple of Oracle engineers, or have them as co-speakers. That, however, didn't happen. So I had to prepare and present these two talks, in addition to the one that I wanted to do on my own (Testing MySQL creatively in a sandbox).

That makes 3 talks. Then I got tasked with organizing the lightning Talks, which is not a big deal per se, but it adds to the global effort. 4 talks.

And finally, SkySQL organized another beautiful conference on Friday, and I got to present a fifth talk. I enjoyed every bit of them, but boy! the conference was intense!.

I have learned not only from the talks that I have attended, but also from the preparation of my own talks. The biggest source of surprises was my talk about MySQL 5.6 replication. I was expecting a mature release, but I found a collection of features that don't play very well together, and can sometimes lead to an unstable server. Since I was trying to get my demos working, rather than isolating the bugs, I didn't submit any reports, but I will come back to that version and do a more thorough analysis as soon as I catch up with my day-by-day work.

Speaking about demos, it's quite common for me to include a demo in a technical talk. First, because getting a demo done will make me better acquainted with the features that I am presenting, and also because a presentation with a demo conveys the idea of a mature and reliable product (or the idea that I, as the speaker, know what I am talking about). Either way, I know prepare a demo for every talk where I have sufficient time to show one, and sometimes even for a lightning talk. So it was surprising to hear comments that praised my talks because they contain demos. Is this practice so unusual? I should start taking count of how often this is done.

My most satisfactory demo (and the one that almost got me in trouble) happened at the last talk, on Friday, when I had to show features from three different Tungsten topologies, using three separate remote clusters. For these demos to be successful, I needed good internet connection, a solid confidence in the product and the strength of its tests, and to remember the sequence of operations for each demo. To my surprise, everything went so smoothly, that someone in the audience thought that I was running a simulation in my laptop, instead of interacting with servers that were 10,000 Km away. So much for my rehearsals! I must remember to add at least a tiny mistake in an otherwise perfect sequence of tasks, to make the audience aware that I am playing live.

The slides for my presentations are available at Slideshare.