Tuesday, May 17, 2011

The price of safe data - Benchmarking semi synchronous replication

Some time ago I wrote about MySQL 5.5 semi-synchronous replication. Since then, I have wanted to benchmark the overhead of semi-synchronous replication with a decent server. Now the occasion presented itself, thanks to some related business that I had to benchmark, and thus I did a few simple runs with and without semi-synchronous replication enabled, to see the impact of this feature on performance. If you haven't read the article on semi-synchronous replication, the bottom line is that, with this feature enabled, the master waits until at least one slave has acknowledged receipt for the data before returning a positive result to the client. This means that for each commit there are two network calls between master and slave. My gut feeling was that this feature would be costly in terms of query response time, although I was not prepared to such a big impact as I found out in my test. I needed a substantial set of data, and I got it by exporting the employees table from the employees test database, using one INSERT per record. Thus, I had about 300,000 records, which are a fair amount for this kind of test. Had I sent the records in a big multiple insert chunk of 10,000 records each, I would have had only 30 commits, which would not have been easy to measure. So, here goes.

regular replication
$ time mysql < employees.sql 

real 0m27.997s
user 0m1.394s
sys 0m1.046s

semi-synchronous replication
$ time mysql < employees.sql 
real 1m24.277s
user 0m3.842s
sys 0m6.270s

Semi-synchronous replication was three times slower than regular replication. The test was taken using one master in one host and one slave in two more hosts. The measurements were the same if I had only one or both slaves enabled. Using row-based replication instead of statement-based did not make any substantial impact. Now my question is: who would be prepared to accept such a performance impact for the sake of more data safety? Data is important, but response time to customers is also important. Your mileage may vary. I know many customers who would think twice before accepting this onerous trade off. I am curious to know what experience others have had with this feature, and how much performance they are willing to sacrifice for safety.

Friday, May 06, 2011

Open Database Camp 2011 opens today!

Open Database Camp 2011 The Open Database Camp 2011 opens today with the Welcome Party, starting today at 7pm CEST. The party (with good Italian food and drinks) is open to all the ones who have registered in the Attendees list.
By car you have to reach Pula, take Via Nora (Nora Street), than Via Sant'Efisio (Sant'Efisio Street), until the end, directly to the party location.
Organisers will also make a bus available on Friday 6 May, leaving from Pula Hotels (Nora Club Hotel - Villa Madau - Baia Di Nora - Is Molas - Marin Hotel - Is Morus Hotel) around 18:30 and reaching Nora. From Nora back to Pula Hotels a bus will leave around 21:00 and 22:00.
The conference itself will start on Saturday, May 7th, at 9am. Travel arrangements to reach the venue are listed in the conference wiki (wiki: Travel).
There will be a bus collecting participants from the hotels at 8:30am and the same bus from the conference venue (Sardegna Ricerche) back to the hotel in the evening.

Sessions

Open Database camp is an Un-conference. As with the previous editions, the schedule will be decided on the spot, on Saturday. You can list your intended sessions and your wishes in the Sessions page.

Logistics

You will need an ID for your wifi access. (Sorry, it's a law requirement)(*). If you want your username and password, you should collect it at the reception as soon as possible.

Customized meetings

There will be room for 1:1 meetings between attendees, if you like to do so. You will find a board with the names of all attendees and their affiliation, and you can easily schedule a meeting with them.

(*) The law has expired but after having put the fear of the state into every internet provider, the lawmakers have not said how the new regulations should be applied. Regretfully, we have still to live with the old rules.

Monday, May 02, 2011

Introducing the Flying Clusters, and more than MySQL replication

Flying Clusters My Colleague Linas Virbalas has just crossed the boundary between real and virtual and has started a blog, titled Flying Clusters.
Linas is a gifted developer who is taking care of the special projects. One of such projects is replication between MySQL and PostgreSQL, which works quite well.
Another project, which has just started, is about providing PostgreSQL with Advanced Logical Replication using Tungsten replicator. As you probably know, recent versions of PostgreSQL can do physical replication, which has its pros and cons. With this project, PostgreSQL users can also have the choice of using logical replication. Not only that: since Tungsten Replicator already supports MySQL, cross-DBMS replication clusters are not far away.
We are also more ambitious, and we are exploring ways of replicating to NoSQL entities. We will start with MongoDB at a dedicated SQL to NoSQL Replication Hackathon, where we will attempt the creation of n applier for MongoDB during the Open Database Camp conference.