Saturday, June 06, 2009

Unreliable androids

There is an article with a cute title that is gaining popularity:
Do Androids Count Electric Sheep with DB2 or MySQL?
Allegedly, DB2 is demonstrated as incredibly faster than MySQL, with a benchmark that repeats the same COUNT query 100 times.

This is a naive (at best) benchmark that doesn’t tell me nothing about the database potential. But anyway, if you enable the query cache in MySQL, the repetition of 100 queries is at least three times faster than DB2.

Try
set global query_cache_size=1024*1024;
and repeat that test.

Moreover: the table structure doesn’t correspond to the data from the freebase project.

CREATE TABLE `people` (
`id` int(11) DEFAULT NULL auto_increment PRIMARY KEY,
`name` varchar(255) NOT NULL,
`fbid` varchar(255) NOT NULL,
`gender` varchar(255),
`profession` varchar(255)
) ENGINE=InnoDB

Too bad there is no integer ID in that table, and there are 23 fields instead of 5.
These androids don’t strike me as extremely reliable.

A cross-system benchmark is reliable only if you either (a) know both systems very well or (b) you invite one expert of the systems involved to make sure you are using both of them in a fair way.
Oh, and if you can make a table that has smarter fields than VARCHAR(255), your benchmark may look more credible.

6 comments:

JanL said...

Making your database 10x faster might even not show up in an end-to-end profile on a website (c.f. Benchmarks: You're Doing it Wrong)

Antonio Cangiano said...

Giuseppe, thank you for joining the discussion. Let me repost a comment I added as a reply to my own blog entry. It may address some of the concerns mentioned here or at least provide some rationale for them. Here goes:

Guys, I just want to clarify a few points. InnoDB was used because it's the default - and most widely adopted - engine in the Rails community. You may notice that an id column was added for ActiveRecord, and that I only used 4 columns from the initial dataset for the sake of simplicity. To keep things simple, I used the default limits which are provided by ActiveRecord (e.g., varchar(255)). This is a simple, micro-benchmark which doesn't need to prove much. It simply shows that with this data, under these conditions, DB2 is much faster than MySQL.

The reasons for choosing DB2 over MySQL are many, and they are not all related to performance. And even when considering performance alone, this post didn't set out to prove that DB2 is faster than MySQL in general. There are industry benchmarks that cost millions of dollars to run that take into account all the possible moving parts. You can't expect that from a simple post. I observed some data under particular circumstances and reported the outcome. While it is undisputed that DB2's counting abilities are still faster than MySQL with the InnoDB engine, this post only claims that with this data, for the query used, DB2 was faster than MySQL. And I think that's fair.

The "100 times repetition" freaked a few people out. I understand that. Please bear in mind though that I went for 100 iterations only after having seen somewhat comparable times obtained for the first execution of each query (i.e., n=1).

PS: The Freebase project was only used so as to have a lot of "real" records.

Mark Callaghan said...

Yet another example of bad benchmarking. If you are not willing or able to explain your results, then you shouldn't publish them. My guess in this case is that DB2 has a much larger buffer cache than InnoDB on this test.

Antonio Cangiano said...

Mark, how would you test them?

Mark Callaghan said...

Antonio -- the test is fine. You should explain the 10x difference in results. It is easy to understand a 10x difference if DB2 has cached the table(s) in the DBMS buffer cache and InnoDB has not. But if both have cached the tables or both are scanning all data from disk and DB2 is 10x faster then that would be very interesting.

Mark Callaghan said...

I checked the results for InnoDB on my hardware and ~0.4 seconds per query seems reasonable. So maybe Antonio has found something that we need to investigate and fix.