Tuesday, March 15, 2011

Quick benchmarking trick

I have been doing quite a lot of benchmarking recently.
I needed to find a safe way of measuring the time spend by the database doing a long task, like catching up on a huge backlog of accumulated replication updates. The problem with measuring this event is that I can record when it starts, but I can't easily detect when it finishes. My initial approach was to monitor the database and count the tables rows to see when the task was done, but I ended up affecting the task performance with my additional queries. So I thought of another method.
Since I had control on what was sent from the master to the slave, I used the following:
The initial time is calculated as the minimum creation time of the databases that I know are created during the exercise. Let's say that I had 5 databases named from db1 to db5:
set @START = (select min(create_time) from information_schema.tables where table_schema like "db%")
Then, to make sure that I catch the exact moment that the task is finished, I added to the master a command for each database:
create table db1.last_table (i int);
create table db2.last_table (i int);
create table db3.last_table (i int);
create table db4.last_table (i int);
create table db5.last_table (i int);
To know if the task is done, I query the database as follows:
select count(*) from information_schema.tables where table_schema like "db%" and table_name="last_table";
If the count is less than 5 (the number of databases that were in my binary logs), I wait more.
Finally, when the count matches the expected one, I get the end time:

set @END = (select max(create_time) from information_schema.tables where table_schema like "db%" and table_name="last_table"');
Now I have two values, @START, and @END
select timediff(@END,@START) as elapsed;
+----------+
| elapsed  |
+----------+
| 00:09:44 |
+----------+
It does not matter if I query the database immediately, or hour after coming back from my errands. Using the table creation times makes sure that I get a clean start and finish time.
I put all the above in a script, and I can check the elapsed time without fear of mistakes.

1 comment:

Ilan said...

Hi,
If I understand:
I should add a create table command in the master at the beginning of the process and another one at the end of the process.
Then all I should do, when the replication is done, is to run the following queries:
set @START = (select min(create_time) from information_schema.tables where table_schema like "db%");

set @END = (select max(create_time) from information_schema.tables where table_schema like "db%" and table_name="last_table"');

select timediff(@END,@START) as elapsed;

Am I right?
Thanks