Monday, December 20, 2010

Looking for a hack - Passing comment-like info through the binary log

hacker I am facing an interesting problem. I need to mark somehow a statement in such a way that the comment is preserved through the binary log.
I don't have control on how the statement is generated or using which client software. For the sake of example, let's say that I need to mark a CREATE PROCEDURE statement in such a way that, if I extract the query from the binary log and apply it to another server, the information is still available.

Background

Normally, I would use a comment. The first thing I would think is
CREATE PROCEDURE p1(i int) select "hello" /* This is my text */
But most client libraries will strip it.
There was a clever trick by Roland Bouman that allowed users to bypass this limitation. You could use a qualified comment such as /*!999999 This is my test */, but unfortunately it only works in MySQL 5.0.x, while MySQL 5.1 strips everything down, even if the comment is a legitimate keyword.

create procedure p9 (j int) insert /*!50000 INTO */ t1 values (j) ;
Query OK, 0 rows affected (0.00 sec)

 show create procedure p9\G
*************************** 1. row ***************************
           Procedure: p9
            sql_mode: 
    Create Procedure: CREATE DEFINER=`msandbox`@`%` PROCEDURE `p9`(j int)
insert  INTO  t1 values (j)
character_set_client: latin1
collation_connection: latin1_swedish_ci
  Database Collation: latin1_swedish_ci
1 row in set (0.00 sec)

Other tricks

Something else that I have tried: I can inject a query before or after the one that I need to monitor.
create table if not exists comments (t varchar(100)) engine=blackhole;
 
update comments set t='the next statement is what I need';
create procedure p1(i int) insert into t1 values (j);
update comments set t='the previous statement is what I need';
This approach does possibly introduce some overhead.

Or I can add a comment clause in the statement.
create procedure p1(i int) comment="this is what I need" insert into t1 values (j);
This approach requires parsing the SQL, and dealing with artistic indentation and usage of other options in the query. And if I need to deal with commands that don't support the "comment" option, I am back to square one.

Advice wanted

So far, the only method that works almost always is the blackhole trick (1)
I would like to know if there is any method of introducing a piece of information related to a given statement, in such a way that the comment survives after one of the following:
  • The binary log is converted to queries and passed to a MySQL client that applies the stream of queries to another server.
  • The binary log is associated with another master, and then passed to a slave through regular replication.
In both cases, I need to find the original information in the last server's binary log.

As a related matter, I know that MySQL, in regular replication, passes some information across binary logs, and that information is the server-id. If I set an intermediate server as relay slave, the server-id of the original master is associated with the query recorder in the binary log of every slave. I don't know if I can use this information for my purposes, but I would like to know how does the replication process maintain the server ID across servers.

Maybe it's too late for me and I can't see an obvious solution. I will appreciate any suggestion. Thanks in advance

(1) If the blackhole is disabled, the method fails, or introduce unacceptable overhead.

15 comments:

Mark Callaghan said...

I want a similar feature but for insert, update, delete and replace statements. We should talk about this in 2011.

Robert Hodges said...

@Mark - We have a couple of tricks for Tungsten that we use to support multi-master. For DDL we add a comment to the end of the statement. For DML we add an extra update at the end of the transaction to a table that we can parse. This also works for row replication, which is nice. The nastiness is really DDL statements like CREATE PROCEDURE that are auto-commit *and* strip comments.

shantanu said...

You can change the case of one of the keywords for e.g. check how insert word is written...

create procedure p9 (j int) iNsErT /*!50000 INTO */ t1 values (j) ;

Mats Kindahl said...

There is WL#4033 that adds informational log events to the binary log. It is mainly intended to be used to add the original statement for row-based replication, but with the addition of WL#5404 there is a way to maintain comments in the binary log and inspect them using mysqlbinlog. What is lacking though is a way for a user to add their own "comment events" to the binary log.

Mark Daems said...

A very dirty and not so practical trick : add "and 'comment'!='the real comment'" to your query. Just make sure the condition always evaluates to true.
I think the parser is smart enough to ignore this condition. Should be query cache safe too.

Mark

Giuseppe Maxia said...

Thanks for all the comments.

@shantanu,
Mixed case in a keyword is a nice idea, but it does not produce enough meaningful information for my needs.

@Mark_Daems,
This trick does not work with DDL (where I have the problem at the moment). Also, regular queries don't always end with the WHERE clause, but they can have a GROUP BY, ORDER BY, and LIMIT. So, to apply this trick I would need to parse SQL, which is an expensive task that I can't afford at this stage.

Unknown said...

Hi Giuseppe,
I hope you're enjoying life post-Oracle!
You mention the ServerId. This is stored in the header of Binlog events, and when the Slave SQL thread is applying an event, it temporarily sets its thread-local ServerId to the received ServerId. This causes the Binlogged events to carry the originator's ServerId. In the Cluster branches (5.1-telco-X.Y) we recently added a MySQLD server option --server-id-bits=n. This causes the slave code to only use the n lsbs of the ServerIds received in the header of Binlog events as ServerId, for the purposes of loop detection etc. The full 32 bits are 'passed through' the Server and written to the Binlog entries (if log-slave-updates is true).

So for example, setting this to 24 gives 32-24 = 8 bits of 'user' information per Binlog event.

Cluster uses RBR and injects its own events into the Binlog, so it's easy to set these 'user' bits. Also, Cluster provides a row change listening Api, so the 'user bits' can be extracted by users on the slave(s). Normal MySQLD might need a session variable or somesuch to set and read the values.

This is by no means a pleasant mechanism, but it can let you transport some bits of extra information for *existing events*.

Ideally the Binlog format could be extended to allow some sort of user comments to travel with Binlog events, but who knows when that will happen. Perhaps to encourage it, we should list some use-cases for this mechanism.

Frazer

Mark Callaghan said...

The binlog format has always been extensible in theory (more attributes could be added). It should now be extensible in practice because we found/fixed a bug that prevented it while doing global transaction IDs for the Google patch and that bug was also fixed in official MySQL. Filtering on attribute values is more efficient and accurate than parsing query comments, but doing that also requires changes to the server that others might not want to rely on.

Mark Callaghan said...

@Robert -- are the comments stripped on the server? The mysql command line client has code to strip comments before sending SQL to the server. That really makes life hard for people who use query comments to debug production problems. In the Facebook and Google patch we changed the mysql command-line client to not strip comments by default.

Robert Hodges said...

@Mats
It looks as if there is room for a couple of useful patches here. One is to set the server ID externally, e.g., as a session rather than global value. The second (and far more useful) is to add metadata in the form of name/value pairs to specific transactions. Tungsten has this feature in its log. We use it to track things like MySQL server ID, shard ID, etc. Here's an example.

SEQ# = 19 / FRAG# = 0 (last frag)
- TIME = 2010-12-20 11:51:09.0
- EVENTID = 001903:0000000000002902;29495
- SOURCEID = logos1-u1
- STATUS = COMPLETED(2)
- SCHEMA = test
- METADATA = [mysql_server_id=14;service=sjc_west_w;shard=test]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [##charset = ISO8859_1, autocommit = 1, sql_auto_is_null = 1, foreign_key_checks = 1, unique_checks = 1, sql_mode = '', character_set_client = 8, collation_connection = 8, collation_server = 8]
- SQL(0) = create table foobar (id int) /* ___SERVICE___ = [sjc_west_w] */

In MySQL it might amount to special session variables that get added into the log automatically for each statement, which addresses current problems with auto-commit DDL as well as RBR updates. We're up for helping on this at Continuent.

Robert Hodges said...

@Mark
Comments are not stripped for DDL in general on the server side if you get them past mysql comment handling. Marcus' drizzle JDBC thankfully does not alter SQL, so that has not been a big problem in operation for Tungsten.

Within CREATE PROC/TRIGGER/FUNCTION it appears they are stripped inconsistently. Guiseppe saw differences between 5.0/5.1, so I'm expecting we'll find more corner cases as we dig through specific DDLs. This was not an interesting issue before we did multi-master, which is why we are learning about it now.

Mark Callaghan said...

Here is another interesting example. In some cases, transaction start is not indicated by "BEGIN" in the binlog.

Justin Swanhart said...

You can avoid using the blackhole engine by just creating a temporary table and referencing session variables. This will result in easy to parse 'SET' statements that don't have varying whitespace, etc. You have to revert to statement based replication though (if only temporarily):

node1 [localhost] {msandbox} ((none)) > set binlog_format=statement;
Query OK, 0 rows affected (0.00 sec)

node1 [localhost] {msandbox} ((none)) > select name_const('i_want_this_in_the_binlog','This is my message 0x12345');
+----------------------------+
| i_want_this_in_the_binlog |
+----------------------------+
| This is my message 0x12345 |
+----------------------------+
1 row in set (0.00 sec)

node1 [localhost] {msandbox} ((none)) > create temporary table `no$exist`.tmp select @i_want_this_in_the_binlog;
Query OK, 1 row affected (0.01 sec)
Records: 1 Duplicates: 0 Warnings: 0

# at 116767
#101221 21:12:04 server id 101 end_log_pos 116851 User_var
SET @`i_want_this_in_the_binlog`:=_latin1 0x54686973206973206D79206D6573736167652030783132333435 COLLATE `latin1_swedish_ci`/*!*/;
# at 116851
#101221 21:12:04 server id 101 end_log_pos 116981 Query thread_id=352 exec_time=0 error_code=0
SET TIMESTAMP=1292994724/*!*/;
SET @@session.pseudo_thread_id=352/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=8/*!*/;
create temporary table `no$exist`.tmp select @i_want_this_in_the_binlog
/*!*/;
# at 116981

Unknown said...

See Monty Program Work Log #47 "Store in binlog text of statements that caused RBR events":

http://askmonty.org/worklog/Server-Sprint/?tid=47

Here's a patch for 5.1.52:

http://bazaar.launchpad.net/%7Epercona-dev/percona-server/release-5.1.52-12-rnt/annotate/head%3A/wl47.patch

Mats Kindahl said...

@Robert

We have actually discussed that internally as well, and I was considering ways of adding more structure to the comment. One of the Replication team members (Sven) suggested to use JSON, which I think would be really cool. It's a very basic language, roughly as expressive as XML, but not as chatty.