Friday, October 11, 2013

Tungsten Replicator Filters: A trove of golden secrets unveiled

UPDATE: This post is obsolete, for most of the URLs referenced here have been removed by the contents owner.


Since I joined the company in late 2010, I have known that one of the strong points of Tungsten Replicator is its ability of setting filters. The amazing capabilities offered by Tungsten filters cannot be fully grasped unless we explain how stage replication works.

There are several default stages in the replication stream. Every stage has an extraction task and an apply task. The extraction task will get data from the previous step repository and the apply task will save the data to the next repository, which can be either a temporary storage (memory queue, THL file) or the final destination (slave database server). Consider that the architecture allows developers to add stages, and you will appreciate its full power. For every stage, we can insert one or more filter between the two tasks. This architecture gives us ample freedom to design our pipelines.


Tungsten Replication stages

What’s more interesting, filters in Tungsten can be of two types:

  • built-in filters, written in Java, which require users to compile and build the software before the filter is available;
  • Javascript filters, which can do almost everything the built-in scripts can do, but don’t require recompiling. All you need is deploy the filter in the appropriate folder, and run a tpm command to install or update the replicator.

The Tungsten team has developed a large number of filters, either to achieve general purpose pipeline control (such as renaming objects, excluding or including schemas and tables from replication) or filters providing very specific tasks that were needed to meet customer requirements or to help implementing heterogeneous pipelines (such as converting ENUM and SET columns to strings, dropping UPDATE and DELETE events, normalize DDL, etc). There are 23 built-in and 18 Javascript filters available. They were largely undocumented, or sparsely documented in the code. Now, thanks to my colleague MC Brown, Continuent director of documentation, the treasure is there for the taking.

Tungsten filters are a beautiful feature to explore, but also a powerful tool for users with special needs who want to create their own customized pipeline. A word of warning, though. Filters are powerful and sometimes fun to implement and use, but they don’t come free. There are two main things to keep in mind:

  • Using more than one filter for the same stage may result in performance loss. The amount of performance loss depends on the stage and on your systems resources. For example if your replication weakest point is data transfer across the network, adding two or three filters to this stage (remote-to-thl) will make things worse. If you can apply the filter to the previous or next stages, you may achieve your purpose without many side effects. As always, the advice is: “test, benchmark, repeat.”
  • Filters that alter data (such as exclude schemas or table, drop events) will make the slave unsuitable for promotion. You should always ask yourself if the filter may make the slave a lousy master, and if it does, make sure you have at least another slave that is replicating without filters.

As a final parting thought, be aware that yours truly and the above-mentioned MC Brown will be speaking at Percona Live London 2013, with a full, take-no-prisoners, 6 hours long complete Tungsten Replicator tutorial, where we will cover filters and give some explosive and sensational examples.

Wednesday, October 09, 2013

Speaking at the MySQL NoSQL & Cloud Conference & Expo in Buenos Aires

I am on my way to Argentina, where I will be speaking at the MySQL NoSQL & Cloud Conference & Expo.

I have two talks: one on my pet project MySQL Sandbox and one on replication between MySQL and MongoDB (using another project dear to me, Tungsten Replicator.

It’s my first visit to Argentina and I will try to look around a bit before the conference. And I look forward to see many ex colleagues and well known speakers at the conference. The lineup includes speakers from Percona, EffectiveMySQL, PalominoDB, MariaDB, SkySQL, Tokutek, and OpenStack.

I am looking forward to this trip. My presentation on MongoDB replication is a first for me, and I am always pleased when I can break new ground. I have the feeling that the networking is going to be more exciting than the sessions.

If you are in Buenos Aires, come to the conference and say hi!