Sunday, December 04, 2005

MySQL 5 general purpose routine library - III : Using "FOR EACH" loops

(Third part of the MySQL 5 general purpose routine library presentation.)
How many times, while using a MySQL client, you were facing a problem that seemed quite simple to solve, if only you could perform a loop of some sort without creating an external script?
I often found myself doing things like
  $ for $X in `mysql -B -N -e 'show tables'` ; \
do perl -pe "something with $X" | \
mysql -e "something else with $X" ; done
Or perhaps I write a quick Perl script to achieve the result I need. But how much better it would be if I could use a FOR EACH loop from mysql command line! Hence, when stored procedures became available, I started planning this kind of features, and now I can say that I have made my life simpler with these powerful language extensions.

complete, normal, simple.

Each group of routines has one "complete" method, i.e. a routine with as many details as it is humanly bearable. This routine is called for_each_SOMETHING_complete. Then there is another routine that does the same thing, but without so many arguments, and it is called for_each_SOMETHING. In some cases, it is possible to reduce the number of parameters even more, an then you'll find a routine called for_each_SOMETHING_simple. The reference guide has the details of what you can do with each one of them.

placeholders

Within the loop, there are elements that you may want to use in your SQL statements. For this reason, in almost each routine of this class you are provided with placeholders that refer to such elements. For example, when using a counter, there is a placeholder for the counter itself, when looping through a table list, you have placeholders for the table name, the database name, the engine type, and so on. Again, the reference guide tells you which placeholders are available for each routine.

user variables

Sometimes, there are too many details that you should take care of, but there are already so many parameters in your routine, that adding more would make its convenience fade away. Instead of adding parameters for every small detail of your routine, we prefer passing these less important arguments as user variables. For example, one routine allows up to 6 different placeholders, and every one of them can be customized. Rather than adding 6 more parameters, you can change one or more of those placeholders by setting the appropriate user variable, as explained in the reference guide.

For each table

The idea of this routine is to provide a database name, and to execute an SQL command for each table of the database, possibly with a filter to choose for which tables the task should be carried out. This option is easier if we take into account the tables available in MySQL 5 INFORMATION_SCHEMA database. Using the information listed in INFORMATION_SCHEMA.TABLES, it is easy to filter tables by name, type, engine, number of rows.
The easiest call is for_each_table(database_name, condition_text, sql_command). The first parameter is self explanatory. The second one is a condition that applies to INFORMATION_SCHEMA.TABLES. For example, if you want to limit the command applicability to the tables with a name starting by 'emp', then set condition_text to 'table_name like "emp%"'. Notice my usage of single and double quotes. Since you are passing a string, every quote inside the string must be escaped, or you should use different quote types. The last parameter, sql_command, is the query you need to execute for each table. This query, to be really effective, needs to use some table attributes, and this feat can be achieved by using placeholders.
A simple example:
    call for_each_table(
'myoffice',
'table_name like "tmp%"',
'drop table $D.$T');

WARNING! You can really do this kind of commands, and you can drop all tables in your database in a second. Use this function with extreme care!

Another example
    call for_each_table(
'myoffice',
'table_name like "emp%"',
'create table another_db.$T like $D.$T' );

This one will create a copy of all tables with name starting by 'emp' in another_db.
If your requirements are more complex, then you may want to try for_each_table_complete, which works like [RP]for_each_table, but has three additional parameters: a query to execute before the main command, another to execute after the main command, and a modifier telling if such queries should be executed before and after the loop ('once') or for each command in the loop ('many'). Let's see an example first:

    call for_each_table_complete(
'world',
'table_name !="mycounts"',
'insert into world.mycounts select "$T", count(*) from $D.$T ',
'create table if not exists world.mycounts(t varchar(50), nrows int)',
'select sum(nrows) from world.mycounts',
'once' );
The first three parameters work just like for_each_table. The fourth parameter is a query to be executed before the main command. As you see, it creates a table used by the main command to insert some values from each table. The fifth parameter is a query to get the summary of the operation. The last parameter says that these two queries should be executed just once, not within the loop.
Let's see an example of these two different behaviours:
    call for_each_table_complete(
'world', -- database
'', -- no filter
'select @SUM := @SUM + ( select count(*) from $D.$T )', -- main command
'select @SUM := 0', -- before
'select @SUM', -- after
'once' ); -- modifier
First, let's call the routine with a 'once' modifier, and see what happens:
+-----------+
| @SUM := 0 |
+-----------+
| 0 |
+-----------+

+----------------------------------------------------+
| @SUM := @SUM + ( select count(*) from world.City ) |
+----------------------------------------------------+
| 4079 |
+----------------------------------------------------+

+-------------------------------------------------------+
| @SUM := @SUM + ( select count(*) from world.Country ) |
+-------------------------------------------------------+
| 4318 |
+-------------------------------------------------------+

+---------------------------------------------------------------+
| @SUM := @SUM + ( select count(*) from world.CountryLanguage ) |
+---------------------------------------------------------------+
| 5302 |
+---------------------------------------------------------------+

+------+
| @SUM |
+------+
| 5302 |
+------+
What happened was that the variable @SUM was set before the loop, and it got increased by the number of rows in each table. The final query gave us the total. Let's see what happens if we call the same routine with a 'many' modifier:
    call for_each_table_complete(
'world', -- database
'', -- no filter
'select @SUM := @SUM + ( select count(*) from $D.$T )', -- main command
'select @SUM := 0', -- before
'select @SUM', -- after
'many' ); -- modifier

+-----------+
| @SUM := 0 |
+-----------+
| 0 |
+-----------+

+----------------------------------------------------+
| @SUM := @SUM + ( select count(*) from world.City ) |
+----------------------------------------------------+
| 4079 |
+----------------------------------------------------+

+------+
| @SUM |
+------+
| 4079 |
+------+

+-----------+
| @SUM := 0 |
+-----------+
| 0 |
+-----------+

+-------------------------------------------------------+
| @SUM := @SUM + ( select count(*) from world.Country ) |
+-------------------------------------------------------+
| 239 |
+-------------------------------------------------------+

+------+
| @SUM |
+------+
| 239 |
+------+

+-----------+
| @SUM := 0 |
+-----------+
| 0 |
+-----------+

+---------------------------------------------------------------+
| @SUM := @SUM + ( select count(*) from world.CountryLanguage ) |
+---------------------------------------------------------------+
| 984 |
+---------------------------------------------------------------+

+------+
| @SUM |
+------+
| 984 |
+------+
Now the variable gets set to 0 for each loop, and thus the final value we get is not the total, but just the number of rows of the last table processed.
If you look at the reference guide for for_each_table_complete, you'll see that there are many more things you can do. Do some experiment and you'll realize that this routine, coupled with some imagination, can do really wonderful things.

For each counter

This is the classic "for loop" that you can find in programming language such as C, Perl, Java. You have a counter with a starting and ending value and a delta value to increment the counter. As in the for_each_table routine, you have a for_each_counter and a [RP]for_each_counter_complete to play with. The principle is the same: the normal routine gets just the minimum parameters to get the job done (counter_start, counter_end, counter_delta, sql_command) and the "complete" version has the same additional parameters seen for for_each_table_complete (sql_before, sql_after, {'once', 'many'} ).
Here is some simple usage of this routine:
 call for_each_counter(1,4,1, 'create table test.test_$N(i int)');
Query OK, 0 rows affected (0.06 sec)

show tables from test like 'test%';
+------------------------+
| Tables_in_test (test%) |
+------------------------+
| test_1 |
| test_2 |
| test_3 |
| test_4 |
+------------------------+

call for_each_counter(1,400,1, 'insert into test.test_1 values ($N)');
Query OK, 0 rows affected (0.08 sec)

select count(*) from test_1;
+----------+
| count(*) |
+----------+
| 400 |
+----------+
Here's another common example from real life experience. One of the most frustrating problems is when you have a table with date values, and you are supposed to have values for every date in a given period. To check that this is correct, you should either use an external program, or, if you have a for_each_counter routine, you can get a quick solution:
 create table test.mydates(d date);

call for_each_counter(1,365,1,
'insert into test.mydates values ("2005-01-01" + interval $N day)');

select d from mydates limit 10;
+------------+
| d |
+------------+
| 2005-01-02 |
| 2005-01-03 |
| 2005-01-04 |
| 2005-01-05 |
| 2005-01-06 |
| 2005-01-07 |
| 2005-01-08 |
| 2005-01-09 |
| 2005-01-10 |
| 2005-01-11 |
+------------+
10 rows in set (0.01 sec)
Now all you need to do to check that your original table is complete, is a RIGHT OUTER JOIN with mydates and you get a list of the missing dates very quick.
   SELECT
d
FROM
mydates
LEFT OUTER JOIN my_real_table on (d=date_column)
WHERE
my_real_table.PK_column is null;

For each dataset row

This is possibly the most complicated routine in this collection. It allows you to walk through a dataset and execute a query for each row, referencing in your query up to three values per row. Does it sound complicated? It is. But it is also powerful, and it works. The syntax for for_each_table_value is
  call for_each_table_value(
database_name,
table_name,
wanted_column1,
wanted_column2,
wanted_column3,
search_condition,
sql_command);
database_name and table_name are just the identifiers of the place from where we get the values. wanted_column1, wanted_column2, and wanted_column3 are the columns you want to get from the table to use in your query. search_condition is the WHERE clause to select which rows we should get from the table. And sql_command is the query where you can use placeholders to reference database and table ($D, $T), each of the three columns ($I1, $I2, $I3) and a counter ($N).
Here's an example. It may seem silly, but this is not something you can easily do in normal SQL.
  call for_each_table_value(
'world', -- database
'City', -- table
'District', -- first column
'Name', -- second column
NULL, -- third (not needed) column
'CountryCode="ITA" limit 10', -- filter
'create table world.$I2(`$I1` varchar(50))');
This query will create a table named after a city, having a column named after the city's district.
 show tables;
+-----------------+
| Tables_in_world |
+-----------------+
| Bari |
| Bologna |
| Catania |
| City |
| Country |
| CountryLanguage |
| Firenze |
| Genova |
| Milano |
| Napoli |
| Palermo |
| Roma |
| Torino |
+-----------------+

desc Firenze;
+---------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| Toscana | varchar(50) | YES | | NULL | |
+---------+-------------+------+-----+---------+-------+

desc Genova;
+---------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| Liguria | varchar(50) | YES | | NULL | |
+---------+-------------+------+-----+---------+-------+
Using the same method, you can drop those tables at once:
  call for_each_table_value(
'world',
'City',
'Name',NULL,NULL,
'CountryCode="ITA" limit 10', 'drop table world.$I1');

show tables;
+-----------------+
| Tables_in_world |
+-----------------+
| City |
| Country |
| CountryLanguage |
+-----------------+

For each array item

If you have seen the documentation about the array routines, you knew that this one was coming. Executing a loop for each element of an array is a common programming need, and this was in my mind since I started planning the array routines.
The simplest case is for_each_array_item_simple, which requires an array name and a sql command. Then the routine walks through the array items, executing the SQL command for each one of them, allowing the SQL to reference the item and the counter using placeholders.
So, for instance, if your array 'colors' contains three values: 'red', 'blue', and 'white', calling
    for_each_array_item_simple('colors','select "$N","$I"')
will execute three times, printing each array item to the screen.
A more advanced routine, for_each_array_item, will take two more parameters to mark the minimum and maximum array index to use in the loop.
As you may guess, having seen the previous routines, for_each_array_item_complete will also take parameters for queries to be executed before and after the main command.

For each list item

Last, but not least, routines for treating lists are the most dynamic of them all. You know that you can execute a query for each item of an array, and you know (if you don't, see array_from_list) that you can create an array from a comma separated list. Therefore, you can execute an SQL command from a list converted into an array. Rather than forcing the user to write
    call for_each_array_item(
array_from_list('red,blue,white','colors'),'select "$N","$I"' );
there is a routine for_each_list_item that does just that behind the scenes.
    call for_each_list_item('red,blue,white', 'select "$N","$I"' );
which does the same thing, with less effort.
As usual, there is also a for_each_list_item_complete to do additional wonders, as the reference guide will tell you.

No comments: