Friday, January 27, 2006

MySQL 5 general purpose routine library - VII: Testing and best practices

(Seventh part of the MySQL 5 general purpose routine library presentation.)

Test units

Testing is often considered a boring activity, something that is not cool enough for a developer to take into consideration.
Sometimes, it's true. Testing is almost as exciting as having a haircut or trimming your nails.
It needs to be done, but you hardly welcome the task.
However, that happens when you don't test with method. Because testing can be fun and it can be as rewarding as any other programming activity.

What you need to do, is to move your testing schedule up to the development schedule and put it in front of the code.That way, you don't have just "some testing", which is usually fumbling with a few examples to ensure that your routine does what you thought it should do.
That's not testing. That's sampling.
And if that fails, you will test even less after you fix the bug (or you think you did),
because that sort of testing is really boring and tiresome.

What you should do instead is to create testing units.
The practice of unit testing (http://en.wikipedia.org/wiki/Unit_testing) calls for the creation of a set of tests to probe all facets of the code you are about to check.
This practice is well known to Perl programmers, since almost all modules from Perl itself and from CPAN come with a standard set of tests. Some modules have even thousands of them.
The clever addition of recent practice, though, is not just writing the testing units,
but writing them before the actual code.
The idea is this: write your specifications as testing units, i.e. put down a stream of cases to prove that the code being produced meets your expectations. If your specifications are detailed and complex, so will be the testing units, which will include as many detail as your specification requires.
Using this approach, you will know that your routine works when it passes all the tests.

Unit testing in MySQL

Testing in Perl is a powerful experience. There are so many tools available, that testing it's becoming an art in itself. There are modules to help you test just about everything. Writing your tests in Perl becomes a very easy task, even pleasant sometimes.
Coming from such experience, the testing environment for MySQL stored routines looks just, well, lacking.
Therefore my first thought when I was facing the task was "let's do it in Perl", and so I did it.
My first test unit (for the arrays module) was in Perl. It did not take long to write, but I found myself writing quite a lot to describe how the tests work in Perl.
And then I realized that, for a non Perl programmer, installing the Perl DBI and DBD::mysql modules is not an easy task, and it can be actually a frightening experience.
The same goes for another well established testing environment for C and C++. Who is not used to compiling and linking a C program will find even a simple task very demanding.
Thus I came to the logical conclusion that testing for stored routines has to be done with MySQL resources alone. It is not very advanced, and it pales compared to Perl, but it can be enhanced with some well thought structures and routines.
So, here goes.
Let's see how unit testing can be performed with MySQL and the general purpose stored routines.

Testing support in mysql-sr-lib

To ease the task of testing, there are a few routines. The most important one, and perhaps the only one that you may need in setting a test, is log_test, which expects four parameters

procedure log_test(
p_description varchar(200),
p_result text,
p_expected text,
p_outcome boolean)
The first one is a description of what we are testing, perhaps the routine name, with some additional short comment. The second one is the result we are inspecting. The third one is a description of what we are expecting. The last one is a boolean, and it should receive an expression, the evaluation of which will tell us whether the test succeeded or not. I know that it sounds obscure, but it will become clearer with the examples below.
Furthermore, we have some generalized tests for something that is quite common during our testing tasks. It is very likely that, testing stored routines, we want to test if such routine exists. And it is equally common to test if a given table exists. So, there are procedures to perform these simple repetitive tasks: check_routine and check_table do the job for you, and if you look at the source code for them, you will see just a clear example of how a test should be carried out. If you have several routines to check at once (as it is the case of this collection, for instance) you can benefit from check_routine_simple and check_for_routines_existence. The first one will let you set a user variable @database and use it to check your routines, so you will set the database parameter only once. The second one gets the routine names form a table (_routine_list) and calls check_routine for each of them. This latest case will raise a question: "how do I know how the table structure?" Don't worry. Before starting your testing work, call initialize_tests, and that, among other hidden tasks, will be done for you. To see different uses of these routines, inspect the source code of test_*.mysql in the "tests" directory. for now, it's enough to say that these three uses are equivalent:
 call initialize_tests();

connect;
call check_routine(database(), 'myp1', 'procedure');
call check_routine(database(), 'myf2', 'function');

connect;
set @database = database();
call check_routine_simple('myp1', 'procedure');
call check_routine_simple('myf2', 'function');

connect;
insert into _routine_list (routine_name, routine_type) values
('myp1', 'procedure'),
('myf2', 'function');
call check_for_routines_existence(database());
Last, there is show_test_results, which gives you an account of what you have tested, listing the result of each test, a summary of passed and failed ones, and a further list of failed test for your easy review. And now it's time for some hands on.

Short tutorial: Writing a testing unit

For this brief exercise, we want to create two stored routines. A procedure that creates a table with a character column and two numeric columns, and a function that will return the sum of the two numbers given the string.
Some facts: The name of the table is 'my_data'. The procedure, 'make_table,' won't have any parameters, the function, 'get_result,' will have one. These facts are enough to set our tests.
   -- test_tutorial.mysql
call initialize_tests();

call check_routine(database(), 'make_table', 'procedure');
call check_routine(database(), 'get_result', 'function');

connect;
call make_table();
call check_table(database(), 'my_data');

connect;
insert into my_data values('first', 1, 2), ('second',10,20),('third', 100, 200);
set @my_res = get_result('first');
call log_test('get_result first', @my_res, '@my_res = 3', @my_res = 3 );

set @my_res = get_result('second');
call log_test('get_result second', @my_res, '@my_res = 30', @my_res = 30 );

set @my_res = get_result('third');
call log_test('get_result third', @my_res, '@my_res = 300', @my_res = 300 );

set @my_res = get_result('fourth');
call log_test('get_result non-existent', @my_res, '@my_res is null',
@my_res is null );

call show_test_results();
This test is what we expect the actual routines to do. It can be part of the user's requirements to signify what you want to achieve. That done, it's time to write the actual code:
   -- test_t1.mysql

delimiter //

drop procedure if exists make_table //
create procedure make_table ()
deterministic
begin
create table my_data
(
name char(10) not null primary key,
val1 int,
val2 int
);
end//

drop function if exists get_result //
create function get_result ( p_name char(10))
returns int
reads sql data
begin
declare mysum int;
select val1*val2 into mysum -- intentional bug
from my_data
where name = p_name;
return mysum;
end//

delimiter ;
Now we have all the components. To perform the tests, we need to install the test utilities and the globals module.


$ mysql databasename < globals.mysql
$ mysql databasename < test_utilities.mysql


and then we can install the two routines:

   $ mysql databasename < test_t1.mysql


Finally, we run the test. Don't forget to add the '-t' option to mysql.


$ mysql -t databasename < test_tutorial.mysql
+---------+-------------------------+--------+-----------------------+------------+
| test no | description | result | expected | outcome |
+---------+-------------------------+--------+-----------------------+------------+
| 1 | make_table p exists | 1 | routine_exists = true | pass |
| 2 | get_result f exists | 1 | routine_exists = true | pass |
| 3 | table my_data exists | 1 | table_exists = true | pass |
| 4 | get_result first | 2 | @my_result = 3 | ** fail ** |
| 5 | get_result second | 200 | @my_result = 30 | ** fail ** |
| 6 | get_result third | 20000 | @my_result = 300 | ** fail ** |
| 7 | get_result non-existent | NULL | @my_result is null | pass |
+---------+-------------------------+--------+-----------------------+------------+
7 rows in set (0.00 sec)

+-------------+--------+--------+-------------------------+
| total tests | passed | failed | passed tests percentage |
+-------------+--------+--------+-------------------------+
| 7 | 4 | 3 | 57.14 |
+-------------+--------+--------+-------------------------+
1 row in set (0.00 sec)

+---------+-------------------+--------+------------------+------------+
| test no | description | result | expected | outcome |
+---------+-------------------+--------+------------------+------------+
| 4 | get_result first | 2 | @my_result = 3 | ** fail ** |
| 5 | get_result second | 200 | @my_result = 30 | ** fail ** |
| 6 | get_result third | 20000 | @my_result = 300 | ** fail ** |
+---------+-------------------+--------+------------------+------------+


The tests worked, but something surely went wrong, because three tests were unsuccessful. We expected the result to be 3, 30,300, and instead we got 2,200,20000. Time to inspect the source, and we find quickly that get_result has val1*val1 where it should have been val1+val2. Let's correct the mistake, and run the test again

   $ mysql databasename < test_t1.mysql
$ mysql -t databasename < test_tutorial.mysql
+---------+-------------------------+--------+-----------------------+---------+
| test no | description | result | expected | outcome |
+---------+-------------------------+--------+-----------------------+---------+
| 1 | make_table p exists | 1 | routine_exists = true | pass |
| 2 | get_result f exists | 1 | routine_exists = true | pass |
| 3 | table my_data exists | 1 | table_exists = true | pass |
| 4 | get_result first | 3 | @my_result = 3 | pass |
| 5 | get_result second | 30 | @my_result = 30 | pass |
| 6 | get_result third | 300 | @my_result = 300 | pass |
| 7 | get_result non-existent | NULL | @my_result is null | pass |
+---------+-------------------------+--------+-----------------------+---------+
7 rows in set (0.00 sec)

+-------------+--------+--------+-------------------------+
| total tests | passed | failed | passed tests percentage |
+-------------+--------+--------+-------------------------+
| 7 | 7 | 0 | 100.00 |
+-------------+--------+--------+-------------------------+


Now we got what we wanted. The routine works reasonably well. Notice that we also included a test for failure, when we asked for a value 'fourth', which does not exist in the table.

best practices



Enlightened by the testing example, we can draw some guidelines for developing stored routines. Actually, this advice can be successfully applied to any programming language, and I owe this great advice to Damian Conway (Ten essential development practices). Follow these simple rules, and you'll see the quality of your routines improving soon.


Write the interface

Before writing any code, write down the interface, i.e. how you would like to call the routine, with the necessary parameters. Possibly, write down the documentation for such a routine, describing how to invoke it and what it should do. Write down every detail of the implementation such as environment variables, dependencies, side effects, and so on.

Write a test unit for your routine

Before writing the code, there is yet another step. Write some tests for your routine. You already know the interface, so you can create the tests for your routine. For every feature described in the documentation, you should prepare at least one test. Make tests for success and for failure. For example, if your routine has some pre-requisites, test that it fails when such requisites aren't met.


As soon as you implement one feature of the routine, test it

When finally writing the code, run the tests on your routine, to see if that feature works as expected. If the test for that feature fails, don't go on developing further features until the failing one is fixed.

When your routine passes all tests, you're done

Your work is finished as soon as your routine passes all the tests. You'll be amazed at the low number of bugs that this approach leaves behind. Most of them are caught during the test.

MySQL 5 general purpose routine library - VI : Globals

(Sixth part of the MySQL 5 general purpose routine library presentation.)
Among the General Puropose Routines there are some that are more general than others. I refer to those small utility routines that don't belong to any specific module, or that are useful and required by any module. I call this group globals because it applies to all the modules, and also because one of its objectives is to create global variables that are available across sessions.

Finding existing objects

This category has some functions to report on the existence of a specific database object, mainly tables, views and routines.
Although this is a task mainly related to testing, I didn't put it among the test utilities, because they are really general, and I believe that they should be useful to many developers in several different tasks.
The list starts with three simple functions table_exists, table_or_view_exists, view_exists, which can tell you in one call if a given table or view exists.
Similar ones are routine_exists, function_exists and procedure_exists. The first one tells you if a given routine exists, providing the database name, the routine name and the routine type (procedure or function). The other two functions are just shortcuts toward calling routine_exists.

Using global variables

The main course in globals, and the reason why this module holds this name is due to the implementation of global variables, i.e. variables that keep their value across sessions. Unlike user variables, that lose their value after a disconnection, global variables can stay stored for as long as you need them.
To set a global variable, you can use either a procedure global_var_set or the homonymous function.
  set @myvar = global_var_set('varname','an important value');
call global_var_set('varname', 'an important value');
Both these calls will create a global variable named 'varname'. The first usage has the side effect of actually returning the variable's value, which is immediately assigned to a user variable. The second form will just create a global variable, assigning its value to a @global_var_set user variable.
To retrieve the value, you use global_var_get.
  set @othervar = global_var_get('varname');
Global variables are created on a user basis. With the default implementation, several users can define variables with the same name, and they will be kept separated, each one being available to their respective creator. It is possible to set the global variables to be truly global, each one available to all users, by changing the return value of library_user, which is used, incidentally, by the arrays module as well.
You can drop global variables with global_var_drop, and inquire about the existence of a global variables using global_var_exists.

Sunday, January 08, 2006

Checking MySQL weak passwords

Do you have a policy of strong passwords?
Do you want to enforce it? If yes, then you may want a quick way of checking if your users are abiding to the rules.
You can either force the users to change passwords through a smart application, to prevent them from using weak passwords, or, if you are assessing an existing environment, you can check the existing passwords against a dictionary, to see if there are some imeediately exploitable passwords among your server accounts.

The mechanics of passwords checking

Checking passwords against a dictionary is a costly business. It can be made more affordable if we manage to put all the items to check in the database itself, so that we can use the DBMS force to do our job.
What we have to do is to collect a large list of words that we would not like our users to set as their password. Immediate candidates are user names, both from the operating system and the database, user full names, and of course words from as many dictionaries we believe our users could make advantage of.


WARNING Password checking is a breech of privacy. If you need to use the techniques explained in this post, be sure to:
  • Inform the users about a policy of strict passwords;
  • Inform them that you will run sanity checks from time to time;
  • Make them sign a form to acknowledge that they know and understand the policy;
  • If you are not the boss, get the management approval (in written!);
  • Have the company lawyer review and approve the policy wording and the forms.
Failure to do so may result in unpleasant situations, ranging from slight embarrassment to criminal charges, depending on the company you work with, your country regulation, and your ill luck.
The table we need to create will have a word, which is what we would not want to be used as a password. A password, which is the word encoded with MySQL password function, and a reversed_password, which is the word spelled backwards encoded with password. If you have time and patience, you may add more variations (such as all lowercase, all uppercase).
Using such table, finding the weak passwords is quite easy. To have something to find, let's create some dumb users:
GRANT all on test.* to test1 identified by "test";        # name of a database
GRANT all on test.* to test2 identified by "test2"; # name of a user
GRANT all on test.* to test3 identified by "master"; # a common word
GRANT all on test.* to test4 identified by "retsam"; # a reversed common word
GRANT all on test.* to test5 identified by "marshmallow"; # an uncommon word
Now let's get them:
select
user, word as 'chosen password', word as 'original word', 'direct' as 'password type'
from
passwords
inner join mysql.user on password=pwd
union
select
user, reverse(word),word,'reverse' as 'password type'
from
passwords
inner join mysql.user on password=reverse_pwd
;
+-----------+-----------------+---------------+---------------+
| user | chosen password | original word | password type |
+-----------+-----------------+---------------+---------------+
| test5 | marshmallow | marshmallow | direct |
| test3 | master | master | direct |
| test4 | retsam | master | reverse |
| test1 | test | test | direct |
| test2 | test2 | test2 | direct |
+-----------+-----------------+---------------+---------------+

Here we are. All the offending users came to light, with a simple (and very efficient) query.

Creating the passwords table


The following perl script will create the passwords table and populate it, using a dictionary of English words and the other above mentioned elements.

#!/usr/bin/perl 
use strict;
use warnings;
use Data::Dumper;
use DBI;

my $words_file = '/usr/share/dict/words';
my $groups_file = '/etc/group';
my $users_file = '/etc/passwd';
my $DEBUG = $ENV{DEBUG};
my $deferred_keys = $ENV{DEFERRED_KEYS} || 1;
my $max_records = $ENV{MAX_RECORDS} || 30_000;
my $base_query = qq{INSERT IGNORE INTO passwords VALUES };
my $query = $base_query;
my $record_count = 0;
my $total_recs = 0;
my $database = 'test';
my $host = 'localhost';
my $conf_file = "$ENV{HOME}/.my.cnf";
my $table_struct = qq{
create table passwords (
word char(50) collate utf8_bin NOT NULL,
pwd char(41) collate utf8_bin NOT NULL,
reverse_pwd char(41) collate utf8_bin NOT NULL,
PRIMARY KEY (word),
KEY (pwd),
KEY (reverse_pwd)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
};

my $dbh=DBI->connect("dbi:mysql:$database;host=$host"
. ";mysql_read_default_file=$conf_file",
undef,
undef,
{RaiseError => 1})
or die "Can't connect: $DBI::errstr\n";

$dbh->do(qq{drop table if exists passwords});
$dbh->do($table_struct);
$dbh->do(qq{ALTER TABLE passwords DISABLE KEYS}) if $deferred_keys;

# get the database users
my $users = $dbh->selectall_arrayref(qq{select user from mysql.user});

# get the database names
my $databases = $dbh->selectall_arrayref(qq{show databases});

# get the table names
my $tables = $dbh->selectall_arrayref(qq{
select distinct table_name from information_schema.tables});

# add all table, users, database names to the word list
for ((@{$users}, @{$databases}, @{$tables})) {
process_query( $_->[0] );
}

# get the OS group names
open my $IN, q{<}, $groups_file
or die "unable to open $groups_file\n";
process_query((split(q{:}, $_))[0]) while <$IN>;
close $IN;

# get the OS user names
open $IN, q{<}, $users_file
or die "unable to open $users_file\n";
while (<$IN>) {
my ($user_id, $full_name) = (split(q{:}, $_))[0,4];
process_query($user_id);
process_query($full_name) if $full_name;
}
close $IN;

# get the words from a dictionary
# repeat this step for every other
# dictionary file you may want to include
open $IN, q{<}, $words_file
or die "unable to open $words_file\n";
process_query( $_ ) while <$IN>;
close $IN;

# finish off leftovers
if ($record_count) {
$dbh->do($query);
$total_recs += $record_count;
print "$record_count ($total_recs)\n" if $DEBUG;
}

$dbh->do(qq{ALTER TABLE passwords ENABLE KEYS}) if $deferred_keys;

sub process_query {
my ($word) = @_;
chomp $word;
if ($record_count) {
$query .= q{,};
}
my $quoted_word = $dbh->quote($word);
$query .= q{(}
. $quoted_word
. q{, password(}
. $quoted_word
. q{), password(reverse(}
. $quoted_word
. q{)))};
if (++$record_count >= $max_records) {
$dbh->do($query);
$total_recs += $record_count;
print "$record_count ($total_recs)\n" if $DEBUG;
$record_count = 0;
$query = $base_query;
}
}

Some efficiency issues

Why do we need a Perl script to achieve such goal? Everything could be done with a few one-liners, and this is actually what I tried in the beginning.
Of course you can use some one-liners, but you will end up inserting one record at the time, which becomes unbearably slow when you pass the fifty thousand record mark.
As an example, to insert five hundred thousand records, this script takes about one minute, because it makes multiple insert statements of 30,000 records each, while inserting records one by one took about 17 minutes.
One more problem may arise if you are not careful about the column types. Your "password" and "reversed_password" columns must have exactly the same type, character set and collation of the "password" column in mysql.user table. If you don't make them match exactly, MySQL will refuse to use any index from that table when joining them. The query I showed before the script runs in less than one second. If you use a different character set, it will require a table scan, and it can take as much as one minute for a table of one million words.