Friday, June 12, 2009

A quick look at Google Fusion Tables

I was curious about Google Fusion Tables, and gave it a try.
I uploaded the employees table from the employees test database, 16 MB of data, about 300,000 rows. Since the maximum limit per table is 100 MB, I expected interesting results.
However, one of my first tests, with aggregation was quite disappointing.
A simple group by gender was executed in about 30 seconds.

InnoDB on my laptop did a much better job:

select gender , count(*) from employees group by gender;
+--------+----------+
| gender | count(*) |
+--------+----------+
| M | 179973 |
| F | 120051 |
+--------+----------+
2 rows in set (0.32 sec)

Here's the link to my test table, if you want to give it a try. You need to have a Google account to see it.

6 comments:

Anonymous said...

You're missing the point about Fusion Tables. It's really about 3 things: simple collaboration, immediate visualization, and smart joining. The latter is pretty much non-existent, but they will likely leverage research from state of the art "smart" data integration techniques. Give it some time and see what they roll out. But don't be expecting performance to be the #1 thing they are concerned about -- collaboration, visualization, and data integration is where it will be at.

Mchl said...

Still 30 seconds is a lot and makes me wonder what's under Fusion Tables' hood.

Giuseppe Maxia said...

@BrightStarSystem,
I may be missing the point, but then why there is an "aggregate" function in Fusion Tables? Is it a "this-is-available-but-don't-touch-it" feature?
Of course, if there is an "aggregate" function, I will try it. And of course I will it compare with what I have in my laptop, or else why should I switch from local storage to an online service?
Simple collaboration with small data is fine. If it should only work with 1MB data or so, then it's a different game.
But if Fusion Tables claims that it can handle 100 MB entities, I believe I am being more than fair when I test an available feature with 1/5th than the maximum limit.

Anonymous said...

People, try to keep in mind that (1) they just released it, (2) it's in Google Labs, not intended as any kind of production solution, and (3) the name is "Google Fusion tables", not "The Google Database". Read their research blog post if you don't believe me about their goals. If you spent any time around data integration research, you'd not dismiss it so easily. Certainly, it's primitive right now, but give it a chance. To compare a new online data integration to something like MySQL, which has been around for years, is ludicrous (although I suspect Fusion Tables could even run "select count(*)" queries faster than MySQL 5.0. :-)

Arjen Lentz said...

If it's built on top of BigTable, it won't do aggregates natively. So it'd be pretty inefficient for that kind of task.

I do agree with you however, the idea of Fusion inevitably involves aggregates/grouping most of the time. Perhaps Google is working on supporting such functionality in BigTable, possibly with Fusion as a testcase. I think it's important foo.

ptiemann said...

In the filter section, I missed a 'like' operator. I am sure they are going to add it.
Or at least a 'begins with..' operator.