Thursday, May 22, 2008

A look at the Open Source Census

Yesterday I had a look at the Open Source Census (OSC), a project aiming at counting the installations of open source packages.

It is a collaborative effort. The OSC offers the infrastructure, and it's open to cooperation. It works with a disc scanner, and relies on the principle that the contributions are voluntary and anonymous.
If you want to contribute, you need to register, and you are encouraged to do so anonymously. The OSC is not interested in who you are, but the registration gives you a chance of tracking your results with the rest of the community.
The registration is also necessary to avoid duplicates, and to track your installations over time, should you decide to do that on a regular basis.
If you want to give it a try, the procedure goes like this:
  1. Register (with any fancy name you want);
  2. download the scanner package (it's open source)
  3. scan your computer (or some selected directories), using your unique identifier
  4. look at the results;
  5. send the results to the census (using the same tool).
Bear in mind that there is no way for the OSC to track down who you are, because you are not asked in the first place. So the results are truly anonymous.

Some concerns:
  • The scan package is open source. However, even the simplest package available the pure Ruby) is more than 6000 lines of code. If you are security conscious, examining the code is not a quick task. You will have to rely on peer review from the community, or do a test scan in a virtual machine to examine what the software does (that's what I did);
  • If you have a large machine, the scan may take hours. You will have to plan for night scans if you want to contribute seriously.

On the plus side, the process of scanning is open. You can contribute the signatures of your favorite open source tools, to be included in the next scans.
Looking at the results may bring some surprises. I did not know that I had two different versions of MySQL-connector/J in my laptop, or two versions of PostgreSQL connector either! And I wonder how I managed to get 5 (FIVE) different versions of docbook-xml in my laptop, given that I haven't ever asked to install it.
However, the scanner gives you more than the public list of packages found in your box. For your own consumption (it is not sent to the census) it produces a detailed list of where each package was found. So you cam analyze that list and eventually clean up the system.
Other surprising results. I checked how often MySQL was installed, and it turns out to be present on 37% of the scanned boxes. The surprising results, however, is the the distribution of old versions. About 20% of MySQL servers are still using version 4.x. Fascinating!
The project is young, but very promising, in my opinion. There may be problems for adoption from large corporations (security policies will be hard to deal with), but if the community picks up, it may produce good results. Give it a try.

No comments: