Archive for March, 2008



The latest arrival on the data-aggregation scene is, by Philip Kromer at the University of Texas in Austin. Infochimps is most similar to Numbrary

This site is clean and well organized. Tagging of datasets is quite thorough. A unique feature to infochimps is recognition of data fields across sources, so you could find, for instance, all sources that have figures for “Personal Income”. So far I’ve been unable to find any fields that actually appear in multiple sources, but this is presumably just a matter of adding metadata to datasets in the repository.

There’s no online viewing of dataset contents, but every dataset is available in (compressed) CSV, YAML and Excel formats. All the reformatting of the source data has been done by infochimps, saving users from repeating this chore. Like Numbrary, infochimps provides links to the source documents that were used to construct the datasets on the site.

So far infochimps has accumulated almost 1,500 datasets, and many more are promised on their blog.


Opinion on future prospects for Swivel and Many Eyes

Steve Few blogged over a year ago with reviews of two public data visualization services, Swivel and Many Eyes. Steve expressed a strong preference for the quality of the visualizations produced by Many Eyes.

Over the past three months, daily Many Eyes contributions have outpaced Swivel contributions by 5 to 1 or more. The gap is widening: Swivel contributions are declining, and Many Eyes contributions are increasing.

My congratulations to Many Eyes for growing their user base, but there’s still a long way to go. Even at a rate approaching a hundred new contributions per day on the site, this level of user-generated activity is low compared to other online collaborative environments that are perceived as “successful”. I group data visualization services with other services that allow participants to complete useful work in a collaborative fashion – LinkedIn and SourceForge are examples of this. Many Eyes is not getting enough traffic to sustain the site as anything other than the research project that it is.

My conclusions, in brief:

  • Swivel is doing poorly, and could fail before the end of 2008.
  • Neither Swivel or Many Eyes are not sufficiently compelling to users for these sites to be self-sustaining.

Data Mining: Text Mining, Visualization and Social Media: GapMinder: The Opportunity

Matthew Hurst of Microsoft Live Labs reports on a Trendalyzer presentation by Ola Rosling at O’Reilly’s Competitive Intelligence Foo Camp in February.

Rosling gave little information on Google’s plans for the Trendalyzer software, but it appears that these plans do not include community participation in any way. I agree with Matt that this is unfortunate. Collaborative analysis can’t happen without collaboration.