The problem with some scientific research is not the research itself but the way people choose to use it. What better example than research on the blogosphere itself to show how you can twist a reasonably simple study for self-interested ends or just get it completely back-asswards? The reaction to the study itself is potentially the source of new research into blogger psychology: "I bloviate therefore I am".
A team from Carnegie Mellon University decided to look at how blogs link to each other as part of a wider study to look at where to put sensors to detect pollution or disease as quickly as possible without spending a shedload of money to put them everywhere. The slightly non-intuitive conclusion is that points with a high overall flow do not provide the best positions - it is those small channels that have the largest effect on the whole network where you want to have those sensors placed.
The team picked blogs as a study area largely because blogs have some interesting parallels with the spread of contagion through a network. They also make it easy to study that spread. They are time-stamped; they link to other blogs. You can trace the flow of 'information' relatively easily.
The researchers picked a large subset of blogs – 45 000 from a possible total of 2.5 million – and crunched through their links, taking account of which links went outside the dataset and which remained inside. They monitored posts that pointed to largish information cascades – effectively blogger pile-ons. To qualify, a subject had to accumulate 10 posts to be considered a cascade. That's big enough for a small pile-on in my book.
The CMU team then computed which blogs – from the subset they picked – were most likely to be a part of blogger pile-ons compared with those which had a high proportion of posts that were not. This gave them a cost function which led to a final list of 100 'top' blogs.
This is where the fun started. People on the list found that they were on some form of top 100 and started to brag about it. It's scientific so it must be true, was Neville Hobson's considered opinion. Then people started to wonder why a really weird bunch of blogs was considered to be the researchers' top 100. A commenter at Nick Carr's Rough Type wondered why a blog that had effectively been run off the farm by an angry mob was in the listing. Had they spent about ten seconds looking at the text at the top of the list, they might have realised that the corpus used by CMU came from 2006. That's right folks, this is not a list of current blogs - only those active up to about a year ago.
There is one other point that those crowing about being on the list might want to bear in mind. If it is any kind of ranking, this is a list of the pile-on addicts of 2006. If you wanted to know where to rubberneck at the biggest accidents on the blogiverse a year ago, these were your go-to guys.
Based on this, I think there is a strong argument for building a feedreader that uses this lot as a filter against your real list of RSS feeds: it would take out the mob rule and leave you with a lot more original information. (To be fair, there are some on the list I would want to keep in the feedreader).
The irony that this post itself is part of a pile-on is not lost on me.