14 August 2006

The one place where you would expect the Long Tail model of markets to apply is on websites. After all, they spawn pages easily and serving an unpopular page up is no more expensive than providing the same one time and time again to many different people. OK, with caching the way it is, that's not quite true. But the difference in cost between the two is way different from providing unpopular CDs from stock compared with top sellers.

Website usability consultant Jakob Nielsen thought he would analyse his own website to see how well it fit the Long Tail model - a curve that follows Zipf's law. It hugs the axes on a regular graph, giving you the impression that only a few elements are important because they score so much higher than all the others. But, if you add the contributions from the small fry together, they turn out to be as important as the few hits. One characteristic that Nielsen noted is that if you plot this kind of distribution on a log-versus-log graph, you get a straight line.

Nielsen took the web statistics and plotted them to see how they did against the straight line. The most-visited 300 or so pages fit the line pretty well - incredibly well if you think about it. But then the curves went their separate ways. By the time you get to the 500th most popular page, there were 10 times fewer hits than Zipf's law would predict based on the page views of the more popular pages. The bottom 200 pages accounted for just one visit or less during the sampling period according to the graph.

So far so good. Nielsen has plotted a phenomenon that a lot of people with a website notice - that nothing but tumbleweed blows through a goodly number of the pages. He arrived at an oddly different conclusion: adding more pages to the site would "wag the drooping tail of the dog". Somehow, adding more content would make more of the plotted page views fit the Zipf law distribution.

That all sounds very well, until you consider that most visitors to or any other site consider themselves to be part of a Long Tail distribution. They just happen to be looking for something or browsing through. Pages that rank very high in search engines and are heavily linked from within the site or from other sites will do very well. Others will just appear to drop out of existence altogether. There is almost a Zen question in there: "Does an unindexed web page have content?"

Simply adding thousands of pages to a website on the basis that you will suddenly snap the line back onto the one proposed by Zipf and dusted off by Chris Anderson does not strike me as a winning strategy. Unless you count bundles of tumbleweed blowing in as legitimate visits.

Nielsen's statistics are for just one site but two curves taken from data 10 years apart show pretty much the same effect. The curve that did fit the Zipf distribution was the one for inbound links. That makes some sense as the whole concept of this type of power law distribution is based on interconnections rather than content. It is not so much that Nielsen's site has accumulated insufficient pages to follow the Long Tail plot all the way - rather that there are too many unremarked pages to ever get back on the curve.

I suspect that if Nielsen were to do this same study in ten years, he will notice that more pages fit the straight line on a log-log graph, but that drooping tail is still there. It might even account for a greater proportion of the overall graph. It all depends on how those pages are favoured by links.

It would be interesting to see how other supposed Long Tail model industries would fare given a similar treatment. I would be more surprised if music sales did not show a similar drooping tail. If nobody knows something exists, why are they going to go look for it? Let's face it, Chris Anderson's original article on the Long Tail in Wired opened on how a book suddenly turned into a best-seller because people found links to it from another unrelated work and closed on the need for links and recommendations to power Long Tail economics. But you can't make links to everything and expect each one to have equal weight and that alters the economics of Anderson's favoured business model. Does the tail of the Long Tail need further examination?

And, as I posted the first version of this entry, I noticed that Nick Carr had pointed to observations by Douglas Galbi on the frequency of children's names. It is another example of how demand is not necessarily connected to supply in markets. Looks like the Long Tail's tail is getting some scrutiny.



As it happens, I gave a speech at Google over the weekend on just this point. Sometimes a "drooping tail" is due to naviational or other fixable problems, and sometimes it's the natural shape of the market (a "lognormal" distribution rather than a powerlaw). I've posted an explantion of all that here.


Chris Anderson

"Does an unindexed web page have content?"

This is Schrödinger's Cat's site.

Most of the long-tail examples I see are about matters of taste - every artist is somebody's favourite artist. The theory applies very well to books, games, names, tracks, pictures, t-shirts, etc. All the really good examples in the book.

The same isn't going to work for items that have a fixed value of zero, no matter what your tastes. Some web pages have no value to anyone whatsoever. I know because I've written some. ;)

Suppose I start a hardware store online, and my long tail items include chocolate pokers, bent nails and glass hammers (I am not a great business man). Surely, I should expect to sell none of those items, as opposed to a few every now and then, because that's what the theory says?

