Meeja: January 2009 Archives

Pimp my CMS

31 January 2009

This is a long-winded reply to a short retweet that came from Retail Week online editor Martin Stabe of a question from developer Mark Ng: "Journalists: do you think internet story research tools built into your CMS would be a good idea?"

My initial response was along the lines of "let's try walking before we start with the whole running thing". Content management systems promise much but frequently fail to deliver anything vaguely usable. Their biggest failing lies in the idea that you should work in them entirely, yet fail to provide tools that word processors have had for 20 years. And writing in them directly is like playing chicken: how long do you have in which to compose and edit a story before it loses all the copy?

Compare it with the situation with blogging software which is often better written - thanks to having more complaining users – and more flexible. I very rarely use the CMS that Movable Type provide. Instead, I write most posts in MarsEdit and then, to publish, use the XMLRPC function that it provides. It also lets me work offline, which is a major plus. It is something that most CMS writers fail to understand: reliable network connections are not always present, particularly when you are out and about.

So, the idea of forcing even more of my working day into the CMS fills me with dread. However, it doesn't mean that the idea of putting research tools into a CMS is an entirely bad one. It just depends what they are.

As Martin points out, Wordpress has plug-ins that help with linking to related stories. This is the kind of thing to include in a editorially focused CMS. Mark suggests storing notes in the CMS and then having the system go out and find information that relates to them.

The problem with this idea is that web research is often more useful early in the process. What's going on? Is this story actually news? Has someone already reported on this or something very similar? What's the background?

As you progress into research, the questions are much more focused. So the idea of having a tool go out and sift the web for more is nice to have but probably just going to contribute to information overload. In reality, you are going to be cross-checking individual claims from a set of interviews. And very rarely do I transcribe an entire interview from shorthand, just key quotes and references to important sections. I may scan the pages but that's just to make the notes easier to find later on.

I've found, for long-term storage, a private blog or a wiki as good as anything for holding transcriptions. The other useful tools are outliners and, occasionally, mind maps for structuring material and spreadsheets and databases for those situations where the story or feature revolves around numbers or timelines. If you are doing financial reporting, then a way of just adding quarterly results to a rolling spreadsheet is probably going to be handy. But, do you need to put that in a CMS or just stick it all in something like Google Docs? Probably, a database is going to be more useful as you have more flexible ways of querying the data.

One area where I think a CMS could make a difference would be if stories acquired more metadata. Again, this is an area fraught with difficulty as the last thing you want to do is force writers to add metadata just to keep the machines happy, which is what tends to happen today. But if you were able to do something akin to what Microsoft tried to do with Smart Tags earlier in the decade, I can see some advantages down the road. Instead of writing "Q4 revenues were $2.4bn", you point to the source data within the financial repository. When someone clicks on the tag, they get taken to a virtual P&L sheet which shows the same number and, if it all works properly, whether that number was later restated.

Similarly, you might tag people so that the system can log all stories about them and build dynamic timelines so you can see when they moved companies or said certain things. That would go some way to making the information that goes into stories more remixable. You don't create new stories from the parts, but you can at least extract some structured information that may prove useful to a reader.

Looking back at that, I think the future for an editorial CMS is less to do with research for the journalist per se but for all the people who might use a news site.

When you bought your last personal computer, you probably didn't think you had two gadgets in one. But, whether you wanted it or not, the machine doubles up as a heater. So, it should come as no big surprise that these things burn through electricity like an electric fire. And it's not hard to make comparisons between the energy used by a computer and boiling a kettle, which is what the Sunday Times did.

The problem is that the story got mixed up with the idea that using Google to search is a particularly energy-intensive thing to do. In a way, it is. It's just that, even though it owns massive data centres, Google doesn't have a whole lot to with how much energy is used in using it to search.

What confuses me about Alex Wissner-Gross's complaint about the Times story is that he claimed his study did not mention Google and only talked about the search engine and power in broad terms. But, in the semi-ghosted piece (it's written by a journalist but in the first person as though spoken by Wissner-Gross) that appears attached to the main story this paragraph is associated with him:

"Google does not divulge its energy use or carbon footprint but, based on publicly available information, we have calculated that each Google search generates an estimated 5-10g of CO2, in part because Google's unique infrastructure replicates queries across multiple servers, which then compete to provide the fastest answer to your query. On the other hand, just browsing a basic website generates about 20mg of CO2 for every second you view it."

Google quickly issued its own estimates of how much energy a search needs on its own equipment that is way, way less than the 15Wh or so needed to generate 7g of CO2, as claimed in the story. In reality, most of the consumption is the user's own PC. This also makes me suspicious of Wissner-Gross's most recent protests as his business, CO2stats, is all about offsetting the carbon emissions from users' own PCs. He probably didn't want to just focus on Google: he wants to sell the offsets to midrange website owners. That is, sites running fewer than 5 million page views a month.

One problem with even analysing PC power consumption is that people often look at the power-supply rating of their PC and assume that's what it chews through every second. Power supplies are normally over-specced for good reason. You don't really want the thing tripping out out every time you fire up Call of Duty. So, it's unlikely that, unless you're running the biggest baddest graphics cards, that a system unit will consume 300W or more. But it's not unreasonable to expect it to demand in the region of 130W, dropping by around a half when idle but not asleep.

Tom's Hardware did tests at the end of 2007 on a variety of systems and came up with figures ranging from 132W for a system based on a Core 2 Duo to more than 200W for a Pentium D 800, with two Pentium 4 cores on a chip.

This is a fair amount, but not quite as bad as the 300W assumed in this analysis of Second Life's energy demand – not the one done by Nick Carr.

An 17in LCD monitor is likely to consume somewhere in the region of 30 to 50W, depending on how efficient it is. The bad news is that LCD monitor screen sizes moved up as prices fell – as with TVs. As most of the power consumption is in the backlight, the power consumed increases with screen area. LED backlights will increase efficiency a little but we won't see a dramatic improvement until organic LED displays – or something better – appear. So, for a 20in LCD, let's call it 50W.

So, that gives us a total of something like 180W for a regular consumer PC. Then add to that the consumption of the ADSL gateway, which is likely to be around 6W all the time it's switched on. There are plans to improve this but that's what the EU reckons is a good target for low power consumption. Even in a low-power state, which is not a given in current equipment, it will still draw something like 2W all the time it's switched on. However, that's nothing on a satellite receiver that can chew through 50W all day every day, whether you are watching the box or not.

That pushes the power up to around 190W for the PC and display, dropping to 130W or so when the processor is idling – that is, you are reading something on the screen, but there's not much going on. It's never really idling in that mode, as there are always things going on in the background in operating-system land. But nothing is causing the fans to speed up.

It means that the PC is consuming roughly as much power as a moderately big flat-screen TV – something in the 30in range – most of the time. Now, how does that compare with the kettle?

I think it's fair to say that the PC will be fairly near to idling when doing simple searching and browsing, maybe it's consuming about 150W. The next question is: how long does it take to do a search on Google? Assuming the machine is already running, it probably takes about 10 seconds to type in a query and hit return, and less than a second for the response to appear. You then have to read it and click on a result. Let's call that 20 seconds.

I ought to factor in how much power is consumed in the network but the reality is that, with a Google search, so little data traverses the network that it's likely to be a miniscule demand compared with that of the user's PC. However, the ADSL line card will consume power on behalf purely of the user, so that needs to be factored in. This is likely to be in the 5W range, maybe less as it's sharing a lot of circuitry with equipment used for other lines. Maybe add on another 5W for good measure – it's not going to make a big difference.

Power needed: less than 1Wh, which equates to about 0.5g of CO2 emitted. Oh dear, way below the Times figure.

However, before you relax, consider that you are generally searching for something to look at or read. Five minutes reading the thing you were searching for doesn't seem unreasonable. Now we're talking real money: 13Wh, or a little under 7g of CO2. So, if you analyse the power consumption attributable to a PC, the headline figure quoted in the Times is not a bad one. It's just that the figure doesn't have a lot to do with Google – it applies to any amount of surfing you do.

The figure in the Times story that I'm suspicious of is the ten-fold increase in power for handling video or interactive stuff. Video sent over the Internet is highly compressed, so most of the work winds up being done on the user's PC again. It's going to be more than the 30 per cent extra that the calculations above imply but that's a long way from 10x.