If there is one thing that troubles processor architects right now, it's working out how many cores they should stick on a die. The number of transistors they can plant on a chip doubles every two years and there's no sign of that supply running dry in the next five years.
What's the problem? Just take the processor core you have already and then step and repeat it across the die. It's worked for graphics processors.
Unfortunately, only some software parallelises so well that it will spread across many cores. Many times, the overhead of distributing the work outweighs the advantage you get from running the code in parallel. This, in effect, is the modification that Gene Amdahl made to his eponymous law of performance in computers.
In its most basic form, Amdahl's Law says it's only worth speeding up things that you do a lot. Big, nested loops are good targets. Lots of branching straight-line code? Not worth the effort. With parallel processors, if you can spread the work of loops over many of them, you see a speed-up. But there is a limit governed by how much code you need to run on just one processor.
In a paper published in this month's IEEE Computer, a pair of researchers from the University of Winsconsin-Madison - one of whom has now moved to Google - has attempted to extend Amdahl's Law to the world of multicore processors where you do not necessarily make all the processors the same size.

