You wait ages for a GPU programming environment to come along and then...

12 June 2008

A couple of months ago, nVidia's Jen-Hsun Huang decided to stick his head out of the window and shout he wasn't going to take it anymore. Or at least, gather a bunch of analysts together at the graphics chipmaker's HQ and tell them he wasn't going to take it anymore. The trigger was Intel's developer forum in China where Intel's Pat Gelsinger declared the death of today's graphics processor (GPU). Curiously, Gelsinger claimed that just ahead of talking about Larrabee: Intel's latest foray into the GPU business (it's a different kind of GPU, you understand).

The argument from the Intel side was that traditional processors would take over many of the rendering functions in 3D graphics, largely because there are going to be so many of them. Huang had the opposite argument: GPUs already have lots of processors on them, why not use them for offloading software from the host processor?

And so the stage is set for a new kind of architecture war in which you have different kinds of microprocessor fighting over the same ground.

At the analysts meeting, nVidia lined up a bunch of demoes from people operating in supercomputer-land who have tried plugging together thousands of Xeons and Athlons and decided they have had enough of it. They want something new. IBM's Cell looked promising for a while but a number of supercomputer users have decided that it does not deliver quite what they want, despite the headline performance claims over Roadrunner.

Their main options are GPUs and field-programmable gate arrays (FPGAs), chips that let you define whatever hardware circuitry you like. FPGAs are not that great on the kind of floating-point code that host processors can handle but they tear through things like genetics and chemical-matching programs.

The problem is finding a programming environment that will handle all of the above. The supercomputer users want to be able to work will a combination of x86, GPU and FPGA-based processors. There isn't anything out there that will work across more than one type of accelerator. Got an nVidia card? Use CUDA. But that's no good if you happen to have an ATI GPU. For that, your only option is the long-in-the-tooth Brook environment. Intel is lining up its own architecture under the banner of QuickAssist. Its current implementation is oriented towards FPGAs but the scope is likely to widen as Larrabee gets closer to shipping.

There is an open effort under the OpenFPGA banner. But, again, it is, as its name suggests, focused on FPGAs. A number of the specialist vendors in the supercomputer world like the idea of OpenFPGA because it has potential as an independent standard. However, the amount of money behind the other players suggests that we are likely to be looking at a vendor-derived standard emerging.

But the GPU programming environments will, with the exception of the FPGAs handled by QuickAssist - as Intel hasn't made FPGA since the mid-1990s - be oriented to vendors's own hardware.

Then there are the unplayed hands of Apple and Microsoft. Apple provided hints of what it is doing in this area on Monday in talking briefly about the Snow Leopard release of Mac OS X:

Snow Leopard further extends support for modern hardware with Open Computing Language (OpenCL), which lets any application tap into the vast gigaflops of GPU computing power previously available only to graphics applications. OpenCL is based on the C programming language and has been proposed as an open standard.

This, potentially, is a smart move on Apple's part. The company would benefit not just from having one language to work with all the GPUs it could buy in from AMD, Intel or nVidia but gets a chance to wrong-foot Microsoft. The software giant has projects to look into acceleration using GPUs and FPGAs - a lot of the work is being carried out by people such as Satnam Singh at Cambridge - but has very little in the way of product plans. Apple gets the chance to define the programming model of the future, giving it a lot more architectural control than it has now, and come out looking like the good guy by providing a software layer that works with a much wider range of hardware than anything that the chipmakers plan to offer.

At this stage of the game, unless Microsoft is genuinely ready to go public with an API, the only realistic counter-proposal would be for nVidia to port CUDA to other architectures. True, AMD could propose the same thing but it is coming from further behind. It's hard to see anyone accepting an Intel-proposed API unless they really have no other choice.

There is plenty that can go wrong with OpenCL. Programming GPUs to run code is one thing. Getting the code to run as fast as you'd expect 128 processors to do it is another matter, according to supercomputer users, some of whom turned up at the MRSC conference in Belfast a couple of months ago to talk about their experiences. But, unless Microsoft can come out with a real API and library first, Apple is in a good position simply by virtue of being in control of a desktop computing environment rather than having to sell chips into existing platforms.