Finding work for the GPU

21 October 2008

Returning from a visit to Germany that took in a visit to the company's Berlin-based raytracing and 3D-animation subsidiary Mental Images, nVidia's CEO Jen-Hsun Huang and senior vice president of marketing Dan Vivoli stopped off in London. In between working out which series of Star Trek was the best - Huang favours Voyager - and marvelling at Vivoli's mental stamina - he is apparently able to stand up to repeated viewings of Chevy Chase movies like Caddyshack and Christmas Vacation - we talked about the future of the graphics processing unit (GPU) and how nVidia is looking less at graphics and more at physics.

Huang calls the plan 'Graphics+'. I got the feeling that he had been working on the elevator pitch for the plan: "We are going to take graphics to the next level. We think 3D is cool and great but we are going to find a way to take graphics in a giant leap to the next level. We will surprise users so they will buy a new generation of computers."

The key to that, naturally, is the GPU, reborn as a massively parallel processor that can crunch through graphics-related jobs that a couple of years ago you could only run on server farms tucked away in the basements of Hollywood animation houses.

If you look at nVidia's activities over the past year, you can see the company picking off technologies that demand huge chunks of processing power and which lend themselves to parallel processing. It has bought physics processing specialist Ageia and the Berlin film-effect houses Mental Images, which has in its portfolio a raytracing engine. Older readers will remember how Inmos used to deck out its exhibition stands with screens of lots of silver balls, all lovingly rendered using raytracing. Then the technology disappeared for about 15 years. And now it's back.

Intel argues that raytracing is the why the days of the GPU are numbered and that people will want something that works more like Larrabee. This is a parallel processor, similar in concept to a GPU but which functions more like a regular multicore processor. This is where Huang reckons Intel has made a fundamental mistake. Asked why I shouldn't think Intel would simply steamroller nVidia out of the way, he argued that CPU and GPU architectures work differently and will continue to work differently.

I can foresee two possible futures for the GPU. One favours nVidia and the ATI bit of AMD. In this future, the GPU slowly takes work away from the CPU until all the host does is handle interrupts. Then you have the "Intel will kill you all" scenario in which the world's number-one chipmaker grabs all the processors and stuff them on one chip, turning the graphics unit into a dumb pixel-sprayer.

Huang argued that CPUs have different needs that make it hard for them to compete with GPUs. They have caches and they need to make decisions quickly to stop applications from gumming up. For a CPU, latency is crucial. It needs an answer and it needs it now. A GPU works on tasks where throughput is everything but latency less of an issue.

"If the dataset is so large and we will be crunching on it for a while, then it will be highly unlikely that you will hit a data dependency that will hold everything up until you get an answer," said Huang.

GPUs do have to deal with memory delays because they don't have caches although they do have tiny chunks of local memory. But because you can wait a long time before a decision becomes crucial, a GPU can make heavy use of multithreading. Waiting for some data to turn up? Don't worry about, go and run the threads that have data onchip already.

For Huang, the difference between CPU and GPU is between two radically different approaches to processor design. "You have thousands of threads running on many cores versus very heavily cached one-, two- or four-processor machines. The two architectures became radically different," he said. Plus, he noted, Intel does not have much of a track record of breaking into markets that do not revolve around the x86.

What would you want to run on a machine with many processors sitting inside it? Physics is one thing and not necessarily just in games. "We believe that the next level of computer graphics will be based on how the world behaves and not how it looks," Huang claimed.

Another target is the 'creative consumer' - the people taking pictures and video and plopping them onto Flickr and YouTube. Parallel processing can attempt to rescue the images that got away and videos that got hit with over-enthusiastic compression. By looking at multiple versions of a photograph, smart processing could change the focus or bring an image into focus. Huang cited MotionDSP as a company that can take analyse YouTube videos and boost their apparent resolution. In effect, the software borrows pixels from adjacent frames to fill in missing details.

Vivoli claimed: "We took a 480p DVD video and, using the same technology, we created enough pixels for 1080p."

"Technologies like this make the GPU a more valuable companion to the CPU," Huang argued.

There is something for the gamers. Huang is becoming a big fan of the idea of stereoscopic displays for gaming. Imagine the kind of GPU you need to run Crysis across not just one but two high-resolution displays.


The statement that "throughput is everything but latency less of an issue" is true when you limit the use of massively parallel computing architectures to simulating the real world. What if you want to deploy this sort of computing power to solve difficult control problems? For example those that require on-the-fly inversion of large matrices - an ideal candiate for data parallelism. Putting such parallel architectures into real-time control loops definitely needs a focus on latency. In reality there is a place for both approaches. This sort of argument is a distraction from the real challenge which is to simplify the programming of such parallel architectures.


The short answer is: don't use a GPU for latency-sensitive work. However, acceptable latency is in the eye of beholder. In 3D graphics, the device is churning out usable results from gigabytes of data 30-100 times a second. For a system controlling convection in a furnace, that's probably going to be plenty of headroom. Might not be so useful in an AGV, so maybe you dump the GPU and use FPGAs to get the acceleration.