- Mar 27, 2009
- 12,968
- 221
- 106
http://www.hpcwire.com/features/Pat...CL-with-New-GPU-Compiler-97089024.html?page=1
A Fermi HPC Port? How *much* could this affect Nvidia's capacity to convert lower profit GeForece SKUs into High Profit Tesla SKUs?
15 to 30 perent performance boost doesn't sound like much if Nvidia always has the upper hand with the architecture development direction.
But if they can make writing code less tedious (as mentioned in the article) could this help pave the way for innovation in program applications? Lowered HPC programming costs + ability to use regular gaming cards, sounds like a chance for Higher risk taking with projects.
According to PathScale CTO Christopher Bergström, interest in doing a GPU compiler began shortly after the company rebooted last year. Since NVIDIA was leading the GPGPU charge, they started with the idea of targeting the Tesla GPU line. Hoping to reuse some of NVIDIA's CUDA stack, they quickly found that the code generator and driver were not optimized for performance computing. "Their drivers, which really dictate quite a bit of what you can do, are supporting everything from gaming to HPC," says Bergström. "It's not that they haven't built a good solution. It's just not focused enough for HPC."
Moreover, they found writing CUDA code for performance tedious, requiring a lot of programmer hand-holding to optimize performance. In particular, the PathScale engineers found that the register usage pattern in the CUDA compiler was generalized for all types of GPU cards, so performance opportunities for Tesla were simply missed.
The twist here is that GPU ISA is volatile -- at least more so than say a CPU. Fortunately, the instruction and register enhancements tend to be incremental. Bergström says they will support all the latest GPU cards being used for HPC, that is, essentially all the cards supported in the three generations of Tesla products. PathScale has a working pre-"Fermi" driver now and is working on the compiler port. "We just got access to the hardware last month," explains Bergström. "So we've basically had 30 days to start tackling the ISA and the registers." He predicts they'll have a fairly robust Fermi port within the next 60 to 90 days.
A Fermi HPC Port? How *much* could this affect Nvidia's capacity to convert lower profit GeForece SKUs into High Profit Tesla SKUs?
Bergström is careful not to claim performance superiority over the CUDA technology just yet. He says ENZO is currently in the alpha or early beta stage. According to him, PathScale engineers have hand-tuned some code using GPU assembly, and have achieved a 15 to 30 percent (or better) performance boost. In other cases, they're not quite there and need to find the right optimizations. Bergström is confident that those hand-coded optimizations can be incorporated into the compiler infrastructure. They have identified a number of areas where they can reduce register pressure, hide latency, reduce stalls and improve instruction scheduling. "We know the performance is there," says Bergström.
15 to 30 perent performance boost doesn't sound like much if Nvidia always has the upper hand with the architecture development direction.
But if they can make writing code less tedious (as mentioned in the article) could this help pave the way for innovation in program applications? Lowered HPC programming costs + ability to use regular gaming cards, sounds like a chance for Higher risk taking with projects.
Last edited: