• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

What is CUDA/stream processors explained?

Nukemann

Junior Member
Pardon my n00bish question, but what exactly is CUDA or stream processing? Doesn't it sound a bit similar to Intel's Hyper-threading, where the CPU can use virtual cores? A GPU with CUDA/stream processors has hundreds or thousands of "cores" or processing units? Is each CUDA on a GPU like a "lane" on an HT processor, which is supposed to make more use of the processor? The difference I see is the GPU stream processors use a proprietary API, like CUDA, but what's stopping Intel or AMD from making an x86 CPU that has hundreds of those "cores" and running several times the clock speed?
 
Yes, a GPU has hundreds or thousands of individual processing units, but they are much smaller and more limited than the general-purpose cores in a CPU.

Intel tried exactly that with something called the Xeon Phi. (something like 60 or so cores on a card) but the Phi cores are also cut down, limited purpose cores.

You couldn't just bolt a hundred full size Haswell cores together and run them at full speed because they're big, complex cores. And they'd pull more power than your entire house.

Quality vs. Quantity.
 
A bit more technical explanation: stream processors are very good at embarrassingly parallel processing, where you have a large amount of data, and you need to perform the same operations on each element. However, as different cores start branching to different execution paths, the parallel performance quickly drops. In CUDA, all cores operate in exact lockstep, always receiving the same instruction as all the others. If you only need to perform an instruction on a subset of cores, the other cores will still spend time executing that instruction, and then throw away the result. This makes them essentially useless for multitasking.
 
CUDA, all cores operate in exact lockstep, always receiving the same instruction as all the others.

A modern GPU isn't as bad as that when it comes to multiple instruction, multiple data (MIMD). The cores are organized into higher level groups of a few dozen cores (exact counts depend on the specific architecture) which can work on different instruction sets.

OP, in a way steam processors/CUDA cores are the opposite of HyperThreading. The purpose of HT is to take more advantage of your limited CPU cores by rapidly switching between multiple different instruction streams (threads) on the same hardware. GPUs do the opposite, execute a small number of threads on a lot of different hardware.
 
Back
Top