- Apr 23, 2016
- 237
- 0
- 0
The term IPC (Instructions Per Cycle) is misunderstood and misused a lot when it is applied to GPUs.
It originated in the CPU world where it indicates the average number of instructions executed per clock per core for a given workload. While the complexity of a modern CPU architecture cannot be captured by a single number, IPC remains a succinct and fairly intuitive way to track how good a core is. It's not always very useful to compare cores, especially if they don't use the same ISA.
The first problem with applying it to GPUs comes from the fact that unlike CPUs there are so many other (often independent) units that concur at keeping the GPU busy. A typical example is shadow map rendering where the GPU cores are idle most of the time, while other subsystems, such us rasterizers and ROPs, can be fully loaded. In such scenario IPC tells us close to nothing about our GPU.
To make things worse a decreased IPC in a new architecture might signal an improvement of the cores, while the rest of the system has not been improved. In the aforementioned shadow map rendering case the cores could be idling even more but it would an error to consider this an issue or a sign of poorer performance/product. This the first misconception about IPC and it would be preferable to come up with a new GOU-only unit of work done per unit time. This goes beyond the scope of this post and from now I'll assume that IPC applied to GPUs is such a new metric. I suspect most are already using the term IPC this way anyway, without giving it much thought, although it is important to understand what it means and where it comes from.
The second and far worse misconception is about IPC and GPU frequency. I see a lot of posts where IPC is computed as GPU performance per clock in a given workload (1st misconception, it's acceptable, not big deal) and compared across different architectures running at completely different frequency. Invariably the higher clocked GPU shows lower "IPC" and people boldly claim "IPC went down.. This is bad, company X sucks, etc". *Too bad this comparison is completely and utterly meaningless because IPC is almost always inversely proportional to frequency.* This is really straightforward to understand since memories don't scale up like cores do, and as we increase core frequency the likelihood of our cores (and other GPU units) starving for data and stalling increases (ergo IPC goes down..).
Let me repeat it. IPC is a function of frequency. If frequency goes up IPC will likely go down, especially at higher frequencies. If you think a GPU architecture is worse because, in this context, IPC goes down, you are swapping cause with effect and you are sooo wrong
Of course GPU architects can modify the GPU to lower the cores IPC in order to scale up frequency but you can't prove this by testing at different frequency and with different memory type and memory bandwidth. This is not always possible but to demonstrate IPC was lowered "on purpose", ideally you have to compare different GPU cores running the same workload, at the same frequency, with the same memory bandwidth (and memory type too).
I don't think what I wrote here is going to change the way people misuse IPC all the time but when it happens I (and you) can point them to this post
It originated in the CPU world where it indicates the average number of instructions executed per clock per core for a given workload. While the complexity of a modern CPU architecture cannot be captured by a single number, IPC remains a succinct and fairly intuitive way to track how good a core is. It's not always very useful to compare cores, especially if they don't use the same ISA.
The first problem with applying it to GPUs comes from the fact that unlike CPUs there are so many other (often independent) units that concur at keeping the GPU busy. A typical example is shadow map rendering where the GPU cores are idle most of the time, while other subsystems, such us rasterizers and ROPs, can be fully loaded. In such scenario IPC tells us close to nothing about our GPU.
To make things worse a decreased IPC in a new architecture might signal an improvement of the cores, while the rest of the system has not been improved. In the aforementioned shadow map rendering case the cores could be idling even more but it would an error to consider this an issue or a sign of poorer performance/product. This the first misconception about IPC and it would be preferable to come up with a new GOU-only unit of work done per unit time. This goes beyond the scope of this post and from now I'll assume that IPC applied to GPUs is such a new metric. I suspect most are already using the term IPC this way anyway, without giving it much thought, although it is important to understand what it means and where it comes from.
The second and far worse misconception is about IPC and GPU frequency. I see a lot of posts where IPC is computed as GPU performance per clock in a given workload (1st misconception, it's acceptable, not big deal) and compared across different architectures running at completely different frequency. Invariably the higher clocked GPU shows lower "IPC" and people boldly claim "IPC went down.. This is bad, company X sucks, etc". *Too bad this comparison is completely and utterly meaningless because IPC is almost always inversely proportional to frequency.* This is really straightforward to understand since memories don't scale up like cores do, and as we increase core frequency the likelihood of our cores (and other GPU units) starving for data and stalling increases (ergo IPC goes down..).
Let me repeat it. IPC is a function of frequency. If frequency goes up IPC will likely go down, especially at higher frequencies. If you think a GPU architecture is worse because, in this context, IPC goes down, you are swapping cause with effect and you are sooo wrong
Of course GPU architects can modify the GPU to lower the cores IPC in order to scale up frequency but you can't prove this by testing at different frequency and with different memory type and memory bandwidth. This is not always possible but to demonstrate IPC was lowered "on purpose", ideally you have to compare different GPU cores running the same workload, at the same frequency, with the same memory bandwidth (and memory type too).
I don't think what I wrote here is going to change the way people misuse IPC all the time but when it happens I (and you) can point them to this post