- Oct 14, 2006
- 3
- 0
- 0
Hi all,
This is my first post and I hope it finds some relevance here. I have been trying to extract out the peak ALU performance on CPU's. I have an Athlon 3200+ CPU on my machine and I have been trying to write a demo code that shows max GFlops on my machine. For this purpose, I have been using Visual Studio 2005 SSE intrinsics. I have written a small program that just loads 2 float[4] arrays into 2 __m128 variables and then perform dependent adds on them 1 after the other. Finally I store back the result. The number of memory ops is 3(2 for loads and 1 for store) while the number of ALU ops is very high(around 200K), so memory can't be the bottleneck.
However, when I do this, I get a peak GFlop number of 1.7 GFlop/sec while the CPU I have is 1.8GHz. Shouldn't the theoretical peak be 1.8 X 4 = 7.2 GFlops, and I should get something around 5 or so(in worst case)? Might there be some dependency related issues or other ALU optimizations that I might be missing? I can paste the code if someone wants to take a look at it.
Thanks and regards,
Kshitij.
This is my first post and I hope it finds some relevance here. I have been trying to extract out the peak ALU performance on CPU's. I have an Athlon 3200+ CPU on my machine and I have been trying to write a demo code that shows max GFlops on my machine. For this purpose, I have been using Visual Studio 2005 SSE intrinsics. I have written a small program that just loads 2 float[4] arrays into 2 __m128 variables and then perform dependent adds on them 1 after the other. Finally I store back the result. The number of memory ops is 3(2 for loads and 1 for store) while the number of ALU ops is very high(around 200K), so memory can't be the bottleneck.
However, when I do this, I get a peak GFlop number of 1.7 GFlop/sec while the CPU I have is 1.8GHz. Shouldn't the theoretical peak be 1.8 X 4 = 7.2 GFlops, and I should get something around 5 or so(in worst case)? Might there be some dependency related issues or other ALU optimizations that I might be missing? I can paste the code if someone wants to take a look at it.
Thanks and regards,
Kshitij.