Anandtech article about ARM floating point perf

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
I thought it was an excellent piece. I would love to get hold of the code you used for determining the FP throughput per clock.
 

poofyhairguy

Lifer
Nov 20, 2005
14,612
318
126
I thought it was really good as well. Only thing I wish it added was a bit of historical perspective with x86- IE: how does current ARM stack up against early X2 or Core Duos instead of i5s?

We all know that mobile tech is slower than modern desktop/laptop tech, but I assume that my S4 is more powerful than my first Macbook and that fact is amazing.
 

rahulgarg

Junior Member
Oct 3, 2012
14
0
0
Thanks for the kind words everyone. About the code, I do plan to open-source it at some point within a few months, after perhaps we get 1 or 2 more articles out.
It is not exactly rocket science though. With the description in the article, and a little bit of experimenting, you can figure it out pretty easily. It is a standard technique, so you should be able to find articles and maybe existing code about it on the webs.

Comparisons with x86: Well, I wanted to avoid detailed comparisons in this article.
Generally cross-ISA comparisons about instruction throughputs are a minefield. Ideally cross-ISA comparisons are best done at the application level rather than synthetic instruction throughput level.

Anyway, since you asked, here is some data from memory:

Core 2 Duo: 4 DP flops/cycle, 8 SP flops/cycle
Nehalem: 4 DP flops/cycle, 8 SP flops/cycle
k10: 4 DP flops/cycle, 8 SP flops/cycle
Sandy/Ivy: 8 DP flops/cycle, 16 SP flops/cycle
 
Last edited:

poofyhairguy

Lifer
Nov 20, 2005
14,612
318
126
Comparisons with x86: Well, I wanted to avoid detailed comparisons in this article.
Generally cross-ISA comparisons about instruction throughputs are a minefield. Ideally cross-ISA comparisons are best done at the application level rather than synthetic instruction throughput level.

Ah ok good to know, thanks

Anyway, since you asked, here is some data from memory:

Core 2 Duo: 2 DP flops/cycle, 4 SP flops/cycle
Nehalem: 4 DP flops/cycle, 8 SP flops/cycle
k10: 4 DP flops/cycle, 8 SP flops/cycle
Sandy/Ivy: 8 DP flops/cycle, 16 SP flops/cycle

Oh wow that is awesome. Thank you again for the article and further information!
 

ElFenix

Elite Member
Super Moderator
Mar 20, 2000
102,393
8,552
126
My dumb ass actually believes the Nvidia slides.

Where are we then? Katmai level? Northwood level?

P54c on a per clock basis.




Note: i have no idea what i'm talking about but plan on reading the article when i'm off the phone
 

ChronoReverse

Platinum Member
Mar 4, 2004
2,562
31
91
I'd say Pentium III level actually. Probably higher in absolute terms. (assuming we're talking about the A15).