Comparing current x86 and ARM chips (performance)

-Slacker-

Golden Member
Feb 24, 2010
1,563
0
76
So, in light of the recent ARM threads in the forums, I would imagine that some people are curious how an ARM chip, or any risc based chip found in tablets would hold up against the closest comparable chips Intel or AMD sell and by "some people I actually mean me, of course ... also now I'm hungry for potato chips for some reason... :(

I'm wondering if there's a method of comparing - say an A5 apu with one of those 5w bobcat apus - if not in terms of real world applications then, maybe, in some highly synthetic benchmarks that measure raw general performance/potential*

*I'm sorry, I pulled that phrase right out of my ***, I actually have no idea of what I'm talking about :oops:


Anyway, despite my lack of know-how, I'm sure that at least the general purpose of the OP is fairly clear.
 

LOL_Wut_Axel

Diamond Member
Mar 26, 2011
4,310
8
81
They're basically a lot slower, but also use a lot less power. And no, they're not more power efficient. Again, they use a lot less power because they're a lot less powerful.

The only real, somewhat meaningful benchmark we have right now is Coremark, and that does seem to be biased in favor of ARM SoCs. It also measures only integer performance, and assumes almost 100% scaling of all cores. Regardless, with the latest version, the upcoming Tegra 3 SoC (the high-end model, that is) scores ~11,400 points and a Core 2 Duo T7200 CPU scores ~15,000 points.

image2.png


That means the Core 2 Duo T7200 is ~71% faster in this synthetic benchmark than the Tegra 3 SoC. It should be noted that the T7200 is a five-year-old CPU, and it still has much higher integer performance. It also consumes a lot more power. We also have newer Intel CPUs that consume the same amount of power but are several times faster.

As far as floating point performance goes, that's ARM's huge problem. I don't have numbers for upcoming ARM SoCs, but a 600MHz Cortex A8 achieves 23 MFLOPS on Linpack, compared to 933 MFLOPS for an Atom N270. That means the Atom is 40x faster in this metric. Both are outdated and have been replaced, but it still serves as a good overview as to what performance is.

Like an editor at arstechnica said: "there's no magical ARM performance elves."
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
I'm wondering if there's a method of comparing - say an A5 apu with one of those 5w bobcat apus - if not in terms of real world applications then, maybe, in some highly synthetic benchmarks that measure raw general performance/potential*

It's simple enough in theory. The most controlled way to do this is to install some Linux distro on each system and perform the benchmarks that way. However, for the A5, I don't know if people have been able to install Linux on the iPad2 or iPhone 4S yet. For other Android phones it's possible. So even then you can have a more controlled comparison between different types of ARM processors (Qualcomm vs. Samsung vs. Marvell vs. etc.).
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
To get direct comparisons, someone would need to get some eval boards, and try a compatible Linux on them. There's tons of software canned in Debian and friends, so it should be possible to do. But, somebody would need to spend their own money to do that, so I wouldn't expect it any time soon.

There are benchmarks between phones, but they don't tend to help much, comparing to PCs. More than anything, such benchmarks tend to show that Qualcomm knows what they're doing, when designing chips for smart phones.
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
A review comparing ARM v x86 Atom is available: http://www.slideshare.net/napoleani...l-atomarchitectural-and-benchmark-comparisons

Its a long paper but if you scroll to the end you'll find a table of benchmark results and some analysis. The raw figures for what we care about is below:

Dhrystone(DMIPS) Whetstone(MIPS) linpack(KFlops)
ARM A8 Cortex 883 100 23376
ATOM N330 1822 1667 933638

What can a full on x86 core do, ie a 2600k? Well http://www.guru3d.com/article/core-i5-2500k-and-core-i7-2600k-review/13 breaks out the dhrystone and whetstone results. The results on guru3d are in Giga ips so they are a 3 orders of magnitude off, so I've converted them to MIPS below:

2600k 118000 83000 ~40000000

Its basically 1000's of times faster in synthetic benchmarks. These results aren't directly comparable because Sandra Dhrystone and Whetstone are different beats to the benchmarks linked but if you run linux you can quite happily download and run them yourself and see how quick your system is in comparison. The figures are orders of magnitude however so no amount of messing with the benchmark is going to stop them being many many times faster. In short there is a massive difference.
 
Mar 10, 2006
11,715
2,012
126
I forgot to mention that my 10000 figure was actually comparing a single core version of the Cortex A15 with an 8 core Haswell.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
http://forums.anandtech.com/showpost.php?p=30819112&postcount=29

According to the above post Cortex A9 is competitive with Atom clock for clock.


------------------------------------------------------------------------

CPUperf.jpg


The above graph comes from the Anandtech Moorestown article: http://www.anandtech.com/show/3696/...00-series-the-fastest-smartphone-processor/14

Keep in mind that SpecInt measures single threaded performance. So we are looking at 1.5 Ghz Atom core vs 1 Ghz Cortex A9 cores (not one atom core vs. two Cortex A9 cores)
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Some info I found a while back on the upcoming Cortex A15:

Integer performance increases 50% (clock for clock) over Cortex A9 accompanied with good clockspeed increases for the Tablet versions of the CPU (eg, Qualcomm's Krait A15 core is spec'd at 2.5 Ghz)

------------------------------------------------------------------------------------

http://eda360insider.wordpress.com/...sor-ip-core-need-a-new-category…superstar-ip/

A very interesting article I found on Cortex A15.

According to this Integer performance is 52% better than Cortex A9.

a15-relative-performance-numbers.jpg


Memory, Floating Point, Gaming and other comparisons are also listed.

a15-performance-graphs.jpg


For the tech savvy here is a chart of the A15 pipeline:

a15-pipeline.jpg
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
To get direct comparisons, someone would need to get some eval boards, and try a compatible Linux on them. There's tons of software canned in Debian and friends, so it should be possible to do. But, somebody would need to spend their own money to do that, so I wouldn't expect it any time soon.

An eval board isn't necessary. If your smartphone can be rooted, then there are some options:

1) Install a Linux distribution on it (eg Debian)
2) If it's already running Android, then you can cross-compile a static benchmark program for native ARM and run it.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Like I keep saying: Intel's doomed!

I don't think Intel is doomed. (They are way to smart to let that happen)

However, I do think comparing raw performance here is somewhat tricky for the following reasons:

1. Intel and ARM don't run the same software. This prevents us from comparing performance in the same way we compare Intel and AMD chips.

2. Just because a certain processor has more raw performance does not guarantee success. I think a good example of this was the original Intel chip beating the High Peformance RISC designs of the day. The Intel chip was actually lower performance, but look who is the high performance king now!
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
1. Intel and ARM don't run the same software. This prevents us from comparing performance in the same way we compare Intel and AMD chips.

Not exactly. Have both architectures run some Linux distribution and compile an open source benchmark (eg Linpack, which almost any x86 processor will handily beat ARM, even VIA's CPUs).
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
An eval board isn't necessary. If your smartphone can be rooted, then there are some options:

1) Install a Linux distribution on it (eg Debian)
2) If it's already running Android, then you can cross-compile a static benchmark program for native ARM and run it.
A low-speed 32-bit memory interface would limit even a CPU like the Atom. 64-bit wide 8GB/s+ DDR2 or DDR3 should be there for a decent comparison, but I don't know what you could buy retail with decent RAM bandwidth, 2-4 A9 cores at 1GHz+, and the ability to run any software you want.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Not exactly. Have both architectures run some Linux distribution and compile an open source benchmark (eg Linpack, which almost any x86 processor will handily beat ARM, even VIA's CPUs).

I know Linpack measures "Floating point".

For whatever reason the ARM specific programs seem to minimize those calculations.

However, with Cortex A15 it seems ARM has decided to beef up floating point:

a15-performance-graphs.jpg


P.S. From what I gather the ARM CPU is biased towards Integer performance rather than floating point.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
A low-speed 32-bit memory interface would limit even a CPU like the Atom. 64-bit wide 8GB/s+ DDR2 or DDR3 should be there for a decent comparison, but I don't know what you could buy retail with decent RAM bandwidth, 2-4 A9 cores at 1GHz+, and the ability to run any software you want.

Well, you could always write a custom benchmark program that fits in the processor cache.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
A low-speed 32-bit memory interface would limit even a CPU like the Atom. 64-bit wide 8GB/s+ DDR2 or DDR3 should be there for a decent comparison, but I don't know what you could buy retail with decent RAM bandwidth, 2-4 A9 cores at 1GHz+, and the ability to run any software you want.

Well, with a server or "desktop" ARMv8 this would obviously change right?

When that happens it will be very interesting to compare the ARMv8 and Intel cores on a level playing field. (die size vs. power vs.performance, etc)
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Well, you could always write a custom benchmark program that fits in the processor cache.
I could, but that would be a stupid way to compare processors. Coremark is already worthless enough. I would take SPECint and SPECfp, with plain kernels and compiler options, the Phoronix test suite, or some other array of benchmarks based on tasks the hardware may be asked to perform, which might actually stress the hardware (given that Linux/FOSS is the path of least resistance, more server-centric tests are inevitable, but they can still provide good comparative results).

You couldn't buy anything arm with decent ram bandwidth....
They do exist, though. Not in a high performance sense, but there are a few companies with chips that can run 64 bit wide DDR3 1066, which would be good enough to compare fairly against Atom and Bobcat. Nufront has showed working chips, for example. Getting one to play with, however...
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Some info I found a while back on the upcoming Cortex A15:

Integer performance increases 50% (clock for clock) over Cortex A9 accompanied with good clockspeed increases for the Tablet versions of the CPU (eg, Qualcomm's Krait A15 core is spec'd at 2.5 Ghz)

Krait is not exactly an A15, but Qualcomm does seem to put much more powerful floating point capabilities in their custom cores (Scorpion and Krait) than ARM does.

So the A15 has a 15-24 stage pipeline? Seems a little long.

It's probably something like 15 stages for integer operations, 16 stages for loads that hit in the L1 cache, and 24 stages for stuff like floating point adds. It's normal for the FP pipe to be much deeper (for example, this site says K8 had a 12 stage integer pipeline but a 17 stage floating point pipeline).


Edit: Apparently the CPU in my tablet supports 333MHz/266MHz LPDDR2 (presumably single-channel 64 bit?)... any benchmarks against a laptop would require seriously underclocking the laptop's memory (or at the very least, measuring benchmark scaling vs. DDR speed)
 
Last edited:

-Slacker-

Golden Member
Feb 24, 2010
1,563
0
76
Hmm

Seems to me like x86 is still not there in terms of performance per watt, at least at that tdp level (.5 ~ 2w), in non floating point dependent operations ... Is that correct?
 

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
Don't compare power used by processor. Compare power used per unit of work. I think you will be surprised.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Hmm

Seems to me like x86 is still not there in terms of performance per watt, at least at that tdp level (.5 ~ 2w), in non floating point dependent operations ... Is that correct?
Depends on what you are doing. Like IPC, it can only really be compared doing the same sort of work, and no work anyone actually cares about entirely fits in the cache, so the whole platform (but especially memory) matters. For a situation like a phone or tablet, though, no, x86 isn't even close. x86 code is so filled with direct memory operations that you can't just turn off big chunks of it to idle low, like with a RISC ISA. Intel will have to brute force their way in to those markets by mostly process advantage.

Well, with a server or "desktop" ARMv8 this would obviously change right?

When that happens it will be very interesting to compare the ARMv8 and Intel cores on a level playing field. (die size vs. power vs.performance, etc)
Maybe. What will really change, if ARM is serious about servers, is making readily available the most fundamental of RAS features: error detection. ARMs with ECC for all their IO, down to low-level caches, are rare clandestine devices, if not vaporware, at the moment. A few are out there that I know of with ECC RAM support, but I'm not sure about ECC caches (optional on some CPUs, but I've found no mention for the A9), nor on-chip IO.

With error checking from the peripheral IO to the lower caches, and MCA to flag anything else in the chip, you can know that there either has been an error, or hasn't (well, at least with high enough confidence that you can usually ignore what might make it through). You can know with some confidence that if it hasn't been logged, it hasn't happened. If you know you can do this, you can design in your preferred redundancy. For anything more important than cheap oversold-to-the-nines web serving, I'd prefer having that over trying to handle it entirely by software (synchronously duplicate, and at critical points compare, all work done, across multiple physical computers, with ability to roll back several such points on an error, and retry--by the time you've got it working, it would probably have been better to get a few big bad x86 servers from the start, and software methods still might leave holes for errors to occur an propagate).

P.S. http://perspectives.mvdirona.com/2009/10/07/YouReallyDONeedECCMemory.aspx
 
Last edited: