Question ARMv8 vs. x86 ISA comparison - is x86 outdated?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Richie Rich

Senior member
Jul 28, 2019
470
229
76
I found an interesting study comparing the impact of instruction set architecture to CPU performance.

There is comparison of these ISA:
ARMv8
x86-64
Alpha

There are some very interesting results showing the ISA matters. We can see less ISA influence on in-order cores and much bigger influence at out of order cores. Basically OoO core based on ARM needs approximately 20% less cycles to perform same task which results in 20% higher performance per clock (PPC/IPC) and also 20% less power consumption in the same time.

Cycle_counts.png

uOP_counts.png

Exec_relative_to_ARM-Haswell.png

Citation:

4.6 Summary of the Findings
This section discusses a summary of the findings based on the aformentioned examples and results.
1. On average, ARMv8 outperforms other ISAs on similar microarchitectures, as it offers better instruction-level parallelism and has lower number of dynamicμ-ops compared to the other ISAs in most of the cases.
2. The average behavior of ISAs can be very different from their behavior for a particular phase of execution, which agrees with Venkat and Tullsen’s findings [20].
3. The performance differences across ISAs are significantly reduced in in-order cores compared to out-of-order cores.
4. On average, x86 has the highest number of dynamicμ-ops. This agrees with previous findings when compared to Alpha [20]. There are few examples where Alpha exceeds x86 in the number ofμ-ops, but ARMv8 always has lower or equal number ofμ-opswhen compared to x86.
5. x86 seems to have over-serialized code due to ISA limitations, such as use of implicit operands and overlap of one of the source and destination registers as observed by[21]. x86 has the highest average degree of use of registers (the average number ofinstructions, which consume the value generated by a particular instruction).
6. The total number of L1-instruction cache misses is very low across all ISAs for the studied cores. This infers that the sizes of L1 instruction caches that are used are sufficientto eliminate any ISA bottlenecks related to code size for the studied benchmarks.
7. Based on our results, the number of L1-data cache misses are similar across all ISAsin case of in-order cores, but the numbers can vary significantly in case of out-of-ordercores.
8. On average, the number of branch mispredictions are very close across ISAs for all cores with few exceptions such asgobmk, qsort and povray.
9.μ-ops to instructions ratio on x86 is usually less than 1.3, as observed by Blem et al [40]. However, the overall instructions count and mixes are ISA-dependent, which contradictsBlem et al’s [40] conclusion that instruction counts do not depend on ISAs.
10. Significant microarchitectural changes affect performance more than an ISA change does on a particular microarchitecture.
11. According to Blem et al’s study [40], performance differences on studied platforms are mainly because of microarchitectures. We see performance differences on exactly similar microarchitectures; which means ISAs are responsible for those performance differences. Moreover, since performance differences across ISAs are different for different microarchitectures, we can conclude that the behavior of ISA depends on microarchitecture as well but they certainly have a particular role in performance
 
  • Like
Reactions: Drazick

chrisjames61

Senior member
Dec 31, 2013
721
446
136
That article aside, how much advantage/disadvantage are we talking about here overall, in terms of ISA? Will that even matter that much in the big picture given the bazillion other variables involved?


Off topic, but this is actually the basis of a Stargate episode. Aliens decide to consult humans for help on how to defeat an enemy. The enemy's technology is impervious to the aliens' advanced energy weapons (which are centuries ahead of anything humans have), but can be severely damaged with human projectile weapons.

They are not looking to get human guns. Instead, they consulted the humans because the aliens felt they were so advanced now that they were not stupid enough to come up with such primitive ideas that can actually still work.

I have to check that out. Cool premise.
 
  • Like
Reactions: lightmanek

Hitman928

Diamond Member
Apr 15, 2012
5,244
7,793
136
Any comparisons with PowerPC, SPARC, or MIPS?

One of the papers referenced in the thesis included MIPS. This was the conclusion:

We analyze measurements on seven platforms spanning three ISAs (MIPS, ARM, and x86) over workloads spanning mobile, desktop, and server computing. Our methodical investigation demonstrates the role of ISA in modern microprocessors’ performance and energy efficiency. We find that ARM, MIPS, and x86 processors are simply engineering design points optimized for different levels of performance, and there is nothing fundamentally more energy efficient in one ISA class or the other. The ISA being RISC or CISC seems irrelevant.

Basically they found that when taking into account the different workloads each CPU was designed for, the ISA made no appreciable difference and that the microarchitecural design for the target workloads was what made the difference.

Interesting that other study shows ISAs gets equal when using OoO engine because it this study is exactly opposite way. The thing is that there is only one super wide engine.... that's famous 6xALU APple core and this is based on ARM. So I tend to believe that this physical evidence of such a wide scalar core supports this study.

Apple's ALU count is a microarchitecture design decision, not an ISA comparison.

I don't want to create thread about x86 vs. ARM war and who is gonna win because that's about economy. I'd like to keep it technical here. Later we can discuss ARMv9 pros and cons.

If you want to keep economics out of it, that's fine, but then it'll be a very short discussion. You still have to face the fact that the best hardware design in the world doesn't mean anything if there isn't software to run it. I don't think anyone disagrees that ARM is a more modern and efficient ISA, the question is by how much (not that significant by those who have done studies on it) and does it actually matter for markets where x86 is dominant because of the software component. This obviously leads to the economic side of things but we can just stop there.

IMHO there are no huge restrictions under the hood of x86 CPU. x86 use CISC/RISC abstraction layer so the difference in registry size for example is overcome (ARM has 31x 64-bit registry while x86-64 has only 16) because internally is using bigger resources. But here comes the problem: this abstraction layer dealing with old bottlenecks cost transistors, power and also produce inefficient code (compiler doesn't see the bigger resource of modern CPU and must produce code that fits resource poor i386 or x86-64 Atom core). However the most important question is how big the penalty is. Because if it small then who cares and you have all that great compatibility. But this penalty increases in time so while K8&C2D was doing great, Ice Lake might start to struggle.

Modern compilers are really, really good at extracting performance from x86. It's not like the compilers aren't aware of how modern x86 CPUs function. Compilers also allow you to target optimizations for different architectures (Zen, Haswell+, Atom, etc.) so I don't see having to support old/small x86 cores being a big deal from a compiler perspective.

This leads me to reason why we don't see some super wide 6xALU design at x86 similar to Apple (and if Matterhorn is suggested to have IPC of A12 then it will have probably 6xALUs too). x86 core with 6xALU is possible but it would struggle much more than Ice Lake so it would need more time to develop.

Because it's not as simple as just slapping a couple more ALUs on the chip and winning. There are a lot of trade-offs that are made when designing a CPU and Intel/AMD have very different design goals than Apple does.

That's probably why Intel and AMD focus much more at FPU/ SIMD performance because these new ISA extensions like AVX are modern and bottleneck free. That's why I'm starting to believe that Zen3 performance rumors makes sense (+17% int IPC, +50% fpu).

AVX+ isn't purely about floating point operations, there are integer instructions as well though the focus is on floating point. Again though, design goals and trade-offs.

That article aside, how much advantage/disadvantage are we talking about here overall, in terms of ISA? Will that even matter that much in the big picture given the bazillion other variables involved?

Depends on what types of instructions are being executed but from the paper referenced earlier where they cycle accurate simulate the difference between the two ISAs with OoO designs, it appears to be 10 - 15% or so on average.
 

Schmide

Diamond Member
Mar 7, 2002
5,586
718
126
I found an interesting study comparing the impact of instruction set architecture to CPU performance.

There is comparison of these ISA:
ARMv8
x86-64
Alpha

There are some very interesting results showing the ISA matters. We can see less ISA influence on in-order cores and much bigger influence at out of order cores. Basically OoO core based on ARM needs approximately 20% less cycles to perform same task which results in 20% higher performance per clock (PPC/IPC) and also 20% less power consumption in the sme time.

(snip)

I really like this paper, but I dislike you placing your 20% (less, more, power) spin on it. I do not see those conclusions in the paper and I read the whole thing. A less diligent reader would interpret this conclusion as that of the author.

Moreover if you look at the simulation (3.3) there were some serious liberties taken with regards to real life systems.

Every system has 25.4 GB/s bandwidth on varying bus frequencies and latencies. Yes this artificially removes memory as a factor, but then when you normalize the frequency across different ISAs it would inflate those with lesser speed to even higher relative transfer rates. The fact that the author does this and neglects to mention it anywhere in the article is kind of troubling. It could be a typo as well.

A higher frequency processor has a higher penalty for cache misses. A deeper pipeline can run at higher frequencies but serial operations on those extra stages are extra cycles blocking any dependent instruction. In the 2 main comparisons ARM and Haswell you have a 15 to 19 stage difference respectively. 15/19 = 0.79 humm?

The code examples do show some of the true disparities between RISC and CISC. ARM's ability to encode operations in more than 2 operands alleviates the need to copy values before destroying one of them. x86s ability to encode load stores into the operations removes the need for explicit loads.

The one underlying nuance that kept playing in my head as I read the paper. Neither ISA ARM or x86 is pure CISC or RISC. ARM has allowed their instruction set to grow to the point where reduced really doesn't apply. x86 has equally used micro-ops and fused-ops to reduce its complexity.
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
Off topic, but this is actually the basis of a Stargate episode. Aliens decide to consult humans for help on how to defeat an enemy. The enemy's technology is impervious to the aliens' advanced energy weapons (which are centuries ahead of anything humans have), but can be severely damaged with human projectile weapons.

They are not looking to get human guns. Instead, they consulted the humans because the aliens felt they were so advanced now that they were not stupid enough to come up with such primitive ideas that can actually still work.
Asgard are funny that way - they also seemed to be oblivious to the fact that while their cloned bodies were slowly degrading with eacj cloning that they can 'replicate' things from scratch which makes that point null and void, or that they can download their minds into their computers clearly with little difficulty and therefore could survive in robotic bodies if necessary without worry of biological degradation.
 

Glo.

Diamond Member
Apr 25, 2015
5,705
4,549
136
It will still be a looong road before we will actually get Console - like experiences on Mobile...

Big words, small actions, as always in this industry.

However, you guys have no idea, how much I support any ARM inititative...

It may actually make entry-level gaming more approachable, and may make Linux go-to gaming platform for this.
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
It will still be a looong road before we will actually get Console - like experiences on Mobile...

Big words, small actions, as always in this industry.

However, you guys have no idea, how much I support any ARM inititative...

It may actually make entry-level gaming more approachable, and may make Linux go-to gaming platform for this.
It depends what you mean by "console like" - touch screens are the opposite of ideal for gaming, what we need is more manufacturers with the balls to try and Xperia Play type design, or something like the Switch as I'm pretty sure there are a few in development at the moment.
 

Glo.

Diamond Member
Apr 25, 2015
5,705
4,549
136
It depends what you mean by "console like" - touch screens are the opposite of ideal for gaming, what we need is more manufacturers with the balls to try and Xperia Play type design, or something like the Switch as I'm pretty sure there are a few in development at the moment.
To be fair, CPU side might already be ready for prime time gaming, as we are hardly CPU bound these days, and the mobile games are rendered in sometimes higher resolutions, than FullHD.

Tha being said - we still need more powerful GPUs. Samsung licensing RDNA Architecture from AMD could be a good direction.

I for one wish that one day, I can take a ARM board in any of the retail stores that has SoC on it, put on it Colling, RAM, SSD, load Linux, and experience cheap mobile based gaming console on my TV. This is what I mean console - like experience.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
It will still be a looong road before we will actually get Console - like experiences on Mobile...

Big words, small actions, as always in this industry.

However, you guys have no idea, how much I support any ARM inititative...

It may actually make entry-level gaming more approachable, and may make Linux go-to gaming platform for this.
Nintendo Switch says hi. 🙂
 

Glo.

Diamond Member
Apr 25, 2015
5,705
4,549
136
Nintendo Switch says hi. 🙂
Is it really console-like experience? :)

Ticking boxes from this post(ignoring the last paragraph)?:
To be fair, CPU side might already be ready for prime time gaming, as we are hardly CPU bound these days, and the mobile games are rendered in sometimes higher resolutions, than FullHD.

Tha being said - we still need more powerful GPUs. Samsung licensing RDNA Architecture from AMD could be a good direction.

I for one wish that one day, I can take a ARM board in any of the retail stores that has SoC on it, put on it Colling, RAM, SSD, load Linux, and experience cheap mobile based gaming console on my TV. This is what I mean console - like experience.
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
Tha being said - we still need more powerful GPUs. Samsung licensing RDNA Architecture from AMD could be a good direction.
The SD 845 and maybe even 835 are already superior to the TX1 used in the Switch GPU wise.

SD 855 slaughters it and the 8cx/SG1 just nukes it out of existence.

So there is already more than enough GPU power in the mobile marketplace to justify better games - albeit with the same problem PC's have with multiple hardware configs leading to lower general optimisation vs a console.

Samsung/AMD's new GPU would be similar in performance to 8cx/SG1 if Adreno scales linearly, the question remains as to its power consumption class vs 8cx.

Is it really console-like experience? :)
Yes, I've played on it - Switch is basically the dream of what mobiles could be like with docking stations and good physical controllers.

Once you dock, it's basically a home console - albeit at best equal to WiiU. Nintendo haven't made a truly great spec jump across the board since Gamecube over N64 sadly, though with a state of the art SoC they could be superior to the PS4 on a reasonable mobile power budget.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,705
4,549
136
The SD 845 and maybe even 835 are already superior to the TX1 used in the Switch GPU wise.

SD 855 slaughters it and the 8cx/SG1 just nukes it out of existence.

So there is already more than enough GPU power in the mobile marketplace to justify better games - albeit with the same problem PC's have with multiple hardware configs leading to lower general optimisation vs a console.

Samsung/AMD's new GPU would be similar in performance to 8cx/SG1 if Adreno scales linearly, the question remains as to its power consumption class vs 8cx.


Yes, I've played on it - Switch is basically the dream of what mobiles could be like with docking stations and good physical controllers.

Once you dock, it's basically a home console - albeit at best equal to WiiU. Nintendo haven't made a truly great spec jump across the board since Gamecube over N64 sadly, though with a state of the art SoC they could be superior to the PS4 on a reasonable mobile power budget.
Which is exactly what Im implying. If we truly would want ARM to explode, we need CPUs(we already have that to handle games in Console-like experience), we need GPU that are at least the same level as Xbox One S and PlayStation 4 Slim, possibly even in form factors that support both OEM and DIY markets.

Stuff like this makes me think that in the madness that ChromeOS appears to be right now, there might be a method. Because ChromeOS will have(already does it?) access to ALL of Android Games.

Heck, it even makes Apple ARM rumors, Apple's Gaming PC, and Apple Arcade put into completely different perspective!
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,829
136
Once you dock, it's basically a home console - albeit at best equal to WiiU. Nintendo haven't made a truly great spec jump across the board since Gamecube over N64 sadly, though with a state of the art SoC they could be superior to the PS4 on a reasonable mobile power budget.

Nintendo likes cheap. Something based on SD865 or 8cx/SG1 would be quite a bit more expensive to build. Sony and MS aren't tapping into that potential market, and Nintendo won't go there. So someone would have to come out of left field with a good lineup of titles. The hardware would be easy. The software? Not so much.
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
Nintendo likes cheap. Something based on SD865 or 8cx/SG1 would be quite a bit more expensive to build. Sony and MS aren't tapping into that potential market, and Nintendo won't go there. So someone would have to come out of left field with a good lineup of titles. The hardware would be easy. The software? Not so much.
Oddly Nintendo are making that software problem easier for competitors already.

By accepting more hardcore AAA games to the Switch platform they have opened the mobile console space up to a wider, more intense audience - I don't know offhand what the sales numbers are for those games, but it must be worth it to port to a console closer to PS3 than PS4 in GPU power, and all the optimisation effort that goes with that.
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
If we truly would want ARM to explode
It already has - the mobile market is (or was) worth far more than the PC and console market in sales numbers.

Don't be fooled into thinking that it hasn't because it's not in all consoles or PC's - it's already in Switch, in 3DS (and DS predecessors), and in the PS Vita.

For now, the main reason that x86 continues on in the desktop and laptop space is it's continuing vendor momentum and legacy software userbase - too many companies rely on legacy SW and emulation/binary translation is not an attractive option to such people.

Had AMD continued to produce less than attractive CPU hardware coupled with Intel's process foibles that momentum could very well have slowed, market enthusiasm cooled, and allowed ARM to take a more serious foothold - as it is now I expect it will take much longer to reach such a point with AMD's phoenix like rise and Intel finally moving if not back up to full speed yet.

An area that ARM can dominate is those tiny SBC's - if they can sponsor a company to produce a decent one with a more up to date CPU and GPU core than RPi 4, then they could make some head way as a super compact SFF PC. I could imagine such systems would go down well as digital signage machines in the right places too.

One of the problems that most RPi like SBC's have is the "some assembly required" mentality that attracts devs, when many consumers just want a fully pre packaged and assembled product ready to go out of the box.
 

Glo.

Diamond Member
Apr 25, 2015
5,705
4,549
136
It already has - the mobile market is (or was) worth far more than the PC and console market in sales numbers.

Don't be fooled into thinking that it hasn't because it's not in all consoles or PC's - it's already in Switch, in 3DS (and DS predecessors), and in the PS Vita.

For now, the main reason that x86 continues on in the desktop and laptop space is it's continuing vendor momentum and legacy software userbase - too many companies rely on legacy SW and emulation/binary translation is not an attractive option to such people.

Had AMD continued to produce less than attractive CPU hardware coupled with Intel's process foibles that momentum could very well have slowed, market enthusiasm cooled, and allowed ARM to take a more serious foothold - as it is now I expect it will take much longer to reach such a point with AMD's phoenix like rise and Intel finally moving if not back up to full speed yet.

An area that ARM can dominate is those tiny SBC's - if they can sponsor a company to produce a decent one with a more up to date CPU and GPU core than RPi 4, then they could make some head way as a super compact SFF PC. I could imagine such systems would go down well as digital signage machines in the right places too.

One of the problems that most RPi like SBC's have is the "some assembly required" mentality that attracts devs, when many consumers just want a fully pre packaged and assembled product ready to go out of the box.

This is EXACTLY what I meant saying what ARM had needed to genuinely explode. I can easily see the reality, that Samsung is doing exactly the same thing, but with AMD.

We will get more ARM ports of software, and this is genuine chance for Linux to Explode, which is for me - the most important perspective.
 
  • Like
Reactions: Tlh97 and coercitiv

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136

This is EXACTLY what I meant saying what ARM had needed to genuinely explode. I can easily see the reality, that Samsung is doing exactly the same thing, but with AMD.

We will get more ARM ports of software, and this is genuine chance for Linux to Explode, which is for me - the most important perspective.
I already read about that, can't say I'm impressed at the reported 4/8 cores for such a huge board when compared to SBC's like Hikey 970 cutting a far smaller footprint.

It also has an optical network port - I can't think many average consumers could make use of such a port.
 

Schmide

Diamond Member
Mar 7, 2002
5,586
718
126
Since this thread seems to be dead. Some final words.

ARM64 is certainly moving in the right direction. We'll have to see if the recipe of today's higher ARM processors will lend themselves to higher end products capable of matching the current desktop.

Ghz? The aggressively large L2 cache with extremely low latency may have trouble at higher frequencies as will the tight pipeline.

Ecosystem. AFAIK there is one socket for ARM the ThunderX2. Which they list as an "Optionally with LGA 4077 socket" How can something grow when you have many players all producing incompatible products. As much as Apple or nVidia produce killer products in their own sphere of influence, coding for that only provides a finite amount of growth. Jetson seems to be big enough to fill a niche market, yet with performance comes price. An $800 SBC to get desktop performance with zero upgrade ability will only tempt a few. (I've thought about it)

With the amount of rancor around the B550 and it's generational compatibility, the chance of the above making inroads is slim to none.
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
How can something grow when you have many players all producing incompatible products.
Ask Intel and AMD - their sockets incompatible, and have been so since AMD became an independent vendor rather than a spare supplier for Intel chips.
Jetson seems to be big enough to fill a niche market, yet with performance comes price. An $800 SBC to get desktop performance with zero upgrade ability will only tempt a few. (I've thought about it)
Huawei/HiSilicon did an SBC called Hikey for the K960 and 970 for much cheaper - sadly no 980+ model was produced to give us A76 cores to play with :(
 

NTMBK

Lifer
Nov 14, 2011
10,232
5,013
136

Here's the summary in one picture:

embed.php


It all depends on pricing, of course.
 

gdansk

Platinum Member
Feb 8, 2011
2,078
2,559
136
Here's the summary in one picture:

embed.php


It all depends on pricing, of course.
Power draw of Graviton2? It's would be pretty similar in performance per watt if Anandtech's estimate were correct (~110W). But obviously Graviton would lose its efficiency if it was clocked 50% higher (to be competitive).
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
2021 - we expect new ARMv9 when ARM Matterhorn arrives. This should be 64-bit evolution of ARMv8 including 128-bit to 2048-bit SVE2 vectors extenstion (effectively replacing 128-bit NEON). Since next year ARM will take a leadership in terms of ISA development over x86 with it's AVX512.

IMHO more important thing effort to keep ISA clear and simple. Compare x86 with it's extensions:
  1. MMX
  2. SSE
  3. SSE2
  4. SSE3
  5. SSSE3
  6. SSE4.1
  7. SSE4.2
  8. SSE4a
  9. AVX
  10. AVX2
  11. AVX512 (including 17 instruction subsets)
x86 legacy is kind of iron ball. The beauty of SVE2 is it's width agnostic (128-bit to 2048-bit) so it can fully replace 128-bit NEON in cheap tiny smartphone cores while allowing expand to 2048-bit in servers and supercomputers (like Fujitsu A64FX).

It looks like x86 is missing innovation at lot. Plus the old lecacy stuff they need to keep in core even nobody use that anymore.... IMHO x86 is outdated in terms of ISA and loosing in compare to ARM in every way right now.