Question ARMv8 vs. x86 ISA comparison - is x86 outdated?

Richie Rich

Senior member
Jul 28, 2019
326
157
76
I found an interesting study comparing the impact of instruction set architecture to CPU performance.

There is comparison of these ISA:
ARMv8
x86-64
Alpha

There are some very interesting results showing the ISA matters. We can see less ISA influence on in-order cores and much bigger influence at out of order cores. Basically OoO core based on ARM needs approximately 20% less cycles to perform same task which results in 20% higher performance per clock (PPC/IPC) and also 20% less power consumption in the same time.

Cycle_counts.png

uOP_counts.png

Exec_relative_to_ARM-Haswell.png

Citation:

4.6 Summary of the Findings
This section discusses a summary of the findings based on the aformentioned examples and results.
1. On average, ARMv8 outperforms other ISAs on similar microarchitectures, as it offers better instruction-level parallelism and has lower number of dynamicμ-ops compared to the other ISAs in most of the cases.
2. The average behavior of ISAs can be very different from their behavior for a particular phase of execution, which agrees with Venkat and Tullsen’s findings [20].
3. The performance differences across ISAs are significantly reduced in in-order cores compared to out-of-order cores.
4. On average, x86 has the highest number of dynamicμ-ops. This agrees with previous findings when compared to Alpha [20]. There are few examples where Alpha exceeds x86 in the number ofμ-ops, but ARMv8 always has lower or equal number ofμ-opswhen compared to x86.
5. x86 seems to have over-serialized code due to ISA limitations, such as use of implicit operands and overlap of one of the source and destination registers as observed by[21]. x86 has the highest average degree of use of registers (the average number ofinstructions, which consume the value generated by a particular instruction).
6. The total number of L1-instruction cache misses is very low across all ISAs for the studied cores. This infers that the sizes of L1 instruction caches that are used are sufficientto eliminate any ISA bottlenecks related to code size for the studied benchmarks.
7. Based on our results, the number of L1-data cache misses are similar across all ISAsin case of in-order cores, but the numbers can vary significantly in case of out-of-ordercores.
8. On average, the number of branch mispredictions are very close across ISAs for all cores with few exceptions such asgobmk, qsort and povray.
9.μ-ops to instructions ratio on x86 is usually less than 1.3, as observed by Blem et al [40]. However, the overall instructions count and mixes are ISA-dependent, which contradictsBlem et al’s [40] conclusion that instruction counts do not depend on ISAs.
10. Significant microarchitectural changes affect performance more than an ISA change does on a particular microarchitecture.
11. According to Blem et al’s study [40], performance differences on studied platforms are mainly because of microarchitectures. We see performance differences on exactly similar microarchitectures; which means ISAs are responsible for those performance differences. Moreover, since performance differences across ISAs are different for different microarchitectures, we can conclude that the behavior of ISA depends on microarchitecture as well but they certainly have a particular role in performance
 
  • Like
Reactions: Drazick

Hitman928

Platinum Member
Apr 15, 2012
2,450
1,489
136
Interesting thesis paper but as it mentions, it does contradict some peer reviewed journal publications. One such is:

A. A. Abudaqa, T. M. Al-Kharoubi, M. F. Mudawar and A. Kobilica, "Simulation of ARM and x86 microprocessors using in-order and out-of-order CPU models with Gem5 simulator," 2018 5th International Conference on Electrical and Electronic Engineering (ICEEE), Istanbul, 2018, pp. 317-322.

This paper shows ARM leading in cycle count and efficiency but the x86 ISA largely catches up when going OOO. I can't post all the results because I accessed it through IEEE but if it is published elsewhere for free, you could view all the results. Here's a quote from the conclusion.

The results show that ARM outperforms x86 in the most cases. X86 architecture works much better when the CPU model is out-of-order.
I don't think anyone would argue that x86 carries some extra baggage from being a more "legacy" architecture than ARM. However, how much of an advantage is it when running modern code with modern instruction sets? The difference isn't that large and it comes down more to the actual michroarchitectural design and the trade-offs the designers made for what they were trying to do with their design. The real question becomes, is the ARM advantage enough to overcome the foundation of x86 in laptops, desktops, workstations, servers, and even a lot of embedded applications?

The whole no legacy baggage thing is a two edged sword for ARM. ARM was able to dominate the mobile market because it was a new market, everything had to be created from scratch anyway, people didn't expect a lot from the first several generations, and efficiency is king, so ARM fit the bill nicely. Trying to move into deeply entrenched x86 markets where ARM's advantages aren't as important and x86 has decades of foundation to topple is a whole different ballgame. Will ARM get there? Maybe. But Intel and AMD aren't sitting still waiting for it to happen so ARM has an uphill battle to fight on many fronts. I'm sure the debate will continue for years.
 

chrisjames61

Senior member
Dec 31, 2013
563
269
136
Internal combustion engines are outdated, kinetic energy projectiles such as bullets are outdated. Yet nothing has succeeded in supplanting them because they do what they are designed for very well. ARM has a long, long way to go and may never get there due to x86 being so entrenched and it does what it needs to do. No matter how much you flog it on forums.
 

soresu

Golden Member
Dec 19, 2014
1,124
380
136
kinetic energy projectiles such as bullets are outdated. Yet nothing has succeeded in supplanting them because they do what they are designed for very well.
Chemically fired projectile weapons using bullets are simple and provide their propulsive energy within the casing, this is the most significant reason they haven't yet been replaced - it's far cheaper and easier to build simple things and make a huge profit on it.

Maybe in the future if they can ever meet or exceed the volumetric/kinetic energy density of petrolium fuel in an electrochemical battery, or even better a leakage resistant super capacitor design.

Right now the barrier to more advanced weapons and electric aircraft is energy density - it's become good enough now combined with more efficient powertrain designs to accommodate electric automobiles, but the other things are a ways off as yet.

The likes of rail guns using electromagnetically propelled projectiles is hampered by the complexity of designing the eponymous rails that warp when fired, a problem made by the fact that designing rails that are capable of both propelling the projectiles well magnetically, and being strong/durable enough to resist warping are very difficult to manage at the same time, not altogether different from finding materials that act as good thermoelectric converters (materials that conduct converted electricity well tend to conduct heat well too, and thermoelectrics that conduct heat are not very good for converting in the first place).
 

Hi-Fi Man

Senior member
Oct 19, 2013
588
110
106
Interesting thesis paper but as it mentions, it does contradict some peer reviewed journal publications. One such is:

A. A. Abudaqa, T. M. Al-Kharoubi, M. F. Mudawar and A. Kobilica, "Simulation of ARM and x86 microprocessors using in-order and out-of-order CPU models with Gem5 simulator," 2018 5th International Conference on Electrical and Electronic Engineering (ICEEE), Istanbul, 2018, pp. 317-322.

This paper shows ARM leading in cycle count and efficiency but the x86 ISA largely catches up when going OOO. I can't post all the results because I accessed it through IEEE but if it is published elsewhere for free, you could view all the results. Here's a quote from the conclusion.



I don't think anyone would argue that x86 carries some extra baggage from being a more "legacy" architecture than ARM. However, how much of an advantage is it when running modern code with modern instruction sets? The difference isn't that large and it comes down more to the actual michroarchitectural design and the trade-offs the designers made for what they were trying to do with their design. The real question becomes, is the ARM advantage enough to overcome the foundation of x86 in laptops, desktops, workstations, servers, and even a lot of embedded applications?

The whole no legacy baggage thing is a two edged sword for ARM. ARM was able to dominate the mobile market because it was a new market, everything had to be created from scratch anyway, people didn't expect a lot from the first several generations, and efficiency is king, so ARM fit the bill nicely. Trying to move into deeply entrenched x86 markets where ARM's advantages aren't as important and x86 has decades of foundation to topple is a whole different ballgame. Will ARM get there? Maybe. But Intel and AMD aren't sitting still waiting for it to happen so ARM has an uphill battle to fight on many fronts. I'm sure the debate will continue for years.
Any comparisons with PowerPC, SPARC, or MIPS?
 

Carfax83

Diamond Member
Nov 1, 2010
5,863
541
126
This paper shows ARM leading in cycle count and efficiency but the x86 ISA largely catches up when going OOO. I can't post all the results because I accessed it through IEEE but if it is published elsewhere for free, you could view all the results. Here's a quote from the conclusion.
There's been rumors for years that Intel is planning a major revision of the x86-64 ISA, with a lot of the legacy stuff removed. Assuming this is true, how feasible is it to do something like that without greatly disturbing backward compatibility?

I mean, what kind of stuff could they outright remove and get away with?
 

chrisjames61

Senior member
Dec 31, 2013
563
269
136
Chemically fired projectile weapons using bullets are simple and provide their propulsive energy within the casing, this is the most significant reason they haven't yet been replaced - it's far cheaper and easier to build simple things and make a huge profit on it.

Maybe in the future if they can ever meet or exceed the volumetric/kinetic energy density of petrolium fuel in an electrochemical battery, or even better a leakage resistant super capacitor design.

Right now the barrier to more advanced weapons and electric aircraft is energy density - it's become good enough now combined with more efficient powertrain designs to accommodate electric automobiles, but the other things are a ways off as yet.

The likes of rail guns using electromagnetically propelled projectiles is hampered by the complexity of designing the eponymous rails that warp when fired, a problem made by the fact that designing rails that are capable of both propelling the projectiles well magnetically, and being strong/durable enough to resist warping are very difficult to manage at the same time, not altogether different from finding materials that act as good thermoelectric converters (materials that conduct converted electricity well tend to conduct heat well too, and thermoelectrics that conduct heat are not very good for converting in the first place).
That is what I was saying. As of now a primitive technology that really hasn't changed in a few hundred years is still the best option.
 
  • Like
Reactions: Tlh97

AmericanLocomotive

Junior Member
Apr 30, 2020
16
21
36
There's been rumors for years that Intel is planning a major revision of the x86-64 ISA, with a lot of the legacy stuff removed. Assuming this is true, how feasible is it to do something like that without greatly disturbing backward compatibility?

I mean, what kind of stuff could they outright remove and get away with?
There's all kinds of weird stuff legacy stuff going on in x86 processors. The newest bleeding edge x86-64 CPUs still have "8086 VME" extensions introduced in the original Pentium 1! VME was actually broken in the 1st gen Ryzen processors, but was later fixed.

But more than that, modern x86-64 CPUs still fully support 16-bit "Real Mode" introduced with the original 8086 back in 1978. I wonder how much engineering effort is wasted ensuring that the latest CPUs can still accurately run code that's 40 years old? I like backwards compatibility just as much as the next person, but at some point you just gotta pull the plug and start cleaning up the architecture.
 

zir_blazer

Senior member
Jun 6, 2013
916
65
91
There's been rumors for years that Intel is planning a major revision of the x86-64 ISA, with a lot of the legacy stuff removed. Assuming this is true, how feasible is it to do something like that without greatly disturbing backward compatibility?

I mean, what kind of stuff could they outright remove and get away with?
That is a risky move, because you can be sure than A LOT of edge cases would start popping up in the wild if backwards compatibility is outright removed. Remember that there is virtualization too, and that from that point of view, remaining backwards compatible makes more sense now than it did before.

Besides, not being backwards compatible isn't a good choice to begin with. Ever heard about the Intel 80376? If you never did, it is for a reason. The history of the evolution of the IBM PC and x86 Processors is completely intertwined with making new Hardware remain broken enough so that popular Software that did things in ways that it wasn't intended to would keep working. For example, this.


There's all kinds of weird stuff legacy stuff going on in x86 processors. The newest bleeding edge x86-64 CPUs still have "8086 VME" extensions introduced in the original Pentium 1! VME was actually broken in the 1st gen Ryzen processors, but was later fixed.

But more than that, modern x86-64 CPUs still fully support 16-bit "Real Mode" introduced with the original 8086 back in 1978. I wonder how much engineering effort is wasted ensuring that the latest CPUs can still accurately run code that's 40 years old? I like backwards compatibility just as much as the next person, but at some point you just gotta pull the plug and start cleaning up the architecture.
The Pentium VME is a minor thing in comparison to the impact that the earlier A20 Gate had. You can go and read OS/2 Museum multitude of articles about the A20 Gate. It became particularly chaotic when Processors began to include Cache.



I think that as open source Software that uses portable libraries gains ground, it will became more feasible for ISAs to be interchangeable since you may get away by with recompilation, or simple ports. If you're a Windows user and you use tons of propietary Software, there is no other option to you but x86, unless your performance needs makes emulation somehow viable. Compared to that, a Linux-only user wouldn't really care whenever he is using x86 or a PowerPC like the Talos II, or ARM, or whatever.
 
Last edited:

Carfax83

Diamond Member
Nov 1, 2010
5,863
541
126
But more than that, modern x86-64 CPUs still fully support 16-bit "Real Mode" introduced with the original 8086 back in 1978. I wonder how much engineering effort is wasted ensuring that the latest CPUs can still accurately run code that's 40 years old? I like backwards compatibility just as much as the next person, but at some point you just gotta pull the plug and start cleaning up the architecture.
That's exactly my point. If ARM has any advantage over x86, it's definitely that. Intel and AMD both need to overhaul the x86-64 ISA and make it much cleaner, but if this is going to happen, Intel will have to collaborate with AMD; something they're not known for doing very well.

The rumor I was talking about came out in 2016, and it stipulated that Tiger Lake will be the last x86-64 based microarchitecture as we've known it. I guess Golden Cove, which is the next core architecture will supposedly sacrifice some backwards compatibility and legacy overhead for increased performance and power efficiency.
 

Richie Rich

Senior member
Jul 28, 2019
326
157
76
This paper shows ARM leading in cycle count and efficiency but the x86 ISA largely catches up when going OOO. I can't post all the results because I accessed it through IEEE but if it is published elsewhere for free, you could view all the results. Here's a quote from the conclusion.
Interesting that other study shows ISAs gets equal when using OoO engine because it this study is exactly opposite way. The thing is that there is only one super wide engine.... that's famous 6xALU APple core and this is based on ARM. So I tend to believe that this physical evidence of such a wide scalar core supports this study.

I don't think anyone would argue that x86 carries some extra baggage from being a more "legacy" architecture than ARM. However, how much of an advantage is it when running modern code with modern instruction sets? The difference isn't that large and it comes down more to the actual michroarchitectural design and the trade-offs the designers made for what they were trying to do with their design. The real question becomes, is the ARM advantage enough to overcome the foundation of x86 in laptops, desktops, workstations, servers, and even a lot of embedded applications?
I don't want to create thread about x86 vs. ARM war and who is gonna win because that's about economy. I'd like to keep it technical here. Later we can discuss ARMv9 pros and cons.

IMHO there are no huge restrictions under the hood of x86 CPU. x86 use CISC/RISC abstraction layer so the difference in registry size for example is overcome (ARM has 31x 64-bit registry while x86-64 has only 16) because internally is using bigger resources. But here comes the problem: this abstraction layer dealing with old bottlenecks cost transistors, power and also produce inefficient code (compiler doesn't see the bigger resource of modern CPU and must produce code that fits resource poor i386 or x86-64 Atom core). However the most important question is how big the penalty is. Because if it small then who cares and you have all that great compatibility. But this penalty increases in time so while K8&C2D was doing great, Ice Lake might start to struggle.

This leads me to reason why we don't see some super wide 6xALU design at x86 similar to Apple (and if Matterhorn is suggested to have IPC of A12 then it will have probably 6xALUs too). x86 core with 6xALU is possible but it would struggle much more than Ice Lake so it would need more time to develop. That's probably why Intel and AMD focus much more at FPU/ SIMD performance because these new ISA extensions like AVX are modern and bottleneck free. That's why I'm starting to believe that Zen3 performance rumors makes sense (+17% int IPC, +50% fpu).

But more than that, modern x86-64 CPUs still fully support 16-bit "Real Mode" introduced with the original 8086 back in 1978. I wonder how much engineering effort is wasted ensuring that the latest CPUs can still accurately run code that's 40 years old? I like backwards compatibility just as much as the next person, but at some point you just gotta pull the plug and start cleaning up the architecture.
That's exactly what is in my mind. AMD buried 3Dnow! extension for the same reason too.
 

Richie Rich

Senior member
Jul 28, 2019
326
157
76
That's exactly my point. If ARM has any advantage over x86, it's definitely that. Intel and AMD both need to overhaul the x86-64 ISA and make it much cleaner, but if this is going to happen, Intel will have to collaborate with AMD; something they're not known for doing very well.

The rumor I was talking about came out in 2016, and it stipulated that Tiger Lake will be the last x86-64 based microarchitecture as we've known it. I guess Golden Cove, which is the next core architecture will supposedly sacrifice some backwards compatibility and legacy overhead for increased performance and power efficiency.
But wouldn't be new ISA change announced few year ahead? I mean to discuss and let other x86 contenders change their development plan accordingly? Something like ARM Holding does with their AArch64 and SVE. Because if not then whole x86 ecosystem would be hurt by disunity.
 

NTMBK

Diamond Member
Nov 14, 2011
8,643
1,568
126
But wouldn't be new ISA change announced few year ahead? I mean to discuss and let other x86 contenders change their development plan accordingly? Something like ARM Holding does with their AArch64 and SVE. Because if not then whole x86 ecosystem would be hurt by disunity.
That doesn't sound like Intel. Remember Itanium? They wanted to move to a new ISA that only they had the rights to use, and let x86 die off. It was only AMD's great Opteron processors (and the awful Intel chips of the era) that made AMD64 the success that it is.
 

soresu

Golden Member
Dec 19, 2014
1,124
380
136
To be frank, actually Intel's current position makes perfect opportunity for ARM to gain some traction.

A lot of traction.
Yes and no - the US government currently playing sanction pinata with China is making those prospects a bit iffy in the near term, though this may change if the administration changes hands in november.

Indeed it would be quite interesting if AMD were the ones to finally make a stamp on the x86 world again for the first time since AMD64 - SSE5/XOP sadly came with a less than appealing uArch to promote it and thus fell flat, to say nothing of Intel playing tricks with the FMA spec.
 

NostaSeronx

Platinum Member
Sep 18, 2011
2,976
565
126
sadly came with a less than appealing uArch to promote it and thus fell flat
Monolithic execution unit growth is exponential growth for power and decay for speed. However, a clustered execution unit growth is linear growth for power and decay for speed.

The least appealing part of the Family 15h is that it wasn't advanced enough in the CMT-development tree.

If a Cortex-A85 existed and it was CMTx or derived module-esque microarchitecture. Then, if it used the more modern example of clustered microarchitecture. Which has the capability of, in Windows terms, a single logical core can utilize all physical core resources in that module.

With that out of the way if Cortex-A85 was CMTx, then no current AMD or Intel design could compete. No design can compete with modern CMT designs that can scale up to 128 function units, while the modern SMT is stuck at a power/speed wall at 8 function units.
 

NostaSeronx

Platinum Member
Sep 18, 2011
2,976
565
126
Mate, forget about CMT - like fetch it just ain't happening.
Nah, CMT is the natural superset of SMT. It has all the capabilities of SMT, one thread can use all/one thread can use some/etc/plus more. Thus, most architectures that need speed and power efficiency will be clustered multithreaded, like POWER9. It is much better to skip SMT for CMT as it has every single feature of SMT.

fyi, P9 isn't perfect either
=> A single thread spans only one half of the SMT8-core resources allowing for the other half of the core resources to be power-gated when operating in single-thread mode
=> The POWER9 architecture was conceived to alternatively support a single thread executing across all eight slices of a SMT8 core; however, this was not implemented in the final design. In this alternate design option, each SMT4-core-resource behaves as an execution cluster with additional latency incurred when dependent instructions are split between the two clusters. Performance and power analysis showed that optimal performance would be achieved for a large number of applications by instead limiting execution of a single thread to only one of the SMT4-core-resources per SMT8 core and boosting the frequency. This approach also removes the complexity associated with managing instruction routing between the execution clusters, such as choosing between utilizing both clusters to optimize for high instruction level parallelism versus utilizing a single cluster to optimize for code that is sensitive to instruction latency.
^== they don't even allow a single thread to utilize the opposite cluster as a 2x 128-bit SIMD unit in single-threaded mode, sigh.

I think ARM would implement more in line with AMD's family 15h.
Two or more Integer/branch clusters + one FPU/NEON+SVE2 FPU cluster.
(Two Int clusters(2x4) + One Branch cluster(1x2) + One FPU cluster(1x2) being the ideal first implementation/ with the second having two FPU clusters)
 
Last edited:

Richie Rich

Senior member
Jul 28, 2019
326
157
76
That doesn't sound like Intel. Remember Itanium? They wanted to move to a new ISA that only they had the rights to use, and let x86 die off. It was only AMD's great Opteron processors (and the awful Intel chips of the era) that made AMD64 the success that it is.
I remember :D That's why I'm worried about rise of ARM. Those guys cooperates on common target - moving performance forward - while ARM Holding is somethink like wise arbiter. As ARM Holding doesn't produce any silicon he is not in competition with Apple, Nuvia or other core designers.

But Intel is in direct and hard competition with AMD. So regarding ISA instead of colaboration they try to screw each other. As a result after 7 years we can see AVX512 was not adopted fully (all 17 sub sets) at any CPU even in Intel and AMD refused to adopt it till these days. That's why I'm worried about x86 flexibility and ability to fight against fast moving ARM.
 
Mar 11, 2004
19,874
2,140
126
While they could probably get away with removing some older stuff, I doubt that makes much difference and quite a bit of it likely plays a role in more modern things (although I'm not a chip architect so perhaps not).

My guess is, next x86 change happens when they move to like a system on chip level version of what Intel had theorized would happen in like the mid 2010s, and was similar to what Cell did. It'd keep the general purpose core with specialized processors around it, so you can maintain some compatibility backwards but reap lots of gains in the specialized hardware. That or maybe adding a chunk of programmable gates to adjust performance for different use cases. I actually think that might be an idea behind some of the SMT testing that was rumored, where I think they could potentially have chips that can kinda flip between parallelism and longer pipeline. In essence the "reverse hyperthreading" thing that was rumored some time back. Where it'd be kinda like the 32/64bit where they can achieve some significant improvement but need to keep compatibility somewhat. Over time that'll shrink smaller and smaller and will end up some BIOS option to enable legacy mode or something.

Oh, forgot another potential option, when they start looking at replacing transistors with light/lasers.

Ok, that's my blather for this thread. D'oh, forgot to add my HBM spiel to the chorus of the ARM/SMT4 guy and Nosta. Uh, they make HBM part of the architecture and so it requires rewriting as then systems become self contained, the memory gets integrated into the chip such that it changes programming. Let's say 1GB per core or thread?
 

Eug

Lifer
Mar 11, 2000
22,653
261
126
That article aside, how much advantage/disadvantage are we talking about here overall, in terms of ISA? Will that even matter that much in the big picture given the bazillion other variables involved?

Internal combustion engines are outdated, kinetic energy projectiles such as bullets are outdated. Yet nothing has succeeded in supplanting them because they do what they are designed for very well. ARM has a long, long way to go and may never get there due to x86 being so entrenched and it does what it needs to do. No matter how much you flog it on forums.
Off topic, but this is actually the basis of a Stargate episode. Aliens decide to consult humans for help on how to defeat an enemy. The enemy's technology is impervious to the aliens' advanced energy weapons (which are centuries ahead of anything humans have), but can be severely damaged with human projectile weapons.

They are not looking to get human guns. Instead, they consulted the humans because the aliens felt they were so advanced now that they were not stupid enough to come up with such primitive ideas that can actually still work.
 

ASK THE COMMUNITY