- Jul 28, 2019
- 470
- 229
- 76
I found an interesting study comparing the impact of instruction set architecture to CPU performance.
There is comparison of these ISA:
ARMv8
x86-64
Alpha
There are some very interesting results showing the ISA matters. We can see less ISA influence on in-order cores and much bigger influence at out of order cores. Basically OoO core based on ARM needs approximately 20% less cycles to perform same task which results in 20% higher performance per clock (PPC/IPC) and also 20% less power consumption in the same time.



Citation:
4.6 Summary of the Findings
This section discusses a summary of the findings based on the aformentioned examples and results.
1. On average, ARMv8 outperforms other ISAs on similar microarchitectures, as it offers better instruction-level parallelism and has lower number of dynamicμ-ops compared to the other ISAs in most of the cases.
2. The average behavior of ISAs can be very different from their behavior for a particular phase of execution, which agrees with Venkat and Tullsen’s findings [20].
3. The performance differences across ISAs are significantly reduced in in-order cores compared to out-of-order cores.
4. On average, x86 has the highest number of dynamicμ-ops. This agrees with previous findings when compared to Alpha [20]. There are few examples where Alpha exceeds x86 in the number ofμ-ops, but ARMv8 always has lower or equal number ofμ-opswhen compared to x86.
5. x86 seems to have over-serialized code due to ISA limitations, such as use of implicit operands and overlap of one of the source and destination registers as observed by[21]. x86 has the highest average degree of use of registers (the average number ofinstructions, which consume the value generated by a particular instruction).
6. The total number of L1-instruction cache misses is very low across all ISAs for the studied cores. This infers that the sizes of L1 instruction caches that are used are sufficientto eliminate any ISA bottlenecks related to code size for the studied benchmarks.
7. Based on our results, the number of L1-data cache misses are similar across all ISAsin case of in-order cores, but the numbers can vary significantly in case of out-of-ordercores.
8. On average, the number of branch mispredictions are very close across ISAs for all cores with few exceptions such asgobmk, qsort and povray.
9.μ-ops to instructions ratio on x86 is usually less than 1.3, as observed by Blem et al [40]. However, the overall instructions count and mixes are ISA-dependent, which contradictsBlem et al’s [40] conclusion that instruction counts do not depend on ISAs.
10. Significant microarchitectural changes affect performance more than an ISA change does on a particular microarchitecture.
11. According to Blem et al’s study [40], performance differences on studied platforms are mainly because of microarchitectures. We see performance differences on exactly similar microarchitectures; which means ISAs are responsible for those performance differences. Moreover, since performance differences across ISAs are different for different microarchitectures, we can conclude that the behavior of ISA depends on microarchitecture as well but they certainly have a particular role in performance
There is comparison of these ISA:
ARMv8
x86-64
Alpha
There are some very interesting results showing the ISA matters. We can see less ISA influence on in-order cores and much bigger influence at out of order cores. Basically OoO core based on ARM needs approximately 20% less cycles to perform same task which results in 20% higher performance per clock (PPC/IPC) and also 20% less power consumption in the same time.



Citation:
4.6 Summary of the Findings
This section discusses a summary of the findings based on the aformentioned examples and results.
1. On average, ARMv8 outperforms other ISAs on similar microarchitectures, as it offers better instruction-level parallelism and has lower number of dynamicμ-ops compared to the other ISAs in most of the cases.
2. The average behavior of ISAs can be very different from their behavior for a particular phase of execution, which agrees with Venkat and Tullsen’s findings [20].
3. The performance differences across ISAs are significantly reduced in in-order cores compared to out-of-order cores.
4. On average, x86 has the highest number of dynamicμ-ops. This agrees with previous findings when compared to Alpha [20]. There are few examples where Alpha exceeds x86 in the number ofμ-ops, but ARMv8 always has lower or equal number ofμ-opswhen compared to x86.
5. x86 seems to have over-serialized code due to ISA limitations, such as use of implicit operands and overlap of one of the source and destination registers as observed by[21]. x86 has the highest average degree of use of registers (the average number ofinstructions, which consume the value generated by a particular instruction).
6. The total number of L1-instruction cache misses is very low across all ISAs for the studied cores. This infers that the sizes of L1 instruction caches that are used are sufficientto eliminate any ISA bottlenecks related to code size for the studied benchmarks.
7. Based on our results, the number of L1-data cache misses are similar across all ISAsin case of in-order cores, but the numbers can vary significantly in case of out-of-ordercores.
8. On average, the number of branch mispredictions are very close across ISAs for all cores with few exceptions such asgobmk, qsort and povray.
9.μ-ops to instructions ratio on x86 is usually less than 1.3, as observed by Blem et al [40]. However, the overall instructions count and mixes are ISA-dependent, which contradictsBlem et al’s [40] conclusion that instruction counts do not depend on ISAs.
10. Significant microarchitectural changes affect performance more than an ISA change does on a particular microarchitecture.
11. According to Blem et al’s study [40], performance differences on studied platforms are mainly because of microarchitectures. We see performance differences on exactly similar microarchitectures; which means ISAs are responsible for those performance differences. Moreover, since performance differences across ISAs are different for different microarchitectures, we can conclude that the behavior of ISA depends on microarchitecture as well but they certainly have a particular role in performance