Arachnotronic
Lifer
- Mar 10, 2006
- 11,715
- 2,012
- 126
You can't really benchmark an ISA independent of an implementation, so this discussion is pretty moot, IMO.
I've never met a group of people more opposed to to giving credit where credit is due, and admitting that I'm right.Because the scenario you outlined earlier is just as absurd. Unless you have gobs of money at your disposal, what you're proposing won't be possible and then you're left with discussing theory. And in theory there is no difference between theory and practice. In practice there is.
I've never met a group of people more opposed to to giving credit where credit is due, and admitting that I'm right.
I've asserted that ARM, particularly its 32-bit variety, is a better ISA than x86, and that RISC-V is better than either, if we were to ignore compatibility as a factor. I suppose I should limit the scope of this claim to pertain to modern, commonly used consumer software only.
It seems ridiculous to assert that ISAs can't be better than others -- this would mean no other ISAs have improved over another, and that all ISAs after the very first ISA were redundant.
What exactly does his post add? Speaking of being rude, you certainly were when you related the scenario I posited to searching for a mythological animal.I think your perseptions are a bit off. For example, I see no instance where dmens was being rude. As for being right, please see Arachnotric's post right above yours.
What exactly does his post add? Speaking of being rude, you certainly were when you related the scenario I posited to searching for a mythological animal.
I've been continually told no, but no one has bothered delving into the subject. Everyone's simply tiptoed around it.
That's not what I was really arguing, though, and I agree that it's not likely.I don't think you're ever going to get a scenario where you can really say one ISA is objectively better than another in all possible cases.
I updated my statement earlier, in regards to RISC-V, with the caveat that it superior for typical consumer workloads. I, of course, don't actually know this for certain.You've said several times that RISC-V is objectively better than ARMv8. But RISC-V is missing a lot of operations that ARMv8 has, most of which are simple and have a low cost of implementation. I can guarantee you that there are algorithms that will benefit tremendously from having access to ARMv8's instructions that are missing on RISC-V. There are also algorithms that benefit from instruction idioms that are only on x86.
So would it not be possible evaluate those tradeoffs, through benchmarking? I've given a pretty rough outline of how I'd think one would go about doing so, which I'll restate:While some design decisions will be worse than others, pretty much everything has some kind of tradeoff. RISC-V is designed to be extensible so that it can be used for the next 50 years without breaking compatibility. And it's designed to be a good fit for the smallest microcontrollers all the way up to big workstations. Those two constraints have a tangible cost that may not apply to ARM, who has different ISAs for microcontrollers vs big 64-bit machines, and who probably expects replacing the ISA over the next 50 years to be a surmountable problem (as evidenced by the fact that they did it recently)
E.g., take however many iterations of MIPS there have been (or less as desired) and compare after doing the same for ARM (A-series, R-series, or M-series -- which ever would be most appropriate). These would often share the same process, from the same foundry, and have had similar design goals and target markets. Additionally, for the process, you would need to compare either between one of each product on each shared node, and do this for several node/product combinations and compare the calculated means, or normalize results with a simulated impact of the process used (process parameters are typically pretty easy to come by).
If you're not going to listen, please refrain from commenting further.What you proposed is the same mythological scenario.
If you're not going to listen, please refrain from commenting further.
III-V said:Say you have 100 programs, compiled for a processor utilizing ARMv8, and compare to a processor utilizing x86-64. Assume the target workloads for these processors are identical. Also assume their development budgets were identical. Assume they use the same manufacturing process. The compilers used were created with an equal development investment. Assume all personnel involved in the creation of the software and hardware mentioned are equally competent.
Measure the performance of those 100 programs, and compare between the two processors.
III-V said:take however many iterations of MIPS there have been (or less as desired) and compare after doing the same for ARM (A-series, R-series, or M-series -- which ever would be most appropriate). These would often share the same process, from the same foundry, and have had similar design goals and target markets. Additionally, for the process, you would need to compare either between one of each product on each shared node, and do this for several node/product combinations and compare the calculated means, or normalize results with a simulated impact of the process used (process parameters are typically pretty easy to come by).
As with most/all surveys, the larger the sample size, the clearer and more reliable your findings. For ARM, it'd be most useful to stick to vanilla implementations, seeing as there are about two decades worth of iterations.Here's what you proposed first:
Not possible. Which ARMv8 processor should we use? Tegra K1 Denver? Snapdragon 810? Snapdragon 410? Apple A7? Which x64? Piledriver? Broadwell? Silvermont? And you want to assume the compilers are equivalent? They're not all that equivalent for x64 (icc vs. gcc) let alone between different ISAs.
I'm not going to do this, nor have I given any indication that I would. All of this has been conjecture.This is feasible. Please report back with your results.
I don't believe I or anyone else has argued otherwise.Back in the real world, any advantage in power or transistor count is more than compensated by the difference in node and other architectural decisions. For example, A15 (at least the earlier versions of the core) was much less efficient than other ARM cores. I'd bet if ISA is neglected, Silvermont is still more efficient than Krait at 28nm.
I actually feel the inefficient, first gen A15 daily. My Nexus 10, from the ending of 2012, has a Samsung SoC. Two A15 cores at 32nm with a T604 GPU. Even through it has a much bigger than my LG G2. The SoC is drawing atleast twice the juice in the Nexus 10.Back in the real world, any advantage in power or transistor count is more than compensated by the difference in node and other architectural decisions. For example, A15 (at least the earlier versions of the core) was much less efficient than other ARM cores. I'd bet if ISA is neglected, Silvermont is still more efficient than Krait at 28nm.
I thought that A15 was designed to be a server chip, or at least proof it could work for ARM ISA. Wasnt bigLITTLE introduced after Samsung made the Exynos 5 Dual and it became obvious that power consumption was too high for mobile without a more efficient core to back it up?
I thought that A15 was designed to be a server chip, or at least proof it could work for ARM ISA. Wasnt bigLITTLE introduced after Samsung made the Exynos 5 Dual and it became obvious that power consumption was too high for mobile without a more efficient core to back it up?
Big.Little was first used in the Tegra 3. It was an idea from ARM (so they could design many cores with a weak dynamic range with their tiny budget).
I think you're very right here about power consumption.
The A15 supposedly had 2x the processing power of 1 A9 core (2 A15 cores = 4 A9 cores.)
But after the exynos 5 dual (Nexus 10 SoC), Samsung kinda found out that a dual core A15 SoC, with a T604, and a 10" screen with a resolution of 2650x1600 would not be power efficient. However. When you look at the Snapdragon 800, it has 4 cores, that's stronger than 4 A15's, while being MUCH more efficient. My LG G2 has *2x the battery life of my Nexus 10*.
* 1080p screen, 4G LTE.
Seems to me, that the A15 was that core, that was rushed. It was not ready in 2012.
Tegra 3 didn't use big.LITTLE. NVIDIA solution is using the external PL310 (L2) cache-controller for both the 4-core cluster and the low power single core. ARM big.LITTLE works by having two clusters having their own L2 cache. This also means ARM needs a specific more or less complex interconnect to handle coherent requests across clusters, something NVIDIA didn't need for T3.Big.Little was first used in the Tegra 3. It was an idea from ARM (so they could design many cores with a weak dynamic range with their tiny budget).
Indeed, and later revisions of Cortex-A15 have improved power efficiency.Seems to me, that the A15 was that core, that was rushed. It was not ready in 2012.
Might be. But in a device with a 5watt battery, a quad core SoC that is stronger than a dual core A15 SoC, and uses less power, is always preffered.A15 is stronger clock for clock than krait (S800).
I'm afraid this is not correct. The instruction set CISC/RISC complexity argument comes up every single time but it's been irrelevant for over two decades.
The big winner, strangely karmic, is that the x86 instruction set is so freaking compact that it winds up being a benefit for modern-day cpu designs rather than a detractor, due to stalls being primarily a problem of cache misses in modern designs.
Back in the real world, any advantage in power or transistor count is more than compensated by the difference in node and other architectural decisions. For example, A15 (at least the earlier versions of the core) was much less efficient than other ARM cores. I'd bet if ISA is neglected, Silvermont is still more efficient than Krait at 28nm.
You can't measure density by just measuring average instruction size, you also have to take into account the number of instructions required to do the job. Last time I measured that on S2K gcc, I got about a 10% advantage for x86-64 over ARMv8 AArch64 (IIRC the average instruction size for x86-64 was 3.5 as your figure for pre-x86-64). That was very early in ARMv8 life, so compilers might have improved.Only x86 isn't really "so freaking compact." Average instruction size was around 3.5 bytes way back before x86-64 and before SSE2. Both have increased average instruction size, although they also increased average work/instruction. But starting with 3.5 bytes isn't really very impressive for a highly variable width instruction set.
These days the code density is probably somewhat better than ARMv8 and somewhat worse than Thumb-2.
Here is another quote:http://seekingalpha.com/article/1296521-intel-48-per-share-in-4-years#comment-16749321
Here's a comment (and the 2 next ones from him as well) about the topic that might be worth checking out.
TL;DR:
Utterly wrong. That guy is an obvious Intel fanboy with little knowledge of the competition.So much of an advantage that ARM was forced to admit to the problem and have introduced a short-form instruction in their 64-bit architecture to try to address it.