The RISC Advantage

witeken · Dec 16, 2014

III-V said:
Then call it ARMv8 vs. x86-64, which is extremely relevant. Don't dismiss the subject entirely over a semantic issue.

Actually, if I remember correctly, Atom uses/used a few nice x86 tricks that helped quite a bit, so maybe x86 is actually better.

ThatBuzzkiller · Dec 16, 2014

III-V said:
Then call it ARMv8 vs. x86-64, which is extremely relevant. Don't dismiss the subject entirely over a semantic issue.

That's not the point of this thread ...

videogames101 · Dec 16, 2014

dmens said:
Unless your code is just meandering around and not doing anything meaningful, yes there is a bottleneck somewhere for every workload. Doesn't have to be on the CPU either.

AFAIK that can result from two things: a bad compiler, or uncommon user code that relies heavily on microcode flows that nobody cares about optimizing for. The decode stage is mostly a fixed pipeline, it is very hard for it to be a bottleneck. Unless you consider the decode width to be a bottleneck, in which case the entire machine must grow to accommodate decode width growth.

Heavily interpreted code can result in a simpler (and hence smaller) back-end implementation. Of course it's not a rule and I have no absolutely data to quote. It is just a personal observation.

What level of code are you referring to? (heavily interpreted assembly?) And what do you consider "back-end" for a cpu? (I always think of BEOL metal when someone says back-end...)

dmens · Dec 16, 2014

videogames101 said:
What level of code are you referring to, and what do you consider "back-end"? (I always think of BEOL metal when someone says back-end)

Sorry, let me clarify. The proprietary code that the decoder spits out after the front-end is done with it is what I meant by interpreted code. The back-end is the rest of the machine other than the decoder, which consumes that proprietary code and performs computations, memory accesses, etc.

The silver lining of x86 in having a complicated decode stage is that it allows engineers to change the definition of interpreted code as they see fit. That off-loads the complexity into the decode stage and can simplify the back-end stage of the machine.

Again, just personal observations.

III-V · Dec 16, 2014

witeken said:
http://www.anandtech.com/show/6529/busting-the-x86-power-myth-indepth-clover-trail-power-analysis

The 5% advantage get easily lost in the noise of all other factors that make up a CPU's speed and power.

The "5%" bit you are quoting is outside of the "on-paper" advantage that I am referring to. These are real world tests, subject to compiler optimizations, different processes, different budgets, different devices with different cooling capabilities, and so on and so forth. It is impossible to determine a possible advantage one way or the other, unless the difference was extremely noticeable. Thus, it is important that if we're going to talk about what ISA/uarch is superior, we have to keep it in the sterile vacuum.

He is not lying because it simply isn't a big deal: it would be different if the difference were 2X. BTW, the specific question was if there's anything with ARM that makes it do so well in the mobile space.

It's not as big of a deal as other factors, but I would not say that it is not a big deal. I would argue that ARM would look pretty damn awful right now if ARMv8 weren't benefiting from being a superior ISA/uarch, but I can't back that up, as there are too many factors that would prevent such an elucidation.

All this ISA talk stands in contrast to the architectural decisions, which do significantly impact performance, and process technology, which impacts power consumption dramatically, more than any ISA ever could.

Sure, but it's important to keep in mind. A lot of the architectural decisions you make are reliant on the instruction set.

III-V · Dec 16, 2014

dmens said:
Unless your code is just meandering around and not doing anything meaningful, yes there is a bottleneck somewhere for every workload. Doesn't have to be on the CPU either.

I am not sure why you are stating this -- doesn't my comment already suggest this?

Unless you consider the decode width to be a bottleneck, in which case the entire machine must grow to accommodate decode width growth.

Right, and that's what I was talking about. It's important to consider that the back ends have gotten much wider for Intel's flagship arch, while the decode width has not changed in quite some time. We are probably not terribly far off from the point where increasing the decode width would be a wise decision, given this -- perhaps another tock or two.

witeken · Dec 16, 2014

III-V said:
Sure, but it's important to keep in mind. A lot of the architectural decisions you make are reliant on the instruction set.

Don't forget that ISAs can be updated. Can ARMv8 compete against TSX, AVX2?

III-V · Dec 16, 2014

witeken said:
Don't forget that ISAs can be updated. Can ARMv8 compete against TSX, AVX2?

Certainly yes, as those two instruction sets are quite niche.

dmens · Dec 16, 2014

III-V said:
while the decode width has not changed in quite some time. We are probably not terribly far off from the point where increasing the decode width would be a wise decision, given this -- perhaps another tock or two.

Increasing decode width is pointless unless you increase the width of the entire machine, and it is immensely difficult to increase machine width across the board. Then there is justifying the inevitable frequency hit and lengthy power studies to see if all the work was all worth it.

This is not unique to x86, I'm sure ARM has to deal with the same trade-offs. Decode width is not an argument in favor of or against either ISA.

III-V · Dec 16, 2014

dmens said:
Increasing decode width is pointless unless you increase the width of the entire machine, and it is immensely difficult to increase machine width across the board. Then there is justifying the inevitable frequency hit and lengthy power studies to see if all the work was all worth it.

This is not unique to x86, I'm sure ARM has to deal with the same trade-offs. Decode width is not an argument in favor of or against either ISA.

Where have I given you any indication that I don't understand this?

witeken · Dec 16, 2014

III-V said:
Certainly yes, as those two instruction sets are quite niche.

I thought this was a discussion in a vacuum?

dmens · Dec 16, 2014

III-V said:
Where have I given you any indication that I don't understand this?

If you know it's not relevant, why bring it up?

krumme · Dec 16, 2014

There is quad arm a7 in loads of midrange phones today. That will be 2015 lowend and new midrange will be another in order quad a53. Many cores. Small cores. A7 is 0.5mm2 including L1 on 28nm. Decoding area means a lot here.

And try and imagine how an x86 on 0.5mm2 28nm would perform?
Johans points have never been more important as arm actually play a role today and ipc seems to slow down and more cores are used.

videogames101 · Dec 16, 2014

dmens said:
Sorry, let me clarify. The proprietary code that the decoder spits out after the front-end is done with it is what I meant by interpreted code. The back-end is the rest of the machine other than the decoder, which consumes that proprietary code and performs computations, memory accesses, etc.

The silver lining of x86 in having a complicated decode stage is that it allows engineers to change the definition of interpreted code as they see fit. That off-loads the complexity into the decode stage and can simplify the back-end stage of the machine.

Again, just personal observations.

I don't pretend to understand modern CPU logic design but arguing from first principles, presumably having a simpler instruction set means you can get the same output from your decoder with less complexity, no? And if so, it doesn't seem like a unique benefit. Even my Hennessy 5 stage pipeline "abstracted" the opcodes from the actual execution of the operations, and made it easy to define how operations should be interpreted.

III-V · Dec 16, 2014

witeken said:
I thought this was a discussion in a vacuum?

I answered your question; was there anything wrong with that?

dmens said:
If you know it's not relevant, why bring it up?

If you're having issues with seeing the relevance of the points I am making, then I suggest you go back and read through the comments I have made, and the posts of the people I have replied to.

witeken · Dec 16, 2014

III-V said:
I answered your question; was there anything wrong with that?

Yes, the straw man.

III-V · Dec 16, 2014

witeken said:
Yes, the straw man.

I was asked, quite plainly, if TSX and AVX defeat any chance that ARMv8 has to compete. I said no. I'm sorry if this wasn't what you meant to say.

The primary issue with new instruction sets is that you need a recompile to take advantage of them. By the time software has been updated to use the new instructions, you've already got something new out that uses even newer instruction sets. Unfortunately, the impact ISAs can have are rarely documented by tech sites.

Idontcare · Dec 16, 2014

Bikes are less complex than cars, cost less, and are more power efficient.

Look out Detroit!

III-V · Dec 16, 2014

Idontcare said:
Bikes are less complex than cars, cost less, and are more power efficient.

Look out Detroit!

I don't feel as if that's a fair analogy. To make it more fair, we'd have to say these "bikes" are targeting the same market as the cars, and have most of the same bells and whistles, including air conditioning and heating, air bags, and so on. Or the cars were peddle-powered.

dmens · Dec 16, 2014

videogames101 said:
I don't pretend to understand modern CPU logic design but arguing from first principles, presumably having a simpler instruction set means you can get the same output from your decoder with less complexity, no? And if so, it doesn't seem like a unique benefit. Even my Hennessy 5 stage pipeline "abstracted" the opcodes from the actual execution of the operations, and made it easy to define how operations should be interpreted.

That is true. I guess the point is the hardware required to decode x86 into a proprietary back-end format allows for interesting optimizations to both the proprietary format and back-end implementation. The impact must be assessed as a whole, not from the size of the decoder unit alone.

dmens · Dec 16, 2014

III-V said:
If you're having issues with seeing the relevance of the points I am making, then I suggest you go back and read through the comments I have made, and the posts of the people I have replied to.

Still don't see it. Frankly, I don't see how one ISA can be better than another. It's a definition. What is the metric for assessing them? How can you isolate the impact of the ISA alone when so many other factors are at play? There is no such thing as a vacuum, all-others-being-equal, whatever.

Also,

No, ARM's objectively better. You get more registers (which are critical for performance), you don't have to decode to micro ops, power's down, area's down, performance is up.

- no. of logical registers - register renaming
- don't have to decode, power, area, performance: it's just different kind of decoding and there are trade-offs there, as always.

AnandThenMan · Dec 16, 2014

Idontcare said:
Bikes are less complex than cars, cost less, and are more power efficient.

Look out Detroit!

If you try and make a car do the job of a bike, then the bike is the better solution.

SlimFan · Dec 16, 2014

AnandThenMan said:
If you try and make a car do the job of a bike, then the bike is the better solution.

Of course, at least in America, that's completely irrelevant. Everyone drives even if they could have walked. Not to mention, in some places it's too hot to ride a bicycle regularly.

I need my features! Air conditioner! Ass warmer!!

That's kind of the problem with the "x86 tax" argument. Eventually you end up needing features (NEON, performance, security, virtualization) and then everyone is just as big as everyone else. In a world where Apple's core is multiple integer ratios larger than an Atom core, IMHO the "x86 tax" is not relevant.

III-V · Dec 16, 2014

dmens said:
Still don't see it.

Please point out, verbatim, where I posted irrelevant statements. Secondly, please point out what statements I made that were deserving of your condescending statements that explained things that go without saying.

How can you isolate the impact of the ISA alone when so many other factors are at play? There is no such thing as a vacuum, all-others-being-equal, whatever.

Then you get them as equal as possible, for the purpose of this discussion.

Frankly, I don't see how one ISA can be better than another. It's a definition. What is the metric for assessing them?

...

Also,

- no. of logical registers - register renaming
- don't have to decode, power, area, performance: it's just different kind of decoding and there are trade-offs there, as always.

Say you have 100 programs, compiled for a processor utilizing ARMv8, and compare to a processor utilizing x86-64. Assume the target workloads for these processors are identical. Also assume their development budgets were identical. Assume they use the same manufacturing process. The compilers used were created with an equal development investment. Assume all personnel involved in the creation of the software and hardware mentioned are equally competent.

Measure the performance of those 100 programs, and compare between the two processors.

I'd appreciate it if you'd stop nitpicking semantics, and address what I'm actually trying to say. It's really frustrating that I've had to go and spell all this out.

jhu · Dec 16, 2014

III-V said:
Say you have 100 programs, compiled for a processor utilizing ARMv8, and compare to a processor utilizing x86-64. Assume the target workloads for these processors are identical. Also assume their development budgets were identical. Assume they use the same manufacturing process. The compilers used were created with an equal development investment. Assume all personnel involved in the creation of the software and hardware mentioned are equally competent.

Measure the performance of those 100 programs, and compare between the two processors.

I'd appreciate it if you'd stop nitpicking semantics, and address what I'm actually trying to say. It's really frustrating that I've had to go and spell all this out.

While we're at it, let's assume invisible pink unicorns exist

The RISC Advantage

Diamond Member

Golden Member

Diamond Member

Platinum Member

Senior member

Senior member

Diamond Member

Senior member

Platinum Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Elite Member

Senior member

Platinum Member

Platinum Member

Diamond Member

Member

Senior member

Lifer