x86 CPU internal execution opcode

velis · Dec 3, 2013

What is the benefit of actually keeping the X86 decoders on an X86 CPU as opposed to plainly introducing a new mode that would expose the bare RISC instruction set of the execution unit? (of course, the X86 decoders would remain for backwards compatibility, but in 5 - 10 years they could be phased out because all the software would be compiled for the new mode)

The question occurred to me when following the "modern computing platform" thread. The quotes there for IMO obvious reasons.

I'd assume that the decoder actually adds a stage or two to the pipeline whereas a direct RISC exposed instruction set needs not mutiple stages - aside from the actual stage that converts memory contents into actual opcodes for the CPU to process.

So I figure there must be a reason that they keep the X86 instruction set. Either because:
1. Decoding X86 is now so cheap there's no need to try to reinvent the wheel because that would not bring any performance benefit
2. Some legacy bullpoop / inertia is preventing the devs to actually do something about it

They seem to not have had any problems introducing new modes when X86 was introduced, so I don't think mode code is a problem.

So, what do you guys think?

NTMBK · Dec 3, 2013

Decoupling instruction set and internal micro-ops is generally A Good Thing. CPU manufacturers get the chance to completely throw away internal representation and come up with something better on their new generation, and have totally different architectures (say, Silvermont and Haswell) support the same binaries.

If you're a software engineer, think of the ISA like a software library's API. You don't want to change the library's API, but you can totally re-engineer the guts of it so long as it still behaves as your user would expect.

Now, x86 may not be the best ISA in the world; but an abstracted ISA is still preferable to naked uOps.

velis · Dec 3, 2013

Quite honestly, having done some assembly in my earlier life, I will admit that I like Motorola's 6800(0) ISA more, but that's more because of consistency than because of actual beauty of the set itself.
X86 ISA was just a half-though-out vs 68000. The same instruction worked on limited set of registers: early implementations typically only allowed one single register target for a given instruction, but that was all fixed later on.
But this is off-topic, I suppose.

I guess the more exact question would be: if it's the less expensive in terms of cycles to process, why not completely remodel the ISA to something that would fit the execution units' format more closely?

Which brings me to a third possible hypothesis:
3. The X86 ISA as it is today is a prime example of CISC and in no way inferior to any of the RISC ISAs, be it in performance or functionality.

If the above should be correct, it would mean decoding X86 imposes no performance penalty to speak of vs decoding a RISC ISA. Even if there are additional stages involved, they ultimately don't play a significant role in final code performance.

Cerb · Dec 3, 2013

velis said:
What is the benefit of actually keeping the X86 decoders on an X86 CPU as opposed to plainly introducing a new mode that would expose the bare RISC instruction set of the execution unit? (of course, the X86 decoders would remain for backwards compatibility, but in 5 - 10 years they could be phased out because all the software would be compiled for the new mode)

First, because the new "mode" would result in dead-end software, while x86 can survive to future CPUs. Each new CPU's internal RISC-like system is (a) based on x86 (not as in derived from, but as in designed to run x86 front-end code), and (b) can change in each new CPU. Ivy's isn't going to be the same as Haswell's, FI.

Second, bandwidth. A simple RISC-like instruction set is big, and shuffling it around caches and main memory is a chore. It can be and has been a problem for RISC CPUs of the past, and continues to be one of many limits for practical fast MIPS. x86-64 is not small, but still smaller than ARM or PCC, typically, both of which are quit decent for RISCs (it will be interesting to see what practical software comes out to with AArch64). Staying inside the CPU, and with the CPU having hard-coded paths for it, that's not a big deal.

Third, nobody except hardware people that lament the loss of things like fast FP emulation really even want it. breaking changes in ISA, or where the ISA doesn't prevent new chips from running the same code differently, have happened before (particularly in early MIPS, ARM, and Alpha), and are generally not liked. The reality would be that they'd have to keep compatibility over time with the internal ISA they exposed.

I'd assume that the decoder actually adds a stage or two to the pipeline whereas a direct RISC exposed instruction set needs not mutiple stages - aside from the actual stage that converts memory contents into actual opcodes for the CPU to process.

Sure, but those stages are so well hidden that nobody cares, in practice. As long as decoding bandwidth is there, an decoding doesn't take too long, you only "see" when there's a cache miss, because typically >98% of the time, the CPU has predicted what needs to be run many cycles before it actually has to be run. And, when there's a miss, even going out to L2 on a new Intel takes far longer than decoding and executing most instructions.

Inertia keeps pushing x86 specifically, but that kind of design, where the chip's internals can be decoupled from the external interface, is there for good reason, is not unique to x86 (RISC ISA CPUs do it, too), and will stick around.

A5 · Dec 3, 2013

x86 decoding is cheap enough to not bother. Why make ICC, GCC, and MSVCC throw out 20+ years of optimizations to get rid of something that isn't causing any problems?

NTMBK · Dec 4, 2013

velis said:
I guess the more exact question would be: if it's the less expensive in terms of cycles to process, why not completely remodel the ISA to something that would fit the execution units' format more closely?

Because it will only fit today's execution units. But what about in 5, 10 years' time? Once you settle on an ISA, you're stuck with it because nobody wants to invalidate years of useful software. Remember, x86 was a damn good fit to the execution units of the 8086- 35 years ago.

And frankly, the ISA cost is negligible these days. Back in the CISC vs. RISC war of the 90s you could potentially argue that x86 was a performance sucker, but today's processors are 100 times more powerful- and the instruction set isn't 100 times more complex. The "x86 penalty" is basically gone, and the costs of ditching x86 (complete death of legacy) outweigh the benefits. Hell, Intel themselves tried to kill it with Itanium, and look how that worked out.

glugglug · Dec 4, 2013

From the P4 onwards, instructions are already cached in L1 as RISC microcode rather than x86 CISC instructions (the x86 decode only happens when pulling code into cache, not running it again in a loop). Also, the x86 code takes fewer instructions and less space than the RISC microcode (so x86 has had an unintended benefit of effective code compression). Since the x86 code is smaller, it takes less time to read it from memory into the cache, and I've read it's an overall performance gain even with the initial decode overhead.

velis · Dec 6, 2013

Good point on compiler optimizations guys.

So basically, all those x86 bashers better shut up because them saying x86 is bad is actually stating they don't know anything about CPU performance, right?

SecurityTheatre · Dec 6, 2013

velis said:
Good point on compiler optimizations guys.

So basically, all those x86 bashers better shut up because them saying x86 is bad is actually stating they don't know anything about CPU performance, right?

There are legitimate benefits to RISC architectures, but in a practical sense, the benefits are small and outweighed by the state of the art.

If we could magically have a RISC compiler with 10 years of tweaking, then sure, I'd go with that over x86, but starting from scratch is difficult and fraught with unseen challenges.

Intel tried with the IA-64. It was a cool idea, but fell down based on the lack of optimization, and limited compiler support. However, carefully optimized code (like LINPAC) benefited hugely from the new architectural concepts.

glugglug · Dec 8, 2013

SecurityTheatre said:
Intel tried with the IA-64. It was a cool idea, but fell down based on the lack of optimization, and limited compiler support. However, carefully optimized code (like LINPAC) benefited hugely from the new architectural concepts.

Actually, the RISC part of IA-64 slowed it down relative to x86 because it made the code more bloated and it took too long to fetch code into cache.

The cool innovation that IA-64 had which would have been a tremendous boost if it was the mainstream architecture (or maybe x86 with a similar extension) was that to some extent instruction interdependencies were figured out by the compiler rather than the CPU hardware scheduler and sets of instructions were explicitly flagged as independent so the CPU didn't need to make that determination (aka EPIC = explicitly parallel instruction computing). The thing is the explicitly parallel instructions were in sets of only up to 3, so the CPU scheduler ends up needing to figure out more parallelism to keep the pipelines busy anyway. They should have had a flag on each set of 3 parallel instructions for whether it could be run in parallel with the next 3 as well.

Furthermore, the real difference between RISC and CISC is addressing modes for instructions other than MOV. In RISC, moving operands from memory to registers, doing math on them, and putting the results back in memory is 3 instructions, which are obviously dependent on each other and need to run sequentially. In CISC, that is 1 instruction, which could run in parallel with other similar instructions operating on different addresses. Marking small sets of CISC instructions as explicitly parallel would probably end up being a lot more useful than doing the same with RISC as IA64 did.

Search

x86 CPU internal execution opcode

velis

Senior member

NTMBK

Lifer

velis

Senior member

Cerb

Elite Member

A5

Diamond Member

NTMBK

Lifer

glugglug

Diamond Member

velis

Senior member

SecurityTheatre

Senior member

glugglug

Diamond Member

TRENDING THREADS