Question Should Intel or AMD break backward compatibility and do a clean x86-64 reboot?

Carfax83 · Dec 25, 2020

This is one of the greatest criticisms I hear about x86-64, that it is too old and burdened with legacy instruction sets that nobody uses anymore. The other major criticism has to do with the variable instruction length, but that's not nearly as clear cut in my opinion. From what I have read, it seems that having a variable instruction length can be advantageous at times, and increase instruction density. Also, micro op caches and ops fusion seem to do a great job of mitigating the problem of having to add more decoders to achieve greater instruction level parallelism. So at least there are solid workarounds for that one. But the legacy stuff is another matter entirely.... How disruptive would such a move be if Intel or AMD decided to throw out most or all of the legacy x86 stuff and design a cleaner more modern architecture?

I doubt it would affect end consumers that much, more so the business and enterprise sectors. But maybe I'm wrong. There was also a rumor about this several years back from bits n' chips.it, that perhaps Willow Cove would be the last x86-64 CPU designed by Intel that had wide support for legacy instruction sets. So I guess that would be Golden Cove, assuming the rumor is true.

zir_blazer · Dec 25, 2020

Google about the Intel 80376 and tell me what you find about it.

damian101 · Dec 25, 2020

I think an operating system kernel should be able to translate/emulate unsupported instructions for older binaries. With a performance penalty of course. That would make software compiled with old x86-64 compilers compatible with the new hardware, as long as you're using the right operating system.
I'm not an expert though.

Or maybe a hardware solution would make more sense: A small area on the die that just translates the legacy instructions before feeding them to the cores, again, with a heavy performance penalty.
That way full hardware compatibility could be maintained while also providing the performance and efficiency benefits for software compiled towards the new reduced x86 instruction set. Again, not sure about that.

sandorski · Dec 25, 2020

There were people complaining about these issues long before AMD came up with x86-64. As an End User I really don't care if anyone "fixes" the issue.

HutchinsonJC · Dec 26, 2020

I would bring to attention that when AMD was at a stall and Intel held the performing advantage, these forums and others had those who started to wonder about things like Moore's Law. We had those that threw around terms like Low Hanging Fruit in the context that there maybe just wasn't any more. We also had those who debated that Intel hadn't had any real competition and their stall in performance upgrades was a result of that.

And then the last few years showed us an AMD who was almost done for financially swing back into a place of increasing mind share with Ryzen and the following series that brought percentage boosts that were very obviously not strictly the result of the process technology brought in by TSMC. AMD seems to have found its stride. We'll see in the next iteration or two if they can maintain these kinds of performance boosts iteration over iteration or if they massively stall now that they've reached a kind of parity with Intel.

I've also seen these legacy criticisms, and with the Intel stall, I started to wonder more about it as I watched ARM moving at the pace it has been. Now that I'm seeing what AMD is doing, and hearing how an interview cited that it was better to just do a refactoring every now and then instead of just building on and building on, I'm less concerned about it because in that single explanation, it seemed to indicate a confident AMD that knew how to bring improvements iteration over iteration.

I say this with basically very little knowledge of CPU layouts/logic, though. Someone who does understand these things at the transistor/logic level maybe could answer the question of how much legacy existing on a CPU impacts performance. Is it as similar as in a program code If This - Then That... Else - This Other Thing? Is that the kind of slow down we're talking about removing by removing Legacy? We need one less check? Or is the Legacy being there more impactful, less impactful, or even more nuanced than that? As I try to imagine through the possibilities, is it just the wattage burned for it existing?

A part of me wants to agree with sandorski, mostly on the basis that I like the software library that exists on x86 and I'm seeing a seeming increased pace in x86 performance /shrug

TheELF · Dec 26, 2020

That's literally what Reduced instruction set computers (RISC) are.
You rip out anything old and only keep whatever is useful at the time.
So why would AMD or Intel bother with something that others have decades of a head start in?! Give up a ~~mono~~ oligopoly position to become a simple player in a game with a huge amount of competition.

Insert_Nickname · Dec 26, 2020

Carfax83 said:
How disruptive would such a move be if Intel or AMD decided to throw out most or all of the legacy x86 stuff and design a cleaner more modern architecture?

It really depends on how you define legacy, but I don't think there'd be too much disruption. A lot of legacy compatibility has already been thrown out of modern x86_64 platforms. It's f.x. not as if run can run WinXP on newer hardware without some form of either emulation or virtual machine interface.

It's perfectly -possible- to emulate old x86 on f.x. ARM with a performance penalty. MS has already done an x86-to-ARM emulator, and are working on a x86_64 version. The question is how much that penalty matters. If you have some older legacy code which just needs to run, emulation is perfectly viable (remember just how powerful modern CPUs are compared to f.x. 10-20 years ago). If you need performance, I'd expect you're already on a modern platform, that there is a modern binary available, or you can do a recompile against a modern instruction set.

my 2c worth of opinion.

Fanatical Meat · Dec 26, 2020

HutchinsonJC said:
I would bring to attention that when AMD was at a stall and Intel held the performing advantage, these forums and others had those who started to wonder about things like Moore's Law. We had those that threw around terms like Low Hanging Fruit in the context that there maybe just wasn't any more. We also had those who debated that Intel hadn't had any real competition and their stall in performance upgrades was a result of that.

And then the last few years showed us an AMD who was almost done for financially swing back into a place of increasing mind share with Ryzen and the following series that brought percentage boosts that were very obviously not strictly the result of the process technology brought in by TSMC. AMD seems to have found its stride. We'll see in the next iteration or two if they can maintain these kinds of performance boosts iteration over iteration or if they massively stall now that they've reached a kind of parity with Intel.

I've also seen these legacy criticisms, and with the Intel stall, I started to wonder more about it as I watched ARM moving at the pace it has been. Now that I'm seeing what AMD is doing, and hearing how an interview cited that it was better to just do a refactoring every now and then instead of just building on and building on, I'm less concerned about it because in that single explanation, it seemed to indicate a confident AMD that knew how to bring improvements iteration over iteration.

I say this with basically very little knowledge of CPU layouts/logic, though. Someone who does understand these things at the transistor/logic level maybe could answer the question of how much legacy existing on a CPU impacts performance. Is it as similar as in a program code If This - Then That... Else - This Other Thing? Is that the kind of slow down we're talking about removing by removing Legacy? We need one less check? Or is the Legacy being there more impactful, less impactful, or even more nuanced than that? As I try to imagine through the possibilities, is it just the wattage burned for it existing?

A part of me wants to agree with sandorski, mostly on the basis that I like the software library that exists on x86 and I'm seeing a seeming increased pace in x86 performance /shrug

Yeah I’m not the guy to talk specifics of this point but I can talk at a 10,000 foot overview.
Seems like performance gains will be limited in the foreseeable future. I do wonder if something on the OS/software side is the solution. Maybe virtualized machines for backwards compatibility? However would everything just run in the virtual environment?
I don’t know...

TheELF · Dec 26, 2020

Insert_Nickname said:
It really depends on how you define legacy, but I don't think there'd be too much disruption. A lot of legacy compatibility has already been thrown out of modern x86_64 platforms. It's f.x. not as if run can run WinXP on newer hardware without some form of either emulation or virtual machine interface.

Older OSes only problem is that there are no drivers for newer hardware and that's what VM provides, a basic chipset and basic devices that can be run on the default drivers that came with the OS.

If you have virtualization activated the VM is running 1:1 the original legacy code.

ajc9988 · Dec 26, 2020

Carfax83 said:
This is one of the greatest criticisms I hear about x86-64, that it is too old and burdened with legacy instruction sets that nobody uses anymore. The other major criticism has to do with the variable instruction length, but that's not nearly as clear cut in my opinion. From what I have read, it seems that having a variable instruction length can be advantageous at times, and increase instruction density. Also, micro op caches and ops fusion seem to do a great job of mitigating the problem of having to add more decoders to achieve greater instruction level parallelism. So at least there are solid workarounds for that one. But the legacy stuff is another matter entirely.... How disruptive would such a move be if Intel or AMD decided to throw out most or all of the legacy x86 stuff and design a cleaner more modern architecture?

I doubt it would affect end consumers that much, more so the business and enterprise sectors. But maybe I'm wrong. There was also a rumor about this several years back from bits n' chips.it, that perhaps Willow Cove would be the last x86-64 CPU designed by Intel that had wide support for legacy instruction sets. So I guess that would be Golden Cove, assuming the rumor is true.

Corporations are the real issue, as i see it. People assume that there are newer binaries and newer versions that can do the same thing. This ignores the long and drawn out test beds to verify stability and fitness for the intended purpose. This also ignored corporations paid previously for custom software solutions to be made which run their databases even though, at times, decades old. This could cause large expenses for many. So if this road is taken, it would be best to check with commercial purchasers to check impact. On the consumer side, this is practically not a problem. How many are still running windows 98 or XP. Some, but not many. And those doing so are on older hardware with the instruction set needed for it.

One of the reasons to do it is to reduce energy consumption and free up die space/ transistors that then can be applied for other purposes. One example of an instruction set that was created and removed is FMA4. That isn't legacy, but an example of instruction sets being removed.

There were also other times Intel created instruction sets AMD did not adopt (aside from not adopting yet, like AVX512, which is die space intensive, energy intensive, and used by a very small subset of users, primarily in the commercial space).

Now, part of this is to say that instruction sets do not have to be uniform across x86 vendors. It also is to say removing some legacy instruction sets does not automatically make the architecture a RISC chip no more than not adopting an instruction set or dropping an instruction set does.

Advantages of getting rid of legacy instructions is optimization of instruction registers, optimization of the load/ store (which has recently helped AMD), potential freeing up of die space for other use, etc. So some of the benefits seen in RISC customized ASICs would be seen here, but that is by the nature of optimization, not because x86 would become a RISC chip.

Hulk · Dec 26, 2020

HutchinsonJC said:
I would bring to attention that when AMD was at a stall and Intel held the performing advantage, these forums and others had those who started to wonder about things like Moore's Law. We had those that threw around terms like Low Hanging Fruit in the context that there maybe just wasn't any more.

Valid points but I have a different take on the "low hanging fruit" statement.

Anand was one of the first people to talk about "low hanging fruit" in relation to x86 IPC improvements. I believe he was absolutely correct in his statement and it holds to this day. At the time he was referring to the massive IPC improvement that came from the move from Netburst to Core where we saw a doubling of IPC.

Anand is/was a genius and his analysis of the technology was astounding. It would be a mistake to take what he has written lightly without first examining the facts.

Some pointed evidence of the lack of "low hanging fruit" would be the Handbrake bench thread in this forum. Let's assume Tiger Lake has approximately the same IPC in Handbrake as Zen 3, which I don't believe it does but we will go with this assumption. 11th generation Tiger Lake IPC in this Handbrake bench is 53% faster than 4th generation Haswell. 53% in 7 generations or 100% generation-to-generation for Netburst to Core. That's 14.5% increase per generation on average. Actually lower since Tiger Lake doesn't do the IPC that Zen 3 does.

Since then there hasn't been a lot of low hanging fruit so Anand was correct. Making the back end or front end wider for maybe 10 or 15% IPC increase is no easy task. And all of the smaller architectural changes (increasing micro-op cache size, cache algorithms, OoO scheduler improvements, etc...) have resulted in perhaps a couple of percent IPC improvement. It's been a struggle since Core to find the "fruit."

So 100% IPC increase from Netburst to Core and then perhaps 15% on average for the next 7 generations. Core did indeed pick all of the low hanging fruit.

Perhaps the best evidence of all to support his statement is the fact that I (and many others) have held onto computers from 7 or 8 years ago and still use them as their main systems today. My main system is a 4770k. I used to upgrade every 2 years. But since the low hanging fruit was all picked I haven't seen enough improvement to upgrade. Now with Zen 3 and Rocket Lake there has finally been enough improvement from Haswell that I will most likely upgraded in the next year.

HutchinsonJC · Dec 26, 2020

Hulk said:
held onto computers from 7 or 8 years

I can put myself in that crowd having upgraded from a 3960x (Intel - 2012 purchase) to a 3950x (AMD - 2020 purchase).

I agree that picking the low hanging fruit for Core specifically is basically at an end, but when Intel chooses to stay with it for as long as it has, it kinda makes folks wonder if its nearing the end for x86 improvements. But then AMD does what it did, and now it's got me re-questioning just what x86 can really do.

I saw a few posts above that seems to confirm the idea that Legacy existing on a CPU is wattage burned up, but wouldn't it be true too that the wattage burned up by that Legacy existing becomes an increasingly smaller amount of the total wattage used by the CPU as transistor counts go up and fab size goes down? It makes me wonder if fab process improvements can render the idea of removing legacy as fairly negligible. Who could begin to theorize how many watts are burned by Legacy transistors sitting on the CPU idle comparing fab process generation over generation?

Leeea · Dec 26, 2020

x86 is not just software backwards compatibility, it is a whole ecosystem. x86 comes with a BIOS that frequently handles the tricky bits of integrating with hardware. Most hardware is designed to work with x86 from the get go.

It is not that way outside of x86 land, things do not just work.

There have been cpus that have been much faster then x86 (sparc, power pc, aplha), and ones that have been cheaper. But between its ecosystem and its $ / performance it has held its own. Yes, expensive RISC/ARM cpus can keep up with x86 again, but that is no where good enough.

x86 will be dominate until something is cheaper, faster, and easier. Nothing is close to achieving that.

Hulk · Dec 26, 2020

I don't think the architecture is inherently the problem. Part of the problem is that there are only so many instructions that can be executed in a processor due to one instruction relying values from the one before it. A lot of work has gone into out of order scheduling and keeping things moving in the core and HT or SMT is one method to alleviate this.

The onus of performance may be moving more to developers to create code that is not only highly multithreaded but also able to be highly "parallel processed" through each individual core.

Or more likely better communication between software and hardware engineers.

This is one area where Apple has a great advantage with their closed system. They can potentially build the processors, the operating systems, and the software to work together with new levels of efficiency. Microsoft may see this coming and my be thinking the same thing and developing their own hardware for this very reason.

The progress that TSMC has made allows producing processors more open to what we would normally consider software companies.

turtile · Dec 26, 2020

I don't think x86 or ARM IPC is going to very important in the next 5-10 years. From what I see, I think we are headed to a future of a general purpose processor along with a bunch of specialized co-processors.

moinmoin · Dec 26, 2020

Carfax83 said:
This is one of the greatest criticisms I hear about x86-64, that it is too old and burdened with legacy instruction sets that nobody uses anymore.

I'd like to start with that sentence, because it's mistaken and that shows us where a realistic way forward for x86 as a whole could be. x86-64 (or x64 or AMD64) is not "too old and burdened with legacy instruction sets that nobody uses anymore", it's just one mode of many modes of which most are actually too old and nobody uses anymore.

x86 up to now kept the practice of perfect back compatibility, which includes at this time honestly ridiculous support for real mode (so 8086 and 8088), protected mode (so 80286) and enhanced mode (so 80386).

A clean break could be removing support for those 16 and 32 bit modes and retain just x86-64. In case 32 bit applications should continue to run, the OS would need to offer an emulation layer. Since the introduction of 64 bit UEFI the POST doesn't rely on anything older than 64 bit anymore so that part shouldn't be a problem.

Schmide · Dec 26, 2020

I think the whole cost of legacy modes is kind of overblown. For the most part it's a few segment registers that shift address and a few default operand sizes. The simplicity of 90s processor technology hardly seems like it would make a dent in the today's transistor budget.

The claim modern x86 is slow or power hungry because of the legacy modes, to me, is illogical.

thigobr · Dec 27, 2020

On top of what has been said, many legacy and more complex instructions are actually run through micro code inside current CPUs. You can always design the circuits to execute most common instructions faster and rely on this kind of "emulation".

Leeea · Dec 27, 2020

moinmoin said:
A clean break could be removing support for those 16 and 32 bit modes and retain just x86-64. In case 32 bit applications should continue to run, the OS would need to offer an emulation layer. Since the introduction of 64 bit UEFI the POST doesn't rely on anything older than 64 bit anymore so that part shouldn't be a problem.

I would not buy an x86 cpu that did not support 32 bit.

32 bit is the default compile target for Visual Studio last I checked. From the software writing aspect, write once, and run anywhere for decades to come is very attractive.

TheGiant · Dec 27, 2020

Leeea said:
x86 is not just software backwards compatibility, it is a whole ecosystem. x86 comes with a BIOS that frequently handles the tricky bits of integrating with hardware. Most hardware is designed to work with x86 from the get go.

It is not that way outside of x86 land, things do not just work.

There have been cpus that have been much faster then x86 (sparc, power pc, aplha), and ones that have been cheaper. But between its ecosystem and its $ / performance it has held its own. Yes, expensive RISC/ARM cpus can keep up with x86 again, but that is no where good enough.

x86 will be dominate until something is cheaper, faster, and easier. Nothing is close to achieving that.

oh I couldn't say it better after 2 weeks as macbook air m1 owner
for the topic my opinion is Intel is starting to break the backwards compatibility with alder lake -s, the hybrid design
when the atom core reaches the "acceptable" performance- skylake level and I guess it will be in like 2021 the legacy apps needs 4- cores to run well enough
but 4- atom cores is a tiny block of the entire package
too many people forget how many computer parts and other connected tech just works plug and play with x86
after 2 weeks of m1ing I discovered year 1998 with with windows 95...

JoeRambo · Dec 27, 2020

The elefant in the room is -> that this "clean" reboot is already out and is already in millions if not billions of machines.

And the name of that "reboot" is AArch64 and it even has x86 emulators for MAC/Windows and X64 emulator for MAC ( and soon Windows) as patents have expired.

Any potential new instruction set has to be evaluated against this rather clean, freshly designed instruction set that has little in common with horrors (necessary evils committed in the name of code density) of previous ARM sets.

"Inventing", designing and implementing CPU is not enough -> whole ecosystem is needed and ARM has a lot going for it, including the coveted 32 AND 64bit mode X86 emulation. It takes millions $$$ of effort to support architecture in OS like Linux, get rudimentary support in GCC/LLVM, but it would take billions in effort ( and bribing MS? ) to release new architecture and get it supported by Windows.
Would need to have tremendous advantages in performance and power usage versus ARM to happen and whatever tricks this was using would get copied by ARM.

And pipe dreams of another breakthrough paradigm shift (on the level of OoO) => the question is what would keep existing players from implementing it?

Mopetar · Dec 27, 2020

Haven't both AMD and Intel largely moved away from the legacy x86 designs of the past to the extent that they're basically a RISC architecture internally with support for the older CISC instructions to be translated into some kind of microcode the processor actually operates on?

Unless the older instructions are being used a lot, there isn't as much need for that silicon to be used. Sure it's extra die space that could be used for something else, but it need not always be powered on and consuming energy. Compilers can avoid using those instructions as well so that more recent software avoids the problem entirely.

amrnuke · Dec 27, 2020

Mopetar said:
Haven't both AMD and Intel largely moved away from the legacy x86 designs of the past to the extent that they're basically a RISC architecture internally with support for the older CISC instructions to be translated into some kind of microcode the processor actually operates on?

That's what I understand too.

Mopetar said:
Unless the older instructions are being used a lot, there isn't as much need for that silicon to be used. Sure it's extra die space that could be used for something else, but it need not always be powered on and consuming energy. Compilers can avoid using those instructions as well so that more recent software avoids the problem entirely.

I do wonder if there could be some flag an application could pass to bypass/turn off the x64->microcode component, and just compile the app for the processor-specific uops (in and of itself this probably would make finding the proprietary uop code of each chip maker easier too...). Of course, enabling such a switch would in and of itself require another controller, and would need to ensure all dependencies are also not needing the uop decode. On top of that, I'm not sure how "integrated" the uop decode is into the chip at large for current x64 uarch, so I don't know how hard it would be to turn it off and still have the rest of the CPU work as usual, and I'm also not sure what kind of gains could be expected. It would seem to me that as long as the translation from x64->microcode isn't inherently slower than the rest of the pipeline, then it may be more fruitful to work on other aspects of the pipeline.

JoeRambo · Dec 27, 2020

The devil with x86 problems is always in details. For example Skylake has 5 decoders, ARM N1 has 4, Skylake has 25% advantage? Nope...

ARM decoding - take 32bits, thats your instruction, if you have 4 decoders, that's 16 bytes per cycle needed. Each is copy-paste copy of the others.

Decoding is damn complex on X86, instructions are sized 1 byte to 15 bytes, with complex nightmare of opcodes, modes and so on.
In fact things are so complex, that chips usually have predecode step, where instruction boundaries are found in current blocks.

Decoders then operate on said block, there used to be 1 complex decoder on Intel and 3-4 simple ones:
Complex one can decode any* instruction, simple ones - only the simplest that result in 1 microOP.
Your code contains several complex instructions in a row? Tough luck, wait for the next cycle, even if you decoded just 2 uOps this cycle.
You have decoded 15 bytes out of 16 byte block and the last one is simple instruction? Tough luck, only single uOP this cycle for you.
Your complex decoder has decoded something complex? Tough luck, less simple decoders will produce results this cycle. 5 is the limit.

Pentium PRO had 1 complex decoder and 2 simple ones, we haven't moved very far from 1994, cause it is hard problem in X86.

And such complexities go on and on and on. People used to be bitten by crazy cases like when adding byte or two sized NOP made hot loop go much faster (cause bytes aligned and decoders were happy).

uOP cache is a necessity, but each instruction needs to be decoded at least once, codebases grow and uOP cache is not infinite, it also gets invalidated on context switches and overall a literal ton of transistors to implement and power needs to be spent operating it.

So compare a single line for ARM with this mini wall of text that is in no way complete.

beginner99 · Dec 27, 2020

Schmide said:
The claim modern x86 is slow or power hungry because of the legacy modes, to me, is illogical.

But in conclusion this means that Apple is a lot better, a lot better at designing CPU's than Intel and AMD. Either x86 has a penalty or Intel/AMD are year(s) behind Apple in CPU design, especially efficiency which I do not really think. I think x86 "forces" you into certain design decisions which impact IPC and performance and efficiency.

Question Should Intel or AMD break backward compatibility and do a clean x86-64 reboot?

Diamond Member

Golden Member

Senior member

No Lifer

Senior member

Diamond Member

Diamond Member

Lifer

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Golden Member

Diamond Member

Golden Member

Golden Member

Diamond Member