It was a stupid idea and Robinson and his boys were very plainly told "no".Having the feature natively support different lengths was kind of the point of AVX10 though.
I don't know how much there is to elaborate on. AVX10 doesn't do anything differently that would make it easier to support in a CPU with hybrid architecture (p-cores and e-cores). Intel could've made a hybrid architecture CPU that supports AVX512.
I don't want to speak for OneEng2, but he seems to be conflating the width of the vectors in the instructions with the width of the execution unit. As other commenters have said, you can execute a 512-bit instruction with a "double-pumped" 256-bit execution unit.
Having the feature natively support different lengths was kind of the point of AVX10 though.
Robinson and team already worked on the OG AVX-512 they were mostly from area point of view and other stuff. AVX-512 will take a decent dev time for themIt was a stupid idea and Robinson and his boys were very plainly told "no".
Where did he get $50M 😱
I believe they are, but I am interested in hearing your argument on this.Your understanding here is still incorrect. Hybrid architectures aren't any easier with AVX10 than AVX512.
OK. After reading MORE about this topic, I have an even clearer explanation and a correction to my last post.I don't know how much there is to elaborate on. AVX10 doesn't do anything differently that would make it easier to support in a CPU with hybrid architecture (p-cores and e-cores). Intel could've made a hybrid architecture CPU that supports AVX512.
I don't want to speak for OneEng2, but he seems to be conflating the width of the vectors in the instructions with the width of the execution unit. As other commenters have said, you can execute a 512-bit instruction with a "double-pumped" 256-bit execution unit.
Yes, it absolutely IS the point.Having the feature natively support different lengths was kind of the point of AVX10 though.
If the official spec doc from intel removing the phrase "optional 512-bit" doesn't convince you, then I don't know what will. At this point I'm done arguing. The discussion has run it's course and it's blocking up the thread.I believe they are, but I am interested in hearing your argument on this.
OK. After reading MORE about this topic, I have an even clearer explanation and a correction to my last post.
Before I go into ANY detail, the MOST important thing to understand about AVX10.2 is that it is a VLA ISA (Vector Length Agnostic). This is done EXPLICITLY to support the same compiled code across different bit width implementations of different CPU's PEROID. This is the entire POINT of the new instruction set.
Let's look at this from 3 very different aspects:
AVX10.2 supports 3 different register widths (I incorrectly said it required 32 x 512b registers):
- Register width
- "Vector Size" (Width of the instructions)
- Execution Path size
XMM (128b)
YMM (256b)
ZMM (512b)
AVX10.2 supports 3 different Vector (instruction) widths:
XMM 128b operands
YMM 256b operands
ZMM 512b operands
Now lets look at Data Paths.
NOTHING IN AVX10.2 REQUIRES 256b OR 512b DATA PATHS. The entire data path could be 128b.
Now, finally, within a core, there are FEATURE flags in the CPUID that determine the LEVEL of AVX10.2 that is supported.
LEVEL1:
Max architectural width 256b (note MAX)
Supports XMM & YMM
DOES NOT HAVE ZMM registers OR ZMM Instructions/Vectors.
LEVEL2:
Max architectural width 512b (again note MAX)
Supports XMM, YMM and ZMM
NOTE: These are NOT in the instructions. The instructions are agnostic to the bit width supported. This is what makes it special.
Yes, it absolutely IS the point.
So what this MEANS is that it is very easy to make a hybrid architecture CPU with different bit width cores BUT all the cores within the CPU MUST be the same level to avoid issues with scheduling unsupported instructions.
The LEVEL is at the CPU level, not the core level. This DOES allow Intel to use the SAME P Core on server as they do in client BUT the CPU stays LEVEL 1 AVX10.2 support. The server part using the SAME core could advertise LEVEL 2 AVX10.2 support.
BOTH Server AND Client CPU's would run the SAME compiled code though.
If I have messed this up, someone let me know.
Oh that is the cute part, they can release upgraded versions every 2-3 years instead of 7.Now that's an argument for a new market - PC gaming market basically, and initially it might be good, so first year sales might be 2-3 mln, but consoles are designed for 6-7 and now looks like even 8 year lifecycles, but PCs evolve, so this PC will be fixed in time few years after release, sales will drop to near zero.
What?Plus it's competing against big OEMs who make far more boxes and get far better economies of sale than something that will sell at best level of Steam Deck (8 mln units).
If they only launch AT2 then you are getting barely any more perf from discrete, unless you pay the green tax.One big negative is that form factor will be limiting upgrades, so it's not real desktop that can last for a while, doubt a lot of PC gamers will be interested - it's basically neither here nor there, and in 2027/28 when this things comes out RDNA5 dGPU will be available for decent upgrades in PCs.
Well it is 36GB, and that is still way cheaper than the equiv desktop PC when it launches.Also there is question of margin - they won't sell it at a loss or even break even, so we might see $1200-$1500 range with the way memory inflation gone: 32 GB GDDR7 won't be cheap.
The whole point is they no longer compete with the PS6, it is all alone, except the handheld is designed to crush Switch 2 into a fine paste.PS6 will kill it.
AMD definitely doesn't need the complication of AVX10.2 as long as compilers support AVX512. AMD has decided that having different architecture e and p cores isn't their strategy so it really doesn't help AMD at all (except for the new instructions in 10.2).Long and short, intel realized how stupid it would be to carve out a 256b niche for a dying platform.
The instructions, the compilers, and the programs are already out there. If you have an avx512 program, you expect it to run on future hardware that exposes the avx512 flags.
If you look at the instructions.
byte 0 EVEX 0x62 (old bound instruction) this tells the system it is a special instruction and the next 3 bytes are going to define the operand.
byte 1 register map
byte 2 data width and opcode map
byte 3 vector length
bytes after. variable length from the opcode map in byte 2.
map 1-3 legacy 2-3 byte opcodes.
map 5,6 avx 512 opcodes.
It gets even more complicated after that.
If intel was to define a 256b path with the above. They would have to replicate avx512 in the remaining maps. Well guess what APX took maps 4 and 7.
No older hardware is going to expose new flags for a 256b path for avx512. So any implementation is left to future.
Putting out half ass future cores that will be the only users of this subset or replicated subset just makes you look lazy (crazy?) considering amd's unified implementation.
This is the point you realize it's just easier to add full avx512 support even if it runs slower using lesser hardware.
Zen 6 supports most everything in AVX10.2 including FP16, Bit Reversal, and possibly 32 zmm registers.AMD definitely doesn't need the complication of AVX10.2 as long as compilers support AVX512. AMD has decided that having different architecture e and p cores isn't their strategy so it really doesn't help AMD at all (except for the new instructions in 10.2).
I don't expect to see AVX10.2 support in any Zen or AMD architecture until such a time as AMD decides to make different architecture cores (not just different cache and different layouts to suit needs).... unless AMD wants to keep x86 consolidated to fend off ARM?
APX is another story. I think AMD should have full support of APX NO LATER than ZEN 7 (Really should be in Zen 6 IMO).
I mean, I'm glad that Intel FINALLY got drug into X86-64 by AMD and we went from 8 registers to 16 GPR's (one has to wonder why they didn't make the jump to 32 registers when they made the jump to 64 bit). I think that APX will bring x86 up to snuff (mostly) with ARM in areas that it is lagging. Note, I still think x86 is ahead of ARM in other aspects.
That's true for AVX512 with extensions too. Both AVX512 and AVX10 do not define how hardware should implement them. What has been already said in this thread...NOTHING IN AVX10.2 REQUIRES 256b OR 512b DATA PATHS. The entire data path could be 128b.
That's wrong. Like AVX512 it just offers 3 fixed lengths to choose from. That's nowhere near true VLA like ARM's SVE.Before I go into ANY detail, the MOST important thing to understand about AVX10.2 is that it is a VLA ISA (Vector Length Agnostic).
Once again it's wrong. Take a look at the example below. AVX512 and AVX10.2 compile to the same assembly. You can clearly see that both zmm and ymm registers are used. The ARM SVE example (furthest to the right) is true VLA example as nowhere does is make a distinction about the size of the target register. Everything is just Z{number}.s where s denotes we are dealing with 32b floats.NOTE: These are NOT in the instructions. The instructions are agnostic to the bit width supported. This is what makes it special.
Yes you did.If I have messed this up, someone let me know.
No. That is simply wrong. AVX10.2 is not VLA. The vector lengths are encoded into instructions. If code is compiled to use 128-bit registers, it will run on any AVX10.2 CPU, but always use 128-bit registers. If it is compiled to use 512-bit registers, it will only run on implementations with 512-bit registers.Before I go into ANY detail, the MOST important thing to understand about AVX10.2 is that it is a VLA ISA (Vector Length Agnostic). This is done EXPLICITLY to support the same compiled code across different bit width implementations of different CPU's PEROID. This is the entire POINT of the new instruction set.
This is incorrect. Any instruction may use any register size, but the register size is encoded into instructions.NOTE: These are NOT in the instructions. The instructions are agnostic to the bit width supported. This is what makes it special.
AMD has already announced support for AVX10.AMD definitely doesn't need the complication of AVX10.2 as long as compilers support AVX512. AMD has decided that having different architecture e and p cores isn't their strategy so it really doesn't help AMD at all (except for the new instructions in 10.2).

That's wrong. Like AVX512 it just offers 3 fixed lengths to choose from. That's nowhere near true VLA like ARM's SVE.
I stand corrected. Since this is true, what advantage does AVX10.2 have?No. That is simply wrong. AVX10.2 is not VLA. The vector lengths are encoded into instructions. If code is compiled to use 128-bit registers, it will run on any AVX10.2 CPU, but always use 128-bit registers. If it is compiled to use 512-bit registers, it will only run on implementations with 512-bit registers.
For what purpose I wonder?AMD has already announced support for AVX10.
They have the functionality already. In fact zen5 supports the most avx512 flags of any processor.For what purpose I wonder?
APX is another story. I think AMD should have full support of APX NO LATER than ZEN 7 (Really should be in Zen 6 IMO).
I would assume that APX could arrive with Zen 6. For Zen 6 AMD announced "new AI datatype support" and "more AI pipelines". That could be AMX and/or APX. But that is not clear to me. AMD does not state AMX or APX in any of their documents or blogposts (only Mark Papermaster had APX on one slide regarding x86 consortium, AMX was never shown). Zen 6 brings at least FP16 support and INT8 VNNI for AVX512 (where we have documented evidence).Zen 6 supports most everything in AVX10.2 including FP16, Bit Reversal, and 32 zmm registers?
The only question is will they add the APX (32 gprs) as well.
Source: https://www.techpowerup.com/342763/amd-zen-6-isa-to-bring-avx512-fp16-vnni-int8-and-more"A set of GNU compiler patches confirms this, as the new open-source enablement adds AVX512_BMM, AVX_NE_CONVERT, AVX_IFMA, AVX_VNNI_INT8, and AVX512_FP16 to GCC. These instructions are particularly interesting due to their intended use cases. AVX-512 BMM is designed for bit matrix manipulation, significantly accelerating local AI deployments. With native FP16 calculations and AVX VNNI in INT8 format on desktops, users will no longer need to rely on Intel Xeon CPUs for AVX-related development and process acceleration. This development benefits more users across the entire product family based on Zen 6. AMD is now directly competing with Intel in AVX development, meaning the difference between the two will come down to implementation strategy. There is also increasing evidence that Intel's next-generation "Nova Lake" will reintroduce AVX-512 functionality to desktops, suggesting that advanced vector and matrix acceleration is set to expand on consumer PCs."
Simplified feature detection. To be clear, it just doesn't add much over AVX512. The original plan was to have basically AVX-512 with different register lengths, and a runtime way of figuring out which code path to dispatch. The industry revolted, no-one wants the complication of shipping multiple code paths. What's left is just AVX-512 with a better feature detection scheme bolted on top. (Which no-one will use for decades, because there is a significant base of AMD AVX-512 hardware, both on desktop and in server, and people will use the old style of feature detection to make sure code runs on them too.)I stand corrected. Since this is true, what advantage does AVX10.2 have?
For the same purpose they have added all the previous Intel SIMD extensions. Compatibility means following even the missteps. It helps that it is very cheap to add, as it is just feature detection.For what purpose I wonder?
It is the standard that all future CPUs from AMD and Intel will implement.I stand corrected. Since this is true, what advantage does AVX10.2 have?
To put an end to the endless list of AVX512 subsets.For what purpose I wonder?
Cuz it's the latest SIMD ISA.For what purpose I wonder?
Ok, so semi-custom design costs will have to be amortised over 2-3 years rather than 7-8, on fairly new processes too - I don't know exactly how much it costs, but it should be pretty expensive, doubt it's less than $100 mln, more likely a few hundred millions: if you sell just 4-6 mln units then it's a big %.Oh that is the cute part, they can release upgraded versions every 2-3 years instead of 7.
Since y'know, it is a PC.
Selling cheap gaming PC is competing directly with OEMs that make Microsoft far more money from software sales.What?
Nah it won't be and in any case a lot more expensive than PS6, which is the primary market it will compete in.Well it is 36GB, and that is still way cheaper than the equiv desktop PC when it launches.
APX is another story. I think AMD should have full support of APX NO LATER than ZEN 7 (Really should be in Zen 6 IMO).
The Magnus SoC die can last 7 years just fine, just swap the GMD with a shiny new one.Ok, so semi-custom design costs will have to be amortised over 2-3 years rather than 7-8, on fairly new processes too - I don't know exactly how much it costs, but it should be pretty expensive, doubt it's less than $100 mln, more likely a few hundred millions: if you sell just 4-6 mln units then it's a big %.
Well it will likely lack all the freedoms of a DIY PC.Selling cheap gaming PC is competing directly with OEMs that make Microsoft far more money from software sales.
If you wanna play PC games and do (some) PC stuff, PS6 is not an option.Nah it won't be and in any case a lot more expensive than PS6, which is the primary market it will compete in.
It is Z7 with the full new featureset yes. Maybe by Z9 they can have a proper new AArch64 style clean sheet ISA.Zen6 seems highly unlikely to me, I'd love to be pleasantly surprised.
So you're thinking that Intel would also go along with that new ISA do-over?It is Z7 with the full new featureset yes. Maybe by Z9 they can have a proper new AArch64 style clean sheet ISA.
Depends on Unified core plans, that is ~Z8 time so could get interesting.So you're thinking that Intel would also go along with that new ISA do-over?
It's very unrealistic yes. Given APX was mostly spearhead by Intel and only finalized in the late 2024, APX support in Zen 6 seems very unlikely, the RTL would've been long done by then. CPU core roadmap from FAD '25 would've mentioned this as well.Zen6 seems highly unlikely to me, I'd love to be pleasantly surprised.