Question Zen 6 Speculation Thread

511 · 2026-04-05T09:03:42-0400

branch_suggestion said:
Depends on Unified core plans, that is ~Z8 time so could get interesting.

Unified is Zen 7 time for client.

Zen 6 ISA is final it's APX 10.1 alongside FRED

branch_suggestion · 2026-04-05T09:11:43-0400

511 said:
Unified is Zen 7 time for client.

Zen 6 ISA is final it's APX 10.1 alongside FRED

Uhh no, it is firmly a year after at minimum.

511 · 2026-04-05T09:18:12-0400

branch_suggestion said:
Uhh no, it is firmly a year after at minimum.

unified? nope it's around Zen 7 timeline not firmly a year it would like max a quarter

csbin · 2026-04-05T10:44:22-0400

511 said:
Unified is Zen 7 time for client.

Zen 6 ISA is final it's APX 10.1 alongside FRED

‘novalake’
Intel Nova Lake CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, AES, PREFETCHW, PCLMUL, RDRND, XSAVE, XSAVEC, XSAVES, XSAVEOPT, FSGSBASE, PTWRITE, RDPID, SGX, GFNI-SSE, CLWB, MOVDIRI, MOVDIR64B, WAITPKG, ADCX, AVX, AVX2, BMI, BMI2, F16C, FMA, LZCNT, PCONFIG, PKU, VAES, VPCLMULQDQ, SERIALIZE, HRESET, AVX-VNNI, UINTR, AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD, AVXVNNIINT16, SHA512, SM3, SM4, PREFETCHI, APX_F, AVX10.1, AVX10.2 and MOVRS instruction set support.

‘znver6’
AMD Family 1ah core based CPUs with x86-64 instruction set support. (This supersets BMI, BMI2, CLWB, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED, MWAITX, SHA, CLZERO, AES, PCLMUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, RDPID, WBNOINVD, PKU, VPCLMULQDQ, VAES, AVX512F, AVX512DQ, AVX512IFMA, AVX512CD, AVX512BW, AVX512VL, AVX512BF16, AVX512VBMI, AVX512VBMI2, AVX512VNNI, AVX512BITALG, AVX512VPOPCNTDQ, GFNI, AVXVNNI, MOVDIRI, MOVDIR64B, AVX512VP2INTERSECT, PREFETCHI, AVXVNNIINT8, AVXIFMA, AVX512FP16, AVXNECONVERT, AVX512BMM and 64-bit instruction set extensions.)

OneEng2 · 2026-04-05T11:42:34-0400

Tuna-Fish said:
What's left is just AVX-512 with a better feature detection scheme bolted on top.

That's kinda what I was thinking as well.

x86-64 would greatly benefit from APX though. The GROSSLY outdated 16 GPR's needed to go decades ago. Can't believe this has been around so long!

adroc_thurston · 2026-04-05T11:44:59-0400

OneEng2 said:
That's kinda what I was thinking as well.

x86-64 would greatly benefit from APX though. The GROSSLY outdated 16 GPR's needed to go decades ago. Can't believe this has been around so long!

32 GPRs make compiler engineer lives a bit easier and that's it.

AMDK11 · 2026-04-05T11:56:57-0400

So where did you get the idea that Zen6 was supposed to have APX?

What source suggested this?

MS_AT · 2026-04-05T12:21:11-0400

adroc_thurston said:
32 GPRs make compiler engineer lives a bit easier and that's it.

Well, I am not compiler engineer and I would love to have 32 GPRs when writing my SIMD code, to be able to hold more pointers for SIMD loads in registers, loop control variables etc. With current 16 it takes some gymnastics to avoid spills on that.

Not to mention with 32, you can afford to spend some of them on quality of life improvements like having frame pointers always on, instead of always off.

511 · 2026-04-05T12:41:31-0400

MS_AT said:
Well, I am not compiler engineer and I would love to have 32 GPRs when writing my SIMD code, to be able to hold more pointers for SIMD loads in registers, loop control variables etc. With current 16 it takes some gymnastics to avoid spills on that.

Not to mention with 32, you can afford to spend some of them on quality of life improvements like having frame pointers always on, instead of always off.

Nova Lake buyer confirmed

Tachyonism · 2026-04-05T13:34:50-0400

Please stop with the AI slop.

MS_AT · 2026-04-05T13:38:01-0400

511 said:
Nova Lake buyer confirmed

Makes no economical sense for me to buy that unless it will smoke Zen6 out of the water😉

AMDK11 said:
From what I've checked, APX_F doesn't necessarily mean full 32GPR support. It may be partial support.

You know official docs explain that just fine. The AI model you have used is hallucinating, pretty badly...

Thibsie · 2026-04-05T13:39:26-0400

branch_suggestion said:
Sony ain't releasing PC ports ever again.

They will lose at least one sale then (mine).

poke01 · 2026-04-05T15:01:03-0400

MS_AT said:
Well, I am not compiler engineer and I would love to have 32 GPRs when writing my SIMD code, to be able to hold more pointers for SIMD loads in registers, loop control variables etc. With current 16 it takes some gymnastics to avoid spills on that.

Not to mention with 32, you can afford to spend some of them on quality of life improvements like having frame pointers always on, instead of always off.

when did you realise you needed more registers ie in which year?

Serious question.

OneEng2 · 2026-04-05T15:21:35-0400

poke01 said:
when did you realise you needed more registers ie in which year?

Serious question.

1991 when I took my Assembly level programming class in college. Looks like we will get them by 2031 🙂

Joe NYC · 2026-04-05T16:41:41-0400

Schmide said:
Considering everything is held in the physical register files that are 224+, it's really comes down to doubling the register alias table and implementing the decode of the APX instructions. Seems pretty probable.

I also think it is doable, but is it probable?

Doable as a microcode update. But it may be a bit risky, AMD will let this one go until Zen 7

MS_AT · 2026-04-05T16:43:10-0400

poke01 said:
when did you realise you needed more registers ie in which year?

Serious question.

After I got Zen4, ported my homebrew FFT code to use Radix8 kernel (what itself was possible because AVX512 gives you 32 regs). So after Nov 2022. Cannot tell you exactly what day, but tuning the code I spotted clang was spilling a lot of GPRs to stack (let's say it was precomputing strided addresses ahead of time partially spilling them to stack and later reading them back). When I rewrote the loop to force it to compute the addresses as it goes then I got some perf back.

If it had more GPRs available then it would be able to keep them all in the GPRs together with the loop control and other aux stuff.

So yes, in that instance it was possible to work around the problem, but having more GPRs would make my life simpler😉

In general having more of them, from software point of view, never hurts. Only when you think about HW implementation this becoming a game of trade-offs😉

adroc_thurston · 2026-04-05T16:43:16-0400

Joe NYC said:
I also think it is doable, but is it probable?

Doable as a microcode update. But it may be a bit risky, AMD will let this one go until Zen 7

Zen6 doesn't have APX or AVX10.2.
You guys need to stop doing this fantasy stuff.

Schmide · 2026-04-05T17:03:12-0400

OneEng2 said:
1991 when I took my Assembly level programming class in college. Looks like we will get them by 2031 🙂

So you never evolved since then?

Although there may be some routines that can do register magic and catch them all.

Most operations do not need that many and can actually just use and forget. (register renaming will take care of it) Moreover the fact that you can address memory for single use makes use and forget even easier. This is, in my opinion, the main advantage of x86.

poke01 · 2026-04-05T17:08:28-0400

MS_AT said:
After I got Zen4, ported my homebrew FFT code to use Radix8 kernel (what itself was possible because AVX512 gives you 32 regs). So after Nov 2022. Cannot tell you exactly what day, but tuning the code I spotted clang was spilling a lot of GPRs to stack (let's say it was precomputing strided addresses ahead of time partially spilling them to stack and later reading them back). When I rewrote the loop to force it to compute the addresses as it goes then I got some perf back.

If it had more GPRs available then it would be able to keep them all in the GPRs together with the loop control and other aux stuff.

So yes, in that instance it was possible to work around the problem, but having more GPRs would make my life simpler😉

In general having more of them, from software point of view, never hurts. Only when you think about HW implementation this becoming a game of trade-offs😉

Good write up, looks like a future Zen cpu with APX will make your life easier. Also it’s fine if you don’t remember the day lol 😂

poke01 · 2026-04-05T17:22:51-0400

adroc_thurston said:
Zen6 doesn't have APX or AVX10.2.
You guys need to stop doing this fantasy stuff.

Great now we are at a point where people are saying new ISA extensions can be delivered via microcode updates.

Schmide · 2026-04-05T18:50:38-0400

MS_AT said:
After I got Zen4, ported my homebrew FFT code to use Radix8 kernel (what itself was possible because AVX512 gives you 32 regs). So after Nov 2022. Cannot tell you exactly what day, but tuning the code I spotted clang was spilling a lot of GPRs to stack (let's say it was precomputing strided addresses ahead of time partially spilling them to stack and later reading them back). When I rewrote the loop to force it to compute the addresses as it goes then I got some perf back.

If it had more GPRs available then it would be able to keep them all in the GPRs together with the loop control and other aux stuff.

So yes, in that instance it was possible to work around the problem, but having more GPRs would make my life simpler😉

In general having more of them, from software point of view, never hurts. Only when you think about HW implementation this becoming a game of trade-offs😉

So there is another ATer smashing their face into FFT programming. I don't force all my calculations into registers. I use a ping pong buffer for those radix. I may go back and code it with just registers, but my bottlenecks really lie below.

radix16-512 no issues.

1024 on. The thing that's kicking my ass is way and spinlock (barrier) management. If you're grabbing data from greater than 64k strides and have twiddle and spinlock each in one way, you can easily blow your way budget.

zen 4 and 5 this isn't an issue as you have 12 - 16 ways, but you have to code for the least common denominator.

OneEng2 · 2026-04-05T18:54:35-0400

Joe NYC said:
I also think it is doable, but is it probable?

Doable as a microcode update. But it may be a bit risky, AMD will let this one go until Zen 7

I think you are likely correct.... but I kinda wish you weren't 😉.

Schmide said:
So you never evolved since then?

Although there may be some routines that can do register magic and catch them all.

Most operations do not need that many and can actually just use and forget. (register renaming will take care of it) Moreover the fact that you can address memory for single use makes use and forget even easier. This is, in my opinion, the main advantage of x86.

Well, kind of.

I haven't written much asm for a very long time. Write C as little as possible, and generally stay at C++ for most of the time (with a little smattering of Flutter Dart in there just to really confuse the soul).

Even in the embedded stuff I oversee, generally this level of code is only in the startup code on the micro (register setup of the chips pins and interfaces) .... and even this is now mostly done by configuration tools provided by the micro OEM (kids have it so easy these days) 🙂.

Schmide · 2026-04-05T18:57:48-0400

OneEng2 said:
I think you are likely correct.... but I kinda wish you weren't 😉.

Well, kind of.

I haven't written much asm for a very long time. Write C as little as possible, and generally stay at C++ for most of the time (with a little smattering of Flutter Dart in there just to really confuse the soul).

Even in the embedded stuff I oversee, generally this level of code is only in the startup code on the micro (register setup of the chips pins and interfaces) .... and even this is now mostly done by configuration tools provided by the micro OEM (kids have it so easy these days) 🙂.

I don't write assembly anymore, intrinsics made that a thing of the past, but I do look at the disassembly of my code. You can basically write assembly level code with tight c/c++.

511 · 2026-04-06T00:00:26-0400

poke01 said:
Great now we are at a point where people are saying new ISA extensions can be delivered via microcode updates.

What who said that lmao

Kaffeekenan · 2026-04-06T06:00:59-0400

adroc_thurston said:
Yes for IPC.
CYC is like 9% SIR. Embarassing.

It's not a big delta; Zen6 ain't no massive tock. But it counts.

9% is not that much. But do you know if Zen 6 is better than 9%?

Question Zen 6 Speculation Thread

Diamond Member

Senior member

Diamond Member

Senior member

Golden Member

Diamond Member

Senior member

Senior member

Diamond Member

Member

Senior member

Golden Member

Diamond Member

Golden Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Member