Unified is Zen 7 time for client.Depends on Unified core plans, that is ~Z8 time so could get interesting.
Zen 6 ISA is final it's APX 10.1 alongside FRED
Unified is Zen 7 time for client.Depends on Unified core plans, that is ~Z8 time so could get interesting.
Uhh no, it is firmly a year after at minimum.Unified is Zen 7 time for client.
Zen 6 ISA is final it's APX 10.1 alongside FRED
unified? nope it's around Zen 7 timeline not firmly a year it would like max a quarterUhh no, it is firmly a year after at minimum.
Unified is Zen 7 time for client.
Zen 6 ISA is final it's APX 10.1 alongside FRED
That's kinda what I was thinking as well.What's left is just AVX-512 with a better feature detection scheme bolted on top.
32 GPRs make compiler engineer lives a bit easier and that's it.That's kinda what I was thinking as well.
x86-64 would greatly benefit from APX though. The GROSSLY outdated 16 GPR's needed to go decades ago. Can't believe this has been around so long!
Well, I am not compiler engineer and I would love to have 32 GPRs when writing my SIMD code, to be able to hold more pointers for SIMD loads in registers, loop control variables etc. With current 16 it takes some gymnastics to avoid spills on that.32 GPRs make compiler engineer lives a bit easier and that's it.
Nova Lake buyer confirmedWell, I am not compiler engineer and I would love to have 32 GPRs when writing my SIMD code, to be able to hold more pointers for SIMD loads in registers, loop control variables etc. With current 16 it takes some gymnastics to avoid spills on that.
Not to mention with 32, you can afford to spend some of them on quality of life improvements like having frame pointers always on, instead of always off.
Makes no economical sense for me to buy that unless it will smoke Zen6 out of the water😉Nova Lake buyer confirmed
You know official docs explain that just fine. The AI model you have used is hallucinating, pretty badly...From what I've checked, APX_F doesn't necessarily mean full 32GPR support. It may be partial support.
Sony ain't releasing PC ports ever again.
when did you realise you needed more registers ie in which year?Well, I am not compiler engineer and I would love to have 32 GPRs when writing my SIMD code, to be able to hold more pointers for SIMD loads in registers, loop control variables etc. With current 16 it takes some gymnastics to avoid spills on that.
Not to mention with 32, you can afford to spend some of them on quality of life improvements like having frame pointers always on, instead of always off.
1991 when I took my Assembly level programming class in college. Looks like we will get them by 2031 🙂when did you realise you needed more registers ie in which year?
Serious question.
Considering everything is held in the physical register files that are 224+, it's really comes down to doubling the register alias table and implementing the decode of the APX instructions. Seems pretty probable.
After I got Zen4, ported my homebrew FFT code to use Radix8 kernel (what itself was possible because AVX512 gives you 32 regs). So after Nov 2022. Cannot tell you exactly what day, but tuning the code I spotted clang was spilling a lot of GPRs to stack (let's say it was precomputing strided addresses ahead of time partially spilling them to stack and later reading them back). When I rewrote the loop to force it to compute the addresses as it goes then I got some perf back.when did you realise you needed more registers ie in which year?
Serious question.
Zen6 doesn't have APX or AVX10.2.I also think it is doable, but is it probable?
Doable as a microcode update. But it may be a bit risky, AMD will let this one go until Zen 7
So you never evolved since then?1991 when I took my Assembly level programming class in college. Looks like we will get them by 2031 🙂
Good write up, looks like a future Zen cpu with APX will make your life easier. Also it’s fine if you don’t remember the day lol 😂After I got Zen4, ported my homebrew FFT code to use Radix8 kernel (what itself was possible because AVX512 gives you 32 regs). So after Nov 2022. Cannot tell you exactly what day, but tuning the code I spotted clang was spilling a lot of GPRs to stack (let's say it was precomputing strided addresses ahead of time partially spilling them to stack and later reading them back). When I rewrote the loop to force it to compute the addresses as it goes then I got some perf back.
If it had more GPRs available then it would be able to keep them all in the GPRs together with the loop control and other aux stuff.
So yes, in that instance it was possible to work around the problem, but having more GPRs would make my life simpler😉
In general having more of them, from software point of view, never hurts. Only when you think about HW implementation this becoming a game of trade-offs😉
Great now we are at a point where people are saying new ISA extensions can be delivered via microcode updates.Zen6 doesn't have APX or AVX10.2.
You guys need to stop doing this fantasy stuff.
After I got Zen4, ported my homebrew FFT code to use Radix8 kernel (what itself was possible because AVX512 gives you 32 regs). So after Nov 2022. Cannot tell you exactly what day, but tuning the code I spotted clang was spilling a lot of GPRs to stack (let's say it was precomputing strided addresses ahead of time partially spilling them to stack and later reading them back). When I rewrote the loop to force it to compute the addresses as it goes then I got some perf back.
If it had more GPRs available then it would be able to keep them all in the GPRs together with the loop control and other aux stuff.
So yes, in that instance it was possible to work around the problem, but having more GPRs would make my life simpler😉
In general having more of them, from software point of view, never hurts. Only when you think about HW implementation this becoming a game of trade-offs😉
I think you are likely correct.... but I kinda wish you weren't 😉.I also think it is doable, but is it probable?
Doable as a microcode update. But it may be a bit risky, AMD will let this one go until Zen 7
Well, kind of.So you never evolved since then?
Although there may be some routines that can do register magic and catch them all.
Most operations do not need that many and can actually just use and forget. (register renaming will take care of it) Moreover the fact that you can address memory for single use makes use and forget even easier. This is, in my opinion, the main advantage of x86.
I don't write assembly anymore, intrinsics made that a thing of the past, but I do look at the disassembly of my code. You can basically write assembly level code with tight c/c++.I think you are likely correct.... but I kinda wish you weren't 😉.
Well, kind of.
I haven't written much asm for a very long time. Write C as little as possible, and generally stay at C++ for most of the time (with a little smattering of Flutter Dart in there just to really confuse the soul).
Even in the embedded stuff I oversee, generally this level of code is only in the startup code on the micro (register setup of the chips pins and interfaces) .... and even this is now mostly done by configuration tools provided by the micro OEM (kids have it so easy these days) 🙂.
What who said that lmaoGreat now we are at a point where people are saying new ISA extensions can be delivered via microcode updates.