Question Zen 6 Speculation Thread

Page 212 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Fjodor2001

Diamond Member
Feb 6, 2010
4,158
564
126
Wasn't that one of the reasons Intel gave up on AVX512 on Alder Lake? You could never had AVX512 and E cores enabled at the same time iirc.
So do we even have a problem here then, or is it just a theoretical one? If Intel also is using the same ISA on E and P cores in practice, then this whole ”AVX512 is only available on one of the core types” problem is moot.
 
Last edited:

marees

Golden Member
Apr 28, 2024
1,301
1,865
96
It should be 2nm. Because unlike xbox which releases next PS6 could easily be 2028. By then 2nm prices would have stabilized

Plus Sony always has some custom features which might not make it in time to RDNA 5
I think the PS6 handheld will be on 3nm to save costs
But box console I would expect to be on 2nm (so that Sony can max frequencies as they usually do)
 
  • Like
Reactions: Tlh97 and Mopetar

StefanR5R

Elite Member
Dec 10, 2016
6,580
10,355
136
Perhaps it is possible for Windows to take topology hints from chipset drivers that would push storage I/O operations to the LPE cores that are located on the I/O die? It makes no sense to saturate the CCD link with disk I/O traffic that isn't necessarily needed by any of those cores. If the file system is doing housekeeping, or an AV scanner or auditing process is banging away at the SSD, why drag that stuff into the CCD? Even the LPE cores will be plenty fast enough to keep up with disk IO without pushing too high in the VF curve. Same thing with network I/O.
This would need a very deep understanding by the Scheduler of what a thread actually does - at runtime.
I/O wait times of a thread should already be known to the kernel.
Might not work as intended though if the respective application has got separate I/O initiator and data processing threads.
Shouldn't really matter. I/O requests are made to the OS anyway in most situations. The OS should have perfect understanding of what is a disk I/O request and what isn't.

To a certain extent, this is similar to what Apple was doing with the T1/T2 coprocessor on their PCs and laptops, functions that have now been migrated into the M series processors...
I was thinking of a multithreaded AV scanner: While it may be desired to put its computational threads on slow energy-saver cores, the OS might miss these threads if the OS only watches the I/O syscalls or I/O locks.

Edit,
Have we considered the possibility of OS Restricted cores? In other words, some cores are only available for OS internal process usage. They don't need big FPUs or AVX2 and beyond, just mainly energy efficient integer and I/O grunt.
This could address battery drainage in connected standby,¹ but not by semi-idle userland (media playback, videoconferencing, web browser bloat...).

¹) And even this is not a trivial scenario. Does some hashing/ encryption/ decryption/ compression/ decompression have to be done in such modes? If yes, should the energy-saver cores have accelerators, or should the high-power core complexes be woken up for that?
 
Last edited:

LightningZ71

Platinum Member
Mar 10, 2017
2,370
2,990
136
My idea is more that the OS cores should be minimum spec for what the OS would need. Having support for the instructions needed for those specific types of activities would be in line with that idea.

As for "semi-idle" with userland running, you can still have a core or two that are full spec to support userland, but there's a lot of background stuff that Windows does all the time, and there are activities that it performs on behalf of userland processes. Here's the crazy part, if we're making these cores not fully compliant with the rest of them with respect to instruction support, there's no reason that they have to even use the same base ISA. They could be RiscV or ARM. It just has to be able to perform the needed function. It's just likely easier to manage the code if it's still x86.
 

StefanR5R

Elite Member
Dec 10, 2016
6,580
10,355
136
The point of extra power-saver cores would be that there are periods during which the regular cores and associated portions of uncore can be powered off for a while.

As long as there is at least one regular core which could not be powered down, all of the CPU-nonintensive stuff of the operating system can be run on that core.

In other words: Would power-saver cores which only the OS itself can make use of actually save any noteworthy amount of power at all?
 

LightningZ71

Platinum Member
Mar 10, 2017
2,370
2,990
136
On their own, in a idle-but-doing-userland-stuff-too no. For ANY of this to work, there needs to be a "low power island" where LP cores live, where they are all etched with high density, low leakage transistors and have a low power uncore that they share. This has to be separate from the high performance core complex, be it on a different die, or a different section of the chip itself. So, for AMD's case, there can be a CCD that's etched for maximum performance, power be dXXXXd, and the IO die can have a little section of cores for the OS and a userland core for anything important that can be used for those functions.
 

Saylick

Diamond Member
Sep 10, 2012
3,938
9,168
136
Looks like Playstation may also use chiplets:

View attachment 128028
lol, press X to doubt. 160W board power for RTX 4080 levels of raster? With N3P or N3C, that's too far fetched imo, especially since the CPU also eats into that power limit. Even if it's on some form of N2, which is costly and not usual for consoles to be on the bleeding edge node, I don't believe N2 has enough perf/W improvements over N3P to allow 4080 levels of performance in a sub-160W envelope. The 4080 uses 300W under gaming loads, so for the PS6 to somehow magically get 2x the perf/W is just ludicrous. I could possibly believe this if it were 260W, not 160W.
 

511

Diamond Member
Jul 12, 2024
3,230
3,170
106
lol, press X to doubt. 160W board power for RTX 4080 levels of raster? With N3P or N3C, that's too far fetched imo, especially since the CPU also eats into that power limit. Even if it's on some form of N2, which is costly and not usual for consoles to be on the bleeding edge node, I don't believe N2 has enough perf/W improvements over N3P to allow 4080 levels of performance in a sub-160W envelope. The 4080 uses 300W under gaming loads, so for the PS6 to somehow magically get 2x the perf/W is just ludicrous. I could possibly believe this if it were 260W, not 160W.
so Consoles have special optimizations that regular dGPU don't get 4080 was N4 it is N3P/C most likely it's not far fetched considering the Hardware.
 
  • Like
Reactions: Tlh97

Saylick

Diamond Member
Sep 10, 2012
3,938
9,168
136
so Consoles have special optimizations that regular dGPU don't get 4080 was N4 it is N3P/C most likely it's not far fetched considering the Hardware.
Sure, if you want to compare raster performance on the PS6 with FSR4/PSSR v2 enabled against a 4080 without DLSS.
 

511

Diamond Member
Jul 12, 2024
3,230
3,170
106
Sure, if you want to compare raster performance on the PS6 with FSR4/PSSR v2 enabled against a 4080 without DLSS.
don't forget we will get better hardware in RDNA5 it's not that difficult also consider a node shrink wait a second i take that back i didn't read carefully he said board power as in he entire SoC power yeah you are right it's bull****
 
  • Like
Reactions: Tlh97

yottabit

Golden Member
Jun 5, 2008
1,630
759
146
I’d love to see LP / LPE cores become a standard thing for x86 even on desktop. Would have loved to see Microsoft push that requirement instead of wasting sand on NPU’s

I’d leave the task handling to the engineers but it would be great if just leaving my computer idle could drop package power to just a couple watts or less

Android, iOS and MacOS are able to take advantage of LPE cores but… I don’t really see it happening for Windows with the current trajectory. It might even be better to handle it all in hardware and keep those cores invisible to the OS than to rely on Windows scheduling
 

511

Diamond Member
Jul 12, 2024
3,230
3,170
106
I’d love to see LP / LPE cores become a standard thing for x86 even on desktop. Would have loved to see Microsoft push that requirement instead of wasting sand on NPU’s

I’d leave the task handling to the engineers but it would be great if just leaving my computer idle could drop package power to just a couple watts or less

Android, iOS and MacOS are able to take advantage of LPE cores but… I don’t really see it happening for Windows with the current trajectory. It might even be better to handle it all in hardware and keep those cores invisible to the OS than to rely on Windows scheduling
It's happening with Nova Lake and Zen6
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,158
564
126
Also if you are familiar with Linux kernel code, then it would be polite to point to specific parts that you are referring to, as this is enormous repository so not everyone can know how to navigate it. Otherwise you give off an impression, you are just throwing links around to appear smart.
Nah, it's only ~40M lines of code and you've got the whole weekend to investigate it. Also, don't forget to fill in the TPS report. ;)
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,158
564
126
No, but seriously it was a a few years ago since I last poked around in the Linux kernel and driver code, and even then I was not involved in scheduler code specifically. I could probably dig out code related to how AVX512 is handled, but it would likely take some time (unless I got lucky and found it quickly). And I intend to spend the weekend doing other stuff. :)

However, I found this interesting related research paper suggesting a method to deal with the frequency reduction that occurs when executing AVX512 instructions. More specifically on CPUs with HT, where one thread executes AVX512 instructions and slows down the CPU frequency, the frequency reduction affects the sibling thread on that core negatively even though it does not execute AVX512 instructions. See this paper:


Basically they divide the cores into those that execute AVX512 instructions vs those that don't. They do this even if all cores actually support AVX512 or not as I understand it. Then they migrate the threads between cores as needed using a method that they are proposing.

It's too much to describe it all here, but check out section 3 which describes it in more detail. See also the Abstract in [1] below.

[1] "Automatic Core Specialization for AVX-512 Applications", Abstract:
"Advanced Vector Extension (AVX) instructions operate on wide SIMD vectors. Due to the resulting high power consumption, recent Intel processors reduce their frequency when executing complex AVX2 and AVX-512 instructions.Following non-AVX code is slowed down by this frequency reduction in two situations: When it executes on the sibling hyperthread of the same core in parallel or – as restoring the non-AVX frequency is delayed – when it directly follows the AVX2/AVX-512 code. As a result, heterogeneous workloads consisting of AVX-512 and non-AVX code are frequently slowed down by 10% on average.

In this work, we describe a method to mitigate the frequency reduction slowdown for workloads involving AVX-512 instructions in both situations. Our approach employs core specialization and partitions the CPU cores into AVX-512 cores and non-AVX-512 cores, and only the former execute AVX-512 instructions so that the impact of potential frequency reductions is limited to those cores. To migrate threads to AVX-512 cores, we configure the non-AVX-512 cores to raise an exception when executing AVX-512 instructions. We use a heuristic to determine when to migrate threads back to non-AVX-512 cores. Our approach is able to reduce the frequency reduction overhead by 70% for an assortment of common benchmarks."
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,854
4,829
136
lol, press X to doubt. 160W board power for RTX 4080 levels of raster? With N3P or N3C, that's too far fetched imo, especially since the CPU also eats into that power limit. Even if it's on some form of N2, which is costly and not usual for consoles to be on the bleeding edge node, I don't believe N2 has enough perf/W improvements over N3P to allow 4080 levels of performance in a sub-160W envelope. The 4080 uses 300W under gaming loads, so for the PS6 to somehow magically get 2x the perf/W is just ludicrous. I could possibly believe this if it were 260W, not 160W.
At same perf N2 has 40-45% lower power than N4, so 300W will shrink to 165-180W,
that s what can be expected from a shinked stock 9070XT, and eventualy 140-150W for a 9070, likewise a 24C Zen 6 should have about 70-75% higher throughput than the 9950X at same power assuming 11% MT IPC uplift.
 

Abwx

Lifer
Apr 2, 2011
11,854
4,829
136
160W should be enough for a N2P shrinked 56 CUs 9070 + CPU processing power.

The average for a 9070 is 220W, with N2P this yield 115W, the remaining 45W out of 160W are enough for the CPU, and whatever the uarch since they, for sure, wont design something less efficient than current gens.
 
  • Like
Reactions: marees

511

Diamond Member
Jul 12, 2024
3,230
3,170
106
160W should be enough for a N2P shrinked 56 CUs 9070 + CPU processing power.

The average for a 9070 is 220W, with N2P this yield 115W, the remaining 45W out of 160W are enough for the CPU, and whatever the uarch since they, for sure, wont design something less efficient than current gens.
The GPU is N3P and the CPU N2/N2P from MLID Rumors.
 
Last edited: