Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

Ajay · May 18, 2024

IEC said:
That's setting yourself up for disappointment.

I don't expect both a 40% IPC gain (already very high) and a fMax boost in the same generation. If they manage to pull a rabbit out of a hat and complete a magic trick to do that, Zen 5 will be unobtanium.

Eh, all that needs to happen is for *one* benchmark to increase by more than 30% and it's winner, winner, chicken dinner. All the leakers will simultaneously orgasm.

Markfw · May 18, 2024

Ajay said:
Eh, all that needs to happen is for *one* benchmark to increase by more than 30% and it's winner, winner, chicken dinner. All the leakers will simultaneously orgasm.

The leaker that said 39% said it was in one specific benchmark. So ????

cherullo · May 18, 2024

A few months back, an AMD patent about a op-cache that could spill it's contents into the L1I made the rounds in the forums:

Method and apparatus for virtualizing the micro-op cache - https://patents.justia.com/patent/11586441

Rather than dropping the evicted micro-operations, the evicted micro-operations are written to the conventional cache subsystem.

There is a "pre-decode cache" which stores whether a cache line stores instructions or uops. There may be distinct pre-decode caches covering each cache level (L1I, L2, L3) or a single, global cache.
Since instructions are usually denser than uops, the patent lists some possibilities upon eviction of uops from the uop-cache: compression of immediate values in the uop cache entry, usage of two cache lines (two ways) simultaneously or just discard the decoded uops.
When uops are evicted from the L3, the patent doesn't say whether they should be written to memory. So this is volatile and not Denver-like.

While I was looking for it, I also found this:
Processor with multiple op cache pipelines - https://patents.justia.com/patent/11907126
Basically, the op-cache would be able to retrieve uops from two different addresses on the same clock. One possibility would be to fetch the instructions up to a branch and the instructions following the target address.
AFAIK current Zens can't do this (even for branches not taken), so branches do reduce the instruction throughput out of the uop-cache. Could this be what was called Zero-Bubble Branch Predictor in the Zen5 slides?
Another interesting possibility that Zen4 can't do is to fetch uops from different threads in the same cycle. This behavior can be steered by external policies (QoS) and how full each downstream uop queue is. Could make SMT even more efficient.

So, did we get any more information about it? Maybe LLVM or Linux kernel patches with related info?
Do you guys think that any of this made it into Zen5?

Markfw · May 18, 2024

cherullo said:
A few months back, an AMD patent about a op-cache that could spill it's contents into the L1I made the rounds in the forums:

Method and apparatus for virtualizing the micro-op cache - https://patents.justia.com/patent/11586441

There is a "pre-decode cache" which stores whether a cache line stores instructions or uops. There may be distinct pre-decode caches covering each cache level (L1I, L2, L3) or a single, global cache.
Since instructions are usually denser than uops, the patent lists some possibilities upon eviction of uops from the uop-cache: compression of immediate values in the uop cache entry, usage of two cache lines (two ways) simultaneously or just discard the decoded uops.
When uops are evicted from the L3, the patent doesn't say whether they should be written to memory. So this is volatile and not Denver-like.

While I was looking for it, I also found this:
Processor with multiple op cache pipelines - https://patents.justia.com/patent/11907126
Basically, the op-cache would be able to retrieve uops from two different addresses on the same clock. One possibility would be to fetch the instructions up to a branch and the instructions following the target address.
AFAIK current Zens can't do this (even for branches not taken), so branches do reduce the instruction throughput out of the uop-cache. Could this be what was called Zero-Bubble Branch Predictor in the Zen5 slides?
Another interesting possibility that Zen4 can't do is to fetch uops from different threads in the same cycle. This behavior can be steered by external policies (QoS) and how full each downstream uop queue is. Could make SMT even more efficient.

So, did we get any more information about it? Maybe LLVM or Linux kernel patches with related info?
Do you guys think that any of this made it into Zen5?

Zen 5 was sampling months ago, no way any of that is in it. Zen 6, maybe even Zen 7 is where it might be.

cherullo · May 18, 2024

These patents are from late 2020. Too soon?

Markfw · May 18, 2024

cherullo said:
These patents are from late 2020. Too soon?

You said a few months back. 2020 ? Then its a maybe on Zen 5. I will wait a few weeks to see the real info.

Kepler_L2 · May 18, 2024

cherullo said:
These patents are from late 2020. Too soon?

There are patents from 2022 in Zen5. Definitely not too soon.

del42sa · May 19, 2024

Kepler_L2 said:
There are patents from 2022 in Zen5. Definitely not too soon.

too late for ZEN4

adroc_thurston · May 19, 2024

del42sa said:
too late for ZEN4

?

biostud · May 19, 2024

With the price cuts of zen4 processors, they are probably sure that they won't sell as well once Zen5 is launched.

adroc_thurston · May 19, 2024

biostud said:
With the price cuts of zen4 processors, they are probably sure that they won't sell as well once Zen5 is launched.

Yeah, not contained to desktop either.

branch_suggestion · May 19, 2024

biostud said:
With the price cuts of zen4 processors, they are probably sure that they won't sell as well once Zen5 is launched.

Every current AMD client part will go LOD before 2026, possibly before H2 2025.

Fjodor2001 · May 19, 2024

branch_suggestion said:
Every current AMD client part will go LOD before 2026, possibly before H2 2025.

LOD=? in this context?

biostud · May 19, 2024

adroc_thurston said:
Yeah, not contained to desktop either.

Since 12 Gen core laptops are still being sold sold I think we can find Zen 4 laptops for a foreseeable future.

branch_suggestion · May 19, 2024

Fjodor2001 said:
LOD=? in this context?

Last Order Date.

The chart here aligns more with software but hardware operates similarly.
For example, LOD would be the date AMD stops ordering wafers of a specific product from TSMC, usually when customers stop requesting it or when a new product has obsoleted it.

leoneazzurro · May 19, 2024

biostud said:
Since 12 Gen core laptops are still being sold sold I think we can find Zen 4 laptops for a foreseeable future.

True, I just got a MSI katana 15 with Alder Lake and a 4070 for 1100€, these machines and the Zen4 are still very capable so with the right pricing (and the fact there will be reduction of the costs of N5 and N4 processes with time, they will easily occupy the lower end/mainstream part of the market.

Ajay · May 19, 2024

Markfw said:
The leaker that said 39% said it was in one specific benchmark. So ????

So that one person, if that's the number - will prove that he actually has an excellent source compared to the rest. The rest will still boast about getting it right because they have held 8 different positions on desktop Zen5 cpus and will claim victory - aka, that one piece of spaghetti actually did stick to the wall.

Fjodor2001 · May 19, 2024

Zen5 is already old news. Leaks about Zen6 starting to appear now:

AMD Zen 6 To Feature Three CCD Configurations: 8, 16, & Up To 32 Cores, Zen 5C Packs 16 Cores In Single CCX

AMD's next-gen Zen 5 and Zen 6 core configurations have allegedly been revealed with the latter featuring up to 32 cores per CCD.

wccftech.com

Timorous · May 19, 2024

Fjodor2001 said:
Zen5 is already old news. Leaks about Zen6 starting to appear now:

AMD Zen 6 To Feature Three CCD Configurations: 8, 16, & Up To 32 Cores, Zen 5C Packs 16 Cores In Single CCX

AMD's next-gen Zen 5 and Zen 6 core configurations have allegedly been revealed with the latter featuring up to 32 cores per CCD.

wccftech.com

View attachment 99257
View attachment 99258

8c client, 16c server, 32c 6c dense would be my guess.

Fjodor2001 · May 19, 2024

Timorous said:
8c client, 16c server

Why use 2x8c instead of 1x16c, for the 16c client CPU variants of Zen6?

Assuming there will be 16c CCD of Zen6 available anyway, why not use them on client too? Also opens up for 2x16c on client CPUs, and 1x16c + 1x8c.

Timorous · May 19, 2024

Fjodor2001 said:
Why use 2x8c instead of 1x16c, for the 16c client CPU variants of Zen6?

Assuming there will be 16c CCD of Zen6 available anyway, why not use them on client too? Also opens up for 2x16c on client CPUs, and 1x16c + 1x8c.

Clockspeed, yields, node? 8c could be on an older node than 16c and 32c so would be quite a bit cheaper. V-cache compatibility.

Those are just off of the top of my head.

kir123 · May 19, 2024

I think 3D V-Cache is everywhere.
Zen6 cores on tsmc N3e/p/s on top.
L3 cache and InFO on N4p/e at the bottom.
Thus, 16/32s CCD is very easy to obtain

Thunder 57 · May 19, 2024

Fjodor2001 said:
Why use 2x8c instead of 1x16c, for the 16c client CPU variants of Zen6?

Assuming there will be 16c CCD of Zen6 available anyway, why not use them on client too? Also opens up for 2x16c on client CPUs, and 1x16c + 1x8c.

Maybe they can only do 16 cores with dense variants which would hurt performance in desktop models.

StefanR5R · May 19, 2024

Apparently it's not just 8c/16c/32c CCDs, but 8c/16c/32c CCXs even.

As for the client CCDs, remember that it has been claimed that Zen 6 desktop will no longer be server derived, but mobile derived.
Regarding client CCXs, the larger and the more complex the last level cache, the higher tends to be its latency.

Also notable is that InstLatX64 claims that EPYC 9006 is on socket SP7.

adroc_thurston · May 19, 2024

Fjodor2001 said:
Why use 2x8c instead of 1x16c, for the 16c client CPU variants of Zen6?

Because 16c is niche.

StefanR5R said:
Apparently it's not just 8c/16c/32c CCDs, but 8c/16c/32c CCXs even.

I don't think the 16c CCD for Venice classic is frozen just yet.
@Kepler_L2 fatwa issued.

kir123 said:
I think 3D V-Cache is everywhere

No lmao.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Lifer

Moderator Emeritus, Elite Member

Member

Moderator Emeritus, Elite Member

Member

Moderator Emeritus, Elite Member

Golden Member

Member

Diamond Member

Lifer

Diamond Member

Senior member

Diamond Member

Lifer

Senior member

Golden Member

Lifer

Diamond Member

Golden Member

Diamond Member

Golden Member

Junior Member

Diamond Member

Elite Member

Diamond Member