Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

AMDK11 · Feb 17, 2024

Tuna-Fish said:
Zen is wider than 4 if you run from the uop cache. The cache can deliver up to 9 uops per clock, but alignment restrictions reduce average throughput. The pipeline after that (crucially, rename) is 6 wide

We are writing about the physical widths of decoders.

naukkis · Feb 17, 2024

majord said:
I think you need to look back at Excavator's front end , including op cache , before being too surprised at Zen's current IPC despite only 4 decoders. Pushing any more IPC out of it , i.e Zen 5, without going wider? , yeah that will start to get interesting.

Zen main L1i-cache is micro-op cache. Decoder width is pretty much irrelevant - when executing code which totally misses quite large MOP it's pretty much guaranted that data caches will miss too. So 4 wide decode is probably way more than needed - core probably won't be even close to reach 1 IPC in those situations. Apple cores are totally different as they don't have micro op cache at all and need decode width equaling their core's execution width. It's more beneficial to use that 4-wide decode more efficiently - to try predecode branches earlier. I do remember that I have seen some speculation that Zen5 tries to do that by combining it's branch target buffer with decoders.

AMDK11 · Feb 17, 2024

So why did Intel add 6 decoders? Couldn't he have done 2 complex + 3 simple instead of 1 complex + 5 simple?

why load ROB as much as 512 when you can get a similar IPC with fewer resources? Could it be poor design?

I hope that we will not have to wait long for the leaks to be verified with the actual situation to compare Zen5 with LionCove.

naukkis · Feb 17, 2024

AMDK11 said:
So why did Intel add 6 decoders? Couldn't he have done 2 complex + 3 simple instead of 1 complex + 5 simple?

Complex instructions have to decoded for compatibility - those instructions when used will destroy performance anyway. Simple instructions are what matter. And they go wider if they can't go smarter. So when BTB instruct to decode not micro-op cached instructions Intel probably misses L1i-cache but after that L2, L3 or memory latency can decode 6-instructions in cycle with high power usage. Instead, if going smarter BTB could be used to fetch and decode not cached instructions before they are needed and cache those mops to be used when BTB hits. To be fair for Intel that's actually partially used scheme for their E-cores which can decode 6 instructions too but 3 for current stream and other 3 for possible branching target. But being able to predecode branches to mop cache is order of magnitude more sophisticated solution - and with that scheme decoding 4 instructions per clock is more than enough - and going only 4 wide decode gives much better energy-efficiency.

itsmydamnation · Feb 17, 2024

AMDK11 said:
So why did Intel add 6 decoders? Couldn't he have done 2 complex + 3 simple instead of 1 complex + 5 simple?

why load ROB as much as 512 when you can get a similar IPC with fewer resources? Could it be poor design?

I hope that we will not have to wait long for the leaks to be verified with the actual situation to compare Zen5 with LionCove.

Intel complex are for microcode instructions + a few exceptions , almost all instructions are "simple"

igor_kavinski · Feb 17, 2024

AMDK11 said:
why load ROB as much as 512 when you can get a similar IPC with fewer resources? Could it be poor design?

Reeks of an unfinished design. It's like they fired the lead or took the project away from one team and handed it to another. Probably from USDC to IDC.

itsmydamnation · Feb 17, 2024

igor_kavinski said:
Reeks of an unfinished design. It's like they fired the lead or took the project away from one team and handed it to another. Probably from USDC to IDC.

its not just one core that intel has been like this,
i would say anything post skylake as been spending big on resources for not much in terms of IPC gains.

igor_kavinski · Feb 17, 2024

itsmydamnation said:
i would say anything post skylake as been spending big on resources for not much in terms of IPC gains.

Cluelessly shooting in the dark have they been? Don't they use simulators to figure out if their idea is worth pursuing?

itsmydamnation · Feb 17, 2024

igor_kavinski said:
Cluelessly shooting in the dark have they been? Don't they use simulators to figure out if their idea is worth pursuing?

remember these cores are designed by 100's of engineers , at that point its about culture and management.

igor_kavinski · Feb 17, 2024

itsmydamnation said:
remember these cores are designed by 100's of engineers , at that point its about culture and management.

Seems that's their problem. Keller said huge teams are basically unmanageable. The intra communication overhead becomes too great.

Nothingness · Feb 17, 2024

igor_kavinski said:
Cluelessly shooting in the dark have they been? Don't they use simulators to figure out if their idea is worth pursuing?

They do. But the thing is that for very long their engineers were very specialized on specifics of blocks. And they also were designing so close to metal that big changes were very difficult to do. I don't know where they stand now but I hope they moved away from that culture.

itsmydamnation · Feb 17, 2024

igor_kavinski said:
Seems that's their problem. Keller said huge teams are basically unmanageable. The intra communication overhead becomes too great.

its more then that , in a high performance team/ culture your "average" engineer delivers above average output , in low performance team/ culture an above average engineer will deliver average output. Especially in fields that require lots of problem solving and perseverance.

In all the original Zen1 fluff AMD put out you can hear them talking about these exact types of things as they had to fight really hard to keep IPC from dropping as the core become more mature/complete.

Goop_reformed · Feb 18, 2024

I have a feeling granite ridge is only about 15% - 20% > zen 4/rpl 🙁

SteinFG · Feb 18, 2024

Goop_reformed said:
I have a feeling granite ridge is only about 15% - 20% > zen 4/rpl 🙁

"only" 🤣

Goop_reformed · Feb 18, 2024

SteinFG said:
"only" 🤣

15% - 20% would be the lowest performance uplift in the history of zen. Zen + doesn't count.
Not sure why you're laughing to be honest.

inf64 · Feb 18, 2024

Goop_reformed said:
I have a feeling granite ridge is only about 15% - 20% > zen 4/rpl 🙁

15-20% of what? Total ST uplift, MT uplift, IPC uplift?

Goop_reformed · Feb 18, 2024

inf64 said:
15-20% of what? Total ST uplift, MT uplift, IPC uplift?

Single thread. I think mlid has really good source this time

igor_kavinski · Feb 18, 2024

It's a monster core and it should be phenomenal in AVX-512.

The only people with a predicament are the ones not on AM5 coz they may get very tempted to switch platforms and it would hurt their savings account.

And 4090 owners coz it's so woefully CPU limited.

SteinFG · Feb 18, 2024

Goop_reformed said:
I have a feeling granite ridge is only about 15% - 20% > zen 4/rpl 🙁

Zen 5 has -5% IPC actually, so temper your expectations!

CakeMonster · Feb 18, 2024

I figure most people here would like to see ST improve, given that currently Intel has the lead in some applications. Who knows if Intel is able to improve further on that... but it seems sensible to me that AMD would have at least wanted to improve ST by now even as they designed Z5 probably 5+ years ago.

AMDK11 · Feb 18, 2024

Goop_reformed said:
I have a feeling granite ridge is only about 15% - 20% > zen 4/rpl 🙁

I think that for Zen 5 an average of +15-20% across the entire spectrum of the IPC growth curve is a reasonable and safe assumption. Of course, AVX512 will have the biggest gains at the end of the curve, as will SunnyCove and GoldenCove, although for the latter AVX512 was ultimately disabled.

If there is more growth, just be happy.

Edit:
The question is what about the clock speed. I hope it stays at least at the same level as Zen 4.

APU_Fusion · Feb 18, 2024

I predict ipc to be in range of -5% to +40% of zen 4 myself. Wins thread.

inf64 · Feb 18, 2024

Goop_reformed said:
Single thread. I think mlid has really good source this time

Well he stated 10-15% IPC improvement (not ST) according to that slide he leaked last year. I don't know what else he claimed since his range is absolutely degenerate (as always) - this is how he can claim he is spot on.

My guess is that Zen 5 will have similar ST jump as Zen 4 (vs Zen 3), so 27+%. MT will likely be lower though, and I expect ~15% on average.

Goop_reformed · Feb 18, 2024

inf64 said:
Well he stated 10-15% IPC improvement (not ST) according to that slide he leaked last year. I don't know what else he claimed since his range is absolutely degenerate (as always) - this is how he can claim he is spot on.

My guess is that Zen 5 will have similar ST jump as Zen 4 (vs Zen 3), so 27+%. MT will likely be lower though, and I expect ~15% on average.

No this is different. I believe mlid is still hiding the slides. Gotta milk the viewers.

Also RGT is doubling down here:

I think the quotes are from here

H433x0n said:
I’ve had more than 1 person tell me that CB R23 1T is >=2800 which is a >=40% increase. If I had to take a geomean of Zen 5 leakers it’d probably be a 40% 1T perf increase overall.

I don’t personally believe that but it’s at least consistent. The only person saying it’s not hype™ is MLID who’s track record is spotty. Although he seemed to get the details about the Zen 5 delay and 800 series chipset right .. so ymmv.

Circle jerking all around Xd

Glo. · Feb 18, 2024

Paul asked about this score few people and they say that IN THEIR OPINION it might be legit score.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Senior member

Golden Member

Senior member

Golden Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Senior member

Senior member

Senior member

Diamond Member

Senior member

Lifer

Senior member

Golden Member

Senior member

Golden Member

Diamond Member

Senior member

Diamond Member