• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 289 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Zen is wider than 4 if you run from the uop cache. The cache can deliver up to 9 uops per clock, but alignment restrictions reduce average throughput. The pipeline after that (crucially, rename) is 6 wide
We are writing about the physical widths of decoders.
 
I think you need to look back at Excavator's front end , including op cache , before being too surprised at Zen's current IPC despite only 4 decoders. Pushing any more IPC out of it , i.e Zen 5, without going wider? , yeah that will start to get interesting.

Zen main L1i-cache is micro-op cache. Decoder width is pretty much irrelevant - when executing code which totally misses quite large MOP it's pretty much guaranted that data caches will miss too. So 4 wide decode is probably way more than needed - core probably won't be even close to reach 1 IPC in those situations. Apple cores are totally different as they don't have micro op cache at all and need decode width equaling their core's execution width. It's more beneficial to use that 4-wide decode more efficiently - to try predecode branches earlier. I do remember that I have seen some speculation that Zen5 tries to do that by combining it's branch target buffer with decoders.
 
So why did Intel add 6 decoders? Couldn't he have done 2 complex + 3 simple instead of 1 complex + 5 simple?

why load ROB as much as 512 when you can get a similar IPC with fewer resources? Could it be poor design?


I hope that we will not have to wait long for the leaks to be verified with the actual situation to compare Zen5 with LionCove.
 
Last edited:
So why did Intel add 6 decoders? Couldn't he have done 2 complex + 3 simple instead of 1 complex + 5 simple?

Complex instructions have to decoded for compatibility - those instructions when used will destroy performance anyway. Simple instructions are what matter. And they go wider if they can't go smarter. So when BTB instruct to decode not micro-op cached instructions Intel probably misses L1i-cache but after that L2, L3 or memory latency can decode 6-instructions in cycle with high power usage. Instead, if going smarter BTB could be used to fetch and decode not cached instructions before they are needed and cache those mops to be used when BTB hits. To be fair for Intel that's actually partially used scheme for their E-cores which can decode 6 instructions too but 3 for current stream and other 3 for possible branching target. But being able to predecode branches to mop cache is order of magnitude more sophisticated solution - and with that scheme decoding 4 instructions per clock is more than enough - and going only 4 wide decode gives much better energy-efficiency.
 
So why did Intel add 6 decoders? Couldn't he have done 2 complex + 3 simple instead of 1 complex + 5 simple?

why load ROB as much as 512 when you can get a similar IPC with fewer resources? Could it be poor design?


I hope that we will not have to wait long for the leaks to be verified with the actual situation to compare Zen5 with LionCove.
Intel complex are for microcode instructions + a few exceptions , almost all instructions are "simple"
 
Cluelessly shooting in the dark have they been? Don't they use simulators to figure out if their idea is worth pursuing?
They do. But the thing is that for very long their engineers were very specialized on specifics of blocks. And they also were designing so close to metal that big changes were very difficult to do. I don't know where they stand now but I hope they moved away from that culture.
 
Seems that's their problem. Keller said huge teams are basically unmanageable. The intra communication overhead becomes too great.
its more then that , in a high performance team/ culture your "average" engineer delivers above average output , in low performance team/ culture an above average engineer will deliver average output. Especially in fields that require lots of problem solving and perseverance.

In all the original Zen1 fluff AMD put out you can hear them talking about these exact types of things as they had to fight really hard to keep IPC from dropping as the core become more mature/complete.
 
It's a monster core and it should be phenomenal in AVX-512.

The only people with a predicament are the ones not on AM5 coz they may get very tempted to switch platforms and it would hurt their savings account.

And 4090 owners coz it's so woefully CPU limited.
 
I figure most people here would like to see ST improve, given that currently Intel has the lead in some applications. Who knows if Intel is able to improve further on that... but it seems sensible to me that AMD would have at least wanted to improve ST by now even as they designed Z5 probably 5+ years ago.
 
I have a feeling granite ridge is only about 15% - 20% > zen 4/rpl 🙁
I think that for Zen 5 an average of +15-20% across the entire spectrum of the IPC growth curve is a reasonable and safe assumption. Of course, AVX512 will have the biggest gains at the end of the curve, as will SunnyCove and GoldenCove, although for the latter AVX512 was ultimately disabled.

If there is more growth, just be happy.

Edit:
The question is what about the clock speed. I hope it stays at least at the same level as Zen 4.
 
Last edited:
Single thread. I think mlid has really good source this time
Well he stated 10-15% IPC improvement (not ST) according to that slide he leaked last year. I don't know what else he claimed since his range is absolutely degenerate (as always) - this is how he can claim he is spot on.

My guess is that Zen 5 will have similar ST jump as Zen 4 (vs Zen 3), so 27+%. MT will likely be lower though, and I expect ~15% on average.
 
Well he stated 10-15% IPC improvement (not ST) according to that slide he leaked last year. I don't know what else he claimed since his range is absolutely degenerate (as always) - this is how he can claim he is spot on.

My guess is that Zen 5 will have similar ST jump as Zen 4 (vs Zen 3), so 27+%. MT will likely be lower though, and I expect ~15% on average.
No this is different. I believe mlid is still hiding the slides. Gotta milk the viewers.

Also RGT is doubling down here:


I think the quotes are from here

I’ve had more than 1 person tell me that CB R23 1T is >=2800 which is a >=40% increase. If I had to take a geomean of Zen 5 leakers it’d probably be a 40% 1T perf increase overall.

I don’t personally believe that but it’s at least consistent. The only person saying it’s not hype™ is MLID who’s track record is spotty. Although he seemed to get the details about the Zen 5 delay and 800 series chipset right .. so ymmv.

Circle jerking all around Xd
 
Last edited:
Back
Top