- Mar 3, 2017
- 1,747
- 6,598
- 136
What would you do if your engineering teams are developing something exciting and turns out to be a turd, like Zen 5 for example?
I am wondering this is the reason why David Suggs is no longer at AMD, since 1 and half years ago
I wonder if they realized early on that Z5 is going to suck, but they are already 4 years into development.
He was chief architect of Zen 2 and Zen 5.
Z3 and Z4 seems OK, especially Z4 got helped by clocks a lot.
Z6 is going to suffer the same fate, being a derivative architecture.
Well, It is not exactly stellar, saying it is mild improvement is being too generous considering the time frame involved.Saying it sucks is a bit harsh and premature considering the whole lineup isn't even out yet. The 9 series may fare better with more traditional TDP's.
I think the Turin successor to Genoa 9184X is going to be the real Zen 5 we all deserve but will never be able to afford, short of a miraculous windfall.However, they could have done something in the uncore and address the BW and latency shortcomings and shore up the perf a bit.
It'll still suck in games though. Gamers desire for a 6GHz triple stack 32+192MB L3 single CCX 8 core parts with 170W TDP cannot and will not be met.I think the Turin successor to Genoa 9184X is going to be the real Zen 5 we all deserve but will never be able to afford, short of a miraculous windfall.
To me this release feels like a consequence of misreading the room when the development of Zen 5 started, which is 5-6 years ago realistically.What would you do if your engineering teams are developing something exciting and turns out to be a turd, like Zen 5 for example?
I am wondering this is the reason why David Suggs is no longer at AMD, since 1 and half years ago
I wonder if they realized early on that Z5 is going to suck, but they are already 4 years into development.
He was chief architect of Zen 2 and Zen 5.
Z3 and Z4 seems OK, especially Z4 got helped by clocks a lot.
Z6 is going to suffer the same fate, being a derivative architecture.
More theory crafting ...
If Z4 got delayed to accommodate CXL (as per Forrest) and COVID played some part, that would leave Z5 very long dev time.
It could have been that they were trying hard to polish this turd to not regress so much like BD.
However, they could have done something in the uncore and address the BW and latency shortcomings and shore up the perf a bit.
Intel did not ditch exotic and expensive stuff. Intel server chips still keep pushing AVX512, AMX, accelerators, etc. The goal for server SKUs has been set to match those instructions.To me this release feels like a consequence of misreading the room when the development of Zen 5 started, which is 5-6 years ago realistically.
At the time, Intel had a big lead in FP and vector throughput in HEDT/server with SKL-X and then followed it up by bringing it to client with ICL and TGL.
To me it feels like AMD decided to match them in this respect no matter what and dedicated bulk of the resources to FP throughput (L1 -> FP PRF doubled, doubled the FP register file, went for the most overkill AVX-512 implementation known to man).
Little did they know that Intel would ditch the thing and ARM would become a major threat with their ultra-wide OOO machines with ridiculous integer throughput.
Couple that with Suggs' propensity for large FP units and bean counters reverting the Zen 5 to N4P, and you have a perfect storm for the lowest gen-on-gen INT gain.
I'd also add that Zen 3 was more than OK, it was a goated gen-on-gen jump. 16 months after Zen 2, miniscule area increase, massive improvement in INT throughput.
To me this release feels like a consequence of misreading the room when the development of Zen 5 started, which is 5-6 years ago realistically.
At the time, Intel had a big lead in FP and vector throughput in HEDT/server with SKL-X and then followed it up by bringing it to client with ICL and TGL.
He is no longer at AMD, They knew at least couple of years earlier that it would turn out this way.Couple that with Suggs' propensity for large FP units and bean counters reverting the Zen 5 to N4P, and you have a perfect storm for the lowest gen-on-gen INT gain.
They did on client, though. Whereas AMD with their "one size fits all" approach ended up with a core that dedicates a large portion of its area for stuff that's almost irrelevant.Intel did not ditch exotic and expensive stuff. Intel server chips still keep pushing AVX512, AMX, accelerators, etc. The goal for server SKUs has been set to match those instructions.
We'll have our preview with STX Halo soon enough I guess.On the other hand, while Z6 would also be a minor iterative core architecturally, it is going to benefit from clocks being on N3E.
So I think the physical implementation team would be able to come to their rescue here. They would have had enough time.
I think there is potential uplift from improving the uncore too which can help.
It's not a given.As I said before, we need to wait for APX instruction set implementation before we see huge IPC increase.
Discussing the bits and pieces of architectural weaknesses and how to overcome them is not coping. I don't know why this word is very much used in the shillicon twitter universe.We'll have our preview with STX Halo soon enough I guess.
The uncore is just pure cope at this point.
I don't mean that discussing it is cope, it isn't. I meant that the uncore is just poor. It's just downright funny that CCD is unable to use all of the memory bandwidth because of a single GMI3 link.Discussing the bits and pieces of architectural weaknesses and how to overcome them is not coping. I don't know why this word is very much used in the shillicon twitter universe.
I am saying this based on the visible improvements in SPEC int, fixed clock, when Z4 is equipped with 3D V-Cache . I would think removing the uncore bottlenecks which 3D Vcache attempts to work around would improve the situation, until the next bottle neck at least
Also Z4 in MI300A benefits from the LLC prefetching as per AMD themselves.
Additionally, they have prefetching updates for L1/L2 instead of just stream, stride, burst, nextline
Yee did point out huge improvements in scalar integer though. It's everything in-between that's stagnating.Well, It is not exactly stellar, saying it is mild improvement is being too generous considering the time frame involved.
I am mostly looking at Alexander Yee's blog to make this statement.
Other than AVX512 there is not much improvement
How would you feed these cores in AM5?So this is another typical AMD launch. Couple users over hype the product. Others fall for this hype. When the product is actually released, everyone feels disappointed. For me the performance meets the expectation from the architectural perspective.
It is well know that it is very difficult to increase integer IPC. The number of general purpose registers is a bottle neck. More read/write ports will help, but it may also increase power usage. As I said before, we need to wait for APX instruction set implementation before we see huge IPC increase.
Having said that, there is still lots of potential still left in AVX. With AVX512 they can probably go over 16 execution units.
My real disappointment is there is no 24/32 core AM5 Zen5 CPU.
It's about the same time-between-releases as Zen 4. But for this length of time people expect bigger gains (even if the process uplift was less)Wouldn't matter as much if there were a realistic chance of the cadence catching up, but it seems to get worse instead.
But is constrained by memory bandwidth, that is they have throughput as long as there is no data to be fetched from somewhere lower in the memory hierarchy.Yee did point out huge improvements in scalar integer though
Which was known to be delayed to account for CXL. So Zen 5 to spend the same time means it is actually doubly delayed instead catching up with the intended cadence.It's about the same time-between-releases as Zen 4.
That was to be expected though considering we already knew bigger uncore/IO changes would only happen with Zen 6 going by previous gens.But is constrained by memory bandwidth, that is they have throughput as long as there is no data to be fetched from somewhere lower in the memory hierarchy.
But they kept the L2 at 1MiB and kept the L2 to L3 at 32B/cycle. So no respite there too.
Let's just put it this way. Only once did a Zen land on time and that was Zen 3. And Zen 5 is right on average. If all Zens but one are delayed then well, what's the exception? It isn't Zen 5.Which was known to be delayed to account for CXL. So Zen 5 to spend the same time means it is actually doubly delayed instead catching up with the intended cadence.
If DDR4 is enough for 16 cores, I am sure DDR5 with double the bandwidth enough for 32 cores. At least should be enough for 24 cores.How would you feed these cores in AM5?
Yes, my body is ready again.Soo... hype train back on tracks?
AMD just released their new architecture, ZEN 5%More good news about the 9600X.
