Speculation: Ryzen 4000 series/Zen 3

Richie Rich · Oct 28, 2019

itsmydamnation said:
All this SMT4 stuff is just stupid, if ILP only average something like 1.5 on spec why would SMT2 only add ~25% performance given there are 11 pipelines in a Zen core. Because the bottleneck isn't in execution!

Why are you mixing 1.5 CISC ILP with RISC execution units? Keller said IceLake is executing 3-6 instructions at once. Maybe you can explain why Apple moved from A7 (4xALU, 2xLSU) to wider A11/12/13 (6xALUs with still only 2xLSU). I think they had pretty good reason to do that (especially when we know there is massive +58% IPC gain over SkyLake).

soresu said:
The problem that ATV and yourself seem to be discounting is that a leak can contain elements of truth without being wholly truthful, perhaps some of these information dispersals are intentionally planted within companies like AMD to identify leakers when they are suspected to exist - it's what I would do.

The interesting thing is that AVT saw exactly same slides (graphics) just with SMT4 on it. This is the point. They put it there for identifying leakers or because Zen 3 is SMT4 capable. Could be both.

Guru said:
We know for a fact that it will have a unified L3 cache, we know it will have faster gates, better branch predictor and smaller latency between cores. They might actually do some sort of L4 cache as well, to improve flow and feed the cores faster.

Cache, cache, cache. I feel like in Tron movie surrounded by programs caught in endless cycle. No offense however it's funny how many people want to increase code execution by not increasing exe units. Leaked Zen 3 IPC gain of >8% (other says >10%) cannot be achieved by just L3 cache.

BTW a comparison of evolution of Apple/Intel cores:

2012 - Intel IvyBridge (3xALU)... Apple A6 (2xALU) .... Apple is way behind
2013 - Intel Haswell (4xALU)... Apple A7 (4xALU) .... Apple is on par with Intel
2017 - Intel CoffieLake (4xALU)... Apple A11 (6xALU) .... Apple became tech leader

Isn't this interesting?

Ajay · Oct 28, 2019

Richie Rich said:
Another interesting thing regarding the AMD presentation about Milan SMT2 and unified L3 cache. AdoredTV claims that in his earlier leaked version of that slide there was SMT4 originally.

There is two possible explanation why AMD changed that at SMT2:

AMD didn't want to reveal a killer feature like SMT4 at this Zen 2 event (way too early before Zen 3 unveiling)

AMD will disable SMT4 for whole Zen 3 generation (due to performance issues due to FPU bottleneck? Zen 4 on 5nm could solve this)

This guy is such a moron with his ‘special' knowledge. He just cannot drop this SMT 4 rumor that he started. He’s impervious to facts.

soresu · Oct 28, 2019

Ajay said:
This guy is such a moron with his ‘special' knowledge. He just cannot drop this SMT 4 rumor that he started. He’s impervious to facts.

At the very least he let a retweet from Lisa Su go to his head.

soresu · Oct 28, 2019

Richie Rich said:
BTW a comparison of evolution of Apple/Intel cores:

2012 - Intel IvyBridge (3xALU)... Apple A6 (2xALU) .... Apple is way behind

2013 - Intel Haswell (4xALU)... Apple A7 (4xALU) .... Apple is on par with Intel

2017 - Intel CoffieLake (4xALU)... Apple A11 (6xALU) .... Apple became tech leader

Isn't this interesting?

Not even remotely, they don't compete in the same area currently - and the platform that is closest to any competing market (iPad OS) is more or less closed from software freedom, unlike the multitude of platforms that x86 potentially supports (excepting MacOS of course).

Apple Axx would be infinitely more interesting to me if they made iOS a more open system, or in some strange parallel world ran Android or Chrome OS on the latest Axx SoC out of the box.

NostaSeronx · Oct 28, 2019

What will be weird if AMD launches multiple Zen3 SKUs with dropping ASPs.
Ryzen 3000(new ASP/2x16 MB *2) -> Ryzen 3003(Zen3(N7P) - lower ASP/1x32 MB *2) -> Ryzen 3005(Zen3+(N6) - even lower ASP/1x32+ MB *2) -> Ryzen 3007(Zen3++(N5) - most lowest ASP/1x 32+ MB *2)
Making it slowly the budget line. Then, there will be Vermeer on AM6 and Genesis Peak on SP5/TR6. <== DDR5 for capacity and HBM2E/3 for speed, first to N5.

Richie Rich · Oct 29, 2019

soresu said:
Not even remotely, they don't compete in the same area currently - and the platform that is closest to any competing market (iPad OS) is more or less closed from software freedom, unlike the multitude of platforms that x86 potentially supports (excepting MacOS of course).

Apple Axx would be infinitely more interesting to me if they made iOS a more open system, or in some strange parallel world ran Android or Chrome OS on the latest Axx SoC out of the box.

Why do you escape to SW stuff? I'm talking about HW core development, I'd appreciate stay there. Another area is not an excuse for Intel. Moreover, desktop and servers CPUs should be at the top of IPC and absolute performance. Obviously, Intel was overtaken by the mobile processor and this is a big shame. I don't understand how somebody can defend this Intel's CPU development stagnation.

soresu · Oct 29, 2019

Richie Rich said:
Why do you escape to SW stuff? I'm talking about HW core development, I'd appreciate stay there. Another area is not an excuse for Intel. Moreover, desktop and servers CPUs should be at the top of IPC and absolute performance. Obviously, Intel was overtaken by the mobile processor and this is a big shame. I don't understand how somebody can defend this Intel's CPU development stagnation.

It's not an escape if it is relevant, everyone praises Apple's hardware to the heavens, but their SW platform restrictions make it useless in my opinion,little more than over powered paper weights - something I found amusing years ago when a Dolphin (GC/Wii) emulator developer praised Apple's Axx cores and damned ARM for not matching them, nevermind that he needed to jailbreak an Apple device merely to test that code, so rather a hollow argument.

As to server CPU's needing to be at the top of IPC, not all servers need that - in a great many server use cases they are simply serving data to a network, rather than doing any significant compute on it which would benefit from that high IPC, such as sending/receiving e-mails, routing video streams, database queries, sending HTML....

These lighter workloads benefit from more cores, more memory and more IO - which AMD is providing along with steady IPC improvement in EPYC.

Either way your pivot to a knock on Intel is just as pointless in this thread as me talking about Apple SW platforms - AMD is advancing not stagnating, and this is a thread about Zen3 after all, I don't care if Intel is slipping into the seventh circle of hell as long as AMD keeps moving forward now that they have momentum.

soresu · Oct 29, 2019

Richie Rich said:
2013 - Intel Haswell (4xALU)... Apple A7 (4xALU) .... Apple is on par with Intel

Also, you seem to be missing parts of the equation there - the IPC is not the only part of the problem, how did the SPEC/GB score compare to actual Haswell CPU's rather than at exactly the same frequency?

Typically if I recall, A7 tanked the battery if it ran full whack for more than short bursts - which is probably why Apple introduced the little cores to improve power efficiency later.

Richie Rich · Oct 29, 2019

soresu said:
As to server CPU's needing to be at the top of IPC, not all servers need that - in a great many server use cases they are simply serving data to a network, rather than doing any significant compute on it which would benefit from that high IPC, such as sending/receiving e-mails, routing video streams, database queries, sending HTML....

So why Intel doesn't provide Xeon CPU based on double amount of Atom core? Why 2xALU in-order ARM server CPU are failing? Why 2xALU Bulldozer failed? The answer is that 90% of workloads benefit from high IPC. Moreover you can eliminate those 10% (or whatever percentage it is) by implementing advanced techniques such as SMT2 and SMT4. This is the secret power of x86 CPU today - these are superior in almost every generic code, no expensive optimization needed. That's why Apple moved to 6xALU design, they increased code crunching window even further. Analogicaly for Zen 3, IMHO Mike T. Clark is great man for core re-design, going wider with ALU and AGU, SMT4. Such a Zen 3 would be fast in every code, old, new, just like Apple A12 is great in SPEC2006 (12 years old code). That's my point why 4xALUs core design is obsolete nowadays, and that's why Zen 3 will have most likely 6xALUs IMHO.

soresu · Oct 29, 2019

Richie Rich said:
That's my point why 4xALUs core design is obsolete nowadays, and that's why Zen 3 will have most likely 6xALUs IMHO.

Something you keep over looking is power consumption - sure A12 has great performance at full whack, but it also drains the battery tout suite, which is why the little cores are needed.

If Zen3 went 6 wide, it would need much more area than the 20% 7nm+ brings, and far more than the meagre 15% (ideally, not necessarily realistic...) power efficiency improvement, an improvement that would likely not even be enough to cover the increased power consumption from going 6 wide, let alone SMT4 too.

NostaSeronx · Oct 29, 2019

soresu said:
which is why the little cores are needed.

The little cores are needed because DVFS-complexity. Why build a single super complex core, when they can build a cheap high IPC core and a cheap EPI core.

AMD doesn't fit in that narrative since Zen2 already has a top-of-the-line 0.3-1.5V sensing AVFS. Higher IPC w/ SMT4 can convert a four core boost into a two core boost.

soresu · Oct 29, 2019

Richie Rich said:
The answer is that 90% of workloads benefit from high IPC.

A citation or 4 would make your argument less of an opinion.

Richie Rich said:
This is the secret power of x86 CPU today - these are superior in almost every generic code, no expensive optimization needed.

The secret power of x86 is the Wintel collaboration that spread it everywhere but mobile, and even then it's because Intel management was too shortsighted to see the potential when Apple came-a-calling during the iPhone development - now ARM owns the mobile space and grew up to gobble MIPS market because of that complacency.

soresu · Oct 29, 2019

NostaSeronx said:
Higher IPC w/ SMT4 can convert a four core boost into a two core boost.

Am I seeing things or are the last 2 things back to front?

NostaSeronx said:
The little cores are needed because DVFS-complexity. Why build a single super complex core, when they can build a cheap high IPC core and a cheap EPI core.

Yeah, it would make Apple's big core even bigger still, not to mention I imagine that AMD and Intel have a lot of patents on that sort of dynamic functionality which Apple would have to either license or do a lot of R&D to find an alternative solution which could likely be inferior.

NostaSeronx · Oct 29, 2019

soresu said:
Am I seeing things or are the last 2 things back to front?

No, higher IPC w/ SMT4 means two cores have eight threads. Where as in Zen2 eight threads are spread across four weaker cores. Four core boost in Zen2 is less than its two core boost. Higher IPC and SMT4 doesn't explicitly mean higher energy given that Zen2 is mostly a port. A new architecture with higher IPC would either handle the higher current of 7nm better or be able to use 6-track for an improved frequency/power curve given N7P.

Milan => isn't a new CPU architecture (process-optimization) / K17.4
Vermeer => is a new CPU architecture (inflection) / K19.2

Given the above, the new core is on N5. Given its HPC is 6T and its mobile is 5T. There is a huge area/power shrink to be traversed for the Ryzen 4000 family/Vermeer.

darkswordsman17 · Oct 29, 2019

soresu said:
Something you keep over looking is power consumption - sure A12 has great performance at full whack, but it also drains the battery tout suite, which is why the little cores are needed.

If Zen3 went 6 wide, it would need much more area than the 20% 7nm+ brings, and far more than the meagre 15% (ideally, not necessarily realistic...) power efficiency improvement, an improvement that would likely not even be enough to cover the increased power consumption from going 6 wide, let alone SMT4 too.

Yeah, plus it seems like AMD is pushing efficiency on Zen 3. I think that's a calculated move to get Zen into laptops (and other similarly constrained form factors). It helps them on servers/Threadripper (where they offer improved perf/w and performance via higher core counts; for servers I think it'll be the start towards keeping per CPU power in check so that they can increase core counts some but can also increase sockets as their means of offering higher density per rack/server). For the consumer space it enables them to cram a GPU in. I think that would be an easy sell for OEMs where they could sell smaller form factor stuff.

soresu · Oct 29, 2019

NostaSeronx said:
Higher IPC and SMT4 doesn't explicitly mean higher energy given that Zen2 is mostly a port.

Higher IPC and SMT are likely to require significant extra transistors, those don't come for free unless you decrease power consumption somewhere else.

Anyway, how is Zen2 a port, mostly or otherwise?

It doubled Zen1/1+ FP resources amongst other changes like TAGE branch predictor, I'd hardly call that a mere port/shrink by any standards.

NostaSeronx · Oct 29, 2019

soresu said:
Anyway, how is Zen2 a port, mostly or otherwise?

Same physical design as Zen. A new design doesn't use the same macro-tiles. Hence, because it is mostly re-using Zen assets it's mostly a port.

soresu · Oct 29, 2019

darkswordsman17 said:
I think that's a calculated move to get Zen into laptops (and other similarly constrained form factors).

Yes, I honestly believe AMD want to get a second VR collaboration too.

Given the last one used the Carrizo SoC (Sulon Q), it would be a huge jump to Zen3 and Navi2/RDNA2 - both for power efficiency and raw performance.

moinmoin · Oct 29, 2019

soresu said:
The secret power of x86 is the Wintel collaboration that spread it everywhere but mobile

I'm sure glad it's not Wintel in the server market, just Intel.

soresu · Oct 29, 2019

NostaSeronx said:
Same physical design as Zen. A new design doesn't use the same macro-tiles. Hence, because it is mostly re-using Zen2 assets its mostly a port.

.....

You just said re-using Zen2, after saying same physical design as Zen1.

Perhaps you mean Zen3 is a port of Zen2?

That does make more sense if unified L3 CCD is the only significant change.

soresu · Oct 29, 2019

moinmoin said:
I'm sure glad it's not Wintel in the server market, just Intel.

Ya darn tootin!

I've only tried BSD a little, but it seems extremely stable, especially by comparison to Winblows.

Richie Rich · Oct 30, 2019

Q3'2019 Lisa Su Q&A said:
We will transition to the 5-nanometer node at the appropriate time and get great benefit from that as well. But we’re doing a lot in architecture. And I would say, that the architecture is where we believe the highest leverage is for our product portfolio going forward.

Another prove that AMD is concentrated heavily on architecture improvements for Zen 3. Remembering another Lisa's statement after unveiling Zen 2, that she's not leaving AMD to IBM because the best things yet will come. Unified L3 cache is just tiny bit of what Zen 3 will bring.

Regarding Zen 3 being wider with 6xALU core. This might not impact area much as 4xALU is one of the smallest area in core (ALUs are 5x smaller than LSU or 10x smaller than FPU). Going wide to 6xALU would cost almost nothing in terms of die size (some other part of core will need to grow accordingly too, like scheduller). However it would cost a lot of brainpower to do that. IMHO that's exactly what Lisa Su is talking about. Just look at the picture of Zen core: https://en.wikichip.org/w/images/th...tated).png/500px-amd_zen_core_(annotated).png

DrMrLordX · Oct 30, 2019

Haven't we beaten the "wider core" and SMT4 arguments to death already? It's getting repetitive to the point of absurdity.

TheGiant · Oct 30, 2019

DrMrLordX said:
Haven't we beaten the "wider core" and SMT4 arguments to death already? It's getting repetitive to the point of absurdity.

how is it possible for current x86 models (zen 3k, cfl, next icelake which is better) to reach the IPC of a13 while maintaining like 4GHz freq
is it possible with current tech?
which part of the cores are the bottlenecks?

DrMrLordX · Oct 30, 2019

TheGiant said:
how is it possible for current x86 models (zen 3k, cfl, next icelake which is better) to reach the IPC of a13 while maintaining like 4GHz freq

I would assume that 256-bit vector processing is much faster on x86 hardware already since that really isn't a thing on mobile hardware. There are also scenarios where AMD's implementation of SMT in particular makes Zen2 much more attractive. For example, I can easily clear an MT score of 14000 in Geekbench 5 on a 3900x with clockspeeds sitting in, I dunno, the 4.2 GHz range? An A13 with 2 Lightning (2.66 GHz) and 4 Thunder (??? GHz) cores scores a measily GB5 MT score of 3400-3500 (varies). I have twice the cores and . . . I guess ~57% (or more) of the clockspeed of an A13, but better than 400% the MT performance. Take two A13s, jack up their clockspeeds by +57%, and you get an MT score of around 11k (hypothetically). Yeah my 3900x sucks power, but big deal. Let's see Apple scale that A13 up to a 95W TDP (or higher).

That ST score is scary, and the MT score may be more a result of throttling than anything else. So A13 deserves a lot of credit. Just not all the credit.

Speculation: Ryzen 4000 series/Zen 3

Senior member

Lifer

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Lifer

Senior member

Lifer