Speculation: Ryzen 4000 series/Zen 3

Page 36 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

moinmoin

Diamond Member
Jun 1, 2017
4,954
7,668
136
Do you guys think we’ll get another New/Next Horizon-like event this year with some early Zen 3 details?
That leaked and downed video mentioned multiple times contained the first early Zen 3 detail: unified 32MB+ cache per CCD. We are bound to get more with time, but as usual confirmation of many parts won't happen until close to (or even after) the launch.
unless they find Intel's damage control PR pieces fun.
Actually I'm pretty sure they do (internally, unofficially etc.). ;)
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
Why the hell are we still discussing SMT4 in Zen 3?
Well...
Then Milan was designed to further erase any asterisks that remain, so in thinking about it, in the original strategy, Milan was where we expected to be back to IPC (or better) parity across all workloads.
- https://www.anandtech.com/show/14568/an-interview-with-amds-forrest-norrod-naples-rome-milan-genoa

Zen2 in its hi(128-bit) and lo(128-bit) FPU has FP; MUL0/MUL1/MUL2/MUL3 and ADD0/ADD1/ADD2/ADD3, while all ports can do VMISC/FMISC. Which is enough to support AVX512.

Now the SMT4 comes into play as there is 4x FMUL+4x FADD. Each thread thus gets 1xFMUL+1xFADD. Since, majority of legacy code is 128-bit, and there are workloads that can't be scaled to 256-bit. It makes more sense to support 128-bit units over 256-bit/512-bit units.

SMT4 based on AMD's hiring of researchers is constrained quantity, add with the dynamic nature of the new SMT model. Aka, at pre-fetch or dispatch it goes 1T/2T/3T/4T on demand, etc. It is more effective than any previous x86 SMT versions.

No value added to Milan = Intel regains their lead. Icelake-SP(XCC*2) w/ SunnycoveX(10nm++ core) isn't a low-volume product, nor does it have less cores than 64-core. It's a >72 core monster that easily replaces Xeon Phi. There is also the ultra-secret Icelake-MDFI w/ mesh chiplets (w/ L4 depot cache).
 
Last edited:
  • Like
Reactions: Richie Rich

soresu

Platinum Member
Dec 19, 2014
2,664
1,863
136
No value added to Milan = Intel regains their lead.
They recently made a statement about Milan.

They said it will have superior perf/watt to IceLake, not superior raw performance.

I expect modest IPC gains with some power efficiency gains too, better to expect that and be surprised by more if it comes.
 

maddie

Diamond Member
Jul 18, 2010
4,746
4,687
136
They recently made a statement about Milan.

They said it will have superior perf/watt to IceLake, not superior raw performance.

I expect modest IPC gains with some power efficiency gains too, better to expect that and be surprised by more if it comes.
If you continue with your line of reasoning, then you're implying that power drops are coming. I take it to mean that the increased perf/W will translate to higher performance as I really don't see them lowering the TDP ratings. Do you?
 

soresu

Platinum Member
Dec 19, 2014
2,664
1,863
136
If you continue with your line of reasoning, then you're implying that power drops are coming. I take it to mean that the increased perf/W will translate to higher performance as I really don't see them lowering the TDP ratings. Do you?
That meant (in my parlance anyways) modest IPC/clock gains and modest power drops, but not a great amount of either given the process change is fairly meagre.

Others have stated otherwise, and some have stated in a rather overly optimistic way that a change to 6 wide is coming - but I prefer to expect less and receive more (if indeed there is more), it's better that way.

Of course I'm just as happy to get a regular 20% IPC bump per gen, but even ARM can't do that all the time - A73 case in point.

Having said that, does anyone have a concrete figure for the Cortex A57 -> A72 IPC improvement?
 

DisEnchantment

Golden Member
Mar 3, 2017
1,608
5,816
136
Yotsugi said:
It's upper teens for IPC and some clocks to boot, the silicon is already up and running, like, Windows.

In the video (around the 104 second mark) they said they are already sampling the chip.
 

soresu

Platinum Member
Dec 19, 2014
2,664
1,863
136
as I really don't see them lowering the TDP ratings. Do you?
Depends on the segment, the 2700E may have been a part which was nigh on impossible to lay your hands on but it was a significant TDP drop for a meagre clock decrease at 45W.

Also at 14nm Zen there was only a single 65W 8 core product (1700), now we have several at 7nm, wasn't there a 12 core 65W too?

In the APU segment I would definitely expect a sub 15W TDP SKU, they have more than enough efficiency now to achieve a very good performer at 10W or below, especially is Navi performs as efficiently as you might hope at lower clockspeeds.
 

maddie

Diamond Member
Jul 18, 2010
4,746
4,687
136
Depends on the segment, the 2700E may have been a part which was nigh on impossible to lay your hands on but it was a significant TDP drop for a meagre clock decrease at 45W.

Also at 14nm Zen there was only a single 65W 8 core product (1700), now we have several at 7nm, wasn't there a 12 core 65W too?

In the APU segment I would definitely expect a sub 15W TDP SKU, they have more than enough efficiency now to achieve a very good performer at 10W or below, especially is Navi performs as efficiently as you might hope at lower clockspeeds.
Doesn't matter. Once they keep the 95W or any other existing rating then increased perf/W is really increased perf.
 

tomatosummit

Member
Mar 21, 2019
184
177
116
Doesn't matter. Once they keep the 95W or any other existing rating then increased perf/W is really increased perf.
This
Perfect example was r7 1700 to the r7 2700. Although it was amplified by the better boost algorithms, the 2700 maintains higher clock speeds at the same 65w tdp thanks to the 12ff inprovements.
 

soresu

Platinum Member
Dec 19, 2014
2,664
1,863
136
Doesn't matter. Once they keep the 95W or any other existing rating then increased perf/W is really increased perf.
Not really how I see it, but I have offline CG rendering goggles on.

I will always prefer more cores at the same power rather than a few hundred mhz on the same number of cores.

If a 65W 16 core model comes out, I would probably buy it even if it costs more than the higher clocked model at 95W-105W.
 

Saylick

Diamond Member
Sep 10, 2012
3,171
6,404
136
It will have superior everything.

It's upper teens for IPC and some clocks to boot, the silicon is already up and running, like, Windows.
Out of curiosity, how do you know the IPC gains are upper teens? Is this just speculation?
 

Saylick

Diamond Member
Sep 10, 2012
3,171
6,404
136
Nah, the cat's outta the bag already in China.
I haven't been keeping up too frequently in this thread so sorry if it's already been posted, but can you provide the source (assuming it's on Chiphell or some Chinese forum) of this info?

Also, "upper teen" IPC improvement implies the jump from Zen 2 to Zen 3 is as large, if not larger, than from Zen+ to Zen 2. I'd really like to see some proof because that's a BIG jump in IPC.
 

Saylick

Diamond Member
Sep 10, 2012
3,171
6,404
136
if Zen3 has the same amount or more development effort then high teens doesn't seem unreasonable, they did what 13-15%ish while also doubling datapath width with Zen2.
I have an easier time understanding the 15% IPC gains in Zen 2 because the larger mop cache and improved predictor are items that I've seen in the past that directly improves IPC. What I'm curious about is what else is there to do that would also give another 15% IPC on top of the 15% that Zen 2 brought. Larger registers? Another L/D unit? More ALUs?
 

Yotsugi

Golden Member
Oct 16, 2017
1,029
487
106
but can you provide the source (assuming it's on Chiphell or some Chinese forum) of this info?
Later, when I dig it out of Twitter DMs.
Also, "upper teen" IPC improvement implies the jump from Zen 2 to Zen 3 is as large, if not larger, than from Zen+ to Zen 2
What's so special about that? Every numbered core is a tock.
What I'm curious about is what else is there to do that would also give another 15% IPC on top of the 15% that Zen 2 brought
You'll see.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,776
3,156
136
I have an easier time understanding the 15% IPC gains in Zen 2 because the larger mop cache and improved predictor are items that I've seen in the past that directly improves IPC. What I'm curious about is what else is there to do that would also give another 15% IPC on top of the 15% that Zen 2 brought. Larger registers? Another L/D unit? More ALUs?
The answers easy, keep the core fed, so bigger better front end, yes more PRF, more dispatch/retire, increase the OOOe window. The known l3 cache change can make a big IPC difference to various different workloads.

In terms of ALU's still waiting for this mythical single thread integer workload that doesn't do any load or store and has an IPC of >4 with heaps of ILP just lying around waiting for more ALU's.
 
  • Like
Reactions: lightmanek

Saylick

Diamond Member
Sep 10, 2012
3,171
6,404
136
Later, when I dig it out of Twitter DMs.

I look forward to it.

What's so special about that? Every numbered core is a tock.

That's a fair point, but then again, Intel has use 10% IPC gains as a tock and that's considered a respectable IPC gain for an architectural improvement. 15% on top of another 15% is a fresh change of pace given the incremental improvements we've seen from Intel in the last few years.

The answers easy, keep the core fed, so bigger better front end, yes more PRF, more dispatch/retire, increase the OOOe window. The known l3 cache change can make a big IPC difference to various different workloads.

In terms of ALU's still waiting for this mythical single thread integer workload that doesn't do any load or store and has an IPC of >4 with heaps of ILP just lying around waiting for more ALU's.
Hahaha, fair enough. That's like me asking, "How do you make a new Corvette faster than the generation before it?", and you replying, "Well, you can make it have more horsepower, fatter tires, better weight distribution, and more downforce." I mean, you'd be right because it's true, but it's the smaller details and reasoning behind certain design decisions that I think that are more interesting.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,776
3,156
136
Hahaha, fair enough. That's like me asking, "How do you make a new Corvette faster than the generation before it?", and you replying, "Well, you can make it have more horsepower, fatter tires, better weight distribution, and more downforce." I mean, you'd be right because it's true, but it's the smaller details and reasoning behind certain design decisions that I think that are more interesting.
well of course,

have a look at the patent for how they made the 3rd AGU work, its quite interesting. It makes me wonder is the reason apple have 4 cycle latency on simple ALU ops because they are doing the same kind of thing AMD did for the AGU's on the ALU's as well, if they did something like that they probably wouldn't have to do any kind of internal clustering of ALU's.
 
  • Like
Reactions: amd6502

Cardyak

Member
Sep 12, 2018
72
159
106
There’s loads of potential for further increases, and that’s without radical redesigns needed.

Just some basic stuff off the top of my head

- More execution units (Doesn’t have to be ALU, can be AGU, LEA, FPU, etc...)
- Larger Caches
- Increased ROB and Memory, Scheduler Buffers
- More ports to dispatch instructions to execution units and reduce back end bottle necks
 
Last edited:

amd6502

Senior member
Apr 21, 2017
971
360
136
In terms of ALU's still waiting for this mythical single thread integer workload that doesn't do any load or store and has an IPC of >4 with heaps of ILP just lying around waiting for more ALU's.

You're seeing it all the time. It's just that it's only (likely) a smaller percentage of the code.

4ALU is quite good already and got zen the 40%+ ipc gain.

The potentiall monothreading gains from 4ALU to 5ALU (or 6ALU) are going to be much less. But here, even a ~5% IPC increase is going to count a lot. And for (SMT2) multithread IPC gains, it's bound to be double digits.

The slight downside is more idle pipes means it would need gating to avoid loosing efficiency. Or a 4-way MT scheme; SMT2+?
 
  • Like
Reactions: jaymc