Speculation: Ryzen 4000 series/Zen 3

Page 54 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
Renoir is monolthic.

Renoir is interesting because it's not the same chipplet again... we might see some bugs and optimizations fixes for zen 2, possible (but unlikely) to be a "zen2+" like raven ridge was
 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
AMD has Dual 10G NIC in the embedded R1000 series 2C/4T w/ Vega 3.
That's licensed. Once their mobile APUs fetch higher prices I can imagine them doing that there as well, but so far they didn't. Though for mobile they'll want a WNIC.

The crazy thing is that Sony are still posting optimisations for Jaguar even now, I suspect the rapid iteration of Zen will cause some backlog in serious optimisation - the price of progress in a hardware oriented smaller company I guess.
The real bad part is that it's not that the optimization doesn't exist in GCC, but that existing optimizations are slow to get enabled for the Zen targets.
 
Last edited:
  • Like
Reactions: amd6502

inf64

Diamond Member
Mar 11, 2011
3,697
4,015
136
Some food for thought.
AMD initially targeted 40% IPC gain for a brand new core (ZEN1) over previous generation (Carizzo). 40% figure is just to give us an idea what AMD had in mind and what they thought is achievable. Fast forward to 2020 and if Zen3 offers a 15-20%(mean 17.5%) IPC boost over Zen2, we end up with a cumulative boost of 1.035(Zen+)x1.15(Zen2)x1.175(Zen3)=1.4 or 40%.

As Forest Norrod stated Zen1 was a tock and Zen3 will be a tock (Zen2 and Zen4 are/will be ticks, but not traditional ones as he noted).Zen1 was targeted for 40% IPC jump, Zen3 might be targeted for the same goal and as can be seen above it's easily achievable. AMD overshot the target for both Zen1 tock and Zen2 tick so that bodes well for a tock that Zen3 is.
 
  • Like
Reactions: Richie Rich

ClockHound

Golden Member
Nov 27, 2007
1,108
214
106
Some food for thought.
AMD initially targeted 40% IPC gain for a brand new core (ZEN1) over previous generation (Carizzo). 40% figure is just to give us an idea what AMD had in mind and what they thought is achievable. Fast forward to 2020 and if Zen3 offers a 15-20%(mean 17.5%) IPC boost over Zen2, we end up with a cumulative boost of 1.035(Zen+)x1.15(Zen2)x1.175(Zen3)=1.4 or 40%.

As Forest Norrod stated Zen1 was a tock and Zen3 will be a tock (Zen2 and Zen4 are/will be ticks, but not traditional ones as he noted).Zen1 was targeted for 40% IPC jump, Zen3 might be targeted for the same goal and as can be seen above it's easily achievable. AMD overshot the target for both Zen1 tock and Zen2 tick so that bodes well for a tock that Zen3 is.

I like the way you're tocking. Keep on ticking too. ;-)
 

Ajay

Lifer
Jan 8, 2001
15,408
7,833
136
Yeah, that had confused me in the light of its absence in high end desktop chipsets - though 10 Gig BASE T market penetration has been woefully slow/stagnant in general given it is more than a decade old now.
I would imagine that with so much office work relying on server and cloud based apps - there isn’t much need to move large enough amounts of data to merit 10G networks. All the bottlenecks are elsewhere.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
- though 10 Gig BASE T market penetration has been woefully slow/stagnant in general given it is more than a decade old now.

I think the reason it hasn't caught on outside of servers is that 1GbE is already quite fast an very adequate for most purposes.
 
  • Like
Reactions: spursindonesia

Thunder 57

Platinum Member
Aug 19, 2007
2,670
3,788
136
Some food for thought.
AMD initially targeted 40% IPC gain for a brand new core (ZEN1) over previous generation (Carizzo). 40% figure is just to give us an idea what AMD had in mind and what they thought is achievable. Fast forward to 2020 and if Zen3 offers a 15-20%(mean 17.5%) IPC boost over Zen2, we end up with a cumulative boost of 1.035(Zen+)x1.15(Zen2)x1.175(Zen3)=1.4 or 40%.

As Forest Norrod stated Zen1 was a tock and Zen3 will be a tock (Zen2 and Zen4 are/will be ticks, but not traditional ones as he noted).Zen1 was targeted for 40% IPC jump, Zen3 might be targeted for the same goal and as can be seen above it's easily achievable. AMD overshot the target for both Zen1 tock and Zen2 tick so that bodes well for a tock that Zen3 is.

Easily achievable? No. Also, no way Zen 3 raises IPC by 40%.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
I think the reason it hasn't caught on outside of servers is that 1GbE is already quite fast an very adequate for most purposes.
The reason it hasn’t come down in price is that it isn’t really needed for home use which could make it a cheap commodity. It isn’t hard to do these days. A lot of consumer level tech is actually pushing higher speeds than 10G Ethernet. Video cables and such require ridiculous speeds to handle uncompressed 4k video and such. Due to the high prices, since it is still mostly server, I am stuck with a cluster connected by gigabit Ethernet at work still. I probably could get higher speeds than gigabit with a good AC WiFi router. It is ridiculous.
 
  • Like
Reactions: Ranulf

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Easily achievable? No. Also, no way Zen 3 raises IPC by 40%.

IPC is application specific. There are two different areas where AMD does not have intel beat. One is certain high end server applications (database servers and such) that can make use of a large monolithic last level cache. Zen 2 can only access up to 16 MB from any one core with good latency. Intel xeons go up to 38.5 MB mostly. They have one specialized 55 MB part. AMD will be looking to surpass intel with Zen 3 in this area also. They need to make sure IT departments don’t have any excuses to continue buying intel. I hope that they have a large cache variant for niche server applications and HPC.

The other area is for AVX512 applications. It is obvious that some of the cases are software de-optimizations; intel software forcing bad code paths for competing products. I suspect that AMD may support AVX512 for Zen 3. Zen 2 only has 64 bytes per clock read and 32 bytes per clock write to the L1 cache. That fits pretty well with AVX256 since 256 bits = 32 bytes. Even if they want to increase the AVX256 throughput, they would probably need to double that to 128 bytes per clock read bandwidth. They could then support AVX512 as a full AVX512 unit, by combining two 256 bit units, or doing 512 bit instructions in 2 clocks. Intel has 2 AVX512 units in each core in some cpus though. AMD would need at least 4 AVX 256 units to match that, so they need to double the cache bandwidth over Zen 2. I tend to think that if you have code that can really make use of 512-bit units then you probably should be looking at running it on a gpu anyway, which is much more parallel with much more bandwidth available. The memory capacity issues on gpus will be much less with new HBM 2e. It allows 16 GB capacity per stack, so a 4 stack device could have 64 GB. I assume AMD wants people to use gpus more also, but they still need to keep up with intel or surpass them if possible.

About the only other thing is just raw clock speed. There are probably some applications that do respond to raw clock speed, but I don’t think they are that important. I have to wonder if the ridiculous low resolution gaming benchmarks are responding more to high clock than low memory latency. Intel is going to have a very hard time competing with themselves if that is the case. Their 10 and 7 nm parts are not going to clock as well as the 14 nm process that they have been tweaking for like 5+ years. This one isn’t going to be something AMD can do, but it looks like Intel can’t do it either. Saying that Intel is better for gaming seems ridiculous at this point. For actual reasonable quality settings, most benchmarks are gpu limited and there is no difference.

There doesn’t seem to be a core count increase for Zen 3, but I don’t think they really need it. I had thought that SMT 4 might be a replacement for not increasing the core count, but intel isn’t going to be able to compete on core count anytime soon. Also, after seeing the speed increases of the 3950x, 3960x and 3960x, I don’t think they will need a higher core count if Zen 3 is another large increase. The 2000 series ThreadRippers are way behind 3000 series parts, and that wasn’t even supposed to be that big of an upgrade.

So, 40% IPC in general is certainly not going to happen. For some specific cases, it may be possible. If you go from something that doesn’t fit in cache to something that does, the performance increase can be huge. They also could get massive performance increases if they double the floating point throughput again, but only for specific applications.
 
  • Like
Reactions: lightmanek

soresu

Platinum Member
Dec 19, 2014
2,650
1,853
136
The reason it hasn’t come down in price is that it isn’t really needed for home use which could make it a cheap commodity. It isn’t hard to do these days. A lot of consumer level tech is actually pushing higher speeds than 10G Ethernet. Video cables and such require ridiculous speeds to handle uncompressed 4k video and such. Due to the high prices, since it is still mostly server, I am stuck with a cluster connected by gigabit Ethernet at work still. I probably could get higher speeds than gigabit with a good AC WiFi router. It is ridiculous.
I got the impression that the PHY was somewhat less than efficient, at least for 10G BASE-T

Something like ports still drawing a lot of power even on more modern fab processes for the chips.

Perhaps owing to it being designed before certain assumptions about how much chip area they could spend on it or something, but as you say considering the multitude of other standards it does seem odd.

Though bare in mind those ultra high data rate consumer standards aren't meant to confer data over a significant range like Ethernet cables do, at least not without some signal loss on purely copper variants?
 

soresu

Platinum Member
Dec 19, 2014
2,650
1,853
136
AVX512 is not happening with Zen3. You can drop that line of thought. Maybe Zen4, but not with Zen3. There are way bigger fish to fry, AVX512 is still niche.
To the point that even Intel's own brand encoders (SVT) get a sound smacking from Epyc despite their AVX512 opts currently, though more will follow and likely close the gap some.
 
  • Like
Reactions: uzzi38

soresu

Platinum Member
Dec 19, 2014
2,650
1,853
136
The memory capacity issues on gpus will be much less with new HBM 2e. It allows 16 GB capacity per stack, so a 4 stack device could have 64 GB.
Max stack height (for the spec) is 12 now, with density per die/layer being 16 Gbit - so max density per stack can go as high as 24 GB, though no vendors have actually announced products yet to that effect.

Though Samsung recently claimed to have cracked 12 high 3D/TSV packaging, so we will likely hear an announcement soon enough.

Even with 96 GB for 4 stacks, the DRAM capacity per card still lags painfully far behind main system memory per socket (2 TB+?) - this is I think the main impetus for those Radeon SSG cards with huge flash memory buffers.
 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
No, Banded Kestrel was an early internal name for the 2c4t counterpart to the 4c8t Raven Ridge (originally Great Horned Owl), so Zen 1 with Vega 10 plus VCN 1.0. It ended up only launching very late for some reasons (likely the Raven Ridge dies were cheap enough to be used for cut down chips like the Athlon APUs) so the only product officially using it right now is the R1000 embedded series.

Old slides from early 2016:
9-1080.712527602.jpg

12-1080.2438300089.jpg
Turns out this Banded Kestrel/Raven2/R1000 die is actually used in the new Athlon 3000G, that's pretty cool. (Don't think the Dual 10G NIC is accessible on AM4 though.)

 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
It's also used in the 300U.
To be honest I expected it to be only used in embedded and mobile for longer. It making it to the desktop as well and essentially replacing the whole previous Athlon line there tells me the die is more profitable for AMD there as well already.
 

uzzi38

Platinum Member
Oct 16, 2019
2,607
5,822
146
To be honest I expected it to be only used in embedded and mobile for longer. It making it to the desktop as well and essentially replacing the whole previous Athlon line there tells me the die is more profitable for AMD there as well already.

Well I don't think I've seen a delidded 300GE, but I'd imagine that is also the Raven2 die.
Also, with Dali coming in < a year clearing as much stock as possible is probably a good idea.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
RedGamingTech rumor:
- Zen 3 has 40 - 50% higher IPC in FPU workloads
- Zen 3 has 40% increased L1 cache bandwidth

If this is true, Zen 3 is a beast.




My comments:
- +40% FPU IPC boost cannot be done by AVX512. Probably AMD widened FPU pipes from 4 to 8 pipes.
- this suits SMT4. In theory Zen3 could double FPU throughput with SMT4. Or +40% with SMT2.
- there must be some significant upgrade also in ALUs+AGLUs to avoid bottle-necking of FPU. So maybe 6xALUs?
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,650
1,853
136
RedGamingTech rumor:
- Zen 3 has 40 - 50% higher IPC in FPU workloads
- Zen 3 has 40% increased L1 cache bandwidth

If this is true, Zen 3 is a beast.

Doesn't necessarily mean general FP32 workloads, it could mean FP4 or FP8 for AI/ML workloads.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,670
3,788
136
RedGamingTech rumor:
- Zen 3 has 40 - 50% higher IPC in FPU workloads
- Zen 3 has 40% increased L1 cache bandwidth

If this is true, Zen 3 is a beast.




My comments:
- +40% FPU IPC boost cannot be done by AVX512. Probably AMD widened FPU pipes from 4 to 8 pipes.
- this suits SMT4. In theory Zen3 could double FPU throughput with SMT4. Or +40% with SMT2.
- there must be some significant upgrade also in ALUs+AGLUs to avoid bottle-necking of FPU. So maybe 6xALUs?

I've heard the L1 bandwidth rumor. Sounds plausible and should be a nice benefit. I'm not buying the 40-50% better IPC in FPU though. There will be no SMT4. I doubt we see 6 ALU's either. I do think they need another ALU or two, but two at once seems a bit much. Not to mention, they'd probably want to add an AGU if you have 6 ALU's. Seems like too much die space especially if they bump up the L3 again. Maybe we will see something like that on 5nm.
 
  • Like
Reactions: soresu and amd6502

Richie Rich

Senior member
Jul 28, 2019
470
229
76
I've heard the L1 bandwidth rumor. Sounds plausible and should be a nice benefit. I'm not buying the 40-50% better IPC in FPU though. There will be no SMT4. I doubt we see 6 ALU's either. I do think they need another ALU or two, but two at once seems a bit much. Not to mention, they'd probably want to add an AGU if you have 6 ALU's. Seems like too much die space especially if they bump up the L3 again. Maybe we will see something like that on 5nm.
Why would they boost L1 bandwidth without IPC increase? Any idea?
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
I got the impression that the PHY was somewhat less than efficient, at least for 10G BASE-T

Something like ports still drawing a lot of power even on more modern fab processes for the chips.

Perhaps owing to it being designed before certain assumptions about how much chip area they could spend on it or something, but as you say considering the multitude of other standards it does seem odd.

Though bare in mind those ultra high data rate consumer standards aren't meant to confer data over a significant range like Ethernet cables do, at least not without some signal loss on purely copper variants?

Ethernet does have much longer distance cabling, but I don’t actually need that. The cluster I am dealing with fits in one cabinet. I don’t think there is any cheaper options for short distance cabling though. I at least have a switch with 10 Gb uplink from the nfs server.
 
  • Like
Reactions: soresu