Speculation: Ryzen 4000 series/Zen 3

inf64 · Nov 27, 2019

Some food for thought.
AMD initially targeted 40% IPC gain for a brand new core (ZEN1) over previous generation (Carizzo). 40% figure is just to give us an idea what AMD had in mind and what they thought is achievable. Fast forward to 2020 and if Zen3 offers a 15-20%(mean 17.5%) IPC boost over Zen2, we end up with a cumulative boost of 1.035(Zen+)x1.15(Zen2)x1.175(Zen3)=1.4 or 40%.

As Forest Norrod stated Zen1 was a tock and Zen3 will be a tock (Zen2 and Zen4 are/will be ticks, but not traditional ones as he noted).Zen1 was targeted for 40% IPC jump, Zen3 might be targeted for the same goal and as can be seen above it's easily achievable. AMD overshot the target for both Zen1 tock and Zen2 tick so that bodes well for a tock that Zen3 is.

ClockHound · Nov 27, 2019

inf64 said:
Some food for thought.
AMD initially targeted 40% IPC gain for a brand new core (ZEN1) over previous generation (Carizzo). 40% figure is just to give us an idea what AMD had in mind and what they thought is achievable. Fast forward to 2020 and if Zen3 offers a 15-20%(mean 17.5%) IPC boost over Zen2, we end up with a cumulative boost of 1.035(Zen+)x1.15(Zen2)x1.175(Zen3)=1.4 or 40%.

As Forest Norrod stated Zen1 was a tock and Zen3 will be a tock (Zen2 and Zen4 are/will be ticks, but not traditional ones as he noted).Zen1 was targeted for 40% IPC jump, Zen3 might be targeted for the same goal and as can be seen above it's easily achievable. AMD overshot the target for both Zen1 tock and Zen2 tick so that bodes well for a tock that Zen3 is.

I like the way you're tocking. Keep on ticking too. ;-)

Ajay · Nov 27, 2019

soresu said:
Yeah, that had confused me in the light of its absence in high end desktop chipsets - though 10 Gig BASE T market penetration has been woefully slow/stagnant in general given it is more than a decade old now.

I would imagine that with so much office work relying on server and cloud based apps - there isn’t much need to move large enough amounts of data to merit 10G networks. All the bottlenecks are elsewhere.

amd6502 · Nov 27, 2019

- though 10 Gig BASE T market penetration has been woefully slow/stagnant in general given it is more than a decade old now.

I think the reason it hasn't caught on outside of servers is that 1GbE is already quite fast an very adequate for most purposes.

Thunder 57 · Nov 28, 2019

inf64 said:
Some food for thought.
AMD initially targeted 40% IPC gain for a brand new core (ZEN1) over previous generation (Carizzo). 40% figure is just to give us an idea what AMD had in mind and what they thought is achievable. Fast forward to 2020 and if Zen3 offers a 15-20%(mean 17.5%) IPC boost over Zen2, we end up with a cumulative boost of 1.035(Zen+)x1.15(Zen2)x1.175(Zen3)=1.4 or 40%.

As Forest Norrod stated Zen1 was a tock and Zen3 will be a tock (Zen2 and Zen4 are/will be ticks, but not traditional ones as he noted).Zen1 was targeted for 40% IPC jump, Zen3 might be targeted for the same goal and as can be seen above it's easily achievable. AMD overshot the target for both Zen1 tock and Zen2 tick so that bodes well for a tock that Zen3 is.

Easily achievable? No. Also, no way Zen 3 raises IPC by 40%.

jamescox · Nov 28, 2019

amd6502 said:
I think the reason it hasn't caught on outside of servers is that 1GbE is already quite fast an very adequate for most purposes.

The reason it hasn’t come down in price is that it isn’t really needed for home use which could make it a cheap commodity. It isn’t hard to do these days. A lot of consumer level tech is actually pushing higher speeds than 10G Ethernet. Video cables and such require ridiculous speeds to handle uncompressed 4k video and such. Due to the high prices, since it is still mostly server, I am stuck with a cluster connected by gigabit Ethernet at work still. I probably could get higher speeds than gigabit with a good AC WiFi router. It is ridiculous.

uzzi38 · Nov 28, 2019

Pretty sure he meant the jump from Zen -> Zen 3 is 40%.

If you take the 3% from Zen+ alongside the 13% from Zen 2, then he's not all THAT far off.

inf64 · Nov 28, 2019

Thunder 57 said:
Easily achievable? No. Also, no way Zen 3 raises IPC by 40%.

Re-read my post please. 40% is over Zen1.

jamescox · Nov 28, 2019

Thunder 57 said:
Easily achievable? No. Also, no way Zen 3 raises IPC by 40%.

IPC is application specific. There are two different areas where AMD does not have intel beat. One is certain high end server applications (database servers and such) that can make use of a large monolithic last level cache. Zen 2 can only access up to 16 MB from any one core with good latency. Intel xeons go up to 38.5 MB mostly. They have one specialized 55 MB part. AMD will be looking to surpass intel with Zen 3 in this area also. They need to make sure IT departments don’t have any excuses to continue buying intel. I hope that they have a large cache variant for niche server applications and HPC.

The other area is for AVX512 applications. It is obvious that some of the cases are software de-optimizations; intel software forcing bad code paths for competing products. I suspect that AMD may support AVX512 for Zen 3. Zen 2 only has 64 bytes per clock read and 32 bytes per clock write to the L1 cache. That fits pretty well with AVX256 since 256 bits = 32 bytes. Even if they want to increase the AVX256 throughput, they would probably need to double that to 128 bytes per clock read bandwidth. They could then support AVX512 as a full AVX512 unit, by combining two 256 bit units, or doing 512 bit instructions in 2 clocks. Intel has 2 AVX512 units in each core in some cpus though. AMD would need at least 4 AVX 256 units to match that, so they need to double the cache bandwidth over Zen 2. I tend to think that if you have code that can really make use of 512-bit units then you probably should be looking at running it on a gpu anyway, which is much more parallel with much more bandwidth available. The memory capacity issues on gpus will be much less with new HBM 2e. It allows 16 GB capacity per stack, so a 4 stack device could have 64 GB. I assume AMD wants people to use gpus more also, but they still need to keep up with intel or surpass them if possible.

About the only other thing is just raw clock speed. There are probably some applications that do respond to raw clock speed, but I don’t think they are that important. I have to wonder if the ridiculous low resolution gaming benchmarks are responding more to high clock than low memory latency. Intel is going to have a very hard time competing with themselves if that is the case. Their 10 and 7 nm parts are not going to clock as well as the 14 nm process that they have been tweaking for like 5+ years. This one isn’t going to be something AMD can do, but it looks like Intel can’t do it either. Saying that Intel is better for gaming seems ridiculous at this point. For actual reasonable quality settings, most benchmarks are gpu limited and there is no difference.

There doesn’t seem to be a core count increase for Zen 3, but I don’t think they really need it. I had thought that SMT 4 might be a replacement for not increasing the core count, but intel isn’t going to be able to compete on core count anytime soon. Also, after seeing the speed increases of the 3950x, 3960x and 3960x, I don’t think they will need a higher core count if Zen 3 is another large increase. The 2000 series ThreadRippers are way behind 3000 series parts, and that wasn’t even supposed to be that big of an upgrade.

So, 40% IPC in general is certainly not going to happen. For some specific cases, it may be possible. If you go from something that doesn’t fit in cache to something that does, the performance increase can be huge. They also could get massive performance increases if they double the floating point throughput again, but only for specific applications.

uzzi38 · Nov 28, 2019

AVX512 is not happening with Zen3. You can drop that line of thought. Maybe Zen4, but not with Zen3. There are way bigger fish to fry, AVX512 is still niche.

soresu · Nov 28, 2019

jamescox said:
The reason it hasn’t come down in price is that it isn’t really needed for home use which could make it a cheap commodity. It isn’t hard to do these days. A lot of consumer level tech is actually pushing higher speeds than 10G Ethernet. Video cables and such require ridiculous speeds to handle uncompressed 4k video and such. Due to the high prices, since it is still mostly server, I am stuck with a cluster connected by gigabit Ethernet at work still. I probably could get higher speeds than gigabit with a good AC WiFi router. It is ridiculous.

I got the impression that the PHY was somewhat less than efficient, at least for 10G BASE-T

Something like ports still drawing a lot of power even on more modern fab processes for the chips.

Perhaps owing to it being designed before certain assumptions about how much chip area they could spend on it or something, but as you say considering the multitude of other standards it does seem odd.

Though bare in mind those ultra high data rate consumer standards aren't meant to confer data over a significant range like Ethernet cables do, at least not without some signal loss on purely copper variants?

soresu · Nov 28, 2019

uzzi38 said:
AVX512 is not happening with Zen3. You can drop that line of thought. Maybe Zen4, but not with Zen3. There are way bigger fish to fry, AVX512 is still niche.

To the point that even Intel's own brand encoders (SVT) get a sound smacking from Epyc despite their AVX512 opts currently, though more will follow and likely close the gap some.

soresu · Nov 28, 2019

jamescox said:
The memory capacity issues on gpus will be much less with new HBM 2e. It allows 16 GB capacity per stack, so a 4 stack device could have 64 GB.

Max stack height (for the spec) is 12 now, with density per die/layer being 16 Gbit - so max density per stack can go as high as 24 GB, though no vendors have actually announced products yet to that effect.

Though Samsung recently claimed to have cracked 12 high 3D/TSV packaging, so we will likely hear an announcement soon enough.

Even with 96 GB for 4 stacks, the DRAM capacity per card still lags painfully far behind main system memory per socket (2 TB+?) - this is I think the main impetus for those Radeon SSG cards with huge flash memory buffers.

moinmoin · Nov 28, 2019

moinmoin said:
No, Banded Kestrel was an early internal name for the 2c4t counterpart to the 4c8t Raven Ridge (originally Great Horned Owl), so Zen 1 with Vega 10 plus VCN 1.0. It ended up only launching very late for some reasons (likely the Raven Ridge dies were cheap enough to be used for cut down chips like the Athlon APUs) so the only product officially using it right now is the R1000 embedded series.

Old slides from early 2016:

DisEnchantment said:
AMD has Dual 10G NIC in the embedded R1000 series 2C/4T w/ Vega 3.

https://www.amd.com/en/products/embedded-ryzen-r1000-series?gclid=Cj0KCQiA2vjuBRCqARIsAJL5a-I2CZXdni4vbQNUMax0LCAFSDcPqR5h-4G08SIHpv144k2-oxQg_mYaAgEaEALw_wcB

Turns out this Banded Kestrel/Raven2/R1000 die is actually used in the new Athlon 3000G, that's pretty cool. (Don't think the Dual 10G NIC is accessible on AM4 though.)

https://twitter.com/x/status/1199354469227487232

uzzi38 · Nov 28, 2019

moinmoin said:
Turns out this Banded Kestrel/Raven2/R1000 die is actually used in the new Athlon 3000G, that's pretty cool. (Don't think the Dual 10G NIC is accessible on AM4 though.)

https://twitter.com/x/status/1199354469227487232

It's also used in the 300U.

moinmoin · Nov 28, 2019

uzzi38 said:
It's also used in the 300U.

To be honest I expected it to be only used in embedded and mobile for longer. It making it to the desktop as well and essentially replacing the whole previous Athlon line there tells me the die is more profitable for AMD there as well already.

uzzi38 · Nov 28, 2019

moinmoin said:
To be honest I expected it to be only used in embedded and mobile for longer. It making it to the desktop as well and essentially replacing the whole previous Athlon line there tells me the die is more profitable for AMD there as well already.

Well I don't think I've seen a delidded 300GE, but I'd imagine that is also the Raven2 die.
Also, with Dali coming in < a year clearing as much stock as possible is probably a good idea.

Richie Rich · Nov 28, 2019

RedGamingTech rumor:
- Zen 3 has 40 - 50% higher IPC in FPU workloads
- Zen 3 has 40% increased L1 cache bandwidth

If this is true, Zen 3 is a beast.

My comments:
- +40% FPU IPC boost cannot be done by AVX512. Probably AMD widened FPU pipes from 4 to 8 pipes.
- this suits SMT4. In theory Zen3 could double FPU throughput with SMT4. Or +40% with SMT2.
- there must be some significant upgrade also in ALUs+AGLUs to avoid bottle-necking of FPU. So maybe 6xALUs?

soresu · Nov 28, 2019

Richie Rich said:
RedGamingTech rumor:
- Zen 3 has 40 - 50% higher IPC in FPU workloads
- Zen 3 has 40% increased L1 cache bandwidth

If this is true, Zen 3 is a beast.

Doesn't necessarily mean general FP32 workloads, it could mean FP4 or FP8 for AI/ML workloads.

Thunder 57 · Nov 28, 2019

inf64 said:
Re-read my post please. 40% is over Zen1.

Indeed. I misread your post, my apologies.

Thunder 57 · Nov 28, 2019

Richie Rich said:
RedGamingTech rumor:
- Zen 3 has 40 - 50% higher IPC in FPU workloads
- Zen 3 has 40% increased L1 cache bandwidth

If this is true, Zen 3 is a beast.

My comments:
- +40% FPU IPC boost cannot be done by AVX512. Probably AMD widened FPU pipes from 4 to 8 pipes.
- this suits SMT4. In theory Zen3 could double FPU throughput with SMT4. Or +40% with SMT2.
- there must be some significant upgrade also in ALUs+AGLUs to avoid bottle-necking of FPU. So maybe 6xALUs?

I've heard the L1 bandwidth rumor. Sounds plausible and should be a nice benefit. I'm not buying the 40-50% better IPC in FPU though. There will be no SMT4. I doubt we see 6 ALU's either. I do think they need another ALU or two, but two at once seems a bit much. Not to mention, they'd probably want to add an AGU if you have 6 ALU's. Seems like too much die space especially if they bump up the L3 again. Maybe we will see something like that on 5nm.

Richie Rich · Nov 28, 2019

Thunder 57 said:
I've heard the L1 bandwidth rumor. Sounds plausible and should be a nice benefit. I'm not buying the 40-50% better IPC in FPU though. There will be no SMT4. I doubt we see 6 ALU's either. I do think they need another ALU or two, but two at once seems a bit much. Not to mention, they'd probably want to add an AGU if you have 6 ALU's. Seems like too much die space especially if they bump up the L3 again. Maybe we will see something like that on 5nm.

Why would they boost L1 bandwidth without IPC increase? Any idea?

jamescox · Nov 28, 2019

soresu said:
I got the impression that the PHY was somewhat less than efficient, at least for 10G BASE-T

Something like ports still drawing a lot of power even on more modern fab processes for the chips.

Perhaps owing to it being designed before certain assumptions about how much chip area they could spend on it or something, but as you say considering the multitude of other standards it does seem odd.

Though bare in mind those ultra high data rate consumer standards aren't meant to confer data over a significant range like Ethernet cables do, at least not without some signal loss on purely copper variants?

Ethernet does have much longer distance cabling, but I don’t actually need that. The cluster I am dealing with fits in one cabinet. I don’t think there is any cheaper options for short distance cabling though. I at least have a switch with 10 Gb uplink from the nfs server.

soresu · Nov 28, 2019

Richie Rich said:
Why would they boost L1 bandwidth without IPC increase? Any idea?

IPC is not a singular figure, mere boosts to the memory/cache system of Cortex-A12 led to its quick revision/renaming to A17 and a boost from 3.5 to 4 DMIPS per clock.

It's not impossible that a boost to the L1 could increase IPC all by itself by allowing current resources to be better utilised, though likely not a huge change, probably Zen+ level at most without further changes.

Thunder 57 · Nov 28, 2019

Richie Rich said:
Why would they boost L1 bandwidth without IPC increase? Any idea?

There will be IPC increases. We just don't know where they are coming from just yet.

Speculation: Ryzen 4000 series/Zen 3

Diamond Member

Golden Member

Lifer

Senior member

Diamond Member

Senior member

Platinum Member

Diamond Member

Senior member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Diamond Member