Speculation: Ryzen 3000 series

Page 68 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

What will Ryzen 3000 for AM4 look like?


  • Total voters
    230

DrMrLordX

Lifer
Apr 27, 2000
21,634
10,850
136
The dead horse has been sufficiently beaten, can we please bury the poor thing and move on.

Could you ship it to Citadel Station? SHODAN needs some more corpses. Thanks.

edit: wrt the memory latency of the sample, that seems really fishy. What CAS/CL is it running? What timings?
 
Last edited:

PotatoWithEarsOnSide

Senior member
Feb 23, 2017
664
701
106
I think that in the era of "fake news" the concept of what it means to be fake has been lost; something being incorrect does not make it fake.
 

PotatoWithEarsOnSide

Senior member
Feb 23, 2017
664
701
106
Regarding that benchmark, and the latency graph, the various articles are suggesting that the steep rise in latency between 16MB and 32MB indicates 32MB L3 cache, but if the rise is between the two then that must indicate that its 16MB L3, not 32MB, surely? What am I missing?
 

maddie

Diamond Member
Jul 18, 2010
4,744
4,679
136
Regarding that benchmark, and the latency graph, the various articles are suggesting that the steep rise in latency between 16MB and 32MB indicates 32MB L3 cache, but if the rise is between the two then that must indicate that its 16MB L3, not 32MB, surely? What am I missing?
My thoughts also. Cache harvested chiplet for now or only Rome to have the full cache? Cache should be close to 50% die area.
 

dnavas

Senior member
Feb 25, 2017
355
190
116
wrt the memory latency of the sample, that seems really fishy.

Does it? Cross-CCX latency was 120ns-ish on the 1800X, so if the separate IO die is only adding (say) 30ns to RAM access, that actually sounds on the better side of what could have been. Clearly the separate I/O die is going to add latency, that isn't a surprise....
 

Hitman928

Diamond Member
Apr 15, 2012
5,313
7,960
136
Does it? Cross-CCX latency was 120ns-ish on the 1800X, so if the separate IO die is only adding (say) 30ns to RAM access, that actually sounds on the better side of what could have been. Clearly the separate I/O die is going to add latency, that isn't a surprise....

Signals travel really fast in such a short space. Propagation delay will be really small by putting IO on a separate die. The buffering logic will probably be the main contributor of signal delay. I'd be surprised if it added even close to 30 ns of latency.

I'm not as familiar with the inner workings of IF and such, how much serializing are they doing from chiplet to IO, does anyone know? Is it fully parallel?
 
  • Like
Reactions: Atari2600

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
Regarding that benchmark, and the latency graph, the various articles are suggesting that the steep rise in latency between 16MB and 32MB indicates 32MB L3 cache, but if the rise is between the two then that must indicate that its 16MB L3, not 32MB, surely? What am I missing?

Well that's the first time the test hits a real higher latency point. That could be one of three things, that could be cross CCX talk (ala Ryzen 1 and 2), that could be cross MCM (chiplet or IO die) or it could be when it heads out to memory. I am more wierded out that it continues at that latency all the way up to 128 MB. If its cross CCX then it would mean that a CHiplet has 32MB of cache (so a 64MB L3 for CPU). If it's at the Chiplet level then 32MB cache per CPU. If it's to the memory then that looks pretty bad.
 

Hitman928

Diamond Member
Apr 15, 2012
5,313
7,960
136
Well that's the first time the test hits a real higher latency point. That could be one of three things, that could be cross CCX talk (ala Ryzen 1 and 2), that could be cross MCM (chiplet or IO die) or it could be when it heads out to memory. I am more wierded out that it continues at that latency all the way up to 128 MB. If its cross CCX then it would mean that a CHiplet has 32MB of cache (so a 64MB L3 for CPU). If it's at the Chiplet level then 32MB cache per CPU. If it's to the memory then that looks pretty bad.

It sees a significant rise at 16 MB and then spikes at 32 MB where it stays level (actually decreases but I'm going to assume that's some sort of anomaly). If it was 32 MB I would expect it would increase when trying to use more. . .
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
It sees a significant rise at 16 MB and then spikes at 32 MB where it stays level (actually decreases but I'm going to assume that's some sort of anomaly). If it was 32 MB I would expect it would increase when trying to use more. . .
It would be nice if they broke them down mid level instead of testing at 8mb then 16, then 32. But apparently if you look at this https://www.userbenchmark.com/UserRun/14075612#MEMORY_KIT 2920x It follows the same process. I don't know why neither tanks after it hits Cache limits. But if I had to guess is that each chiplet is either one 8 core mesh (or one 8c CCX) or AMD really fixed the cross CCX latency. That each chiplet is 32mb and a dual chiplet CPU will be upwards of 64MB.
 
  • Like
Reactions: spursindonesia

naukkis

Senior member
Jun 5, 2002
706
578
136
Regarding that benchmark, and the latency graph, the various articles are suggesting that the steep rise in latency between 16MB and 32MB indicates 32MB L3 cache, but if the rise is between the two then that must indicate that its 16MB L3, not 32MB, surely? What am I missing?

If Zen2 is still 2CCX design it has 2 L3 caches like Zen1, and memory access latency rises after 16MB L3 chunk like it does after 8MB with Zen1.
 

PotatoWithEarsOnSide

Senior member
Feb 23, 2017
664
701
106
It would be nice if they broke them down mid level instead of testing at 8mb then 16, then 32. But apparently if you look at this https://www.userbenchmark.com/UserRun/14075612#MEMORY_KIT 2920x It follows the same process. I don't know why neither tanks after it hits Cache limits. But if I had to guess is that each chiplet is either one 8 core mesh (or one 8c CCX) or AMD really fixed the cross CCX latency. That each chiplet is 32mb and a dual chiplet CPU will be upwards of 64MB.
I'm not sure it is following the same pattern.
That TR graph shows clearly that there's a difference between going across CCX than going to memory. It goes cross CCX between 8MB and 16MB, then to memory beyond that.
The ES goes straight from 16MB L3 to what looks like memory by the point it hits 32MB. I would be expecting the big jump to be after 32MB if there was 32MB L3, but we see that it is already taking that hit at 32MB.
No idea why it then drops off slightly beyond that though. Seems unusual.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
The only bright spot from that graph is the fact that each CCX seems to have access to ~16MB of cache. I doubt it is cut to 12MB, as the rise would be much steeper when block size nears 16MB. What is not nice is what happens once workload does not fit L3 of CCX anymore, latency numbers look not good.

Btw it is 12C24T, so it supposedly has 2 chiplets with 64MB of L3 total?
 
  • Like
Reactions: IEC

PotatoWithEarsOnSide

Senior member
Feb 23, 2017
664
701
106
The only bright spot from that graph is the fact that each CCX seems to have access to ~16MB of cache. I doubt it is cut to 12MB, as the rise would be much steeper when block size nears 16MB. What is not nice is what happens once workload does not fit L3 of CCX anymore, latency numbers look not good.

Btw it is 12C24T, so it supposedly has 2 chiplets with 64MB of L3 total?

https://www.techpowerup.com/249952/amd-doubles-l3-cache-per-ccx-with-zen-2-rome
That fits with earlier data with regards to Rome.
 
  • Like
Reactions: exquisitechar