Speculation: Ryzen 4000 series/Zen 3

Page 39 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ajay

Lifer
Jan 8, 2001
15,431
7,849
136
thats not the way its described by Mike Clarke at hotchips (unless im miss interpreting you) , i linked it in this thread somewhere. Each core has a path to each L3 slice, a buffer sits infront of each L3 slice and thats what the cores request/write to. Hashing is based of memory address, so a core knows which slice to check in to see if data or shadow tags reside there.
Yes, it’s fully meshed - but I assumed the buffers were connected to the cache controllers. All this interplay between the coherency protocol and local cache line read/write policy needs some pretty solid logic. Given the apparent size of the L3$CTL unit, the buffers seems to be a decent size (assuming I know what I’m looking at - we don’t get the same level of detail on physical design now that CPUs have massive xtor counts).
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,829
136
The only connection between the cores are between the L3 cache controllers. The memory hierarchy must be kept consistent, coherent etc. That’s all.

Then why the different latency times between cores in a CCX and L3?
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
Then why the different latency times between cores in a CCX and L3?

Presumably because the L3 is not a perfect square of silicon (closer to rectangular), and a diagonal link would have longer wire lengths than a straight one maybe.
 

Ajay

Lifer
Jan 8, 2001
15,431
7,849
136
Then why the different latency times between cores in a CCX and L3?
So there are significant differences in access times between between one core and the four L3 slices (L2 -> L3 cache line eviction time for example)?
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,829
136
So there are significant differences in access times between between one core and the four L3 slices (L2 -> L3 cache line eviction time for example)?

It seems so. At least according to my system measurements on Matisse, intraCCX latency (core-to-core) is 15-16ns while L3 cache latency is 20ns. That's only 4-5ns, but by percentage . . .
 

Ajay

Lifer
Jan 8, 2001
15,431
7,849
136
It seems so. At least according to my system measurements on Matisse, intraCCX latency (core-to-core) is 15-16ns while L3 cache latency is 20ns. That's only 4-5ns, but by percentage . . .
What the heck does core to core even mean? What tool are you using?
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
Hot and spicy rumor, get out your table salt.

AMD has canned all 7nm+(7nm EUV) projects that are APU/CPU/GPU 2.5D/3D. AMD instead will be launching them on the 5nm node at the soonest.

7nm DUV, 7nmP DUV, 5nm EUV is the new timeline from an unnamed individual from TW.

Stuff to watch out for: RRD. Three new CCDs at TSMC. Of which one is for the new Renoir(7nm) and its successor Rembrandt(5nm). The D one is for a semi-custom project on 6nm(not AMD mainline, however it might get pushed to 5nm).

TSMC doesn't like the 7nm/7nm+ as first customer, so they want AMD to be on 5nm.
 
Last edited:
  • Like
Reactions: NTMBK

TheGiant

Senior member
Jun 12, 2017
748
353
106
Hot and spicy rumor, get out your table salt.
yep, this one deserves

1.jpg
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,829
136
What the heck does core to core even mean? What tool are you using?

SiSoft Sandra. Hold on, let me re-run it and I'll cap some screenies. These tests were run at 4375 MHz. Intercore latency:

Sandracore.png

(this pattern is repeated for all the other permutations)

Note that U0-U1 is actually 11.4ns this time around, whereas the last time I ran it that score and other similar ones were in the 15-16ns range. Not sure what happened to lower those scores but whatever. Might be the static clockspeed, might be something else. Anyway we can see the lowest latency when one core is essentially communicating with itself, and I have to confess that I screwed up in interpreting this data last time around since I forgot that logical cores were included in this benchmark. Oops! Anyway, looking at U0-U2 we can see that communication within a CCX is actually taking around 26.5ns since U2 and U4 are cores distinct from U0. U8 is on a different CCX, and that took ~65ns. Moving right along:

sandracache.png

Ignoring the obviously-wrong memory latency of 28.7ns (lulz), we see cache latency. L3 is 19 clocks this time around - better than the result of 20 that I got last time with this benchmark. And that comes out to . . . 4.34ns? Nahhh that can't be right. Can it? Either Sandra is smoking the good stuff or I'm reading this data wrong!

Anyway I can say that L3 cache latency isn't lining up with intercore latency on the CCX, at least not according to Sandra. Here's a sample of a 3900x from overclockers.com


Specifically:


Which shows 9.6ns. Still very fast! And much faster than the core->core latency within a CCX reported by Sandra. Which, mind you, might be smoking the good stuff . . .
 
  • Love
Reactions: cytg111

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,829
136
Memory latency testing is hard, all the popular tools out there are just wrong. Core-to-core roundtrip is basically 2x L3 latency, which makes sense as that's essentially how it works.

So what is supposed to be the normal L3 latency of a 3900x? According to the one review I found that actually reported L3 latency, they claimed 9.6 ns. That's low enough that the reported core-to-core scores I got out of Sandra are still significantly longer than what should be reported assuming the review was even remotely correct (should have been around 19ns, not 26ns).
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
TSMC doesn't like the 7nm/7nm+ as first customer, so they want AMD to be on 5nm.
My only response to this is, huh?

While the timing in availability of a certain process may effect what a customer uses for their design, the fab corporations don't actually dictate what process a customer runs their designs on.

TSMC are already expanding their capacity due to increased demand (link), it would be insane to push a customer to a more advanced process which is not yet commercially viable with lower yields.

5nm may well get used for Zen4, but the lack of a commitment to this on their CPU roadmap says something about AMD wavering on this, possibly Samsung 3nm/MBCFET may be ready in time with superior power efficiency by the time Zen4 is design complete.
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
So what is supposed to be the normal L3 latency of a 3900x? According to the one review I found that actually reported L3 latency, they claimed 9.6 ns. That's low enough that the reported core-to-core scores I got out of Sandra are still significantly longer than what should be reported assuming the review was even remotely correct (should have been around 19ns, not 26ns).
Is not L3 latency meaning just L2 -> L3, whereas core-core as you put it would be L2 -> L3 -> L3 -> L2?

Could the latency scores not be affected by occupancy due to other things running on the CCD other than the benchmarking software?

ie the OS and background processes.
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,829
136
Is not L3 latency meaning just L2 -> L3, whereas core-core as you put it would be L2 -> L3 -> L3 -> L2?

If it were L2-L3-L3-L2, then that works out to around 25ns. So that makes more sense.

Could the latency scores not be affected by occupancy due to other things running on the CCD other than the benchmarking software?

That is always true of memory and cache benchmarks. I don't think you can 100% accurately measure such things without booting software dedicated to that purpose.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
While the timing in availability of a certain process may effect what a customer uses for their design, the fab corporations don't actually dictate what process a customer runs their designs on.
TSMC has an inverse relationship with AMD than AMD has with GlobalFoundries. AMD can cancel nodes with GlobalFoundries, but AMD can't cancel nodes with TSMC. AMD needs TSMC, however TSMC only wants AMD. With that TSMC wants DL/ML instructions which aren't present in 7nm/7nm+ designs, but are in 5nm. Either AMD steps up, or TSMC is dropping them for HiSilicon(server component).
TSMC are already expanding their capacity due to increased demand (link), it would be insane to push a customer to a more advanced process which is not yet commercially viable with lower yields.
"According to TSMC, $1.5 billion of the $4 billion will be spent to increase its 7 nm capacity, whereas $2.5 billion will be used to increase 5 nm capacity."

More money is going for 5nm. With that 5nm has higher yields than 7nm. 5nm is being done at a discrete foundry, where 7nm/7nm+ are being shared with other lines. Only so much room for 7nm before they have to cut out 16nm(/12nm)/28nm(/22nm), both high revenue sources.
 
Last edited:

VirtualLarry

No Lifer
Aug 25, 2001
56,327
10,035
126
With that TSMC wants DL/ML instructions which aren't present in 7nm/7nm+ designs
This statement makes no sense. TSMC is a fab. They don't demand / care what opcodes a particular processor being fabbed there support, or if it's even a processor.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
This statement makes no sense. TSMC is a fab. They don't demand / care what opcodes a particular processor being fabbed there support, or if it's even a processor.
TSMC is on EPYC, TSMC is the first customer. If TSMC doesn't like it, they will cancel. Hi1630, >3 GHz, SMT, SVE w/ ML+DL, yes DDR5 vs >2.75 GHz, SMT, AVX w/o ML+DL, no DDR5. Hence, TSMC is kicking AMD towards 5nm.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,327
10,035
126
TSMC is on EPYC, TSMC is the first customer. If TSMC doesn't like it, they will cancel.
So, you're saying that TSMC, is not only fabbing the processors, they are also a processor customer? And they have attached stipulations in their fabbing contract with AMD, specifying features that the processors must have, for them to be a processor customer, AND fab the processors? That sounds down-right unethical to me.

If TSMC gets a paying customer who wants them to fab a chip, and TSMC wants to use those customer's chips internally, then to me, that seems like they should be separate deals.

Edit: So what you are telling me is, that TSMC is threatening a paying customer, with not fabbing their chip, unless their demands on the chips final designs are mandated from the chip designer?

What if TSMC wanted Apple's A13 CPUs to contain a back-door that TSMC and their gov't could use to access all of Apple's customer's iPhones, before TSMC would fab their chips for them. Do you think that would somehow similarly be ethical business behavior?

Edit: And is TSMC refusing to fab any ARM-architecture-derivative CPUs, that don't contain appropriate DL/ML opcodes? If not, then that sets up a pretty clear case of non-fab-related discrimination against their own customers. I would think that would go over like a lead balloon.

How many 3rd-tier ARM CPUs are fabbed by TSMC for cheap Chinese low-end Android phones? Are they refusing to fab those chips (probably on something older than 7nm), if they don't customize their "stock" ARM CPU designs, to add DL/ML opcodes just for TSMC to utilize, if they so wish? Or insert backdoors, as another example?

Pardon me while I get my larger salt-shaker.

Edit: I guess that I really don't like the concept, that a foundry, would get actively involved in architecture design for the products being fabbed, other than things like cell support, and fab process constraint-related issues. That just rubs me the wrong way round.
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
So what you are telling me is, that TSMC is threatening a paying customer, with not fabbing their chip, unless their demands on the chips final designs are mandated from the chip designer?
It isn't quid pro quo.

TSMC needs that ML/DL acceleration to keep up their Moore's Law gambit. AMD is willing to completely change their plans for the betterment of TSMC. AMD being a small order customer, compared to the other leading edge customers. AMD's fate is tied to how much they can offer TSMC, compared to natural customers.

Like always, it is the victims' fault => AMD.
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
TSMC has an inverse relationship with AMD than AMD has with GlobalFoundries. AMD can cancel nodes with GlobalFoundries, but AMD can't cancel nodes with TSMC. AMD needs TSMC, however TSMC only wants AMD. With that TSMC wants DL/ML instructions which aren't present in 7nm/7nm+ designs, but are in 5nm. Either AMD steps up, or TSMC is dropping them for HiSilicon(server component).
So you are implying that TSMC is actually a customer of AMD even as they fab for them?

I doubt that anything AMD lacks in ML capability is also lacking with nVidia, given their concentration in this area for some time.

Also AMD and nVidia aren't the only players in this arena, even Google have designed their own TPU hardware - if TSMC was so desperate for it, they would probably design it themselves rather than depend on a customer.

TSMC is even less likely to depend on a Chinese design vendor than an American one, given the political problems Taiwan has with China.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
So you are implying that TSMC is actually a customer of AMD even as they fab for them?
TSMC's "partnership" w/ AMD isn't just IT back-end w/ EPYC, they want the whole backbone(process simulation, co-design simulations, all of it) to be EPYC/INSTINCT, etc. TSMC is AMD's first customer, they profit from outward sells(AMD buys from them) and reduced cost inward purchases(TSMC gets EPYC/INSTINCT at lower risk&cost). AMD has proven that they can quickly go from node to node.

AMD is technically already sampling the 5nm products for TSMC already:
tsmc.png

It might seem bad, but TSMC is moving AMD to its better interests. Just don't expect TSMC to keep AMD if they can't keep up.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,327
10,035
126
Just don't expect TSMC to keep AMD if they can't keep up.
See, again, that strikes me as an implied quid-pro-quo, and seems ... unethical, to me.

You're directly implying that TSMC won't fab for AMD in the future, if AMD's architecture designs aren't up to TSMC (as a CPU customer) wishes.

Edit: And I really don't think that you should spread rumors of this sort of thing, unless you actually have any proof. This is the kind of crap that can affect stock prices.
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
Just don't expect TSMC to keep AMD if they can't keep up.
They might drop AMD as a supplier, but not as a customer - that would be basically refusing revenue they get from AMD, which won't be trivial given how well Zen2 is doing.
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
TSMC's "partnership" w/ AMD isn't just IT back-end w/ EPYC, they want the whole backbone(process simulation, co-design simulations, all of it) to be EPYC/INSTINCT, etc. TSMC is AMD's first customer, they profit from outward sells(AMD buys from them) and reduced cost inward purchases(TSMC gets EPYC/INSTINCT at lower risk&cost). AMD has proven that they can quickly go from node to node.
You are basically implying a severely unequal relationship here, with TSMC benefiting far more than AMD.

You imply AMD sells Epyc/Instinct to them for lower cost than other customers - but gets no likewise reduction in fab costs to recompense them.
 
  • Like
Reactions: spursindonesia