if their plan is to replace kaby lake Y series with a lakefield successor in late 2021 or early 2022 i think van gogh has nothing to fear from intel.Considering Intel has stated that Tiger Lake is a 10-65nm design, you will likely never see parts that are <10W. Both Intel and AMD rarely release 6W parts. AMD’s 6W parts are still 14nm as an example.
Only thing I'll say for a long time in regards to AM5 is a lot of possibilities are opened with the new socket.Notice how there is a Navi IGP on Raphael, it looks like CPU-only is going to disappear.
It's literally Renoir but Zen 3 cores.Also Cezanne is Vega 7? WTH AMD.
Not sure how you brought 5nm into this.Acktually Cezanne's Vega isn't like Renoir's Vega.
Renoir's Vega re-uses Vega2x/MI50/MI60/RVII w/ RDNA1 blocks.
Cezanne's Vega re-uses Vega-H(Non-numbered name of Vega3x/Arct1x)/MI100 w/ RDNA2 blocks.
Of which, CPUs and GPUs were leaked to be moved to 5nm in 1H2020. However, the leaks don't specify when the decision was made. Which was shortly after trial production in 2018 and before risk production in 2019.
5nm didn't just miraculously appear in 2019...
View attachment 28515
N5P isn't as far either:
"Design kits of N5P technology will be available in the next N5 revision in the second quarter of 2020."
And if you go all the way back to 2016:
View attachment 28516
We have. But I cannot remember any consensus building around your interpretation. Actually, I'm surprised that you maintain this interpretation and present it as fact. For those interested, see discussion earlier in this thread:We've been over this before.
As pointed out in previous discussion, L3 access latency uniformity is achieved by address interleaving.Yes, very sure, because otherwise one of the L3 slices would be much faster than the other 3, due to only having to do one hop. Their speeds are too similar for this.
Do you have a link or quotation?no Tuna is right, its covered in Zen1 Hot chips Q&A when the intel guys keeps asking him cache questions. The way Micheal Clarke describes how each single ported cache selection handles requests.
He says "Every core sees the same average latency out of the L3" which seems to support @Tuna-Fish 's point.PS. Here is Michael Clark's presentation at Hot Chips 2016, discussing the L3 (note what he says about address interleaving and average latency):
I'm surprised that you come to that conclusion. The way it sounds to me, Clark makes the point that memory interleaving is used to achieve uniform latency on average, thus weakening, not supporting, the slice-aware L2 interpretation.He says "Every core sees the same average latency out of the L3" which seems to support @Tuna-Fish 's point.
i was pretty specific in stating where it wasDo you have a link or quotation?
I suspect there is confusion. The idea that each L2 controller is a 4-link crossbar to the L3 slices seems wasteful (complexity, area, power). It makes address interleaving pointless also, as all L3 cache slices would have equal distance. Address interleaving is pointed out on the AMD slides referred to earlier.
PS. Here is Michael Clark's presentation at Hot Chips 2016, discussing the L3 (note what he says about address interleaving and average latency):
it totally is,Anyway, my point is that uniform latency is not an argument for Tuna-Fish's interpretation. Considering he has pointed out that his interpretation is resting on this fallacy, it makes me suspect confusion and incorrect interpretation.
Sorry, but to me this doesn't clarify anything about the topology issue. They are discussing single vs multiple transfers at a single point in time (single-ported vs multi-ported), and Clark is reluctant to go into detail, only saying "we have buffering around it to handle that".i was pretty specific in stating where it was
I have seen and read that, at some point I just stopped bothering to respond.We have. But I cannot remember any consensus building around your interpretation. Actually, I'm surprised that you maintain this interpretation and present it as fact. For those interested, see discussion earlier in this thread:
That is not the conventional interpretation, and it is not backed by AMD's slides.I think the conventional interpretation, as described in AMD's presentations and slides, is the correct one: the L3 cache controller acts as a crossbar between the 4 cache slices in a CCX, requiring 6 links for a fully connected topology.
How exactly address interleaving would produce uniform latency in your topology? Just to make sure you get the basics right: every cache line lives only in one of the slices. Interleaving is done between adjacent cache lines. That is, cache line 0 (addresses 0x0..0x3f) is in slice 0, cache line 1 (0x40..0x7f) is in slice 1, and so on until cache line 5 (0x100..0x13f) is again in slice 0. It is easy to confirm this by allocating an array that fits into the L3, and then only accessing every fourth cache line, measuring the throughput, and comparing to accessing all of the array linearly. Accessing all of it gets substantially higher throughput.I'm surprised that you come to that conclusion. The way it sounds to me, Clark makes the point that memory interleaving is used to achieve uniform latency on average, thus weakening, not supporting, the L2 crossbar interpretation.
Yes. These latency differences correspond well to the actual distance differences for a fully connected topology. If there was an actual step down from the metal layers and processing, I would expect ~10 cycles minimum for one additional hop (there and back).
Agreed. AMD has fully connected topology without any doubt:Yes. These latency differences correspond well to the actual distance differences for a fully connected topology. If there was an actual step down from the metal layers and processing, I would expect ~10 cycles minimum for one additional hop (there and back).
This! Essentially implies that both your and Vattilas proposal are topologically the same as far as 4xL3$ is concerned - just drawn differently. You argue about nothing. Its just that Vattila has the 1x4 splitter implicitly. Also there need to be 4x1 merger in addition - making the whole thing a 4x4 crossbar.The physical links themselves are free, because they occupy an area of the die which would be just plain blank without them. As for routing logic, your interpretation has 1x4 crossbar at every L3 (just because one of those links takes to the L3 slice itself doesn't mean you can leave it out), mine has a 1x4 crossbar at every core.
7nm process allowed RX5500XT to have comparable FP32 performance as RX 570 even though It has a lot less CU, because It clocks a lot higher, but this is also thanks to a better architecture.RX 570 and RX 5500 XT have a similar FP32 performance and the performance differs by 20%. And Vega/Polaris were manufactured on GlobalFoundries 14 nm process, TSMC 7nm is vastly better than this.
The GPU may also get a speed bump. It may be Vega by name, but Vega in the APU is a totally different beast from Vega desktop.Only thing I'll say for a long time in regards to AM5 is a lot of possibilities are opened with the new socket.
It's literally Renoir but Zen 3 cores.
So, basically, a fully meshed design. I get how that helps maintain a more even latency, so I suppose it is better (also from an area perspective as well). Do we have any idea how many slices there will be per L3$?So every core has a link to every L3 slice, or 4*4 = 16. With a 8-core CCX with 8 L3 slices (and I keep pointing this out, there is no fundamental reason why L3 slice count must be equal to core count!), there will be 8*8 = 64 links.
|Thread starter||Similar threads||Forum||Replies||Date|
|Question How does the memory divider usually work at ryzen 5000?||CPUs and Overclocking||1|
|I||Question AMD Rembrandt/Zen 3+ APU Speculation and Discussion||CPUs and Overclocking||167|
|Discussion Speculation: The Rise of RISC-V||CPUs and Overclocking||17|
|D||Discussion AMD Cezanne/Zen 3 APU Speculation and Discussion||CPUs and Overclocking||440|
|L||Question Speculation: XBox series S SoC info...||CPUs and Overclocking||16|