AMD “Next Horizon Event" Thread

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Saylick

Diamond Member
Sep 10, 2012
3,084
6,184
136
Yeah, 8 core CCX seems like an assumption. My guess is that whatever is inside each chiplet doesn't look like the traditional CCX layout and that the L3 is unified between all eight cores, since that seems like obvious low hanging fruit, but there are a number of topologies that could enable that.
I think it will still be 16 MB L3 cache within each chiplet with a larger L4 cache within the IO die. Someone mentioned that the IO die will need to have a large perimeter to interface with each of the chiplets, even more so if there are 8 of them. The interior space within this IO die can be used for a large, shared L4 cache.
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
8x8 makes perfect sense to me because:
1) 4C CCX keeps things simple with respect to the number of interconnects between cores, and reduces the development cost from Zen 1
2) 2-CCX or 8C desktop parts will still be the norm for most consumers. If AMD went with a 8C CCX approach, it'd be overkill for consumers.
3) Threadripper can still use a MCM approach with up to 32 cores (4 dies w/ 8 cores each).

Basically, only EPYC 2 would use a chiplet approach. All other Zen 2 products remain similar to Zen 1 but with improved IF, IPC, clocks, power efficiency, etc.
This is my thinking too. AMD is going to now capitalize on market segmentation and you're going to now start to see a similar distinguishment between their pro-line and consumer line that intel has w/ its Xeon vs Desktop CPUs. What we got with the first two iterations of Zen was a capture attempt and a flat manufacturing strategy to allow AMD to gain market share w/ low production cost. Now that they have money in the bank and buy-in, they'll start segmenting out products and sucking up profits.

Normal Ryzen will probably go to 8 core CCX but not two of them. So, you will get lower power/higher performance/clocks but not more cores. I also don't think they'll move a lot of this tech down to thread-ripper. i.e - given that there wont be any consumer GPUs with Infinity fabric CPU<->GPU link, it wont make much sense for AMD to push a threadripper w/ such capability. So, i think you can already reason through why certain features wont make it down to the consumer.
 

Saylick

Diamond Member
Sep 10, 2012
3,084
6,184
136
Threadriper 2990WX has four CCX or each CCX has 8 Cores, as also Rome/Epyc 2 same thing 8 Core CCX as we see.

WCCFTech is not that drunk, "they must now this facts" so what is missing or only Desktop Ryzen 2/Zen 2.
No, TR 2990WX is 4-dies with 8 cores on each die (or two 4-core CCXes per die). In total, there are 8 CCXs in TR 2990WX.

https://www.anandtech.com/show/13124/the-amd-threadripper-2990wx-and-2950x-review

WCCFTech is indeed that drunk, or high off of whatever they keep smoking...
 

HurleyBird

Platinum Member
Apr 22, 2003
2,670
1,250
136
3) Threadripper can still use a MCM approach with up to 32 cores (4 dies w/ 8 cores each)

Not going to happen. A big advantage of the 8 + 1 approach is that the consumer dies don't need all those IF links and server IO stuff. Keeping that in the consumer dies just for TR is completely implausible. TR will likely be built using the same IO die + chiplet arrangement as Epyc.
 
  • Like
Reactions: Gideon

Glo.

Diamond Member
Apr 25, 2015
5,661
4,419
136
Single, 8 Core core complex means there is no NUMA, and the move of northbridge outside of CPU cores chiplet into the IO die proves this.
 
  • Like
Reactions: lightmanek

Saylick

Diamond Member
Sep 10, 2012
3,084
6,184
136
Not going to happen. A big advantage of the 8 + 1 approach is that the consumer dies don't need all those IF links and server IO stuff. Keeping that in the consumer dies just for TR is completely implausible.
I think AMD will have 4 different dies for Zen 2:
1) Ryzen desktop die: monolithic die similar to Ryzen 1000/2000 but with better clocks, IPC, IF, energy efficiency. Up to 8 cores for Zen 2 Ryzen, up to 32 for Zen 2 TR.
2) Ryzen APU die: monolithic die similar to Raven Ridge but with better clocks, IPC, IF, energy efficiency. Up to 8 cores for Zen 2 Rave Ridge.
3) EPYC 2 core chiplet: only contains the cores and execution units of Zen 2 (and associated L1, L2, and L3 cache) along with just enough IO to interface with...
4) EPYC 2 IO die: contains the IO and interconnects to interface with the EPYC 2 chiplets.
 
Mar 11, 2004
23,031
5,495
146
K, so obvious question is obvious....
Infinity fabric link CPU<->GPU...

How exactly are they going to physically pull this off? Mobo level? cable? What of the generic PCIE interface? Augmented? Some type of nearby connection? Something over PCIE 4.0?

What is the physical connection going to look like here?

For reference, what I'm looking to see from AMD (Nvidia's NVlink tech) :

Have they detailed this yet? Is this going to flow down to consumer zen2 or be cut out and delayed? Seems all the mobos would have incompatibility as this is board level?

I thought that it used the PCIe wiring but because of the upgraded signaling hardware, it could exceed the PCIe spec? That's also because it doesn't have to support all the same features that PCIe does (so basically when it has the hardware the software would put it in that compatibility mode enabling higher performance). PCIe is capable of more its just that because it supports various other features (and is targeting a wide variety of systems) that it doesn't push things as far as it can (meaning they can use the same hardware but just push it higher and achieve more, but they want things to be certified for it; kinda like how ethernet and HDMI function where they keep the same pin layout but require stricter cable/wire specs, as well as the two ends to have the proper signaling capability, to guarantee that performance level). I believe that the shorter distance between the GPU slots is why it can run higher than the PCIe CPU-GPU links, although the talk about things being on a ring is interesting). PCIe 5 should offer 128GB/s.

I could be incredibly mistaken on this though, and I'd expect that it won't be activated on consumer boards as I'm not sure it has much benefits right now (I don't believe that games are limited even by PCIe 3.0 spec, as I think most of the time it shows very little if any improvement of x16 compared to x8 in games; if they do try to make multiple GPUs function as a single monolithic one it probably would help though). Will be interesting to see if Threadripper supports it though as those users should be able to make use of it for their tasks.

The other thing to take into account is latency, which I'm not sure is a huge issue but would be curious about what the latency is. Which I think the ring aspect might be there to manage that partly (so if you have a system of 8 GPUs, the ring would try to have GPUs nearby communicate and be aware of how the system is setup to maximize performance, or it might go GPU 1 and 5 talk, 2 and 6, 3 and 7, and 4 and 8, so that there's the same latency between all of them).

AMD did explicitly say it doesn't require bridge or switches.
 

HurleyBird

Platinum Member
Apr 22, 2003
2,670
1,250
136
Assuming that Epyc 2 chiplets aren't the same as consumer Zen 2 dies, then TR will be based on the former and not the later. There's absolutely no reason to blow up the consumer die sizes with server crap just for TR.
 
  • Like
Reactions: Gideon and Saylick

Saylick

Diamond Member
Sep 10, 2012
3,084
6,184
136
Assuming that Epyc 2 chiplets aren't the same as consumer Zen 2 dies, then TR will be based on the former and not the later. There's absolutely no reason to blow up the consumer die sizes with server crap just for TR.
Hmmm... you know what, you're right. Considering that TR and EPYC share the same socket, what you just stated makes sense.
 

Asterox

Golden Member
May 15, 2012
1,026
1,775
136
No, TR 2990WX is 4-dies with 8 cores on each die (or two 4-core CCXes per die). In total, there are 8 CCXs in TR 2990WX.

https://www.anandtech.com/show/13124/the-amd-threadripper-2990wx-and-2950x-review

WCCFTech is indeed that drunk, or high off of whatever they keep smoking...

Well then this is your only logical answer, if all Ryzen/Thredripper/Epyc CPU use 4 Core CCX combination what is left on the table.

In the end what is here, only Epyc 2 or Zen 2 with very logical move to 8 Core CCX.Red combination is very expected Zen 2/7nm CPU desing improvement no doubt.
 

Glo.

Diamond Member
Apr 25, 2015
5,661
4,419
136
Assuming that Epyc 2 chiplets aren't the same as consumer Zen 2 dies, then TR will be based on the former and not the later. There's absolutely no reason to blow up the consumer die sizes with server crap just for TR.
They are the same. There are only three designs made on 7 nm: Rome/Matisse chiplets, Navi and Vega.
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
Infinity Fabric can operate either through PCIe or SATA PHY while running Hypertransport-esque transfers. If both sides run IF, then it runs at IF speeds which can go up to 512 GB/s.
You speak as if this is a given and has already been detailed. Where does this operate in this capacity today outside of CPU socket linking? Correct me if i'm wrong, but there's a lot of overhead if they are piping this across physical PCIE 4.0. Latency being the most stand-out aspect. You can't just flip a switch and use a physical interface differently. You more specifically have to encap to satisfy the underlying hardware/pin-out etc?

again, please correct me if i'm wrong.. and if so, please link.
Nv-link doesn't work in the manner you described. There's a physical layer and its specific to the link. You only can encap in such cases which has overhead.

PCIE 4.0(Infinity fabric) ..

Nv-link has a physical pinout and trace from CPU to GPU.
 

Hitman928

Diamond Member
Apr 15, 2012
5,182
7,632
136
AMD showed C-Ray comparison of 1x 64 Rome against 2x 28 core intel Xeon. AMD finished the render 7% faster with 14% more cores. They said Rome was prototype system and not the highest performance (clocks) they'll achieve with the part.
 

Saylick

Diamond Member
Sep 10, 2012
3,084
6,184
136
amd-rome-7nm-bg_575px.png


If only we could zoom and enhance... ;)
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
AMD showed C-Ray comparison of 1x 64 Rome against 2x 28 core intel Xeon. AMD finished the render 7% faster with 14% more cores. They said Rome was prototype system and not the highest performance (clocks) they'll achieve with the part.
Intel BTFO once again.
Pure losing for years to come
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,324
1,462
136
Both the IO die and the chiplets look larger than I thought they would be.

Yeah, the chiplets are huge. Didn't expect that.

As to the IO die, they are still under the WSA. 14nm silicon will be so cheap it is effectively free for them. Why not fill it out with a gigantic L4 memory-side cache?

It will help keep twice the amount of cores fed with much less than twice the ram bandwidth.