Speculation: The CCX in Zen 2

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

How many cores per CCX in 7nm Zen 2?

  • 4 cores per CCX (3 or more CCXs per die)

    Votes: 55 45.1%
  • 6 cores per CCX (2 or more CCXs per die)

    Votes: 44 36.1%
  • 8 cores per CCX (1 or more CCXs per die)

    Votes: 23 18.9%

  • Total voters
    122

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
That's my point Zen+ only matters if they either lower power usage at speed or improve IPC. But as far as we know it's just a move to a higher clock process. The only place this matters is with the desktop Ryzen market. TR is already straining what AMD wants to use on a socket and EYPC is much farther back right into the efficiency range of Zeppelin. Higher clocks do it no good.

Hell even Ryzen mobile (APU) would do better on LPP. Zen + looks to be a 7700/8700/7820> and i5 competitor and not much else.

The idea of a higher performance process is exactly that - higher perf at iso power or lower power at iso speed. Eg: 14 LPU offers higher speed at same power vs 14LPC (which is a lower cost version of 14LPP)

https://news.samsung.com/global/sam...ndry-offerings-with-14lpu-and-10lpu-processes

"Samsung’s fourth-generation 14nm process technology, 14LPU, delivers higher performance at the same power and design rules compared to its third-generation 14nm process (14LPC). 14LPU will be optimally suited for high-performance and compute-intensive applications."
 
Last edited:
  • Like
Reactions: CatMerc

PeterScott

Platinum Member
Jul 7, 2017
2,605
1,540
136
I am really close to 50/50 on this one.

I started out choosing 6 core CCX (as it makes everything else simpler and a drop in replacement).

But after reading through the thread, I switched my vote to 4 core CCX with more CCX modules, the rationale being that 4 Core CCX is just so balanced, with easy to manage internal interconnects. It's a perfect building block.

This one seems too close to call for me.

But I definitely don't think it will be soon either way. I don't think there is a massive need to get to 48 cores for Epyc in a hurry, and that is really just about the only place it matters.
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Switching to the faster "performance" [14nm+] process will not only create higher current draws from higher available clock rates, but the process itself is more power hungry to begin with from what I'm seeing online. Processors pushing into the 4.5Ghz range will be consuming a LOT more power

At the risk of derailing my own thread into a discussion about process technology, can you clarify? As I understand it, a high-performance process may be more power-efficient at higher clocks, for which it is tuned, while it may lose at lower clocks, compared to a low-power process. I.e. the power vs frequency curves for the two processes may cross at some frequency.

kalel2.png
 
Last edited:
  • Like
Reactions: CatMerc

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
While very simplified, your graph is in essence, right, however, they are not straight lines. Both lines are curved. The higher performance process (and in this case, the LPP process, does start off with a higher power draw than the LPU process, there is a point where they cross which is likely a bit before the LPU process hits it's limits and starts to draw power in a curve that gains slope very rapidly. The LPP process will have an area, likely between 3.8Ghz and 4.1 Ghz where it does, indeed, draw less power than the current LPU process, but then it will begin its own curve to near vertical. The problem here is that neither is going to be very suitable for low power applications (Mobile, SFF) in that speed range as both will be drawing a lot of electricity there. Below that, you'd do better on the LPU process. Above that, its all HEDT world with nasty power draws and lots of heat. There is room for more core refinement to help mitigate that, and it certainly could improve the base case that I am talking about here, but I don't think it will make a drastic difference. I think that, on 14LPP, we won't see a commercial volume product on the RYZEN line that exceeds a 3.8-4Ghz base clock and a boost clock much north of 4.4Ghz. The power draw will imply be too high.

Again, this is all opinion based on what I've read about the various process online. If anyone from the actual fundary that has real working knowledge of these specific process techs cares to chime in, I'm all ears.
 

maddie

Diamond Member
Jul 18, 2010
4,747
4,689
136
At the risk of derailing my own thread into a discussion about process technology, can you clarify? As I understand it, a high-performance process may be more power-efficient at higher clocks, for which it is tuned, while it may lose at lower clocks, compared to a low-power process. I.e. the power vs frequency curves for the two processes may cross at some frequency.

kalel2.png
I guess that is why so much work was put into fine grained clock gating and shutting off portions of die. One of the uses of IF. It's not a straight either/or anymore as you can get at least some of the benefits of both worlds.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
That's my point Zen+ only matters if they either lower power usage at speed or improve IPC. But as far as we know it's just a move to a higher clock process. The only place this matters is with the desktop Ryzen market. TR is already straining what AMD wants to use on a socket and EYPC is much farther back right into the efficiency range of Zeppelin. Higher clocks do it no good.

Hell even Ryzen mobile (APU) would do better on LPP. Zen + looks to be a 7700/8700/7820> and i5 competitor and not much else.

There may not even BE a Zen+ at this point. I was a strong backer of that slide, because it makes sense. AMD needs to stick with a yearly refresh for products. However, 14nm+ may be a 'backup plan' with 7nm as the focus. GlobalFoundries IS claiming volume 7nm applications for 2H 2018. I'll go out on a limb and say that AMD probably gets priority on the 7nm rollout. It is completely possible for AMD to launch 7nm in March or April, or at least do a soft launch with products shipping in may and/or june.

A side note for the second part of your quote: Ryzen already competes very well with the 7800, 7840, etc. and coffee lake won't be any faster from an IPC standpoint, so I don't know why you felt the need to include this in your post. How about we stick to the topic at hand and not play the fanboy card?
 

Lodix

Senior member
Jun 24, 2016
340
116
116
Next 14nm generarion LPU won't increase power consumption in any power target. It will be more efficient in all the performance curve. Just like Intels 14nm+/++.
14nmcharacteristic.png


And GF will start risk production in 1H 2018, their claim of mass production in 2H 2018 is too "irrealistic" because It would be too late and low yielded to launch any relevant/big die product. Dont expect any highend CPU/GPU before 1Q 2019.
 
  • Like
Reactions: Vattila and Olikan

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
There may not even BE a Zen+ at this point. I was a strong backer of that slide, because it makes sense. AMD needs to stick with a yearly refresh for products. However, 14nm+ may be a 'backup plan' with 7nm as the focus. GlobalFoundries IS claiming volume 7nm applications for 2H 2018. I'll go out on a limb and say that AMD probably gets priority on the 7nm rollout. It is completely possible for AMD to launch 7nm in March or April, or at least do a soft launch with products shipping in may and/or june.

A side note for the second part of your quote: Ryzen already competes very well with the 7800, 7840, etc. and coffee lake won't be any faster from an IPC standpoint, so I don't know why you felt the need to include this in your post. How about we stick to the topic at hand and not play the fanboy card?

Coffeelake is launching next month and AMD needs a response. CFL 8700k at 4.3 Ghz all core turbo is scoring 1523 points in CB R15. That makes it on par with AMD 1700x in a multithreaded benchmark where Ryzen is very strong. For single thread performance the 8700k is hitting 213 . Thats a massive single thread performance lead over 1700x. AMD's current product stack won't look good against Coffeelake. AMD needs Pinnacle Ridge on 14nm+ to compete against Coffeelake. Moreover GF 7LP is scheduled for risk production in Q1/Q2 2018 with volume production starting by late 2018 or early 2019. Do not expect 7nm Ryzen before Q2 2019.

Next 14nm generarion LPU won't increase power consumption in any power target. It will be more efficient in all the performance curve. Just like Intels 14nm+/++.
14nmcharacteristic.png


And GF will start risk production in 1H 2018, their claim of mass production in 2H 2018 is too "irrealistic" because It would be too late and low yielded to launch any relevant/big die product. Dont expect any highend CPU/GPU before 1Q 2019.

very well said. GF has not built a track record of execution so their claims are always to be taken with a lot of salt. GF claims risk production in early 2018 and volume production in H2 2018. It normally takes 12 months to go from risk to volume production. imo the best case would be GF 7LP goes into volume production by Dec 2018 and we see 7nm Ryzen 2 launch in early Q2 2019. The realistic case would be 7nm Ryzen launching in mid-late Q2 2017.
 
  • Like
Reactions: CatMerc

scannall

Golden Member
Jan 1, 2012
1,946
1,638
136
There may not even BE a Zen+ at this point. I was a strong backer of that slide, because it makes sense. AMD needs to stick with a yearly refresh for products. However, 14nm+ may be a 'backup plan' with 7nm as the focus. GlobalFoundries IS claiming volume 7nm applications for 2H 2018. I'll go out on a limb and say that AMD probably gets priority on the 7nm rollout. It is completely possible for AMD to launch 7nm in March or April, or at least do a soft launch with products shipping in may and/or june.

A side note for the second part of your quote: Ryzen already competes very well with the 7800, 7840, etc. and coffee lake won't be any faster from an IPC standpoint, so I don't know why you felt the need to include this in your post. How about we stick to the topic at hand and not play the fanboy card?
IBM likely gets the first go at 7nm. It's their node after all. But yes, AMD is likely a close second.
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
There may not even BE a Zen+ at this point. I was a strong backer of that slide, because it makes sense. AMD needs to stick with a yearly refresh for products. However, 14nm+ may be a 'backup plan' with 7nm as the focus. GlobalFoundries IS claiming volume 7nm applications for 2H 2018. I'll go out on a limb and say that AMD probably gets priority on the 7nm rollout. It is completely possible for AMD to launch 7nm in March or April, or at least do a soft launch with products shipping in may and/or june.

Entirely possible. All the more reason to not to have to re-qualify two higher end options if this is more of a 6 month short term bump.

A side note for the second part of your quote: Ryzen already competes very well with the 7800, 7840, etc. and coffee lake won't be any faster from an IPC standpoint, so I don't know why you felt the need to include this in your post. How about we stick to the topic at hand and not play the fanboy card?

Lol this is comic gold. I am running 3 AMD video cards. I have a Ryzen 1700, my second processor ever was a Thunderbird socket A 700mhz, and my second most expensive CPU was a 4400+ Athlon X2 ($500) as I was a early adopter of Dual Core and 64bit (was one of the few purchasers of XP 64bit). I am probably the most unbiased AMD fanboy you can run into. I prefer when possible to get AMD products for myself but I also recognize its weaknesses.

There are several weakness's an R7 has in comparison to the the higher core count Intel alternatives. Specially Coffee lake will put the R7 in a sensitive spot maintaining clocks and IPC advantages while closing in on Multitasking (depending on clocks possibly beating it). Even a 6c6t i5 will look like a really good option next to the current R5/R7 lineup with more actual cores compared to the 7600k but maintaining clocks for games. A clock boost to the mid 4GHz would do a lot to help keep the R7/R5 competitive till Zen 2.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
IBM likely gets the first go at 7nm. It's their node after all. But yes, AMD is likely a close second.
IBM are too low volume for their "first go" to affect AMD in any significant way in terms of volumes.

Both can get priority without significantly affecting the other.
 
  • Like
Reactions: Ajay

Ajay

Lifer
Jan 8, 2001
15,468
7,871
136
IBM are too low volume for their "first go" to affect AMD in any significant way in terms of volumes.

Both can get priority without significantly affecting the other.

And they use different FABs anyway, IIRC.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
And they use different FABs anyway, IIRC.
For now yes, as AMD is going 12LP, while IBM is going 14HP.
But when both converge on a single process (7nm LP), they'll have to share production bandwidth.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
AMD has advertised to server customers that the Zen 2 Epycs that come with the GloFo 7nm will be compatible with the current motherboards and that they will have 48 cores per package.

We know that they are going up to 48 cores per CPU, we just don't know the topology. We also know that they intend to still have 8 memory channels, which implies 12 cores per chip.

So eight channels for 48 cores.....but is this also for the newest motherboards? Or will the newest motherboards for Zen 2 EPYC have 1 memory channel per CCX (in the event each CCX is four cores)?

Yup, whether they go with 4-cores or 6-cores per CCX, by sticking with the 8 memory channels per socket for Epyc 2, memory controllers will have to be shared. That's 2 channels per die, which is 2 channels per 12 cores (3 x 4 or 2 x 6) for a 48-core Epyc 2.
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
So eight channels for 48 cores.....but is this also for the newest motherboards? Or will the newest motherboards for Zen 2 EPYC have 1 memory channel per CCX (in the event each CCX is four cores)?
No change in socket. Hell no change in platform at all since EPYC is a full on SoC. Still 8 Channels.
 

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
I've finally had the time to sit down and dig through the programming guide and some of their other published information. After reviewing it, I'm now rather firmly in the camp of Zen 2 including a third CCX. Trying to change the CCX significantly enough to make it 6 cores would be a massive tear up of the CCX, which just wouldn't be worth the time and effort. The CCXs talk to the rest of the world through the scalable control and data fabric. They don't have a direct connection between the L3 and the memory controller, despite being next to each other on the actual die. Keeping the same number of memory controllers is irrelevant to the number of CCXs. As long as the relative bandwidth available per core is maintained, it shouldn't matter. Also, the addition of another CCX means another 8MB of L3, giving a total of 24MB of L3. That can hide a lot of memory accesses.

Looking at the logical diagrams, it looks to be MUCH easier to just add another CCX and attach it to the data and control fabrics. It would also not affect the IB links between dice on TR or Epyc either, aside from heavily over-subscribing them in some scenarios. I'm going to go out on a limb and say that AMD would likely combine this step with a memory controller and Fabric speed upgrade to DDR-3200 or so (ryzen and TR) and 2666-2800 on EPYC. Why? Because the extra bandwidth would be vital to keep up with the additional inter-core communications traffic between the CCXs. As for the die dimensions, given that they want to stay on AM4, there is no driving need to make the die physically smaller. Adding the third CCX can be done in the middle of the die without changing the exterior dimensions. Interestingly enough, if they do that, they have the room to also add a large L4 cache. To make it useful, it would need to be about 2X the L3 caches, and, it would need to be exclusive. That would make it 48MB. The other option is, there's enough space to add a modest number of GCN cores. This would allow it to better compete with the Intel chips.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Given that eight DDR4 memory channels (running at higher speed) works for 48 7nm cores (at the same TDP as 32 14nm cores), what is the chance a third DDR4 memory controller on a 3 CCX 12 core Zen 2 die could still exist? Either that or two DDR5 controllers alongside two DDR4 controllers?

Usage on other platforms besides Zen 2 EPYC? Or maybe Zen 2 EPYC at a higher TDP than 180W?
 
Last edited:

Tuna-Fish

Golden Member
Mar 4, 2011
1,355
1,549
136
So eight channels for 48 cores.....but is this also for the newest motherboards? Or will the newest motherboards for Zen 2 EPYC have 1 memory channel per CCX (in the event each CCX is four cores)?

The promise is that you can buy a motherboard with current EPYC CPU, and upgrade it to an EPYC 2 CPU with 48 cores some time down the line. So, 8 channels and 48 cores.

I still do not see any point in the "connection topology" argument for 4-core CCX. Cores do not talk to other cores, they talk to L3 slices. Cores don't have "their own" L3 slices, it's now clear that all L3 slices are treated equally by all cores within a CCX. So the current system does not have 4 links, it has 16. (Each core links to each L3 slice). There is no reason why the amount of L3 slices would necessarily need to equal the amount of cores.
 

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
At one point, it was a not uncommon belief that the CCXs were linked directly between separate zepplin die on the MCM processors. That isn't the case though.
 

Schmide

Diamond Member
Mar 7, 2002
5,587
719
126
Cores don't have "their own" L3 slices, it's now clear that all L3 slices are treated equally by all cores within a CCX. So the current system does not have 4 links, it has 16. (Each core links to each L3 slice). There is no reason why the amount of L3 slices would necessarily need to equal the amount of cores.

They do. (AFAIK) The L3 is partitioned into slices for each core/thread and is a designated eviction area. Each core/thread can access any of the other cores partitions but it cannot evict its own data to it.

A cache is an economy of scale. The more ways you put into it the slower it gets.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,355
1,549
136
They do. (AFAIK) The L3 is partitioned into slices for each core/thread and is a designated eviction area. Each core/thread can access any of the other cores partitions but it cannot evict its own data to it.

This is incorrect (or rather it's how eviction happens between CCX:es). The slices inside the CCX are interleaved strictly based on low-order bits of address. No matter which core evicts a line, if the physical address of that line ends in 00, it goes to the first slice, if it ends in 01, it goes to the second, and so on. If you organize your data to only occupy every 4th cache line, you can force all your accesses to go to one slice. The L3 is roughly as fast (+- a few cycles) regardless of which slice you pick, meaning none of the 4 slices is "your own" that is better to access. If you make all 4 cores to use the same slice, it cuts your cache bandwidth to a quarter.

edit:

And the reason for this is that it makes the cache faster and easier to access, and reduces coherency traffic -- you always know which slice a line is from just looking at it's address. This is the same policy that Intel L3 has used since at least SNB.

This means that they probably don't want to do non-power-of-2 amounts of cache slices, but it also means there is no practical reason why the amount of cache slices must equal the amount of CPU cores. I would not be surprised if Zen2 CCX is 6 cores but 4 or 8 slices.
 
Last edited:

IRobot23

Senior member
Jul 3, 2017
601
183
76
Why people think that latency between CCX is huge problem.
What I expect is same design (4C/8MB) with faster DF, same IMC clocks.
 

kjboughton

Senior member
Dec 19, 2007
330
118
116
AMD will *not* release a 48-core CPU for the same reason Intel won't release a 48-core CPU: currently, Windows has inherent limitation in size of processor group of max 64 logical processors. This is based on current APICX2 spec and is not possible to modify without a coordinated industry change. So that means, assuming Intel HTT or AMD SMT are here to stay: no more than 32 real cores per node/socket. You could go NUMA on a single socket like intel does with the Xeon v3+ Cluster on Die (COD) but then you'd lose Creative Mode as there would be no way to throw all the cores at a non-NUMA aware app (most apps and certainly games these days).

If you want more cores
a) No
b) you're going to have to provide a really compelling reason as to why you don't have enough now, and
c) see point a

EDIT: typographical errors only
 
Last edited:

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
So, the implication there is that windows servers (which, let's face it, if you are buying an Epyc and want to use windows, is what you will run if you actually go bare metal) are the only major stumbling block for this? This excludes all the *nix servers, the vmware virtual hosts, and the other niche applications that don't have that limitation. It also excludes the far larger volume of Ryzen and TR processors that would be shipped where that limitation would never even come into play.

There is no need to limit based on that. They already have Epyc processors that have fewer than the max possible die, there is no reason not to continue that practice going forward to satisfy that portion of the industry.