CPCHardware:2nd gen AMD EPYC will have 64 cores, 256 Mo (!) L3, 8x DDR4-3200 and 128 PCIE-4 lines

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

itsmydamnation

Diamond Member
Feb 6, 2011
3,072
3,897
136
Remember that IBM's 14nm process clocks a 695mm sq SOC thats CPU pipelines are 12 stages @ 4ghz. Now how high could that process clock a 200mm sq ~16 stage pipeline. Then how much improvement would one expect from IBM 14nm to 7nm SOC or HPC....... if 12nm gets PR to standard boost of 4.4 and XFR of 4.5-4.6 ( 10% over Zepplin) and given the highend wall nature of 14nm LLP, I think 5ghz is realistic on SOC is not far fetched.
 

Spartak

Senior member
Jul 4, 2015
353
266
136
So I guess this rumor effectively kills the possibility of a 6C CCX? 4D*3CCX*6C would be 72 cores in total.

Looks like 4C CCX remains the basic building block for Zen2.
 

Spartak

Senior member
Jul 4, 2015
353
266
136
I place my bet on 8-core CCX. With 32MB L3 per CCX.

That would be a lot of wasted die space (2 cores disabled on each CCX) for the 48 core part. Seems more logical that they would have 3CCX (Ryzen2) and 4CCX (TR2) dies.

1CCX+1GPU for ultra low power 7-15W
2CCX+2GPU for low to medium power 25W - 65W
3CCX for Desktop
4CCX for Workstation
 
  • Like
Reactions: CatMerc

Spartak

Senior member
Jul 4, 2015
353
266
136
Also: they'd need to develop two different CCX's for mobile and desktop/server which kind of defeats the whole purpose of the CCX modularity.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
Also: they'd need to develop two different CCX's for mobile and desktop/server which kind of defeats the whole purpose of the CCX modularity.
Quite the opposite, a large part of the reason for CCX modularity is that the time and cost it takes to design a product is greatly reduced, allowing them to release multiple designs more easily.

With the setup we're proposing, servers up to 48 cores will be based upon desktop 12 core designs, meaning the same reuse that's happening now will still keep going. What we're also suggesting is a 16 core server die that wouldn't be economical for consumers on early 7nm, but would be able to command a large price on servers due to the amount of computational power and cache offered, just like GP100 and GV100 sell for obscene amounts. This same die can be reused for Threadripper, as that platform is less price sensitive, and as 7nm matures could even be offered as an upgrade for mainstream desktop users.
 
Last edited:
  • Like
Reactions: stockolicious

Spartak

Senior member
Jul 4, 2015
353
266
136
Quite the opposite, a large part of the reason for CCX modularity is that the time and cost it takes to design a product is greatly reduced, allowing them to release multiple designs more easily.

With the setup we're proposing, servers up to 48 cores will be based upon desktop 12 core designs, meaning the same reuse that's happening now will still keep going. What we're also suggesting is a 16 core server die that wouldn't be economical for consumers on early 7nm, but would be able to command a large price on servers due to the amount of computational power and cache offered, just like GP100 and GV100 sell for obscene amounts. This same die can be reused for Threadripper, as that platform is less price sensitive, and as 7nm matures could even be offered as an upgrade for mainstream desktop users.

I think you are conflating their current modular/interconnect strategy that is based on 'multiple designs' using the same CCX with 'multiple designs' of that CCX module. Their is just one CCX design right now. With an eight core CCX you'd need two and it would still be suboptimal.

You can't make a 12-core desktop from an 8 core CCX-module without disabling four in a 2x8CCX die.

My point is twofold: with an 8CCX you'd need to develop both a 4CCX APU and 8CCX CPUpart, and the 8CCX parts would need to run with many cores disabled. Having just the same small 4CCX unit means you have much more flexibility in how you mix and match your CPU parts from ultramobile to server.

The chances of an 8CCX module are nearly zero.
 
Last edited:
  • Like
Reactions: Topweasel

CatMerc

Golden Member
Jul 16, 2016
1,114
1,153
136
I think you are conflating their current modular/interconnect strategy that is based on 'multiple designs' using the same CCX with 'multiple designs' of that CCX module. Their is just one CCX design right now. With an eight core CCX you'd need two and it would still be suboptimal.

You can't make a 12-core desktop from an 8 core CCX-module without disabling four in a 2x8CCX die.

My point is twofold: with an 8CCX you'd need to develop both a 4CCX APU and 8CCX CPUpart, and the 8CCX parts would need to run with many cores disabled. Having just the same small 4CCX unit means you have much more flexibility in how you mix and match your CPU parts from ultramobile to server.

The chances of an 8CCX module are nearly zero.
Who said anything about 8 core CCX?

Edit: I see I missed the context.
 

turtile

Senior member
Aug 19, 2014
633
315
136
I think you are conflating their current modular/interconnect strategy that is based on 'multiple designs' using the same CCX with 'multiple designs' of that CCX module. Their is just one CCX design right now. With an eight core CCX you'd need two and it would still be suboptimal.

If AMD jumps to 64 cores on EPYC, we can assume AMD will use 8 cores on APUs. However, I think they will use 4 x 4 core CCX modules per die and 2 x 4 core CCX in APU.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,508
3,190
136
I don't think that AMD will overhaul the CCX design until zen3. Zen2 may have some errata and performance "tweaks", but nothing substantial.
 

stockolicious

Member
Jun 5, 2017
80
59
61
Folks bear in mind that the area shrink is going to be massive. 7SoC with 6T is optimized for designs running at 3.5 Ghz. At 14LPP to hit > 3 Ghz you needed 9T libraries . 7.5T was only for mobile CPUs running in the 2 - 2.4 Ghz range.

https://www.globalfoundries.com/sites/default/files/product-briefs/product-brief-14lpp.pdf
https://www.semiwiki.com/forum/cont...alfoundries-discloses-7nm-process-detail.html

Cell Height = Minimum Metal Pitch x Track count
Contacted Poly Pitch x Cell Height is the new measure for transistor density

14LPP = 78nm x 64 nm x 9 tracks = 44928
7SoC = 56nm x 40nm x 6 tracks = 13440

13440/44928 = 0.299 or 0.3. Thats a 70% area shrink from 14LPP 9T. A single full node generation shrink will take you from 1 to 0.5 and another full node would take you to 0.25. 7SoC with 6T is literally bringing close to 2 generations of density increase. 7SoC 6T vs 14LPP 9T comparison by GF shows a 60% power reduction at iso perf or 40% perf increase at iso power.

https://m.eet.com/content/images/eetimes/1 7 12 14 copared x 800_1505972923.jpg

AMD should be able to pack 64 Zen 2 cores while doubling L3 cache per core and still should be able to keep die size <= 200 sq mm. I think AMD knows they have an opportunity to take a decisive lead in servers and are going for the kill. Intel EMIB and 10++ will arrive with server first in 2020 (most probably H2) and Icelake-SP is not going to be able to bring 64 cores to market in 2019. If AMD can launch Rome with 64 cores in Q1 2019 they will catch Intel totally off guard.

"If AMD can launch Rome with 64 cores in Q1 2019 they will catch Intel totally off guard"

I think they already have - AMD already has more cores but forget 64 - how does intel handle the 48 core problem? Until INTC works in an MCM design more cores will equal bigger cost advantage to AMD - Either INTC hands over market share or squashes margins.
 

DrMrLordX

Lifer
Apr 27, 2000
22,900
12,965
136
2018 might be the year when Acorn based servers start to make a serious push; could be competition for x86 finally. https://www.extremetech.com/computi...-server-cpu-benchmarks-mean-big-trouble-intel
no SMT yet still energy efficient (prbly thx to power gating).

Interesting the article mentions Calxeda but not Cavium. In somewhat-related news, Marvell just bought Cavium for $6 billion.

Also interesting that ARM is going for the server market and practically owns the entire mobile market, but skipped the PC sector.
 
  • Like
Reactions: amd6502

gOJDO_n

Member
Nov 13, 2017
33
7
81
Remember that IBM's 14nm process clocks a 695mm sq SOC thats CPU pipelines are 12 stages @ 4ghz. Now how high could that process clock a 200mm sq ~16 stage pipeline. Then how much improvement would one expect from IBM 14nm to 7nm SOC or HPC....... if 12nm gets PR to standard boost of 4.4 and XFR of 4.5-4.6 ( 10% over Zepplin) and given the highend wall nature of 14nm LLP, I think 5ghz is realistic on SOC is not far fetched.
This is apples to watermelons comparison. The number of stages is absolutely irrelevant when comparing frequencies of completely different architectures on the same lithography node. Its like comparing two people, one eating 3 apples and the other 2 watermelons. So, who ate more fruits?
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,596
136
Taking a modern arch like ryzen and even having a good insight into it is easy to say if its 19 or 20 stages? I mean is the definitions stricht even for those involved? and does the number give any sense besides a rough assessment if its very low or high fmax design?
 

CHADBOGA

Platinum Member
Mar 31, 2009
2,135
833
136
Will these future Epyc processors be a good choice as a gaming CPU as well, or will they always be just for non-gaming applications?
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,072
3,897
136
This is apples to watermelons comparison. The number of stages is absolutely irrelevant when comparing frequencies of completely different architectures on the same lithography node. Its like comparing two people, one eating 3 apples and the other 2 watermelons. So, who ate more fruits?

yeah i know
power 9 has:
8x decode vs 4x
8x rename/issue vs 6x
4x L/S vs 3L/S
4SMT vs 2
All that extra complexity in very latency sensitive parts of the core!

They are both large modern OOOE cores with very large instruction windows, Yes they have made uarch changes(8 to 9) but alot of these saved cycles are in decode and retire L/S and they are extremely latency sensitive parts of a OOOE cores pipeline. You try to make out like piplein length to clock is completely irreverent, but the simple fact is if you look across ARM, x86 and power pipeline length vs operating clocks has a consistent relationship, right up until power9.

But over all it points in a positive direction for 7nm which was/is IBM based given how there 14nm is performing.
 

krumme

Diamond Member
Oct 9, 2009
5,956
1,596
136
Well Keller said ryzen was a high freq design. We know 14lpp is not tailored for that. 7nm is at least in the high power variant as used by Power made to be high freq. So there is surely an opportunity for a very high uplift. Lets hope they go that high power way. At least for a desktop variant and leave the standard for the epyc and apu stuff.
 
  • Like
Reactions: amd6502