Intel Announces 48-Core Cascade-AP Multi-Chip Package

IEC

Super Moderator
Super Moderator
Jun 10, 2004
13,571
411
136
#1
Source:
https://www.anandtech.com/show/13535/intel-goes-for-48cores-cascade-ap

The good:
-Up to 48 cores per socket
-Aimed at 2S servers
-12 DDR4 channels
-Possible 5903 pin LGA socket
-Launch 1H2019?

The bad:
-14nm...
-UPI connection - no EMIB yet
-No mention of hyperthreading
-28-core XCC dies --> 24-cores "glued" together

The ugly:
-A current 24-core Xeon Platinum runs at 205W
-AMD launch of 7nm Epyc Rome imminent
-They disabled SMT on Epyc for performance comparisons
 

french toast

Senior member
Feb 22, 2017
943
59
106
#2
They disabled SMT in Epyc?..desperation is setting in.
 

jpiniero

Diamond Member
Oct 1, 2010
6,430
272
126
#3
Pretty sure AT is wrong on it being an LGA socket and that it is BGA. TDP is I'm guessing 300 at least.

The earlier rumors had the AP being a different die with 4 AVX-512 units per core but that doesn't seem to have happened.
 

thecoolnessrune

Diamond Member
Jun 8, 2005
9,311
33
126
#6
I also remember reading in the past that this is likely to be a BGA Socket to allow for easier complex routing in the limited amount of space available.

As long as these are being targeted for the ultra-dense and high-end markets, 350, even 400 watt TDP's should not be that big of a deal as I would imagine BGA lends itself to "direct-to-node" liquid cooling like Lenovo's SD650 or HPE's Apollo f8000.
 

jpiniero

Diamond Member
Oct 1, 2010
6,430
272
126
#7
It (well it's supposed to be) the replacement for the Xeon Phi.
 

tamz_msc

Platinum Member
Jan 5, 2017
2,230
219
106
#8
SMT was off because the benchmark was Linpack.
 

TheGiant

Senior member
Jun 12, 2017
412
69
86
#9
well I wonder if that 12CH DDR4 can be tested with CFD calcs and other numerics which require lots of bandwitch
does it work like true 12CH or ?
 

rainy

Senior member
Jul 17, 2013
395
25
116
#10
SMT was off because the benchmark was Linpack.
They have chosen Linpack because it works well with AVX-512 which AMD do not support.
Intel know perfectly that Epyc SMT scales better than Xeon HT and security measures are just smoke-screen.
 
Apr 27, 2000
11,927
1,082
126
#11
Why are they only 48-core and not 56-core chips? Is there some reason why 4 cores per die are disabled?
 
Mar 10, 2006
11,719
122
126
#12
Pretty sure AT is wrong on it being an LGA socket and that it is BGA. TDP is I'm guessing 300 at least.

The earlier rumors had the AP being a different die with 4 AVX-512 units per core but that doesn't seem to have happened.
Why would AP be a different die?
 

rainy

Senior member
Jul 17, 2013
395
25
116
#13
Why are they only 48-core and not 56-core chips? Is there some reason why 4 cores per die are disabled?
Most probably TDP - even with 48 cores it would be above 300W, with 56 cores that could be 350-400W.
 

jpiniero

Diamond Member
Oct 1, 2010
6,430
272
126
#14
Why would AP be a different die?
Because it would have 3 additional AVX-512 units instead of just 1 since this was originally intended to be the Phi replacement. That was the rumor anyway, it may have not been true or cancelled.. and what we are getting instead is just two XCC dies glued together.
 
Mar 10, 2006
11,719
122
126
#15
Because it would have 3 additional AVX-512 units instead of just 1 since this was originally intended to be the Phi replacement. That was the rumor anyway, it may have not been true or cancelled.. and what we are getting instead is just two XCC dies glued together.
I follow the rumors closely and never heard of such a thing.
 

jpiniero

Diamond Member
Oct 1, 2010
6,430
272
126
#16
I follow the rumors closely and never heard of such a thing.
Think about it.. what exactly is the point of this versus a 4S XCC system? There's got to be more to it.

Should add that the rumor stated that the need to do something like this initially started was because Intel lost or scrapped a big HPC design win, because they were told that the Phi wasn't all that useful in getting any kind of throughput out of it.
 

IntelUser2000

Elite Member
Oct 14, 2003
6,112
230
126
#18
Think about it.. what exactly is the point of this versus a 4S XCC system? There's got to be more to it.
You are making no sense. They are using an MCM design because it had to be done quick. It's also named Cascade Lake. If it added 2 more AVX-512 units they need to make up room for it. Do you think it comes for free?

Harder problem than finding extra space is the substantially increased TDP on top of having 48 cores. Vector units take 3/4 or more of the total power used. Double the part that takes 75%+ and tell me what that does to the whole?

Another problem combines the two problems together. What's the point of 75% higher TDP(making it at 450-500W part rather than 300W perhaps) if it doesn't perform well? Further doubling of AVX capabilities require things like cache system and load/store system to be substantially beefed up. That's a significant change in the floorplan.

As long as these are being targeted for the ultra-dense and high-end markets, 350, even 400 watt TDP's should not be that big of a deal as I would imagine BGA lends itself to "direct-to-node" liquid cooling like Lenovo's SD650 or HPE's Apollo f8000.
That makes sense as the -AP series become a replacement for Xeon Phi. I've heard of the 5900-pin BGA arrangement too. Most pin count increase will go into accommodating for the 6 extra channels of memory and rest for extra power.
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
17,788
1,444
136
#19
Looks to me like a desperate attempt to compete with the 64 core EPYC. And a bad one at that.
 

Abwx

Diamond Member
Apr 2, 2011
8,871
214
126
#20
In the comparison published by Intel the core/frequency product is 1.227x in favour of the Xeon.

They are implicitly admitting that for legacy FP perf up to AVX1 the Epyc set up is 17.6% faster/clock, that s all good to state 3.4x but it s 1/4 this value for the conditions i stated.

In the TRIAD test the perf/clock is close to even assuming perfect scaling from 64 to 96T, but even if it s not the case we can read that it is stated that the software is optimised for Intel CPU and that Epyc is thus not fully exploited in this comparison.

Edit : They state that for Triad they use AMD s numbers from June 2017, but how did they got the numbers with SMT disabled.?.

Or are the numbers with SMT enabled for both, because without HT the Xeon would be no match for an Epyc with SMT enabled...



https://www.computerbase.de/2018-11/intel-xeon-cascade-lake-ap-48-kerne-mcp/
 
Last edited:

krumme

Diamond Member
Oct 9, 2009
5,755
143
136
#21
Think about it.. what exactly is the point of this versus a 4S XCC system? There's got to be more to it.

Should add that the rumor stated that the need to do something like this initially started was because Intel lost or scrapped a big HPC design win, because they were told that the Phi wasn't all that useful in getting any kind of throughput out of it.
This is just a product for 2s niches where area and avx512 is what matters.

Simple as that.
 

jpiniero

Diamond Member
Oct 1, 2010
6,430
272
126
#22
You are making no sense. They are using an MCM design because it had to be done quick
That Intel was doing a dual die Xeon to replace the Phi had been rumored for some time. Cascade Lake-AP isn't a good replacement for the Phi though if it doesn't have the extra units. The real product may have been cancelled for one reason or another though, and this is what they hacked together to release something.
 
Apr 27, 2000
11,927
1,082
126
#23
Most probably TDP - even with 48 cores it would be above 300W, with 56 cores that could be 350-400W.
The sad thing is, anyone using a 4P system with 28-core Xeons, probably has a faster overall system than a Cascade-Lake AP 2P system.
 

Atari2600

Senior member
Nov 22, 2016
774
250
106
#24
I cannot even smell the reek of desperation because of the smell of burnt PSUs...
 

IntelUser2000

Elite Member
Oct 14, 2003
6,112
230
126
#25
The real product may have been cancelled for one reason or another though, and this is what they hacked together to release something.
The real product would have been Cannonlake or Icelake.

The notion that they could stick 2x AVX-512 units as easily as putting scotch tape to cover up damage is ludicrous. We know even Cascade Lake is a product that resulted in the company not preparing for 10nm fallouts. Cascade Lake AP is a further, and much rushed result.

The sad thing is, anyone using a 4P system with 28-core Xeons, probably has a faster overall system than a Cascade-Lake AP 2P system.
It's likely for density. There's an article that was saying the volume has been shifting from 8 socket systems to 4 as core counts balloon. Cascade Lake AP isn't the best example, but future efforts from both companies will steer towards 1/2 socket systems with 2x cores.
 

Similar threads



ASK THE COMMUNITY

TRENDING THREADS