Discussion Intel’s Unified Core: There is hope

DavidC1

Golden Member
Dec 29, 2023
1,650
2,705
96
If the translation is actually decent, it says that Diamond Rapids using Lion Cove variant will close the gap, but overall Xeon 7 is "not good". Looks like they'll bleed server share at least until Xeon 8.

Talks about politics how the P core team was able to get all high performance projects until Skymont exceeded expectations. I read somewhere how some Intellers hated Skymont because of how well it did. Talk about politics! It's supposed to be one company!

Regardless of claims of technical superiority, in the end it's all down to people and relationships. Supposedly some of the smartest group of people in the world acting like children.

I doubt Unified Core is "unified" as they claim. It'll likely be more akin to how Core 2 "unified" Pentium M and Netburst. Other than the quad pumped bus and good vector performance, there was not much of Netburst in Conroe/Merom.

Most of P core ideals are likely going to die with "Unified Core". Not gonna miss any parts of the P core team. In their best days(Pentium M, Conroe, Sandy Bridge) IMO they did worse than the E core one.
 

zir_blazer

Golden Member
Jun 6, 2013
1,239
536
136
I doubt Unified Core is "unified" as they claim. It'll likely be more akin to how Core 2 "unified" Pentium M and Netburst. Other than the quad pumped bus and good vector performance, there was not much of Netburst in Conroe/Merom.
You're forgetting the Prescott branch predictor, which Hardware analysts considered extremely good at that era.

Most of P core ideals are likely going to die with "Unified Core". Not gonna miss any parts of the P core team. In their best days(Pentium M, Conroe, Sandy Bridge) IMO they did worse than the E core one.
What?
 
  • Like
Reactions: Thunder 57

DavidC1

Golden Member
Dec 29, 2023
1,650
2,705
96
You don't get it? Let me describe it to you.

The Haifa IDC team is very conservative in their changes. Nehalem as a core doesn't bring much, it's essentially a tick. SMT, IMC, QPI, cache level changes are all things that's architecture agnostic. Then you have Sandy Bridge, but Haswell, Skylake, Icelake, Sunny Cove, Golden Cove, Lion Cove are all expansions of previous concepts. No new architectural ideas.

Pentium M - 32+32KB L1 cache, Micro Op Fusion, Dedicated Stack manager, real dynamic SpeedStep, L2 way splitting to save power.
Core 2 - Macro Op Fusion, Memory Disambiguation
Sandy Bridge - Uop cache, Physical Registers, Branch predictor improvements without xtor increase, AVX256 support using same number of ports, Ring Bus, real Turbo Mode.

Even those above cores greatly increased the core size ISO-process. 50% larger, for 15-20% improvements. All the cores come with improved branch prediction.

Let's compare E cores:

Bonnell - 2-issue in-order core, SMT
Silvermont - Out of Order execution, non blocking memory instructions, SMT removed, same core size as previous gen. 50% faster per clock
Goldmont - 3-way decode, OoOE fully pipelined FP, 16KB L2 predecode cache, Fetch and I-cache decoupled, 20B fetch, from 16B, 30% faster per clock
Goldmont Plus - Widens backend to 4-wide from 3, 64KB L2 predecode, 30% faster per clock
Tremont - 32KB D-cache from 24KB, 6-wide using 2x 3-wide clustered decode, 2x16B fetch, 128KB L2 predecode, 30% faster per clock
Gracemont - L1-I doubled to 64KB, removes the L2 predecode with a new feature called On-Demand Instruction Decoder(OD-ILD), Clustered decode has a load balancer to address cases where there's not enough branch(meaning no clustering) and inserts a fake branch, 2x16B from OD-ILD and 2x32 from I-cache, supports AVX2 using 128-bit vector units, 30% faster per clock
Skymont - 9-wide using 3x3-wide, Nanocode to improve ILP, ultra-wide retirement to save overall resources, literally doubles FP from 2x to 4x units, 30% faster per clock in Int and 60% faster in FP

The efficiency gains, performance gains, execution efficiency and rate of innovation don't even come close between the two teams. The E core team is far superior, even in the best days of the P cores. And amazing 30% faster gains came at a linear area/power increase.

In Silvermont they removed SMT and got 50% perf improvement at the same area ISO-process with OoOE. In Tremont, they added a novel new feature. In Gracemont they addressed the weaknesses with the new feature while cutting out one feature to replace it with another one which is better. Such breakneck pace of modifications without screwing up and regressing is amazing.

Meanwhile, the P core team stayed at 16B fetch all the way from Pentium II in 1998 to Sunny Cove in 2020. Only Golden Cove doubles it to 32B. While it can be argued in the average scenario it is enough since average x86 instructions are 4-bytes and 16B satisfies 4-way decode, there will be bottlenecks. Goldmont, a tiny core increased it to 20B. In Gracemont it's 2x16B from OD-ILD and 2x32B from I-cache.

Skymont is 3x32B, while Lion Cove's 32B fetch has to serve all 8 decoders. It also slightly outperforms it in the all-important branch prediction and it's wider too.

Why, were the P core team so conservative in some areas, while blowing budget and power on others?
 
Last edited:
  • Like
Reactions: Joe NYC

511

Platinum Member
Jul 12, 2024
2,961
2,954
106
No one invited me to this thread nooooo
If the translation is actually decent, it says that Diamond Rapids using Lion Cove variant will close the gap, but overall Xeon 7 is "not good". Looks like they'll bleed server share at least until Xeon 8.
This is nonsense Xeon 7 is not using Lion Cove it's using Panther Cove without HT and DMR is a solid upgrade over GNR not to mention we have Rouge River Forest based on Arctic Wolf with at least 288 Cores APX/AVX 10.
 
  • Like
  • Haha
Reactions: DavidC1 and poke01

DavidC1

Golden Member
Dec 29, 2023
1,650
2,705
96
No one invited me to this thread nooooo

This is nonsense Xeon 7 is not using Lion Cove it's using Panther Cove without HT and DMR is a solid upgrade over GNR not to mention we have Rouge River Forest based on Arctic Wolf with at least 288 Cores APX/AVX 10.
Isn't that pretty much Lion Cove on a new node? And I think it's adroc that said there's anywhere from nothing to negative gains over LNC?

Lion Cove in that time period will compete against Zen 6. It'll struggle.
 

Geddagod

Golden Member
Dec 28, 2021
1,378
1,465
106
Isn't that pretty much Lion Cove on a new node? And I think it's adroc that said there's anywhere from nothing to negative gains over LNC?

Lion Cove in that time period will compete against Zen 6. It'll struggle.
It's supposed to be a tock core. Problem is that the leaked NVL ST perf slide is horrendous, so IPC uplift might honestly be on par with LNC or even less, not GLC or SNC level.
I doubt it's nothing to negative gains. Negative gains especially make no sense....
Realistically, I doubt Zen 6 vs Panther Cove has any sort of real IPC gap.
The question in server would be how bad the all core boost frequency difference will be, considering the node gap + architectural weakness from Intel.
And ig also how the vectorized perf would be.
 
  • Like
Reactions: Io Magnesso

511

Platinum Member
Jul 12, 2024
2,961
2,954
106
The way things are going, are we sure that product is ever going to come out?
It's currently slated I have not heard of news of it getting canned unlike the Low Core Count version of Clearwater Forest which are canned only the 288C and it's lower bin variants remain.

Isn't that pretty much Lion Cove on a new node? And I think it's adroc that said there's anywhere from nothing to negative gains over LNC?

Lion Cove in that time period will compete against Zen 6. It'll struggle.
That is cougar cove Panther is a proper tock with a shared L2 design as long as Haifa Core team doesn't screw up like LNC.
 

511

Platinum Member
Jul 12, 2024
2,961
2,954
106
It's supposed to be a tock core. Problem is that the leaked NVL ST perf slide is horrendous, so IPC uplift might honestly be on par with LNC or even less, not GLC or SNC level.
I doubt it's nothing to negative gains. Negative gains especially make no sense....
Realistically, I doubt Zen 6 vs Panther Cove has any sort of real IPC gap.
The question in server would be how bad the all core boost frequency difference will be, considering the node gap + architectural weakness from Intel.
And ig also how the vectorized perf would be.
What node gap lol 18AP would be more or less on par with N2 considering 18A is between N3P and N2.
There may be architectural difference.
 

511

Platinum Member
Jul 12, 2024
2,961
2,954
106
If 18A-P is anywhere near N2 in PPA, NVL-S would not be N2.
It is in terms of PPW not density though and N2 was their backup plan due to timing and ramping of DMR can't have both lol.

10nm really gave both design and manufacturing a wake-up call they have backup plan for both design and manufacturing as well.

Main volume driver is 18A/AP for NVL.
 

Geddagod

Golden Member
Dec 28, 2021
1,378
1,465
106
It is in terms of PPW
18A-P is supposed to look best in HPC and high-voltage instances such as desktop. If 18A-P is not competitive there vs N2, I shudder to imagine the difference in the lower end of the v/f curve that the cores in server skus will be handling.
not density
Which is the more important metric for server skus, where static power improvements from physically smaller cores can improve Vmin or near Vmin perf.
and N2 was their backup plan
Despite other tiles on NVL also being on 18A-P? And PTL launching before NVL on 18A? And so much of the rest of their products being internal that if 18A-P or 18A doesn't work, the entire company nears bankruptcy regardless?

I think N2 and 18A-P will have a sizable gap. I think it would be a good sign if 18A-P is near N3P in PPA in terms of just the CPU cores, and not counting other parts of a SoC.
Are they putting 100 E cores on TSMC A14?
Lol
 

511

Platinum Member
Jul 12, 2024
2,961
2,954
106
18A-P is supposed to look best in HPC and high-voltage instances such as desktop. If 18A-P is not competitive there vs N2, I shudder to imagine the difference in the lower end of the v/f curve that the cores in server skus will be handling.

Which is the more important metric for server skus, where static power improvements from physically smaller cores can improve Vmin or near Vmin perf.

Despite other tiles on NVL also being on 18A-P? And PTL launching before NVL on 18A? And so much of the rest of their products being internal that if 18A-P or 18A doesn't work, the entire company nears bankruptcy regardless?

I think N2 and 18A-P will have a sizable gap. I think it would be a good sign if 18A-P is near N3P in PPA in terms of just the CPU cores, and not counting other parts of a SoC.

Lol
Have you thought how are you going to cool 6Ghz+ BSPDN ? You can't do that easily why do you think their main question when the presented BSPDN was what about cooling?
SoC is on 18A and CPU is shared between N2(8+16/bLLC) and 18AP.
Raichu has clearly said 18A is between N3P and N2.
Server CPUs are like what 3.7-4Ghz sustained at max vs 5.3-5.4 GHz sustained rn it will increase and BSPDN will have issue without some sort of cooling for it.
 

Geddagod

Golden Member
Dec 28, 2021
1,378
1,465
106
Have you thought how are you going to cool 6Ghz+ BSPDN ? You can't do that easily why do you think their main question when the presented BSPDN was what about cooling?
The entire point of BSPD is supposed to be for these type of products.
Look at Intel's points on the benefits of BSPD, it's about increasing Fmax. Whatever thermal hotspot concerns they have, they mitigated (on their Intel 4 test chip), much like they have been mitigating for years- thermal hotspots is becoming more of an issue with denser processes regardless of BSPD or not.
I don't think it's a coincidence that TSMC has stressed so much that their A16 node is for HPC as well...
SoC is on 18A and CPU is shared between N2(8+16/bLLC) and 18AP.
So if 18A isn't working, NVL doesn't launch anyway.
The "risk mitigation" argument doesn't work when if Intel 18a fails, they won't be able to launch anything.
Raichu has clearly said 18A is between N3P and N2.
Raichu is not infallible, and if this was the case, Intel would have almost certainly have done even the high end of NVL on 18A-P.
Intel using internal for server products, both in GNR and DMR, should be clear signs that they really don't want to go to external, unless they are forced to.
The N2 vs 18A-P comparison is bad enough that they feel like they have to.
Server CPUs are like what 3.7-4Ghz sustained at max vs 5.3-5.4 GHz sustained rn it will increase and BSPDN will have issue without some sort of cooling for it.
So esentially what you are arguing is that BSPD is only good for a narrow slice of the market where voltages/Fmax isn't so high that BSPD will cause thermal concerns, but also where Fmax is an important enough metric that customers will go through the effort to utilize BSPD (aka not mobile)?
 
  • Like
Reactions: CouncilorIrissa

511

Platinum Member
Jul 12, 2024
2,961
2,954
106
The entire point of BSPD is supposed to be for these type of products.
Look at Intel's points on the benefits of BSPD, it's about increasing Fmax. Whatever thermal hotspot concerns they have, they mitigated (on their Intel 4 test chip), much like they have been mitigating for years- thermal hotspots is becoming more of an issue with denser processes regardless of BSPD or not.
I don't think it's a coincidence that TSMC has stressed so much that their A16 node is for HPC as well...
HPC yes but have they said anything about peak frequency like 'X' flavor of node?
So if 18A isn't working, NVL doesn't launch anyway.
The "risk mitigation" argument doesn't work when if Intel 18a fails, they won't be able to launch anything.
18A is not in HVM rn we will se with PTL how good is 18A.
Raichu is not infallible, and if this was the case, Intel would have almost certainly have done even the high end of NVL on 18A-P.
Intel using internal for server products, both in GNR and DMR, should be clear signs that they really don't want to go to external, unless they are forced to.
The N2 vs 18A-P comparison is bad enough that they feel like they have to.
not really it's not bad like people have been making it out to be
So esentially what you are arguing is that BSPD is only good for a narrow slice of the market where voltages/Fmax isn't so high that BSPD will cause thermal concerns, but also where Fmax is an important enough metric that customers will go through the effort to utilize BSPD (aka not mobile)?
it's not narrow it's just not suitable for ultra high clocks like Desktop Server and Mobile.
 

DrMrLordX

Lifer
Apr 27, 2000
22,704
12,656
136
It's currently slated I have not heard of news of it getting canned unlike the Low Core Count version of Clearwater Forest which are canned only the 288C and it's lower bin variants remain.
There have been intimations that Clearwater Forest would be the last of the -mont cloud compute Xeons due to lack of customer demand.
 

511

Platinum Member
Jul 12, 2024
2,961
2,954
106
There have been intimations that Clearwater Forest would be the last of the -mont cloud compute Xeons due to lack of customer demand.
Yeah they said that but Intel swings mood like no other and SKU cancellation and resurrection are norm and RRF is planned SKU I am wondering MLID hasn't leaked it lol(now thaat i think about it MLID never like never said anything about skymont or e cores in general pretty sure if he doesn't bash Intel he wouldn't get views).
 
Last edited:

Geddagod

Golden Member
Dec 28, 2021
1,378
1,465
106
HPC yes but have they said anything about peak frequency like 'X' flavor of node?
No one uses the X flavors of the node despite the Fmax improvements they bring.
And Intel has talked about how Fmax improves from BSPD too, in the same paper they talked about thermal mitigations with BSPD, in their Intel 4 BSPD paper.
18A is not in HVM rn we will se with PTL how good is 18A.
Sure, but the reason they are going N2 isn't because they don't know how good 18A will turn out.
not really it's not bad like people have been making it out to be
Not really, but why?
Why else would Intel use external if not for at least a full node difference between the two nodes?
it's not narrow it's just not suitable for ultra high clocks like Desktop Server and Mobile.
So what would it then be suitable for?
You just listed all 3 of Intel's major markets.
 

511

Platinum Member
Jul 12, 2024
2,961
2,954
106
No one uses the X flavors of the node despite the Fmax improvements they bring.
And Intel has talked about how Fmax improves from BSPD too, in the same paper they talked about thermal mitigations with BSPD, in their Intel 4 BSPD paper.
There were other mitigations BSPDN causes thermal issue in the first place.
Sure, but the reason they are going N2 isn't because they don't know how good 18A will turn out
ohh that's not true Intel basically knows everything about 18A/AP/N2 if anything they have the most knowledge in this matter.
Not really, but why?
Why else would Intel use external if not for at least a full node difference between the two nodes?
ask Intel product group are they basing it on Time/Volume/PPA people forget the first two and blame everything on PPA.
So what would it then be suitable for?
You just listed all 3 of Intel's major markets.
oh F*** i need to edit i only meant to say desktop
 

Geddagod

Golden Member
Dec 28, 2021
1,378
1,465
106
There were other mitigations BSPDN causes thermal issue in the first place.
No, those mitigations were BSPD specific.
BSPD undoubtedly causes thermal hotspot issues, sure. But tbf you don't even have to benefit from the higher cell utilization (higher density, more thermal hotspot issues) either.
ohh that's not true Intel basically knows everything about 18A/AP/N2 if anything they have the most knowledge in this matter.
Yes, so when they go N2, they know it's better.
ask Intel product group are they basing it on Time/Volume/PPA people forget the first two and blame everything on PPA.
Neither time or volume are factors. NVL desktop is launching 2H 2026, there's no way 18A or 18A-P isn't ready by then.
Volume isn't a factor either, Intel claims they can greatly expand capacity in their EUV fabs if customers ask for it- they already built out all they want/need for internal, and could build out even more if they want to.
 
  • Like
Reactions: Io Magnesso

511

Platinum Member
Jul 12, 2024
2,961
2,954
106
No, those mitigations were BSPD specific.
BSPD undoubtedly causes thermal hotspot issues, sure. But tbf you don't even have to benefit from the higher cell utilization (higher density, more thermal hotspot issues) either.
The core was a crestmont at 3. something Ghz and you are comparing it to a P core that is 5.7 Ghz will lead to more heat issues the higher they clock.
Yes, so when they go N2, they know it's better.

Neither time or volume are factors. NVL desktop is launching 2H 2026, there's no way 18A or 18A-P isn't ready by then.
Volume isn't a factor either, Intel claims they can greatly expand capacity in their EUV fabs if customers ask for it- they already built out all they want/need for internal, and could build out even more if they want to.
Who is going to pay for expanding capacity ramping node costs a fortune why do you think 20A got canned it was due to money as well and customer asks is customer has to pay for it in advance they are not going to pay out of their pocket but in Intel's case they are paying from their pocket and they don't wanna pay for others any more it's not sustainable.
 
Last edited:

Geddagod

Golden Member
Dec 28, 2021
1,378
1,465
106
The core was a crestmont at 3. something Ghz and you are comparing it to a P core that is 5.7 Ghz will lead to more heat issues the higher they clock.
But crestmont is only 20% the area of a RWC core.
Sure you might be feeding it less power, but the core is also way, way more dense.
But also consider this, a P-core on N2 will likely also already be smaller than that same core on 18A-P because of N2's better density, so in that sense the N2 core could also have higher thermal density, right?
Thermal density is a problem that has been a thing for years and years and years, and companies always find a way to mitigate it time and time again. And yet BSPD apparently is the one problem they can't solve though?
The worst part about all of this though, is that 18A-P was always claimed to be a performance focused node. Both by analysts and Intel themselves. So if 18A-P doesn't have any performance advantage, due to thermal issues from BSPD, what exactly does it haven then?
Who is going to pay for expanding capacity ramping node costs a fortune why do you think 20A got canned it was due to money as well and customer asks is customer has to pay for it in advance they are not going to pay out of their pocket but in Intel's case they are paying from their pocket and they don'
So Intel is willing to pay for it for Wildcat Lake, a low end, high volume product, but not for NVL-S compute tiles.
Intel is willing to pay for it for the IO die of NVL but not the compute tiles.
Intel is willing to pay for it for the iGPU tiles, but not the compute tiles.
If Intel had to decide what tiles they want to fab on 18A because they couldn't pay for the capacity they wanted, so many other tiles could have been easily shifted around to make room for NVL-S compute tiles being internal, which would obviously save them more money than the alternative.
And yet they didn't.
 
  • Like
Reactions: Joe NYC