Discussion Intel current and future Lakes & Rapids thread

Markfw · Jun 6, 2022

IntelUser2000 said:
Actually on some very demanding HPC applications Hyperthreading is turned off because it performs better. Linpack for example performs higher when HT is off since Linpack saturates the FPU and the second thread has absolutely no room for enhancing performance.

SMT benefits performance by filling pipeline bubbles AND covering LLC misses. HPC applications LOVE memory bandwidth, the dataset is large, and fully saturates the execution units. Plus, even if HT was faster, it would have to clock lower because full AVX execution is very demanding anyway.

All those benchmarks were made by Intel. Worthless to me, they are just trying to justify their server chips any way they can, at least for the investors if nothing else.

Pretty pathetic though.

Exist50 · Jun 6, 2022

mikk said:
In the reddit leak "the biggest architectural change in CPU architecture since the Core architecture" was claimed for Nova Lake. There are rumors about Panther Lake after Lunar, I hope it doesn't mean it has been delayed one generation. If there is Panther Lake the Panther Cove core naming would make sense though.

So, I've put some thought into this. I do think it's reasonably likely that Royal first shows up in Nova Lake around 2026. If so, I see three possibilities.

There is a significant issue with Royal execution, which pushes development out that far. I'm optimistic on this not being the case, but admittedly that's just blind faith in the team.
Integrating Royal would require significant rework of other parts of the SoC (fabric, etc) that would not fit in the schedule for an incremental redesign.
IDC (who're developing Lunar Lake and [last I heard] Panther Lake) are not politically willing to take the core due to the influence of the big core team.

In the last two cases, I would hope that Royal's first appearance would be more like a "Royal 2", in which case I'd hope for IPC more like 2.5-3.0x Golden Cove, but maybe I'm getting ahead of myself.

lobz · Jun 7, 2022

Exist50 said:
Definitely not. Cost and board space are far more dominant concerns there than cooling.

Oh my dear God..... obviously not as a conscious design goal... but sure, from now on I'll try to communicate with you either through citing officially filed patents or court rulings to be as literate as possible.

Cardyak · Jun 7, 2022

Exist50 said:
So, I've put some thought into this. I do think it's reasonably likely that Royal first shows up in Nova Lake around 2026. If so, I see three possibilities.

There is a significant issue with Royal execution, which pushes development out that far. I'm optimistic on this not being the case, but admittedly that's just blind faith in the team.

Integrating Royal would require significant rework of other parts of the SoC (fabric, etc) that would not fit in the schedule for an incremental redesign.

IDC (who're developing Lunar Lake and [last I heard] Panther Lake) are not politically willing to take the core due to the influence of the big core team.

In the last two cases, I would hope that Royal's first appearance would be more like a "Royal 2", in which case I'd hope for IPC more like 2.5-3.0x Golden Cove, but maybe I'm getting ahead of myself.

Do we actually know any of the details for Royal Core yet?

I've heard lots of rumours/speculation, and I've been digging on a few patents, but I'm unable to source any information that links some of Intel's R&D directly to any ongoing projects. As an example, here's some of the ideas that have been floated over the past few years to massively push CPU performance forward:

Fusion Reconfigurable Multicore Architecture - Basically stitching together small cores dynamically to create big cores as and when needed, as opposed to "statically" partitioned Big/Little cores.
Post-silicon CPU adaptation made practical using machine learning - Scaling the size of components in a CPU core up & down using ML in order to improve efficiency. Golden Cove has started to do this with the BTB, but I expect to see a lot more of this going forward. Note also that this paper was co-authored by Ronak Singhal, a Senior Fellow at Intel.
Auto-Predication of Critical Branches - Seems to be an idea where you analyse branches, determine the "trouble makers" that are nearly impossible to predict, and basically give up and decode both sides of the branch as insurance. Seems like a costly insurance policy but if used sparingly on only Hard-to-Predict branches it could cause a massive performance increase, as in most programs a small number of branches cause the majority of mispredictions.
Selective Pipeline Flush - Implementing some sort of process so that when a branch mis-predict occurs, you only discard the illegitimate instructions, and retain the correct code, as opposed to flushing everything and throwing the baby out with the bath water. Incredibly convoluted, but the performance gains from doing this are obvious. Worth noting that this research wasn't conducted by Intel, but I'll be damned if they haven't looked into something like this.
Value Prediction - One of the key academics who has pioneered research in this area is André Seznec. He recently accepted a position at Intel. Value Prediction is arguably the "Holy Grail" of ST Perf increases and is pretty much required at some point to increase performance. This is a question of not if, but when. There's an absolute ton of research around this and the opportunity is enormous.
Shared front end components - No source for this one, but I stumbled across this when glancing at a thread on RealWorldTech - Due to the decode units being idle a large percentage of the time because the Micro-Op cache primarily feeds the back end, not every core needs a whole decode unit to itself. In the interest of saving die space it may be possible to have 1 large decode block that is shared among a small number of cores and behaves in a "round-robin" time slice sort of process.
Dropping x86 baggage such as old redundant instructions and emulating them in software to reduce the burden on the CPU. No idea how or even if this would work, but this is something people have spoken about before on numerous occasions, and it stands to reason we will have to have a clean slate at some point or another. (I highly doubt Intel will ever move away from x86 all together)

I have no idea if/what will feature in Royal Core, if anyone does I would be glad to share information. @Exist50 seems to have reputable sources inside Intel so it would be great to have his insight into this.

DrMrLordX · Jun 7, 2022

igor_kavinski said:
. It makes Intel look bad but AMD can't provide enough chips for all enterprise customers so Intel still sells a lot of server chips despite having fewer cores.

Intel can't sell Cascade Lake in volume anymore. They're still struggling to supply 10nm anything to the server market. People are going to be on waiting lists no matter what.

uzzi38 said:
Correction: you can bet that AMD actively decided against using N3 after the mess it is.

Jfc I've been ragging on about this for a while now for a reason.

N3e isn't a mess though. I still don't think AMD will be an early adopter. Intel may be the first!

ashFTW · Jun 7, 2022

DrMrLordX said:
Intel can't sell Cascade Lake in volume anymore. They're still struggling to supply 10nm anything to the server market. People are going to be on waiting lists no matter what.

While it’s a fact that Intel has lost a lot of server market share to AMD, the above statement is patently false.

Intel Q4 2021 Earnings Call: “Our Data Center Group had its best quarter ever as customers continued rebuilding their confidence in choosing Intel. Enabled by our IDM advantage, Ice Lake servers shipped more than 1 million units, equal to the amount we had shipped in the prior 3 quarters combined.”

Intel shipped 7.71 million total servers that quarter. In contrast, AMD shipped 1.13 million total servers (EPYC2 and EPYC3). One can conclude that Icelake volume was very likely greater than that of EPYC3.

Intel Q1 2022 Earnings Call: ”Our third-generation Intel Scalable processor Ice Lake has now shipped almost 4 million units and Amazon Web Services recently announced general availability of its EC2 I4i instance designed for storage and I/O intensive workloads. This is the 48th AWS instance powered by Ice Lake.

I am also pleased to say that as committed, we began shipping initial SKUs of our fourth gen Intel Xeon scalable processor, Sapphire Rapids, to select customers in Q1.”

Intel shipped almost 2 million Icelake servers last quarter, double that of the previous quarter, and again likely out shipped AMD‘s EPYC2 and EPYC3 combined. AMD does have a superior product right now, that can command much higher ASPs. That’s an awesome achievement from AMD, but Intel will be stronger with Sapphire Rapids soon.

Edit: And for those who continue to falsely claim that Intel can’t yield big chips on 10nm, the two IceLake server chips are 470 and 628 mm2 respectively.

igor_kavinski · Jun 7, 2022

^^^

That's why Intel's too big to fail. I mean, 7X higher server chip volume than AMD??? Insane!

jpiniero · Jun 7, 2022

ashFTW said:
Edit: And for those who continue to falsely claim that Intel can’t yield big chips on 10nm, the two IceLake server chips are 470 and 628 mm2 respectively.

You can make up somewhat for lousy yield by burning way way more wafers. That's the nice thing about being an IDM. Intel probably sells close to zero fully enabled HCC Icelake server chips and probally very little XCC in general. How far they have to cut the HCC to have any supply is something you can't really tell.

From Dell's website you can see that they are still selling plenty of Cascade Lake.

Henry swagger · Jun 7, 2022

ashFTW said:
While it’s a fact that Intel has lost a lot of server market share to AMD, the above statement is patently false.

Intel Q4 2021 Earnings Call: “Our Data Center Group had its best quarter ever as customers continued rebuilding their confidence in choosing Intel. Enabled by our IDM advantage, Ice Lake servers shipped more than 1 million units, equal to the amount we had shipped in the prior 3 quarters combined.”

Intel shipped 7.71 million total servers that quarter. In contrast, AMD shipped 1.13 million total servers (EPYC2 and EPYC3). One can conclude that Icelake volume was very likely greater than that of EPYC3.

Intel Q1 2022 Earnings Call: ”Our third-generation Intel Scalable processor Ice Lake has now shipped almost 4 million units and Amazon Web Services recently announced general availability of its EC2 I4i instance designed for storage and I/O intensive workloads. This is the 48th AWS instance powered by Ice Lake.

I am also pleased to say that as committed, we began shipping initial SKUs of our fourth gen Intel Xeon scalable processor, Sapphire Rapids, to select customers in Q1.”

Intel shipped almost 2 million Icelake servers last quarter, double that of the previous quarter, and again likely out shipped AMD‘s EPYC2 and EPYC3 combined. AMD does have a superior product right now, that can command much higher ASPs. That’s an awesome achievement from AMD, but Intel will be stronger with Sapphire Rapids soon.

Edit: And for those who continue to falsely claim that Intel can’t yield big chips on 10nm, the two IceLake server chips are 470 and 628 mm2 respectively.

Well said.. supply volume is what customers want the most

Henry swagger · Jun 7, 2022

igor_kavinski said:
^^^

That's why Intel's too big to fail. I mean, 7X higher server chip volume than AMD??? Insane!

Intel is flexing its supply pipeline.. amd would need to fabs supplying them to compete

ashFTW · Jun 7, 2022

jpiniero said:
You can make up somewhat for lousy yield by burning way way more wafers. That's the nice thing about being an IDM. Intel probably sells close to zero fully enabled HCC Icelake server chips and probally very little XCC in general. How far they have to cut the HCC to have any supply is something you can't really tell.

Yields always drop with a newer process. Intel has shown such graphs for ever, and these things are discussed during earning calls. In the last call, Intel reported "Gross margin for the quarter was 53%, exceeding our guidance by 100 basis points on improved manufacturing yields and lower factory costs."
So things are getting better. Intel also reported the crossover from 14 to 10nm a while ago.

jpiniero said:
Intel probably sells close to zero fully enabled HCC Icelake server chips and probally very little XCC in general. How far they have to cut the HCC to have any supply is something you can't really tell.

I don't have any data on this. They designed these processors with a certain yield, recoverability, and product mix in mind. And they are sticking with big die for at least their next two server releases - SPR, and EMR. Future trend is definitely towards smaller chiplets and advanced packaging, which will ameliorate these concerns.

jpiniero said:
From Dell's website you can see that they are still selling plenty of Cascade Lake.

Icelake volume was still only 1/7th of the total server volume in Q4; it's no surprise Cascade Lake outsells IceLake. Another reason IceLake has had lower level of OEM interest, other than competitive pressure from AMD, is that IceLake is the sole Xeon release on the Whitley platform. It didn't provide enough time for OEMs to recoup their platform investments. They like at least 2 years/releases per platform. For the same reasons, most if not all large OEMs didn't even engage with IceLake Xeon-W. Intel typically has done 2 releases per platform, but they were forced by AMD to quickly move to the next platform. Now that Granite is delayed, Intel has decided to do EMR as a follow on to SPR on Eagle Stream.

nicalandia · Jun 7, 2022

IntelUser2000 said:
Actually on some very demanding HPC applications Hyperthreading is turned off because it performs better. Linpack for example performs higher when HT is off since Linpack saturates the FPU and the second thread has absolutely no room for enhancing performance.

SMT benefits performance by filling pipeline bubbles AND covering LLC misses. HPC applications LOVE memory bandwidth, the dataset is large, and fully saturates the execution units. Plus, even if HT was faster, it would have to clock lower because full AVX execution is very demanding anyway.

Great way to spin that around. Except OpenFoam benefit from HT On. They load 250 iterations on a 120 thread CPU vs one with 224 threaded one.

Hyperthreading for CFD – Ground Effect Vehicle Aerodynamics

glypo.com

jpiniero · Jun 7, 2022

ashFTW said:
Intel typically has done 2 releases per platform, but they were forced by AMD to quickly move to the next platform.

Intel had two releases, but the Cooper Lake one got cancelled. Probably because OEMs didn't want to bother because the benefit versus Cascade wasn't much and there was doubt that Intel would be able to do Icelake Server. IIRC that was when they did the Refresh/Price Cut of Cascade instead.

And Intel has done anything but move fast. All they've done is delay because of the yields. In 2019, Intel was talking about Sapphire coming out in 2021. Now it's the end of 2022. They had to fill the void left from 7 nm being behind schedule with Emerald.

Exist50 · Jun 7, 2022

Cardyak said:
I have no idea if/what will feature in Royal Core, if anyone does I would be glad to share information. @Exist50 seems to have reputable sources inside Intel so it would be great to have his insight into this.

So, understand that most of my knowledge is, somewhat ironically, about people. Who's working on what, how they feel about what they're working on, how they feel about what others are working on, etc. I've had good luck extrapolating technical information from that, but for something like specific features in a future core? Forget it. Companies keep those details very well guarded, and there's no value in me just guessing.

For that matter, it's not like I have any contacts actually within the Royal core team itself. They've just apparently given a couple of high level presentations to various other business units. "Who we are. What we're trying to do.", stuff like that. And so word's gotten around for the basic stuff. The overwhelming consensus is unabashed optimism, and I extrapolate from there.

Cardyak said:
Value Prediction - One of the key academics who has pioneered research in this area is André Seznec. He recently accepted a position at Intel. Value Prediction is arguably the "Holy Grail" of ST Perf increases and is pretty much required at some point to increase performance. This is a question of not if, but when. There's an absolute ton of research around this and the opportunity is enormous.

Ok, I have no idea how on earth you found that name, but yes, I think Seznec is working on Royal. Just decided to browse his page, and there're some interesting things on there, including history on DEC's Alpha EV8, and inventing the TAGE branch predictor. Pretty impressive resume.

André Seznec – PACAP

And a fun little snippet.

As a programmer I am fundamentally a sequential guy.. Therefore my past researches in computer architecture have essentially focussed on providing performance for sequential programs, mainly on a single processor.

Will need to look into that value prediction stuff, but sounds cool.

ashFTW · Jun 7, 2022

jpiniero said:
And Intel has done anything but move fast. All they've done is delay because of the yields. In 2019, Intel was talking about Sapphire coming out in 2021. Now it's the end of 2022. They had to fill the void left from 7 nm being behind schedule with Emerald.

True that! But according to their Q1 call, they did start shipping some SPR SKUs last quarter to select customers. This list of customers must include Argonne National Labs, and I would think that they want only the highest core count SPR with HBM. I expect Aurora to feature in the next Top 500 list.

Has Intel said end of ’22 publicly for full release, or is it just rumors and/or speculation? If not, I would rather wait a month and see what they say on their Q2 earnings call.

ashFTW · Jun 7, 2022

Exist50 said:
André Seznec: As a programmer I am fundamentally a sequential guy.. Therefore my past researches in computer architecture have essentially focussed on providing performance for sequential programs, mainly on a single processor.

I’m the opposite. I love to parallelize my workloads and run on as many threads I can. It’s a joy to break up problems into smaller pieces and compose back the sub results. But I do have a big bias for multiprocessors vs multicomputers/clusters. I need to correct that bias in the future.

IntelUser2000 · Jun 7, 2022

nicalandia said:
Great way to spin that around. Except OpenFoam benefit from HT On. They load 250 iterations on a 120 thread CPU vs one with 224 threaded one.

Hyperthreading for CFD – Ground Effect Vehicle Aerodynamics

glypo.com

You have AMD's own marketing materials that have SMT off: https://www.amd.com/system/files/documents/amd-epyc-7Fx2-openfoam.pdf

This shows that in all applications tested, there are gains with lower core counts but not with higher ones: https://www.nas.nasa.gov/assets/nas/pdf/papers/NAS_Technical_Report_NAS-2015-05.pdf

You are talking about 6% gains in a specific scenario with most getting no gains or even lower with recommendations to disable SMT.

jpiniero · Jun 7, 2022

ashFTW said:
Has Intel said end of ’22 publicly for full release, or is it just rumors and/or speculation?

They can launch Sapphire whenever. I believe last they were still talking about a 1H launch but the clock is ticking on that.

But actually buying it is probably like Icelake Server when you won't be able to get it in any quantities until Q4.

jpiniero · Jun 7, 2022

Nvidia taps Intel’s Sapphire Rapids CPU for DGX H100 system

A win against AMD as a much bigger war over AI compute plays out

www.theregister.com

FWIW nVidia decided on using SPR for the x86 Hopper DGX.

ashFTW · Jun 7, 2022

jpiniero said:
They can launch Sapphire whenever. I believe last they were still talking about a 1H launch but the clock is ticking on that.

But actually buying it is probably like Icelake Server when you won't be able to get it in any quantities until Q4.

Lot of the volume will go to the cloud vendors. Same is true of AMD. They are already receiving deliveries.

lyonwonder · Jun 7, 2022

I wouldn't be surprised if Meteor Lake or any of its successors drops legacy hardware features like real mode and hardware X87 since new motherboards no longer have (or are supposed to no longer have since it's an Intel mandate) legacy BIOS support with CSM.

Booting in 32-bit mode may eventually go by the wayside too with future CPU's only booting in long mode and only spporting 64-bit OSs, though I doubt Intel will remove hardware support for 32-bit from the next several CPU generations since 32-bit applications aren't going away any time soon.

ashFTW · Jun 7, 2022

jpiniero said:
Nvidia taps Intel’s Sapphire Rapids CPU for DGX H100 system

A win against AMD as a much bigger war over AI compute plays out

www.theregister.com

FWIW nVidia decided on using SPR for the x86 Hopper DGX.

I small win for Intel. Could be due to the HBM version, don’t know if Genoa has one. Intel must have bribed them or given the processors away for free! /s

Henry swagger · Jun 7, 2022

ashFTW said:
I small win for Intel. Could be due to the HBM version, don’t know if Genoa has one. Intel must have bribed them or given the processors away for free! /s

Stop lying.. are you a amd employee or something

uzzi38 · Jun 7, 2022

ashFTW said:
I small win for Intel. Could be due to the HBM version, don’t know if Genoa has one. Intel must have bribed them or given the processors away for free! /s

It has nothing to do with HBM. Intel just came in cheaper. Didn't matter last time back when Nvidia picked Rome because Intel didn't have a PCIe Gen 4 platform back then. This time around they have a competent platform and were actually an option.

JasonLD · Jun 7, 2022

uzzi38 said:
It has nothing to do with HBM. Intel just came in cheaper. Didn't matter last time back when Nvidia picked Rome because Intel didn't have a PCIe Gen 4 platform back then. This time around they have a competent platform and were actually an option.

One thing Sapphire Rapids does well over competition is AI related performance and well, that is the system focused on AI.

Discussion Intel current and future Lakes & Rapids thread

Moderator Emeritus, Elite Member

Platinum Member

Platinum Member

Member

Lifer

Senior member

Lifer

Lifer

Senior member

Senior member

Senior member

Diamond Member

Lifer

Platinum Member

Senior member

Senior member

Elite Member

Lifer

Lifer

Senior member

Member

Senior member

Senior member

Platinum Member

Senior member