Discussion Intel current and future Lakes & Rapids thread

IntelUser2000 · Jun 4, 2022

Hougy said:
12-21% IPC increase for MTL over RPL? Locuza and Semianalysis already did a great die shot analysis of MTL. It seems there were few changes to the core, so doesn't that make this IPC prediction completely impossible?

I've seen MLID be proved wrong many times before, but never as soon as he made the prediction 😆

It's not small. Here's what Raichu said:

About Meteor lake. MTL focus on how to improve the efficiency of the instruction execution, it will not widen the microarchitecture crazy like Alder lake.

More improvements maybe will focus on branch prediction, Micro-operation fusion, instruction dispatch, register remake, and EU execution efficiency.

Some are expecting Haswell-like gains, which is low 10%. 12% seems to be a good average. Haswell was also efficiency focused, so some have said they reduced potential performance to lower power.

Maybe they are doing small/Medium gains rather than zero/Large gains they are doing now. Sunny = Large, Willow = tiny/zero, Golden = Large

to,

Raptor = small, Redwood = Medium, etc. Which would be better for planning I guess?

biostud · Jun 4, 2022

Obviously we don't know if zen6 is going to be on AM5, but the continued socket changing definitely wants me to stay with AMD.

Exist50 · Jun 4, 2022

IntelUser2000 said:
Yes desktop haha.

So, what I've heard of MTL desktop is...weird, and I'm not sure I've any leak quite in alignment. That said, my info might very well be out of date, but I'm ultimately quite curious what Intel decides to do. Timing of Arrow Lake is probably critical.

IntelUser2000 said:
I don't think the CPU comparisons will be bad, at least performance. The GPU is a problem.

I'm not nearly so optimistic for Intel. Obviously graphics will be a huge win for AMD, but they also have a full node jump + new architectures going for them on the CPU side. The efficiency gap is going to be pretty darn stark, and Intel's battery life numbers are bad enough as it is. I can't believe they actually regressed from Tiger Lake.

IntelUser2000 said:
But it doesn't really matter if Raptorlake mobile is having a comprehensive lineup. Then Meteorlake mobile is going to come 12 month later, period. [snip]

So, I get what you're saying here, but it would definitely be better for Intel to have Meteor Lake come out ASAP, and deal with any consequences to Raptor Lake or whatever. I consider myself a bit of an MTL pessimist, but it's still going to be a significant improvement over RPL, and I imagine the fabs are begging for something on Intel 4/3 to ramp with.

IntelUser2000 said:
It's always a tradeoff. So apples-to-apples the square root law says 30% performance needs a core that's 70% larger. Also the single core focused design will have higher frequencies, so that'll result in the core larger as well.

I dislike the naive application of the square root law here, as clearly there are many different designs in the market, some of which are strictly superior to others, and I think that's the root of my argument - the differences in the underlying engineering investment.

Like, if the single thread difference between the two is small enough, then it seems to me like a Zen 4/4c arrangement is better. Basically have a base design and a derivative for a different market, instead of "reinventing the wheel" for two separate architectures. Granted, massive political implications of that at a company like Intel, but still.

Exist50 · Jun 4, 2022

IntelUser2000 said:
It's not small. Here's what Raichu said:

Some are expecting Haswell-like gains, which is low 10%. 12% seems to be a good average. Haswell was also efficiency focused, so some have said they reduced potential performance to lower power.

Maybe they are doing small/Medium gains rather than zero/Large gains they are doing now. Sunny = Large, Willow = tiny/zero, Golden = Large

to,

Raptor = small, Redwood = Medium, etc. Which would be better for planning I guess?

So, I admittedly took something of a leap here. The actual info I'm working with is the claim that IDC's best engineers (and the majority in general) rolled off from Golden Cove to Lion Cove, meaning that Redwood Cove and especially Raptor Cove get the scraps. Therefore, my logic goes, surely the IPC gains from either won't be competitive with Golden Cove. Plus, I've heard that a lot of effort on Lion Cove is focused on "modernizing" it. I'm still uncertain what numbers that will translate to for IPC, freq, etc., but the implication was that it's a painful, if necessary adjustment, so I interpreted it as a negative modifier. In any case, I do think that MLID's >30% number is too high. When's the last time any company achieved that gen/gen? Hah, then again, I guess I have hyped up LNC a bit myself.

Maybe I'll regret making such strong claims, but I'm sticking to my guns for now. I think they're justified based on the available information.

Plus, MLID has a pretty terrible track record with IPC claims, and usually overshoots. We're seeing that play out right now in the Zen 4 thread.

DrMrLordX · Jun 4, 2022

moinmoin said:
So aside of a 10 cores glitch with Comet Lake Intel is stagnating with 8 P-cores for years to come and hoping people won't notice because eventually 32 E-cores?

Maybe Intel will petition Congress to rescind Amdahl's Law.

LightningZ71 said:
Have we legitimately seen a DESKTOP consumer grade application that would really benefit from having more than 8 P cores as opposed to having 4 times as many additional E cores?

Handbrake comes to mind. Its scaling diminishes beyond a certain number of cores. I would certainly rather have 12-16P cores to commit to it than a bunch of e-cores. Also I would much rather have 16P cores for a unified gaming + streaming box.

Timmah! · Jun 4, 2022

moinmoin said:
So aside of a 10 cores glitch with Comet Lake Intel is stagnating with 8 P-cores for years to come and hoping people won't notice because eventually 32 E-cores?

yeah. Hard pass as far as i am concerned. Gimme 16 + 16 instead of 8 + 32 and then i might be interested.

its interesting we are getting tidbits of info about 2024 products like Arrow lake, meanwhile all there is to say about Fishhawk Falls, supposedly to be released this year, is that the sources are starting to “get briefed”. Tells you everything you need to know about it, i guess.

IntelUser2000 · Jun 4, 2022

Exist50 said:
Plus, MLID has a pretty terrible track record with IPC claims, and usually overshoots. We're seeing that play out right now in the Zen 4 thread.

Don't think it was just MLID that expected that gain. Pretty much everyone did. Maybe there was a confusion between total performance versus architectural.

Yea I don't know about Lion Cove. Maybe the efficiency gain will be great, but the absolute performance gain for the uarch is Golden Cove level. Like when we look at Haswell, it was great for laptops, but performance-wise? Not much.

20% gain takes amazing amount of enhancement and additions to get there. 30% takes a serious, almost forced flaw in the predecessor to happen.

mikk · Jun 4, 2022

Hougy said:
12-21% IPC increase for MTL over RPL? Locuza and Semianalysis already did a great die shot analysis of MTL. It seems there were few changes to the core, so doesn't that make this IPC prediction completely impossible?

I've seen MLID be proved wrong many times before, but never as soon as he made the prediction 😆

There is a good analysis from Cardyak on this topic. Redwood is more focused on integer upgrades.

https://twitter.com/x/status/1531196797783314434

Some people say the LGA2551 picture from MLID belongs to BGA2551 and is likely the successor of BGA1964 (ADL-HX, basically ADL-S on BGA). If you say 8+16 won't exist and assuming it's true it's doesn't seem like MTL-S is able to replace the upcoming 13900k.

jpiniero · Jun 4, 2022

Going back to this...

ashFTW said:
SPR on the other hand, since it only has a 1/4th subset of memory controllers, PCIe, CXL, UPI on each of the tiles, they cannot make a chip with full IO unless all 4 chiplets are used. That makes sense for large core count parts. For smaller core counts it’s much cheaper to make monolithic. It would be a disaster if they had to use 1600 mm2 silicon (not counting the EMIB tiles) for every SKU, some of them may even go as low as 8/12/16 cores. Keeping the 4 chiplet design for lower core count also doesn’t make sense, because if you decide to make smaller “1/4th split chiplets”, you might as well make a monolithic which will be far cheaper to make. Intel in the past has made several different size Xeon chips to address the core count range — XCC, MCC, LCC. So they are likely to make several size chips; they do not have the financial constraints to only make one tile (and it’s mirror) for SPR.

How big do you think a 24c/8ch/4xIO monolithic die would be? The problem is that 10 nm yield is still very mediocre to the point where you'd have to slash the core count way down to the point where it wouldn't really be viable as a Metal Xeon.

Now MAYBE you could make an HEDT monolithic die that could work if you are frugal enough. The problem with that is I don't think Xeon W demand is all that much. Kind of wondering if you would be able to satisfy the Xeon W demand with just partially busted EMIB chips.

Xeon E does have some demand. But you'd have to come up with something that would make sense. I'm sure there would be some fans of an 24 15 GC core on LGA 1700.

nicalandia · Jun 4, 2022

moinmoin said:
So aside of a 10 cores glitch with Comet Lake Intel is stagnating with 8 P-cores for years to come and hoping people won't notice because eventually 32 E-cores?

Not sure about you...

But 8 P-cores with High IPC for gaming and the equivalent of a Xeon W-3175X for highly threaded apps on a Single CPU sounds like a Fine CPU to me.

That is a very powerful CPU, we are talking about a CPU that will likely match or beat a Zen3 ThreadRipper 5975WX in MT workloads.

Timmah! · Jun 4, 2022

nicalandia said:
Not sure about you...

But 8 P-cores with High IPC for gaming and the equivalent of a Xeon W-3175X for highly threaded apps on a Single CPU sounds like a Fine CPU to me.

That is a very powerful CPU, we are talking about a CPU that will likely match or beat a Zen3 ThreadRipper 5975WX in MT workloads.

View attachment 62559

View attachment 62560

Then again, you could have say 24 of those high IPC cores, with only say 8 of them boosting to 5+ GHz while gaming, and then most/all of them running on lower clocks for highly-threaded apps and very likely be significantly faster than Xeon 3175x, and not just its equivalent. As a bonus, you could get away with all the little.big core scheduling shenanigans or lack of AVX-512 for the same reason.

Granted, that would require Intel to finally get their process to the TSMC levels.

Doug S · Jun 4, 2022

mikk said:
There is a good analysis from Cardyak on this topic. Redwood is more focused on integer upgrades.

Can I just say that it is kind of amazing that someone was able to take pictures of wafers on a show floor of a high enough quality that we can not only determine die sizes, but do so even for the various functions on the die!

Glo. · Jun 4, 2022

Exist50 said:
Thanks. Was on mobile at the time. Didn't want to scrub through a video. Or give it a view, tbh. And quality is plenty fine.

So I'll just give my take item by item.

MTL:

Socket - No clue. I thought MTL/ARL would use LGA ~~1800~~ 1700 (Edit: Typo?), but I also knew they would not be platform compatible with ADL/RPL. Could see this being true, but little difference either way. Though a new socket typically hurts motherboard prices.

IPC - Bull. Maybe bigger than Raptor Lake, but 12-21%? Nah, he's just making stuff up.

Clock speed regressions - Probably bull. Maybe they'll lose 100-200MHz or so, but large enough to compare to ICL? Nah. I fully expect the node shrink to make up any regression from arch/design, and personally guess that clocks will ultimately be higher between comparable SKUs.

VPU - sure

2+8, 6+8 - sure, 8+16 - no

RPL/MTL volume split - Not sure, but certainly suspicious.

Timing - Sounds reasonable enough.

ARL:

8+32 on 20A - Think so?

LNC IPC - I'm thinking comparable-ish to Golden Cove's gains. Expecting >>GLC gains (like his previous "at least 30%" claim) is just nonsense. But make no mistake, LNC is probably the most important evolution of Core since its inception. Much better in a whole host of ways.

Lion Cove is not Royal. Royal is Royal. How hard is this to understand? Clearly no clue what he's talking about.

Skymont - We're in for a treat with this one.

Timing - Sounds reasonable enough.

In short, I think all the "new", important details range from suspect to nonsensical, and the rest just reiterating well established rumors.

And he also "leaked" socket LGA2551 photo which clearly based on the photo is a BGA type of socket.

ashFTW · Jun 4, 2022

jpiniero said:
How big do you think a 24c/8ch/4xIO monolithic die would be? The problem is that 10 nm yield is still very mediocre to the point where you'd have to slash the core count way down to the point where it wouldn't really be viable as a Metal Xeon.

28 core Icelake Xeon is around 470 mm2 **, and is shipping in high volume along with its bigger sibling, as has been said by Intel management on their earning calls. I expect SPR 24 core chip to be around the same size, maybe a tad smaller like 450 mm2. SPR is on Intel 7, an improved 10nm compared to ICL. Alder and Raptor Lake are also on Intel 7. Raptor will probably be half the size of the above monolithic SPR, which is again going to be produced in large volume.

So my conviction stands as before -- 1 (maybe even 2) SPR monolithic die at the lower core count end, and these being repurposed for Xeon-W3400 series. Lower core count servers are probably much higher volume, so addressing them separately (and not throwing 1600 mm2 plus at them, plus the complexity and cost of advance packaging) makes total sense!

** The 40 core die is 628 mm2, and the 28 core die visually looks to be 3/4th the size (Source wikichip).

nicalandia · Jun 4, 2022

Doug S said:
Can I just say that it is kind of amazing that someone was able to take pictures of wafers on a show floor of a high enough quality that we can not only determine die sizes, but do so even for the various functions on the die!

And thanks to that we are able to further guesstimate future Products based on this info.

For example here. 8 + 16 Raptor Lake CPU 13900K Mock Up compared to a 8 + 16 Meteor Lake Compute Cluster with size difference and die size estimates based on info from Locuza and semianalysis

ashFTW · Jun 4, 2022

Doug S said:
Can I just say that it is kind of amazing that someone was able to take pictures of wafers on a show floor of a high enough quality that we can not only determine die sizes, but do so even for the various functions on the die!

Yes, indeed!

jpiniero · Jun 4, 2022

ashFTW said:
I expect SPR 24 core chip to be around the same size, maybe a tad smaller like 450 mm2. SPR is on Intel 7, an improved 10nm compared to ICL.

I crudely did it out, based upon the available SPR tile shot, and got 719 mm2, lol. 428 mm2 for 24 cores and the memory tiles, 228 mm2 for 4xIO and 63 for 4xmemory controllers. That can't be right but surely that's closer than your estimate. Golden Cove Server is a lot bigger than Client because of AMX and the extra AVX-512 unit.

IntelUser2000 · Jun 4, 2022

Timmah! said:
Then again, you could have say 24 of those high IPC cores, with only say 8 of them boosting to 5+ GHz while gaming, and then most/all of them running on lower clocks for highly-threaded apps and very likely be significantly faster than Xeon 3175x, and not just its equivalent.

24 cores would take an insane amount of space, even on the latest process from TSMC.

24P cores is actually roughly equal to 8 P and almost 64 E cores not 32.

mikk said:
There is a good analysis from Cardyak on this topic. Redwood is more focused on integer upgrades.

So he's saying it can get 20% gain for Integer but much less for FP? When they say "Integer" it means basically overall perf/clock improvement. It's like increasing the speed limit of the highway and widening it. It benefits every block that uses it, not just "integer". Branch prediction, ROBs, OoOE blocks, L/S units, micro op cache, all benefit all code. It's not like FPU is on a separate block connected by a ring bus.

It's way easier to get FP gains than Integer. The latter you cannot just double blocks and double performance that way.

Based on what @Exist50 is saying, I'd put that as wild ass speculation.

nicalandia · Jun 4, 2022

ashFTW said:
So my conviction stands as before -- 1 (maybe even 2) SPR monolithic die at the lower core count end

Just be prepared to be disappointed when Intel officially announces Sapphire Rapids for Workstations.

IntelUser2000 said:
24 cores would take an insane amount of space, even on the latest process from TSMC.

24P cores is actually roughly equal to 8 P and almost 64 E cores not 32.

24 P-Cores on a world where thermals don't matter would get 56,000 points in CB R23, a realistic 8 P + 32 E configuration gets you 50,000+ points in CB R23(as estimated by Puget Systems)

Exist50 · Jun 4, 2022

IntelUser2000 said:
Yea I don't know about Lion Cove. Maybe the efficiency gain will be great, but the absolute performance gain for the uarch is Golden Cove level. Like when we look at Haswell, it was great for laptops, but performance-wise? Not much.

Just to be clear, do you mean Lion Cove or Redwood Cove here? I'm inclined to believe that Raichu is correct, and that Redwood Cove is more efficiency focused. I think Lion Cove is a much better opportunity from a performance perspective, but I think they're going to be careful to balance power and area as well.

IntelUser2000 said:
20% gain takes amazing amount of enhancement and additions to get there. 30% takes a serious, almost forced flaw in the predecessor to happen.

Well I expect Royal to be far, far beyond a mere +30%, but that's worth a thread in its own right. Maybe will make one if anything of substance actually leaks.

Exist50 · Jun 4, 2022

mikk said:
There is a good analysis from Cardyak on this topic. Redwood is more focused on integer upgrades.

https://twitter.com/x/status/1531196797783314434

Some people say the LGA2551 picture from MLID belongs to BGA2551 and is likely the successor of BGA1964 (ADL-HX, basically ADL-S on BGA). If you say 8+16 won't exist and assuming it's true it's doesn't seem like MTL-S is able to replace the upcoming 13900k.

I am very unimpressed with this take, and think they're basically just reading the tea leaves and pulling actual numbers out of thin air. They have no idea what individual changes actually consist of, and while I can admire effort being spent into analyzing the information we have, I'm considerably colder towards any attempts to make confident assertions about the results.

Also, those two in the OP have been rather confidently incorrect on some past assertions. Remember Locuza's original floorplan for MTL? Yeah, that was way off...

IntelUser2000 · Jun 4, 2022

Exist50 said:
Just to be clear, do you mean Lion Cove or Redwood Cove here? I'm inclined to believe that Raichu is correct, and that Redwood Cove is more efficiency focused. I think Lion Cove is a much better opportunity from a performance perspective, but I think they're going to be careful to balance power and area as well.

I am addressing your whole post, so yes I'm talking about Lion Cove.

Yea I'm not sure if I buy the fantastic, 30%+ gains even for the Royal Cove project. We'll see when it happens. Yes I can believe amazing amount of effort and reorganization of teams would happen, but in terms of absolute numbers I am skeptical.

Like when Nvidia was claiming some epic level adjustments with Pascal but it was nowhere near the hype in terms of numbers. Or how with FinFET they claimed some revolution but performance/watt gains were just in line with normal trends. Things like FinFET is an enabler to continue that's all. RibbonFET and PowerVia seems awesome but the performance gains are 15% and less than previous ones which were plain old boring FinFETs.

Of course if you are already at the top level in things any progress you made you want to tout your horns but I am not going to believe that you are increasing the trendline. It's like sprinters making 1ms progress each year. Eventually it's all marketing.

ashFTW · Jun 4, 2022

nicalandia said:
Just be prepared to be disappointed when Intel officially announces Sapphire Rapids for Workstations.

Same goes for you, my friend! 🙂

IntelUser2000 · Jun 4, 2022

nicalandia said:
24 P-Cores on a world where thermals don't matter would get 56,000 points in CB R23, a realistic 8 P + 32 E configuration gets you 50,000+ points in CB R23(as estimated by Puget Systems)

Yea and 8+32 would take way less space and power. 8+32 = 16

8+64 equals 24P. Now tell me how that performs!

And there's an additional benefit where the P core can be P+, and be bigger and more performant than otherwise for even better ST performance and responsiveness. That's the whole point of hybrid. You get to specialize the cores way more than otherwise.

The real promise is this: Rather than doing 8+64 in place of 24P, you do 8 supercharged P cores + 32 E cores. Of course the P cores would be a lot larger. Let's say 30% faster per clock and twice the size.

Remember, this is in addition to whatever they would do normally. So I believe for risk mitigation it'll be spread out over few generations. So rather than new gen P being 18% faster, you have it being 24% faster for next 4-5 generations. And at the end, you have a very large P core and sea of E cores. Supercharging it for low and high thread.

ashFTW · Jun 4, 2022

I'm repeating myself a lot, but perhaps this helps...

We can break this discussion down to two questions. Let's answer the first one, and then answering the second becomes much easier.

Question 1. Intel has only publicly discussed the 400 mm2 XCC SPR tile with 15 cores each. Four of these are combined together with EMIB to make the SPR chip with (up to) 60 cores. It's not currently publicly known how Intel plans to make lower core count (8/12/16 etc) SPR chips. For example, how will intel make something like the 16 core IceLake Xeon Silver 4314 with a full complement of PCIe lanes and memory channels (but reduced cache), which has an MSRP of only $750?

Option 1: Since only 1/4th the I/O and memory is on each SPR tile, four XCC SPR tiles could be used. Note that these tiles can have defective/fused-off cores but no unrecoverable defects in the I/O, memory, and EMIB PHY areas. There are also 10 additional EMIB tiles (totaling 215 mm2 **), as well as the added costs of advanced packaging. Given that lower end chips have a much larger volume and low MSRP, this option as the only option, is complete madness! Intel may be forced to use this option to satisfy some portion of the low-core parts volume, or for SKUs with full L3 cache, but using this option exclusively would be a colossal money loosing proposition.

Intel 7 yield issues have been brought up before to support this option. But Intel 7 is making half this size die in high volume Alder/Raptor lake parts with no problem. And with extensive block repair/recovery methods, with 74% of the chip being recoverable, a large portion of the SPR tiles will be functional. Intel has also been selling 470 and 628 mm2 Icelake Xeon parts in high volume but on a slightly older 10nmSF.

** Estimated from the figure below from IEEE ISSCC 2022. Also note that EMIB takes over 1/8th of the SPR tiles.

Option 2: Keep the 4 tile design, but make the tiles smaller with reduced number of cores. Let's remove two rows of cores (total 8) per tile. Now we have 7 cores per tile, and 28 core parts with all cores functioning. We do have all the I/O and memory, but we also still have the complexity of the 4 tile design. The Silicon savings are there of course. I estimate that these tile will be 250 mm2 or so. Two fewer EMIB tiles will be needed, but over 17% of the 1000 mm2 is now dedicated to die-to-die fabric.

Option 3: Build a monolithic die for lower core counts. For example, take one of the 15 core tiles and add 1-2 rows of cores. Use the area now dedicated to EMIB PHYs to add additional I/O and memory. Option 3 is superior to Option 2, because it's much simpler and cheaper to make, while still reusing large parts of the design. The size should be 450-500 mm2. This option also better covers the even lower core count (like 8 and 12) SKUs. With no multi-die fabric, the chip should perform better as well.

I have no insider information, but to me Option 3 makes the most sense for Sapphire and Emerald Rapids. Granite Rapids and Falcon Ridge are disaggregated with separate compute only tiles, in which case the number of cores can be scaled by just adding more (preferred) or bigger compute tiles.

Question 2. How will Intel make SPR workstations chips?

For a high core count professional workstation (say with 48-60 cores) 4 x XCC is the only option.

For a low core count enthusiast workstation with 16-24 cores and half the I/O and memory channels, my answer would be to use the monolithic die from Option 3 above, though 2 x XCC would also be fine here as @nicalandia has suggested. Or it could be a combination of the two, but being low volume product, that's unlikely.

This is not a competition, let's wait and see what Intel does.

Discussion Intel current and future Lakes & Rapids thread

Elite Member

Lifer

Platinum Member

Platinum Member

Lifer

Golden Member

Elite Member

Diamond Member

Lifer

Diamond Member

Golden Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Lifer

Elite Member

Diamond Member

Platinum Member

Platinum Member

Elite Member

Senior member

Elite Member

Senior member