Discussion Intel current and future Lakes & Rapids thread

nicalandia · Jun 19, 2022

ashFTW said:
Roughly 3-4 Gracemont cores can fit inside a Goldencove core. I assume you multiplied the 15 cores on Sapphire Rapids (SPR) tile by 4 to reach 60?

Correct, one can fit exactly 4 Gracemont cores on the space of a server class Golden Cove core(larger than client size)

ashFTW said:
Granite Rapids: So how many P-cores can one fit in 600mm2 (400x1.5) on Intel 7? 15 Golden cores take 200mm2, or 13.3mm2 per core.

Your Numbers are Off.

Xeon based Golden Cove core die area size is 10.5 mm^2 with L3$. 10.5 x 16 = 168 mm^2, the rest(32 mm^2) is used by the Mesh/Ring Bus interconnect. so with L2$ and Ring Bus the total die area is 12.50 mm^2 per core

ashFTW said:
Lets make the new core size 15mm2 to account for new features. The answer is 40 P-cores per tile, or 160 cores per chip with 4 top tiles.

I don't expect that Intel3 will have the same die area shrinkage that we are seeing from Intel7 to Intel4(based on current information) and I don't expect that Granite Rapids will add much more to the new instruction set provided by Sapphire Rapids

We can extrapolate that Server Class Redwood Core on was going to be 9.37 mm^2 including L3$ and Mesh/Ring interconnect . so using your number to calculate it will be 600/9.37 = Exactly 64 cores per tile and 256 Core per CPU and Sierra Forest has the die area size to fit exactly 10,24 e cores.

ashFTW · Jun 19, 2022

nicalandia said:
Your Numbers are Off.

Xeon based Golden Cove core die area size is 10.5 mm^2 with L3$. 10.5 x 16 = 168 mm^2, the rest(32 mm^2) is used by the Mesh/Ring Bus interconnect. so with L2$ and Ring Bus the total die area is 12.50 mm^2 per core

I measured the die. See the picture above. Assuming the XCC die is 400 mm2, which Is what Intel reported, the area taken up by the cores, and mesh, and cache is 200 mm2 for 15 cores. I’m not going to argue the minuscule difference between 13.3 and 12.5.

nicalandia said:
I don't expect Intel3 to be as big of a jump that we are seeing from Intel7 to Intel4(based on current information) and I don't expect that Granite Rapids will add much more to the new instruction set provided by Sapphire Rapids

I was comparing Intel 7 to Intel 3, to come up with 1.5x density increase. Intel 4 to 3 is expected to be 8-10% denser for the high performance library.

nicalandia said:
We can extrapolate that Server Class Redwood Core on was going to be 9.37 mm^2 including L3$ and Mesh/Ring interconnect . so using your number to calculate it will be 600/9.37 = Exactly 64 cores per tile and 256 Core per CPU and Sierra Forest has the die area size to fit exactly 10,24 e cores.

If you are going to use Redwood cove size on Intel 4, then you should use 400mm2 to estimate the number of cores. And you will arrive at similar number as me -- 40 cores per tile, and 160 per chip. You need to reread my post.

jpiniero · Jun 19, 2022

ashFTW said:
If you are going to use Redwood cove on Intel 4, then you should use 400mm2 to estimate the number of cores. And you will arrive at similar number as me -- 40 cores per tile, and 160 per chip. You need to reread my post.

I'd say the issue is going to be more power consumption than feasibility. 160 Redwood Cove cores sounds very toasty.

ashFTW · Jun 19, 2022

jpiniero said:
I'd say the issue is going to be more power consumption than feasibility.

Yes, agree. Thats why I watered it down from 160 to 128. Even then, the max TDP might be as high as 500/600W, as I commented earlier.

Page 656 - Discussion - Intel current and future Lakes & Rapids thread

Page 656 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

Intel can go with fewer top chiplets to make lower TDP parts with fewer cores. Unlike Sapphire Rapids, the platform capability (I/O, memory etc) will be in the base Foveros tiles.

nicalandia · Jun 19, 2022

ashFTW said:
I measured the die. See the picture above. Assuming the XCC die is 400 mm2, which Is what Intel reported, the area taken up by the cores, and mesh, and cache is 200 mm2 for 15 cores. I’m not going to argue the minuscule difference between 13.3 and 12.5.

That is where is discrepancy is. You need to divide 200 by 16(15 Active cores with a Memory control Tile)

ashFTW said:
If you are going to use Redwood cove size on Intel 4, then you should use 400mm2 to estimate the number of cores.

I believe that Granite Rapids was always intended to be used on the same socket as Sierra Forest and that is the Much Larger Intel Birch Stream-AP which will have a much larger Compute Tile of about 600 mm^2

controlflow · Jun 19, 2022

lobz said:
that's actually quite impressive from the measly Zen 3+ from the rearview mirror

meh

ashFTW · Jun 19, 2022

nicalandia said:
That is where is discrepancy is. You need to divide 200 by 16(15 Active cores with a Memory control Tile)

I wasn't sloppy in my calculation. I multiplied the area of the rectangle with cores and the memory controller by 15/16 to get the area occupied by just the cores. The small discrepancy probably comes from the 400 mm2 die size assumption; Intel only said ~400 mm2.

nicalandia said:
I believe that Granite Rapids was always intended to be used on the same socket as Sierra Forest and that is the Much Larger Intel Birch Stream-AP which will have a much larger Compute Tile of about 600 mm^2

Yes they share the socket. The base tile will be 600 mm2 ish, but you will lose area on it for connectivity to I/O, memory etc. Look at Ponte Vecchio for inspiration. I assumed top tile (or combination of subtitles) to be 500 mm2. I then removed 20% for Foveros power delivery and chipset interconnect to arrive at 400 mm2 for the actual area for the cores.

Exist50 · Jun 19, 2022

lobz said:
It's quite possible that the 256 number is the planned max of Zen 5c, being falsely attributed to Turin.

That would still be a flat doubling over Genoa/Bergamo, which is 96/128c for Zen 4/4c.

And it makes sense for that to be unrealistic. They have a presumably large architectural change (generally means bigger die/more transistors), but only a minor density improvement from N4, and are limited to the same socket. Where would they get the space for double the cores?

To try to get back on topic, Granite Rapids vs Turin is shaping up to be much more interesting than I expected. I was originally thinking we'd see a matchup in late 2023 between RWC-based Granite Rapids on Intel 4 and Zen5-based Turin on N3, which (for a healthy/more normal N3) would be a beatdown. Instead we're going to get rough process parity (Intel 3 vs TSMC 4) and probably Lion Cove vs Zen 5. Should be a much more "even" match up.

As for core count, I'm expecting Turn and Granite Rapids to be pretty similar at the end of the day, probably in the ballpark of 100-150 cores (hopefully towards the top end) for the max config. Also, I have no idea why people are referencing SPR's topology. Intel's shown Granite Rapids's diagrams that are at least close enough.

nicalandia · Jun 19, 2022

Exist50 said:
I have no idea why people are referencing SPR's topology. Intel's shown Granite Rapids's diagrams that are at least close enough.

What are those Diagrams that you speak of?

These ones?

ashFTW · Jun 19, 2022

ashFTW said:
Speculation: Granite Rapids and Sierra Rapids with disaggregated design, on the same platform. A "4-stack" Foveros and co-EMIB base die with all the I/O, memory, and cache. The P and E core tiles may be assemblies of 2 or more smaller chiplets. I expect the max core counts for Granite to be 128, and Sierra to be 3-4 times more.

View attachment 63272

Edit: Fixed Typo (Sierra Rapids to Sierra Forest).

Updated image. At least 32 P-cores and 96 E-cores per tile.

ashFTW · Jun 19, 2022

nicalandia said:
What are those Diagrams that you speak of?

These ones?
View attachment 63311

View attachment 63310

I am working off the assumption** that Granite Rapids (GR), Sierra Forest (SF), and Falcon Shores (FS) will share the same/similar platform, and that it will be a 4-stack Foveros design. That way you build the fewest number of tiles and reuse them across products. A 2-stack design (Fig 1 in your reply) will not be competitive, as it won't be able to host sufficient number of cores for the top end SKUs. GR and SF could be 3 stack (Fig 2 in your reply) designs using the same base tiles as FS, and Intel may do that for power/cost reasons.

** Look at the bottom most x86 only chip!

I think Fig 1, 2 and the FS one below, show evolution in Intel's thinking in response to changing competitive landscape and delays in server releases.

Exist50 · Jun 19, 2022

nicalandia said:
What are those Diagrams that you speak of?

These ones?
View attachment 63311

View attachment 63310

Yes, those.

ashFTW · Jun 19, 2022

Exist50 said:
Yes, those.

I think these figures (2 and 3 stack designs respectively) and the Falcon Shores (4 stack) one that I posted above, show evolution in Intel's thinking regarding Granite Rapids and Sierra Forest, in response to the changing competitive landscape and delays in their server releases.

Exist50 · Jun 19, 2022

ashFTW said:
I think these figures (2 and 3 stack designs respectively) and the Falcon Shores (4 stack) one that I posted above, show evolution in Intel's thinking regarding Granite Rapids and Sierra Forest, in response to the changing competitive landscape and delays in their server releases.

I'm not convinced that Falcon Shores even aligns with with Granite Rapids/Sierra Forest. They didn't give any concrete timing for their latest update. If anything, "Angstrom era process" probably puts it in 2025 at best. I think it's more likely that it coincides with a post-GNR architecture.

Also, with GNR clearly doing poorly schedule/quality-wise, it would be terrible for them to decide to make such major changes to the architecture as you're suggesting.

ashFTW · Jun 19, 2022

Exist50 said:
I'm not convinced that Falcon Shores even aligns with with Granite Rapids/Sierra Forest. They didn't give any concrete timing for their latest update. If anything, "Angstrom era process" probably puts it in 2025 at best. I think it's more likely that it coincides with a post-GNR architecture.

Also, with GNR clearly doing poorly schedule/quality-wise, it would be terrible for them to decide to make such major changes to the architecture as you're suggesting.

These changes were probably locked late last year, after Pat’s review of the roadmaps, giving them 3 years to Granite and Sierra releases In ’24. Ponte Vecchio was done on a similar time schedule. There is no point of continuing to fall short of AMD; bold steps are needed to regain market leadership!

Exist50 · Jun 19, 2022

ashFTW said:
These changes were probably locked late last year, after Pat’s review of the roadmaps, giving them 3 years to Granite and Sierra releases In ’24. Ponte Vecchio was done on a similar time schedule. There is no point of continuing to fall short of AMD; bold steps are needed to regain market leadership!

Yeah, and Ponte Vecchio isn't on schedule either. When you're already behind, the very worst thing to do is to add extra work. One reason SPR is the disaster that it is.

ashFTW · Jun 19, 2022

Exist50 said:
Yeah, and Ponte Vecchio isn't on schedule either. When you're already behind, the very worst thing to do is to add extra work. One reason SPR is the disaster that it is.

You need bold decision making and smart engineering to produce designs that are not only cutting-edge but also parsimonious, so that elements of it can be leveraged across multiple products and market segments, outpacing one’s competitors. I have harped on this for a while.

Exist50 · Jun 19, 2022

ashFTW said:
You need bold decision making and smart engineering to produce designs that are not only cutting-edge but also parsimonious, so that elements of it can be leveraged across multiple products and market segments, outpacing one’s competitors. I have harped on this for a while.

What you propose for Granite Rapids isn't bold; it's reckless. Again, this attitude that you keep changing products in flight because you're behind is what results in disasters like SPR, and GNR is not in a healthier position. Intel needs to show they can execute anything to plan. Then they can worry about the competition.

ashFTW · Jun 19, 2022

Exist50 said:
What you propose for Granite Rapids isn't bold; it's reckless. Again, this attitude that you keep changing products in flight because you're behind is what results in disasters like SPR, and GNR is not in a healthier position. Intel needs to show they can execute anything to plan. Then they can worry about the competition.

Perhaps someone should have said that to AMD leadership, before they came out of the ashes with a bold chiplet based design that could be used up and down, and across client and server product lines. There are times for incremental steps, and then there is time for bigger deltas that can completely change a company’s trajectory. Intel server stuff needs one of these plans now. Without major changes, the 80/20 server market split can soon turn 20/80, especially if AMD can somehow manage to solve their wafer supply constraints.

Anyways, we have a difference of opinion, and that’s ok; I respect what you have said.

Exist50 · Jun 19, 2022

ashFTW said:
Perhaps someone should have said that to AMD leadership, before they came out of the ashes with a bold chiplet based design that could be used up and down, and across client and server product lines

AMD didn't start out with a clean sweep vs Intel. First gen Zen was Broadwell-tier on a per core basis, and Naples competitive in some regards vs Skylake-SP, but by no means the clear recommendation from a performance standpoint. And they were pretty far behind in other markets like mobile. What AMD did was then continue to rapidly build on that advancement to eventually cover for all of their previous weaknesses. Intel needs to do the same. Release a product, on schedule, even if doesn't beat AMD in everything. Then expand scope.

And either way, I'm not sure how you conflate topology with competitiveness. Even on "old" GNR topology, they should still be able to support >100 cores per socket. Can you explain why you think this Foveros layout you propose justifies the extra silicon and packaging cost?

jpiniero · Jun 19, 2022

Intel can't really wait 5 years for 7 nm to get to decent yield. It's either this or go full TSMC.

ashFTW · Jun 19, 2022

Exist50 said:
Release a product, on schedule, even if doesn't beat AMD in everything. Then expand scope

Icelake was already one of those attempts. And Sapphire Rapids, if it had been released last year or even early this year, would have been close if not on-par with EPYC3. The recent delays, as far as i know, are due to severe bugs discovered quite late in the cycle, not due to product definition changes mid stream.

Exist50 said:
And either way, I'm not sure how you conflate topology with competitiveness. Even on "old" GNR topology, they should still be able to support >100 cores per socket. Can you explain why you think this Foveros layout you propose justifies the extra silicon and packaging cost?

The “old” GNR, as far as I can tell, was already a 2-stack Foveros design. Building upon the rationale in my previous posts, which I’m not going to repeat here, one can only get around 80 cores on Intel 3 in 2024! Clearly that’s not competitive. I’m proposing a 4-stack design that shares tiles across several products lines. Far less engineering overall, and similar packaging costs.

Exist50 · Jun 19, 2022

ashFTW said:
The recent delays, as far as i know, are due to severe bugs discovered quite late in the cycle, not due to product definition changes mid stream.

Those two things are extremely tightly correlated.

ashFTW said:
Building upon the rationale in my previous posts, which I’m not going to repeat here, one can only get around 80 cores on Intel 3 in 2024!

But that's simply not true. 2-3 reticle-limit compute tiles linked via EMIB should easily be sufficient to support >100 cores. Foveros isn't necessary at all.

moinmoin · Jun 19, 2022

Exist50 said:
Instead we're going to get rough process parity (Intel 3 vs TSMC 4)

Pat said himself Intel wants to reach process performance-per-watt parity with TSMC in 2024. So I'm not sure we can call anything earlier on par already.

ashFTW said:
Perhaps someone should have said that to AMD leadership, before they came out of the ashes with a bold chiplet based design that could be used up and down, and across client and server product lines.

The thing with AMD is that all the risk was counter-balanced with that flexibility you refer to that actually reduces risk as it allows to use the design wherever it happens to stick to the wall. Intel's designs so far don't appear to have that flexibility (the later tile based designs may or may not get there).

ashFTW said:
Icelake was already one of those attempts.

I don't recall it being on schedule though?

ashFTW · Jun 19, 2022

Exist50 said:
Those two things are extremely tightly correlated

I have no knowledge of this to comment. Please elaborate.

Exist50 said:
But that's simply not true. 2-3 reticle-limit compute tiles linked via EMIB should easily be sufficient to support >100 cores. Foveros isn't necessary at all.

Two 600+ mm2 tiles could reach 100 cores with perfect yield. So, Foveros is not necessary, but do you really want to commit to making such large sized tiles on a new process (Intel 3). I would rather stich together much smaller (say 100mm2) top tiles using Foveros; EMIB seems to take a lot of space on the SPR die, so I’m a bit wary of that. Also going 3D has advantages. If done properly, it should provide shorter paths to I/O and memory interfaces on the base die. It also provides an opportunity for a large cache to be placed there.

And once you look at the Falcon shores design that needs x86 chiplets as well, why not build them that way to start with. Less engineering, faster time to market etc

Edit: I also want to point out that Foveros is not something fringe; with Meteor Lake as the only 14th gen client, it will be mass produced in 10s of millions of units mid ‘23. Intel also has had good experience with very high TDP Foveros designs in Ponte Vecchio and Rialto Bridge. It’s high time to bring these learnings to mainstream server designs as well in ‘24.

Discussion Intel current and future Lakes & Rapids thread

Diamond Member

Senior member

Lifer

Senior member

Diamond Member

Member

Senior member

Platinum Member

Diamond Member

Senior member

Senior member

Platinum Member

Senior member

Platinum Member

Senior member

Platinum Member

Senior member

Platinum Member

Senior member

Platinum Member

Lifer

Senior member

Platinum Member

Diamond Member

Senior member