Discussion Intel current and future Lakes & Rapids thread

Page 657 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Roughly 3-4 Gracemont cores can fit inside a Goldencove core. I assume you multiplied the 15 cores on Sapphire Rapids (SPR) tile by 4 to reach 60?

Correct, one can fit exactly 4 Gracemont cores on the space of a server class Golden Cove core(larger than client size)

Granite Rapids: So how many P-cores can one fit in 600mm2 (400x1.5) on Intel 7? 15 Golden cores take 200mm2, or 13.3mm2 per core.
Your Numbers are Off.

Xeon based Golden Cove core die area size is 10.5 mm^2 with L3$. 10.5 x 16 = 168 mm^2, the rest(32 mm^2) is used by the Mesh/Ring Bus interconnect. so with L2$ and Ring Bus the total die area is 12.50 mm^2 per core

Lets make the new core size 15mm2 to account for new features. The answer is 40 P-cores per tile, or 160 cores per chip with 4 top tiles.

I don't expect that Intel3 will have the same die area shrinkage that we are seeing from Intel7 to Intel4(based on current information) and I don't expect that Granite Rapids will add much more to the new instruction set provided by Sapphire Rapids

We can extrapolate that Server Class Redwood Core on was going to be 9.37 mm^2 including L3$ and Mesh/Ring interconnect . so using your number to calculate it will be 600/9.37 = Exactly 64 cores per tile and 256 Core per CPU and Sierra Forest has the die area size to fit exactly 10,24 e cores.
 
Last edited:

ashFTW

Senior member
Sep 21, 2020
316
236
126
Your Numbers are Off.

Xeon based Golden Cove core die area size is 10.5 mm^2 with L3$. 10.5 x 16 = 168 mm^2, the rest(32 mm^2) is used by the Mesh/Ring Bus interconnect. so with L2$ and Ring Bus the total die area is 12.50 mm^2 per core
I measured the die. See the picture above. Assuming the XCC die is 400 mm2, which Is what Intel reported, the area taken up by the cores, and mesh, and cache is 200 mm2 for 15 cores. I’m not going to argue the minuscule difference between 13.3 and 12.5.

I don't expect Intel3 to be as big of a jump that we are seeing from Intel7 to Intel4(based on current information) and I don't expect that Granite Rapids will add much more to the new instruction set provided by Sapphire Rapids
I was comparing Intel 7 to Intel 3, to come up with 1.5x density increase. Intel 4 to 3 is expected to be 8-10% denser for the high performance library.

We can extrapolate that Server Class Redwood Core on was going to be 9.37 mm^2 including L3$ and Mesh/Ring interconnect . so using your number to calculate it will be 600/9.37 = Exactly 64 cores per tile and 256 Core per CPU and Sierra Forest has the die area size to fit exactly 10,24 e cores.
If you are going to use Redwood cove size on Intel 4, then you should use 400mm2 to estimate the number of cores. And you will arrive at similar number as me -- 40 cores per tile, and 160 per chip. You need to reread my post.
 

jpiniero

Lifer
Oct 1, 2010
16,493
6,986
136
If you are going to use Redwood cove on Intel 4, then you should use 400mm2 to estimate the number of cores. And you will arrive at similar number as me -- 40 cores per tile, and 160 per chip. You need to reread my post.

I'd say the issue is going to be more power consumption than feasibility. 160 Redwood Cove cores sounds very toasty.
 
  • Like
Reactions: moinmoin and ftt

ashFTW

Senior member
Sep 21, 2020
316
236
126
I'd say the issue is going to be more power consumption than feasibility.
Yes, agree. Thats why I watered it down from 160 to 128. Even then, the max TDP might be as high as 500/600W, as I commented earlier.

Intel can go with fewer top chiplets to make lower TDP parts with fewer cores. Unlike Sapphire Rapids, the platform capability (I/O, memory etc) will be in the base Foveros tiles.
 
Last edited:

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
I measured the die. See the picture above. Assuming the XCC die is 400 mm2, which Is what Intel reported, the area taken up by the cores, and mesh, and cache is 200 mm2 for 15 cores. I’m not going to argue the minuscule difference between 13.3 and 12.5.
That is where is discrepancy is. You need to divide 200 by 16(15 Active cores with a Memory control Tile)


If you are going to use Redwood cove size on Intel 4, then you should use 400mm2 to estimate the number of cores.
I believe that Granite Rapids was always intended to be used on the same socket as Sierra Forest and that is the Much Larger Intel Birch Stream-AP which will have a much larger Compute Tile of about 600 mm^2
 
  • Like
Reactions: Tlh97 and ftt

ashFTW

Senior member
Sep 21, 2020
316
236
126
That is where is discrepancy is. You need to divide 200 by 16(15 Active cores with a Memory control Tile)
I wasn't sloppy in my calculation. I multiplied the area of the rectangle with cores and the memory controller by 15/16 to get the area occupied by just the cores. The small discrepancy probably comes from the 400 mm2 die size assumption; Intel only said ~400 mm2.

I believe that Granite Rapids was always intended to be used on the same socket as Sierra Forest and that is the Much Larger Intel Birch Stream-AP which will have a much larger Compute Tile of about 600 mm^2
Yes they share the socket. The base tile will be 600 mm2 ish, but you will lose area on it for connectivity to I/O, memory etc. Look at Ponte Vecchio for inspiration. I assumed top tile (or combination of subtitles) to be 500 mm2. I then removed 20% for Foveros power delivery and chipset interconnect to arrive at 400 mm2 for the actual area for the cores.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
It's quite possible that the 256 number is the planned max of Zen 5c, being falsely attributed to Turin.
That would still be a flat doubling over Genoa/Bergamo, which is 96/128c for Zen 4/4c.

And it makes sense for that to be unrealistic. They have a presumably large architectural change (generally means bigger die/more transistors), but only a minor density improvement from N4, and are limited to the same socket. Where would they get the space for double the cores?

To try to get back on topic, Granite Rapids vs Turin is shaping up to be much more interesting than I expected. I was originally thinking we'd see a matchup in late 2023 between RWC-based Granite Rapids on Intel 4 and Zen5-based Turin on N3, which (for a healthy/more normal N3) would be a beatdown. Instead we're going to get rough process parity (Intel 3 vs TSMC 4) and probably Lion Cove vs Zen 5. Should be a much more "even" match up.

As for core count, I'm expecting Turn and Granite Rapids to be pretty similar at the end of the day, probably in the ballpark of 100-150 cores (hopefully towards the top end) for the max config. Also, I have no idea why people are referencing SPR's topology. Intel's shown Granite Rapids's diagrams that are at least close enough.
 

ashFTW

Senior member
Sep 21, 2020
316
236
126
Speculation: Granite Rapids and Sierra Rapids with disaggregated design, on the same platform. A "4-stack" Foveros and co-EMIB base die with all the I/O, memory, and cache. The P and E core tiles may be assemblies of 2 or more smaller chiplets. I expect the max core counts for Granite to be 128, and Sierra to be 3-4 times more.

View attachment 63272

Edit: Fixed Typo (Sierra Rapids to Sierra Forest).

Updated image. At least 32 P-cores and 96 E-cores per tile.
GR_SF.jpg
 
Last edited:

ashFTW

Senior member
Sep 21, 2020
316
236
126
What are those Diagrams that you speak of?

These ones?
View attachment 63311

View attachment 63310
I am working off the assumption** that Granite Rapids (GR), Sierra Forest (SF), and Falcon Shores (FS) will share the same/similar platform, and that it will be a 4-stack Foveros design. That way you build the fewest number of tiles and reuse them across products. A 2-stack design (Fig 1 in your reply) will not be competitive, as it won't be able to host sufficient number of cores for the top end SKUs. GR and SF could be 3 stack (Fig 2 in your reply) designs using the same base tiles as FS, and Intel may do that for power/cost reasons.

** Look at the bottom most x86 only chip!

I think Fig 1, 2 and the FS one below, show evolution in Intel's thinking in response to changing competitive landscape and delays in server releases.

1655665527044.png
 
Last edited:

ashFTW

Senior member
Sep 21, 2020
316
236
126
Yes, those.
I think these figures (2 and 3 stack designs respectively) and the Falcon Shores (4 stack) one that I posted above, show evolution in Intel's thinking regarding Granite Rapids and Sierra Forest, in response to the changing competitive landscape and delays in their server releases.
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
I think these figures (2 and 3 stack designs respectively) and the Falcon Shores (4 stack) one that I posted above, show evolution in Intel's thinking regarding Granite Rapids and Sierra Forest, in response to the changing competitive landscape and delays in their server releases.
I'm not convinced that Falcon Shores even aligns with with Granite Rapids/Sierra Forest. They didn't give any concrete timing for their latest update. If anything, "Angstrom era process" probably puts it in 2025 at best. I think it's more likely that it coincides with a post-GNR architecture.

Also, with GNR clearly doing poorly schedule/quality-wise, it would be terrible for them to decide to make such major changes to the architecture as you're suggesting.
 

ashFTW

Senior member
Sep 21, 2020
316
236
126
I'm not convinced that Falcon Shores even aligns with with Granite Rapids/Sierra Forest. They didn't give any concrete timing for their latest update. If anything, "Angstrom era process" probably puts it in 2025 at best. I think it's more likely that it coincides with a post-GNR architecture.

Also, with GNR clearly doing poorly schedule/quality-wise, it would be terrible for them to decide to make such major changes to the architecture as you're suggesting.
These changes were probably locked late last year, after Pat’s review of the roadmaps, giving them 3 years to Granite and Sierra releases In ’24. Ponte Vecchio was done on a similar time schedule. There is no point of continuing to fall short of AMD; bold steps are needed to regain market leadership!
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
These changes were probably locked late last year, after Pat’s review of the roadmaps, giving them 3 years to Granite and Sierra releases In ’24. Ponte Vecchio was done on a similar time schedule. There is no point of continuing to fall short of AMD; bold steps are needed to regain market leadership!
Yeah, and Ponte Vecchio isn't on schedule either. When you're already behind, the very worst thing to do is to add extra work. One reason SPR is the disaster that it is.
 

ashFTW

Senior member
Sep 21, 2020
316
236
126
Yeah, and Ponte Vecchio isn't on schedule either. When you're already behind, the very worst thing to do is to add extra work. One reason SPR is the disaster that it is.
You need bold decision making and smart engineering to produce designs that are not only cutting-edge but also parsimonious, so that elements of it can be leveraged across multiple products and market segments, outpacing one’s competitors. I have harped on this for a while.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
You need bold decision making and smart engineering to produce designs that are not only cutting-edge but also parsimonious, so that elements of it can be leveraged across multiple products and market segments, outpacing one’s competitors. I have harped on this for a while.
What you propose for Granite Rapids isn't bold; it's reckless. Again, this attitude that you keep changing products in flight because you're behind is what results in disasters like SPR, and GNR is not in a healthier position. Intel needs to show they can execute anything to plan. Then they can worry about the competition.
 

ashFTW

Senior member
Sep 21, 2020
316
236
126
What you propose for Granite Rapids isn't bold; it's reckless. Again, this attitude that you keep changing products in flight because you're behind is what results in disasters like SPR, and GNR is not in a healthier position. Intel needs to show they can execute anything to plan. Then they can worry about the competition.
Perhaps someone should have said that to AMD leadership, before they came out of the ashes with a bold chiplet based design that could be used up and down, and across client and server product lines. There are times for incremental steps, and then there is time for bigger deltas that can completely change a company’s trajectory. Intel server stuff needs one of these plans now. Without major changes, the 80/20 server market split can soon turn 20/80, especially if AMD can somehow manage to solve their wafer supply constraints.

Anyways, we have a difference of opinion, and that’s ok; I respect what you have said.
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
Perhaps someone should have said that to AMD leadership, before they came out of the ashes with a bold chiplet based design that could be used up and down, and across client and server product lines
AMD didn't start out with a clean sweep vs Intel. First gen Zen was Broadwell-tier on a per core basis, and Naples competitive in some regards vs Skylake-SP, but by no means the clear recommendation from a performance standpoint. And they were pretty far behind in other markets like mobile. What AMD did was then continue to rapidly build on that advancement to eventually cover for all of their previous weaknesses. Intel needs to do the same. Release a product, on schedule, even if doesn't beat AMD in everything. Then expand scope.

And either way, I'm not sure how you conflate topology with competitiveness. Even on "old" GNR topology, they should still be able to support >100 cores per socket. Can you explain why you think this Foveros layout you propose justifies the extra silicon and packaging cost?
 

ashFTW

Senior member
Sep 21, 2020
316
236
126
Release a product, on schedule, even if doesn't beat AMD in everything. Then expand scope
Icelake was already one of those attempts. And Sapphire Rapids, if it had been released last year or even early this year, would have been close if not on-par with EPYC3. The recent delays, as far as i know, are due to severe bugs discovered quite late in the cycle, not due to product definition changes mid stream.

And either way, I'm not sure how you conflate topology with competitiveness. Even on "old" GNR topology, they should still be able to support >100 cores per socket. Can you explain why you think this Foveros layout you propose justifies the extra silicon and packaging cost?
The “old” GNR, as far as I can tell, was already a 2-stack Foveros design. Building upon the rationale in my previous posts, which I’m not going to repeat here, one can only get around 80 cores on Intel 3 in 2024! Clearly that’s not competitive. I’m proposing a 4-stack design that shares tiles across several products lines. Far less engineering overall, and similar packaging costs.
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
The recent delays, as far as i know, are due to severe bugs discovered quite late in the cycle, not due to product definition changes mid stream.
Those two things are extremely tightly correlated.

Building upon the rationale in my previous posts, which I’m not going to repeat here, one can only get around 80 cores on Intel 3 in 2024!
But that's simply not true. 2-3 reticle-limit compute tiles linked via EMIB should easily be sufficient to support >100 cores. Foveros isn't necessary at all.
 
  • Like
Reactions: Tlh97 and uzzi38

moinmoin

Diamond Member
Jun 1, 2017
5,234
8,442
136
Instead we're going to get rough process parity (Intel 3 vs TSMC 4)
Pat said himself Intel wants to reach process performance-per-watt parity with TSMC in 2024. So I'm not sure we can call anything earlier on par already.

Perhaps someone should have said that to AMD leadership, before they came out of the ashes with a bold chiplet based design that could be used up and down, and across client and server product lines.
The thing with AMD is that all the risk was counter-balanced with that flexibility you refer to that actually reduces risk as it allows to use the design wherever it happens to stick to the wall. Intel's designs so far don't appear to have that flexibility (the later tile based designs may or may not get there).

Icelake was already one of those attempts.
I don't recall it being on schedule though?
 

ashFTW

Senior member
Sep 21, 2020
316
236
126
Those two things are extremely tightly correlated
I have no knowledge of this to comment. Please elaborate.

But that's simply not true. 2-3 reticle-limit compute tiles linked via EMIB should easily be sufficient to support >100 cores. Foveros isn't necessary at all.
Two 600+ mm2 tiles could reach 100 cores with perfect yield. So, Foveros is not necessary, but do you really want to commit to making such large sized tiles on a new process (Intel 3). I would rather stich together much smaller (say 100mm2) top tiles using Foveros; EMIB seems to take a lot of space on the SPR die, so I’m a bit wary of that. Also going 3D has advantages. If done properly, it should provide shorter paths to I/O and memory interfaces on the base die. It also provides an opportunity for a large cache to be placed there.

And once you look at the Falcon shores design that needs x86 chiplets as well, why not build them that way to start with. Less engineering, faster time to market etc

Edit: I also want to point out that Foveros is not something fringe; with Meteor Lake as the only 14th gen client, it will be mass produced in 10s of millions of units mid ‘23. Intel also has had good experience with very high TDP Foveros designs in Ponte Vecchio and Rialto Bridge. It’s high time to bring these learnings to mainstream server designs as well in ‘24.
 
Last edited: