Intel's main server team, and I can't believe I'm saying this, actually seems to be in a decent enough state now, from what I've been hearing. If nothing else, far better than the GPU team, low bar though that may be. I doubt we'll see significant shifts in those product timelines at this point. And in terms of cost cutting, they would sacrifice pretty much anything else first.Any roadmap changes to Emerald Rapids, Granite Rapids and Sierra Forest?
Glad to know that they are ready to underwhelm us againIntel's main server team, and I can't believe I'm saying this, actually seems to be in a decent enough state now, from what I've been hearing.
Is it possible for the core in Granite Rapids to utilize quite a bit more HD cells into their design to reduce their footprint and make the what, 128? core GNR across 3 tiles seem much more plausible?
GLC using Intel 7 HP cells primarily would make sense with how they are able to keep clocks higher than Zen 4 on TSMC 5nm, a newer and much more mature node,
Regarding maturity, Intel 7, is just an advanced version of the 10nm process.
I don't think they were planning on GNR cores using HD cells... like at all. Remember, GNR was originally on Intel 4. Intel 4 only has HP cells.If they weren't planning that from the beginning you mean. The density gains of Intel 4 is based on the HP library only, while Intel 3 should offer some improvement there, plus availability of more libraries.
The Granite Rapids package is large, very large. So physically there is enough space to put 128 cores over 3 tiles. And the non-core portions are largely being moved to other tiles, so there's a big reduction there. Even on Intel 7 the Golden Cove server core is at most about 15mm2, so only about 230mm2 is the core in Sapphire Rapids.
Bump the size back up to the 450mm2 range, shrink with Intel 3, and cores not being a radical expansion but some targetted ones(plus some from RWC+ in Granite Rapids), move some I/O off to separate tiles and it's very doable, especially if you consider they might push it closer to the 500mm2 range. Also some of us speculate that due to high clock needs, the Core line is too big, and if they are bothering to do RWC+ modifications, maybe we'll get bigger than normal shrink.
If we're expecting them to do AMD-style with many 80-100mm2 dies, then sure. But looks like Intel is using the chiplet strategy to go beyond traditional die limitations.
Look at companies that execute better like Nvidia. A die exceeding 800mm2 and no one bats an eye. But 400mm2 on Sapphire Rapids and suddenly the world is over?
Intel has been traditionally the leader in making processes for high clocked CPUs, while foundries like TSMC has been better at making lower leakage/higher density designs.
Now, I don't know how much of it is there, but they've been doing that for so long it's pretty much culture.
Regarding maturity, Intel 7, is just an advanced version of the 10nm process.
As for maturity, I shouldn't have said that. What I meant by that is TSMC 5nm is a newer node, sure, but usually newer nodes also have lower F Max as well, because they aren't optimized as well. TSMC 5nm by the time zen 4 was not in that situation.If they weren't planning that from the beginning you mean. The density gains of Intel 4 is based on the HP library only, while Intel 3 should offer some improvement there, plus availability of more libraries.
The Granite Rapids package is large, very large. So physically there is enough space to put 128 cores over 3 tiles. And the non-core portions are largely being moved to other tiles, so there's a big reduction there. Even on Intel 7 the Golden Cove server core is at most about 15mm2, so only about 230mm2 is the core in Sapphire Rapids.
Bump the size back up to the 450mm2 range, shrink with Intel 3, and cores not being a radical expansion but some targetted ones(plus some from RWC+ in Granite Rapids), move some I/O off to separate tiles and it's very doable, especially if you consider they might push it closer to the 500mm2 range. Also some of us speculate that due to high clock needs, the Core line is too big, and if they are bothering to do RWC+ modifications, maybe we'll get bigger than normal shrink.
If we're expecting them to do AMD-style with many 80-100mm2 dies, then sure. But looks like Intel is using the chiplet strategy to go beyond traditional die limitations.
Look at companies that execute better like Nvidia. A die exceeding 800mm2 and no one bats an eye. But 400mm2 on Sapphire Rapids and suddenly the world is over?
Intel has been traditionally the leader in making processes for high clocked CPUs, while foundries like TSMC has been better at making lower leakage/higher density designs.
Now, I don't know how much of it is there, but they've been doing that for so long it's pretty much culture.
Regarding maturity, Intel 7, is just an advanced version of the 10nm process.
I'm aware you can't compare node names between companies, yes...@Geddagod Even back in the good old days you couldn't directly compare processes using the naming.
For example Intel themselves admitted that their 20-nm class 22nm process was only 30% increased density over TSMC 28nm(Equivalent Intel 32nm generation). Therefore, TSMC's 20-nm class beat Intel's 20-nm class in density by 50%. Intel's lead of 2-2.5 years was due to them coming out a year or more earlier, plus the transistors performed vastly better.
2-2.5 years = 1.5 year due to coming earlier, 1.5 year due to performance, -0.5 due to density.
(they claimed 4 years lead based on technology introductions, but no one sane would say Intel 22nm is anywhere comparable to TSMC 16nm. The latter is better in almost all aspects)
This means, the all mighty Intel TD team was subject to physics like anyone else. Faster, but lower density. Sure they executed well.
Plus they never competed directly back then, so who cares? There was literally no one outside Intel management that cared about density comparisons to foundries.
Intel 14nm they tried to have the cake and eat it too. And with 10nm they did more. So they failed first, and failed second miserably.
Over all, I'm going to just saw if Intel just plopped in RWC into a max 132 core GNR, divided into 3 tiles, I would not be surprised if each compute tile is still not too far away from the die size of each SPR tile. In other words, bye bye profit margins.
SPR tiles being over 400mm2 is terrible for Intel though. Because profit margins are going to suck.
I don't think they were planning on GNR cores using HD cells... like at all. Remember, GNR was originally on Intel 4. Intel 4 only has HP cells.
AMD is also using chiplets to go beyond traditional die limitations, just with smaller chiplets for better economics.
Even if SPR performed as well as Genoa , the choice to make larger "chiplets" is going to cause lower profit margins regardless. Being able to act as a 'monolithic' CPU is the only upside it looks like.Let me rephrase that for you: A mediocre product that slots in mid-high range at the best sucks for profit margin.
When there was no competition?They made ton of profits with 600mm2 dies.
Larger tiles = worse yields = higher costs.You are overestimating the impact of die size.
Maybe so, but that doesn't change the fact that Intel would like it way more if they were able to shrink the size of their tiles considerably. Like I don't get the point here?If they can get it by 2024 with the ~120 core counts then they'll be in a way better position than Sapphire Rapids is because it has ballpark chance at the high end.
Problem is that chiplets enable to greatly cut down on those costs. Intel is not using perhaps the most significant advantage of chiplets- cost cutting- and instead suffering financially because of using large dies.(Also even 400mm2 for a new process is not that surprising. Historically it was 400mm2 for a Tick, and Tock brought it back up to 600mm2+.)
The entire point of that was an estimation of what a 132 core GNR on Intel 4 might have looked like. I'm not claiming that it's going to be the size of actual GNR on Intel 3 with a new core. By seeing how large that would have been, it's pretty clear Intel would like to take advantage of the newer node and new libs in order to shrink the core, again, to reduce die size.And consider the die size estimates can easily swing from the armchair engineer speculation by +-20% easily depending on many, many factors. Things like if they are putting in enough uarch changes to make it "Granite-wood Cove", then what if they work on the blocks to make it smaller since the frequency focus won't be insane like on desktop?
I'm curious, who estimated that?Like we were wrong on Alderlake die being humongous due to Golden Cove server's size. Because even if some of us are engineers, none of us are Intel nor AMD engineers.
One way certainly is better for cost. It might not end up being as good for performance, but so far AMD doesn't seem to be struggling there. If Intel is able to produce these large 400m^2 dies and package them while also being competitive, it's still fair to say that AMD's way is better because they cost less to produce.You are saying one way is better, but I am saying it's so far the successful approach, and each have been successful in their own ways.
Larger tiles = worse yields = higher costs.
As soon as™ IFS is semi-independent and pushing for more outside customers I think we'll get such numbers.It would be nice to know the actual yield data on 10ESF/Intel 7.
Doubt it. Intel 7 isn't even being offered as part of IFS, and even if it was, the fabs tend to keep defect density numbers quite close to their chests. A general rule of thumb, however, is that around 0.5DD is when you see volume production. I'd imagine Intel hit that around Tiger Lake.As soon as™ IFS is semi-independent and pushing for more outside customers I think we'll get such numbers.
TSMC released this half way through 2020Doubt it. Intel 7 isn't even being offered as part of IFS, and even if it was, the fabs tend to keep defect density numbers quite close to their chests. A general rule of thumb, however, is that around 0.5DD is when you see volume production. I'd imagine Intel hit that around Tiger Lake.
AMD's L2/L3 cache were never all that dense anyway, and now the L2 also includes TSVs for V-Cache as well, so I'm not that surprised.Someone might want to double check me on this...
but is Zen 4's L2 cache data array larger than RWC's? It makes little sense to me, especially considering on paper TSMC's 5nm is marginally better than Intel 4's SRAM density but...
Just wanted to comment on this. It's certainly an interesting choice gen/gen. Almost makes one think they underspecced the L3 with SPR. Or perhaps it was simply a less disruptive change than adding >64 cores. There will probably be a few niche cases where such a large unified LLC makes EMR an interesting option, but obviously not enough to save it vs Genoa.And before I go to sleep, what do you all think about the rumors that EMR is going to have 5MB L3 per core?
I would believe more cache, but jumping to 5MB L3 per core is a >2.5x increase in L3 amount per core.
If true, that just seems like a huge waste of L3$ on a product that I think is still going to get stomped by Genoa and Genoa/X in most workloads.
This would place it a 320MB of total L3 cache
I mean this would also place it not too far behind the 384MB L3 of Genoa, all while being able to access the shared L3 across chiplets much faster than Genoa can, but even if this set up is more beneficial, the perf/watt is way to far behind for it to matter much anyway imo.
I think the subject of "optimal" chiplet sizes deserves a bit more nuance. A larger chiplet size decreases yields, yes, but it also means less interconnect overhead (in both area and power) and a larger L3 domain (particularly useful for VM bucketing). AMD's solution is empirically successful, but I don't think it's necessarily the only viable path. And obviously that equilibrium is heavily dependent on what packaging tech is available.Even if SPR performed as well as Genoa , the choice to make larger "chiplets" is going to cause lower profit margins regardless. Being able to act as a 'monolithic' CPU is the only upside it looks like.
When there was no competition?
Larger tiles = worse yields = higher costs.
Maybe so, but that doesn't change the fact that Intel would like it way more if they were able to shrink the size of their tiles considerably. Like I don't get the point here?
Problem is that chiplets enable to greatly cut down on those costs. Intel is not using perhaps the most significant advantage of chiplets- cost cutting- and instead suffering financially because of using large dies.
The entire point of that was an estimation of what a 132 core GNR on Intel 4 might have looked like. I'm not claiming that it's going to be the size of actual GNR on Intel 3 with a new core. By seeing how large that would have been, it's pretty clear Intel would like to take advantage of the newer node and new libs in order to shrink the core, again, to reduce die size.
I'm curious, who estimated that?
One way certainly is better for cost. It might not end up being as good for performance, but so far AMD doesn't seem to be struggling there. If Intel is able to produce these large 400m^2 dies and package them while also being competitive, it's still fair to say that AMD's way is better because they cost less to produce.
Good point. Something interesting I found is that with Zen 1? and Zen 2, the L3 cache used to be HCC and it wasn't until Zen 3 that they switched over to a denser HDD SRAM cells. They claimed that resulted in a 14% reduction in area. And funnily enough,1MB of Zen 4's L3 is also ~15% (from my own measurements since I couldn't find that data online) reduction in area compared to their 1MB array of L2.AMD's L2/L3 cache were never all that dense anyway, and now the L2 also includes TSVs for V-Cache as well, so I'm not that surprised.
Thanks. That's an interesting slide. It's been a while since I heard the 0.5 number, but it encompassed TSMC at the time as well, so I'm curious about the disconnect. Perhaps 0.5 is the earliest possible time, but for TSMC's lead customers (Apple, historically Huawei), they need better, pushing back the actual start of volume production.TSMC released this half way through 2020
View attachment 77713
Looks like TSMC starts HVM between 0.2 and 0.1 DD.
1.875MB of L3, combined with 2MB of private? L2 for GLC. SNC had 1.5MB of L3 combined with 1.25MB of shared? L2 cache. Seems like a decent enough uplift from the previous architecture. Weird certainly, that the L3 is smaller than the L2.Just wanted to comment on this. It's certainly an interesting choice gen/gen. Almost makes one think they underspecced the L3 with SPR. Or perhaps it was simply a less disruptive change than adding >64 cores. There will probably be a few niche cases where such a large unified LLC makes EMR an interesting option, but obviously not enough to save it vs Genoa.