Discussion Intel current and future Lakes & Rapids thread

Page 766 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Any roadmap changes to Emerald Rapids, Granite Rapids and Sierra Forest?
Intel's main server team, and I can't believe I'm saying this, actually seems to be in a decent enough state now, from what I've been hearing. If nothing else, far better than the GPU team, low bar though that may be. I doubt we'll see significant shifts in those product timelines at this point. And in terms of cost cutting, they would sacrifice pretty much anything else first.
 

Geddagod

Golden Member
Dec 28, 2021
1,057
888
96
Is it possible for the core in Granite Rapids to utilize quite a bit more HD cells into their design to reduce their footprint and make the what, 128? core GNR across 3 tiles seem much more plausible?
Also is RWC vs GLC a fair comparison on area to area considering that RWC could only be built out of Intel 4 HP cells, or does GLC primarily use Intel 7 HP cells as well?
GLC using Intel 7 HP cells primarily would make sense with how they are able to keep clocks higher than Zen 4 on TSMC 5nm, a newer and much more mature node, however it is also hard to reconcile that with the thought that HD cells would constitute most of the core with HP or UHP cells only being in critical blocks.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Is it possible for the core in Granite Rapids to utilize quite a bit more HD cells into their design to reduce their footprint and make the what, 128? core GNR across 3 tiles seem much more plausible?

If they weren't planning that from the beginning you mean. The density gains of Intel 4 is based on the HP library only, while Intel 3 should offer some improvement there, plus availability of more libraries.

The Granite Rapids package is large, very large. So physically there is enough space to put 128 cores over 3 tiles. And the non-core portions are largely being moved to other tiles, so there's a big reduction there. Even on Intel 7 the Golden Cove server core is at most about 15mm2, so only about 230mm2 is the core in Sapphire Rapids.

Bump the size back up to the 450mm2 range, shrink with Intel 3, and cores not being a radical expansion but some targetted ones(plus some from RWC+ in Granite Rapids), move some I/O off to separate tiles and it's very doable, especially if you consider they might push it closer to the 500mm2 range. Also some of us speculate that due to high clock needs, the Core line is too big, and if they are bothering to do RWC+ modifications, maybe we'll get bigger than normal shrink.

If we're expecting them to do AMD-style with many 80-100mm2 dies, then sure. But looks like Intel is using the chiplet strategy to go beyond traditional die limitations.

Look at companies that execute better like Nvidia. A die exceeding 800mm2 and no one bats an eye. But 400mm2 on Sapphire Rapids and suddenly the world is over?

GLC using Intel 7 HP cells primarily would make sense with how they are able to keep clocks higher than Zen 4 on TSMC 5nm, a newer and much more mature node,

Intel has been traditionally the leader in making processes for high clocked CPUs, while foundries like TSMC has been better at making lower leakage/higher density designs.

Now, I don't know how much of it is there, but they've been doing that for so long it's pretty much culture.

Regarding maturity, Intel 7, is just an advanced version of the 10nm process.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
@Geddagod Even back in the good old days you couldn't directly compare processes using the naming.

For example Intel themselves admitted that their 20-nm class 22nm process was only 30% increased density over TSMC 28nm(Equivalent Intel 32nm generation). Therefore, TSMC's 20-nm class beat Intel's 20-nm class in density by 50%. Intel's lead of 2-2.5 years was due to them coming out a year or more earlier, plus the transistors performed vastly better.

2-2.5 years = 1.5 year due to coming earlier, 1.5 year due to performance, -0.5 due to density.

(they claimed 4 years lead based on technology introductions, but no one sane would say Intel 22nm is anywhere comparable to TSMC 16nm. The latter is better in almost all aspects)

This means, the all mighty Intel TD team was subject to physics like anyone else. Faster, but lower density. Sure they executed well.

Plus they never competed directly back then, so who cares? There was literally no one outside Intel management that cared about density comparisons to foundries.

Intel 14nm they tried to have the cake and eat it too. And with 10nm they did more. So they failed first, and failed second miserably.
 

DrMrLordX

Lifer
Apr 27, 2000
21,316
10,497
136
Regarding maturity, Intel 7, is just an advanced version of the 10nm process.

Seems like they've had: 10nm (Cannonlake), 10nm+ (IceLake-U/SP), 10SF (TigerLake), 10 ESF/Intel 7 (Alder Lake, possibly Sapphire Rapids), 10 ESF+/Intel 7 "Super" (Raptor Lake, possibly Sapphire Rapids, probably Emerald Lake)
 

Geddagod

Golden Member
Dec 28, 2021
1,057
888
96
If they weren't planning that from the beginning you mean. The density gains of Intel 4 is based on the HP library only, while Intel 3 should offer some improvement there, plus availability of more libraries.

The Granite Rapids package is large, very large. So physically there is enough space to put 128 cores over 3 tiles. And the non-core portions are largely being moved to other tiles, so there's a big reduction there. Even on Intel 7 the Golden Cove server core is at most about 15mm2, so only about 230mm2 is the core in Sapphire Rapids.

Bump the size back up to the 450mm2 range, shrink with Intel 3, and cores not being a radical expansion but some targetted ones(plus some from RWC+ in Granite Rapids), move some I/O off to separate tiles and it's very doable, especially if you consider they might push it closer to the 500mm2 range. Also some of us speculate that due to high clock needs, the Core line is too big, and if they are bothering to do RWC+ modifications, maybe we'll get bigger than normal shrink.

If we're expecting them to do AMD-style with many 80-100mm2 dies, then sure. But looks like Intel is using the chiplet strategy to go beyond traditional die limitations.

Look at companies that execute better like Nvidia. A die exceeding 800mm2 and no one bats an eye. But 400mm2 on Sapphire Rapids and suddenly the world is over?



Intel has been traditionally the leader in making processes for high clocked CPUs, while foundries like TSMC has been better at making lower leakage/higher density designs.

Now, I don't know how much of it is there, but they've been doing that for so long it's pretty much culture.

Regarding maturity, Intel 7, is just an advanced version of the 10nm process.
I don't think they were planning on GNR cores using HD cells... like at all. Remember, GNR was originally on Intel 4. Intel 4 only has HP cells.
AMD is also using chiplets to go beyond traditional die limitations, just with smaller chiplets for better economics.
SPR tiles being over 400mm2 is terrible for Intel though. Because profit margins are going to suck. Say what you want about Nvidia, but their GPUs sell for insanely high profit margins because their software lock in and lack of AMD competition in server, so they could afford to build those massive 800mm^2 dies. Intel doesn't have that luxury.
 

Geddagod

Golden Member
Dec 28, 2021
1,057
888
96
If they weren't planning that from the beginning you mean. The density gains of Intel 4 is based on the HP library only, while Intel 3 should offer some improvement there, plus availability of more libraries.

The Granite Rapids package is large, very large. So physically there is enough space to put 128 cores over 3 tiles. And the non-core portions are largely being moved to other tiles, so there's a big reduction there. Even on Intel 7 the Golden Cove server core is at most about 15mm2, so only about 230mm2 is the core in Sapphire Rapids.

Bump the size back up to the 450mm2 range, shrink with Intel 3, and cores not being a radical expansion but some targetted ones(plus some from RWC+ in Granite Rapids), move some I/O off to separate tiles and it's very doable, especially if you consider they might push it closer to the 500mm2 range. Also some of us speculate that due to high clock needs, the Core line is too big, and if they are bothering to do RWC+ modifications, maybe we'll get bigger than normal shrink.

If we're expecting them to do AMD-style with many 80-100mm2 dies, then sure. But looks like Intel is using the chiplet strategy to go beyond traditional die limitations.

Look at companies that execute better like Nvidia. A die exceeding 800mm2 and no one bats an eye. But 400mm2 on Sapphire Rapids and suddenly the world is over?



Intel has been traditionally the leader in making processes for high clocked CPUs, while foundries like TSMC has been better at making lower leakage/higher density designs.

Now, I don't know how much of it is there, but they've been doing that for so long it's pretty much culture.

Regarding maturity, Intel 7, is just an advanced version of the 10nm process.
As for maturity, I shouldn't have said that. What I meant by that is TSMC 5nm is a newer node, sure, but usually newer nodes also have lower F Max as well, because they aren't optimized as well. TSMC 5nm by the time zen 4 was not in that situation.
But if I wanted to be pedantic, I could argue that TSMC 5nm is still more mature than Intel 7 is. Intel 7 might be based on Intel 10nm, sure, but Intel 10nm was horrendously broken and had to have major changes (see superfin) for products that actually yielded well and reached decent clocks to be released. By the time Intel 10ESF was rolling out in Tiger Lake, near the end of 2020, TSMC 5nm had already started mass production half a year earlier. And that's not to mention Intel 10ESF still having problems ramping for Intel, as they first were limited to only quad core Tiger Lake chips with low volume. I think there's merit to this as well. TSMC 5nm faced none of these problems.
 

Geddagod

Golden Member
Dec 28, 2021
1,057
888
96
@Geddagod Even back in the good old days you couldn't directly compare processes using the naming.

For example Intel themselves admitted that their 20-nm class 22nm process was only 30% increased density over TSMC 28nm(Equivalent Intel 32nm generation). Therefore, TSMC's 20-nm class beat Intel's 20-nm class in density by 50%. Intel's lead of 2-2.5 years was due to them coming out a year or more earlier, plus the transistors performed vastly better.

2-2.5 years = 1.5 year due to coming earlier, 1.5 year due to performance, -0.5 due to density.

(they claimed 4 years lead based on technology introductions, but no one sane would say Intel 22nm is anywhere comparable to TSMC 16nm. The latter is better in almost all aspects)

This means, the all mighty Intel TD team was subject to physics like anyone else. Faster, but lower density. Sure they executed well.

Plus they never competed directly back then, so who cares? There was literally no one outside Intel management that cared about density comparisons to foundries.

Intel 14nm they tried to have the cake and eat it too. And with 10nm they did more. So they failed first, and failed second miserably.
I'm aware you can't compare node names between companies, yes...
 

Geddagod

Golden Member
Dec 28, 2021
1,057
888
96
Also just for fun, estimating GNR Tile sizes:
So each tile is rumored to have 44 cores. Let's assume the core is client RWC. First of all, this is a very optimistic approach. The mesh stops take up a bunch of space, and the AMX extension to each core is also not there either. I also believe GLC server has an extra set of EUs not present in GLC or RWC, so that also increases size.
But let's ignore all that. 5.33 (RWC client core +L2) x 44 = ~235mm^2. This isn't including L3 cache, and unfortunately I couldn't find a die shot analysis of RWC including L3$ size, so eyeballing it I'm estimating the 3MB block of L3 on MTL to look like 1/4 the size of the total RWC core, which comes in at 1.33 mm^2. Now, lets keep the L3 amount of RWC server the same as GLC, at 1.875MB so 1.875/3 x 1.33 x 44 = ~37. So around ~275mm^2 at the lowest, low end. Using that figure, I guess we could see it as an improvement over the 400mm^2 of SPR tiles, but nearly 300mm^2 for a new Intel 3 node is, at the very least, suboptimal for profit margins compared to what AMD is doing with their chiplets.
I love theory crafting so I'm going to continue to make up a bunch of bad assumptions and try to estimate the tile size of each GNR tile. The AMX extension looks to be ~0.8mm^2 on GLC, assuming a best case scenario of 2X shrink (though the best case I saw in RWC vs GLC core components was a ~40% shrink in the Int Reg file) we add another 0.4mm^2 to each RWC core for 5.73. The Mesh Agent looks to be around another ~0.4mm^2, so +0.2mm^2 for RWC to 5.93.And ye, about the extra EUs, not sure if RWC has them or not (not mentioned in the die shot analysis) so I won't be adding that at all.
So this would place the shrink of RWC-Server vs GLC-Server (I'm sure the size of this is somewhere on the internet but could not find it so est. based on die shot to be ~10mm^2, not including L3 and power banks) to be a 40% reduction in area.
Now adding those numbers back into the GNR tile area would make it 5.93 x 44 + 37 = ~300mm^2.
But we also forgot to add in the memory controllers which are rumored to be on the compute tiles.
On SPR, the IMC was around the same size of an entire core + L3, so ~12.2 / 2 = 6.1 mm^2, and now DDR5 PHY +6.9mm^2 = ~+15mm^2.
And about EMIB, I'm just completely ignoring that. That should add a bunch to the die size as well.
Btw all my estimations were just done by comparing structures in die shots in diagram.io to est. aprox areas for each unit relative to each other and the known info that the entire SPR client core is 7.123 mm^2 w/o L3$.
Over all, I'm going to just saw if Intel just plopped in RWC into a max 132 core GNR, divided into 3 tiles, I would not be surprised if each compute tile is still not too far away from the die size of each SPR tile. In other words, bye bye profit margins.
 

Geddagod

Golden Member
Dec 28, 2021
1,057
888
96
And before I go to sleep, what do you all think about the rumors that EMR is going to have 5MB L3 per core?
I would believe more cache, but jumping to 5MB L3 per core is a >2.5x increase in L3 amount per core.
If true, that just seems like a huge waste of L3$ on a product that I think is still going to get stomped by Genoa and Genoa/X in most workloads.
This would place it a 320MB of total L3 cache
I mean this would also place it not too far behind the 384MB L3 of Genoa, all while being able to access the shared L3 across chiplets much faster than Genoa can, but even if this set up is more beneficial, the perf/watt is way to far behind for it to matter much anyway imo.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Over all, I'm going to just saw if Intel just plopped in RWC into a max 132 core GNR, divided into 3 tiles, I would not be surprised if each compute tile is still not too far away from the die size of each SPR tile. In other words, bye bye profit margins.

SPR tiles being over 400mm2 is terrible for Intel though. Because profit margins are going to suck.

Let me rephrase that for you: A mediocre product that slots in mid-high range at the best sucks for profit margin.

They made ton of profits with 600mm2 dies. You are overestimating the impact of die size. If they can get it by 2024 with the ~120 core counts then they'll be in a way better position than Sapphire Rapids is because it has ballpark chance at the high end. The estimates of Turin having 192 cores make no sense on what they have revealed. They are closer to 120 than they are to 192.

(Also even 400mm2 for a new process is not that surprising. Historically it was 400mm2 for a Tick, and Tock brought it back up to 600mm2+.)

And consider the die size estimates can easily swing from the armchair engineer speculation by +-20% easily depending on many, many factors. Things like if they are putting in enough uarch changes to make it "Granite-wood Cove", then what if they work on the blocks to make it smaller since the frequency focus won't be insane like on desktop? Or that it's Intel 3 and they say it does have density improvements over Intel 4?

Like we were wrong on Alderlake die being humongous due to Golden Cove server's size. Because even if some of us are engineers, none of us are Intel nor AMD engineers. Nvidia said for Pascal the circuitry was rebuilt for higher frequencies. Who would have theorycrafted that fact?

Golden Cove server doesn't have extra set of execution units. Actually I guess you mean by the second AVX512 unit and yes it does have one. But rest of the architecture is identical.

I don't think they were planning on GNR cores using HD cells... like at all. Remember, GNR was originally on Intel 4. Intel 4 only has HP cells.
AMD is also using chiplets to go beyond traditional die limitations, just with smaller chiplets for better economics.

Intel 4 might have changed it's scope significantly. They moved Granite Rapids off of it, so now they don't have to worry about that. The possibility of being on HD is also something I read about. But I will not say it's a sure thing.

AMD's approach is way, way different. Dozen different tiny dies. Versus many cores as possible in smallest amount of tiles to make it monolithic-esque.

There's more than one way to skin a cat, and CPU engineers see it the same way. Why did AMD go for 4-4-4 decoders when Intel went 4-1-1? Or their distributed port approach versus Intel's unified ports? 64KB L1 with higher latency and lower associativity versus 32KB with lower latency and higher associativity? When Athlon was successful people said that was the way to go. And Core 2 came out with similar things in certain areas but was better despite it. And the E core team does their own thing.

You are saying one way is better, but I am saying it's so far the successful approach, and each have been successful in their own ways.
 
Last edited:

Geddagod

Golden Member
Dec 28, 2021
1,057
888
96
Let me rephrase that for you: A mediocre product that slots in mid-high range at the best sucks for profit margin.
Even if SPR performed as well as Genoa , the choice to make larger "chiplets" is going to cause lower profit margins regardless. Being able to act as a 'monolithic' CPU is the only upside it looks like.
They made ton of profits with 600mm2 dies.
When there was no competition?
You are overestimating the impact of die size.
Larger tiles = worse yields = higher costs.
If they can get it by 2024 with the ~120 core counts then they'll be in a way better position than Sapphire Rapids is because it has ballpark chance at the high end.
Maybe so, but that doesn't change the fact that Intel would like it way more if they were able to shrink the size of their tiles considerably. Like I don't get the point here?
(Also even 400mm2 for a new process is not that surprising. Historically it was 400mm2 for a Tick, and Tock brought it back up to 600mm2+.)
Problem is that chiplets enable to greatly cut down on those costs. Intel is not using perhaps the most significant advantage of chiplets- cost cutting- and instead suffering financially because of using large dies.
And consider the die size estimates can easily swing from the armchair engineer speculation by +-20% easily depending on many, many factors. Things like if they are putting in enough uarch changes to make it "Granite-wood Cove", then what if they work on the blocks to make it smaller since the frequency focus won't be insane like on desktop?
The entire point of that was an estimation of what a 132 core GNR on Intel 4 might have looked like. I'm not claiming that it's going to be the size of actual GNR on Intel 3 with a new core. By seeing how large that would have been, it's pretty clear Intel would like to take advantage of the newer node and new libs in order to shrink the core, again, to reduce die size.
Like we were wrong on Alderlake die being humongous due to Golden Cove server's size. Because even if some of us are engineers, none of us are Intel nor AMD engineers.
I'm curious, who estimated that?
You are saying one way is better, but I am saying it's so far the successful approach, and each have been successful in their own ways.
One way certainly is better for cost. It might not end up being as good for performance, but so far AMD doesn't seem to be struggling there. If Intel is able to produce these large 400m^2 dies and package them while also being competitive, it's still fair to say that AMD's way is better because they cost less to produce.
 
  • Like
Reactions: BorisTheBlade82

Geddagod

Golden Member
Dec 28, 2021
1,057
888
96
Someone might want to double check me on this...
but is Zen 4's L2 cache data array larger than RWC's? It makes little sense to me, especially considering on paper TSMC's 5nm is marginally better than Intel 4's SRAM density but...
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
As soon as™ IFS is semi-independent and pushing for more outside customers I think we'll get such numbers.
Doubt it. Intel 7 isn't even being offered as part of IFS, and even if it was, the fabs tend to keep defect density numbers quite close to their chests. A general rule of thumb, however, is that around 0.5DD is when you see volume production. I'd imagine Intel hit that around Tiger Lake.
 
  • Like
Reactions: Tlh97 and Vattila

Geddagod

Golden Member
Dec 28, 2021
1,057
888
96
Doubt it. Intel 7 isn't even being offered as part of IFS, and even if it was, the fabs tend to keep defect density numbers quite close to their chests. A general rule of thumb, however, is that around 0.5DD is when you see volume production. I'd imagine Intel hit that around Tiger Lake.
TSMC released this half way through 2020
1678095508647.png
Looks like TSMC starts HVM between 0.2 and 0.1 DD.
 

uzzi38

Platinum Member
Oct 16, 2019
2,530
5,430
146
Someone might want to double check me on this...
but is Zen 4's L2 cache data array larger than RWC's? It makes little sense to me, especially considering on paper TSMC's 5nm is marginally better than Intel 4's SRAM density but...
AMD's L2/L3 cache were never all that dense anyway, and now the L2 also includes TSVs for V-Cache as well, so I'm not that surprised.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
And before I go to sleep, what do you all think about the rumors that EMR is going to have 5MB L3 per core?
I would believe more cache, but jumping to 5MB L3 per core is a >2.5x increase in L3 amount per core.
If true, that just seems like a huge waste of L3$ on a product that I think is still going to get stomped by Genoa and Genoa/X in most workloads.
This would place it a 320MB of total L3 cache
I mean this would also place it not too far behind the 384MB L3 of Genoa, all while being able to access the shared L3 across chiplets much faster than Genoa can, but even if this set up is more beneficial, the perf/watt is way to far behind for it to matter much anyway imo.
Just wanted to comment on this. It's certainly an interesting choice gen/gen. Almost makes one think they underspecced the L3 with SPR. Or perhaps it was simply a less disruptive change than adding >64 cores. There will probably be a few niche cases where such a large unified LLC makes EMR an interesting option, but obviously not enough to save it vs Genoa.
 
  • Like
Reactions: Tlh97 and Geddagod

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Even if SPR performed as well as Genoa , the choice to make larger "chiplets" is going to cause lower profit margins regardless. Being able to act as a 'monolithic' CPU is the only upside it looks like.

When there was no competition?

Larger tiles = worse yields = higher costs.

Maybe so, but that doesn't change the fact that Intel would like it way more if they were able to shrink the size of their tiles considerably. Like I don't get the point here?

Problem is that chiplets enable to greatly cut down on those costs. Intel is not using perhaps the most significant advantage of chiplets- cost cutting- and instead suffering financially because of using large dies.

The entire point of that was an estimation of what a 132 core GNR on Intel 4 might have looked like. I'm not claiming that it's going to be the size of actual GNR on Intel 3 with a new core. By seeing how large that would have been, it's pretty clear Intel would like to take advantage of the newer node and new libs in order to shrink the core, again, to reduce die size.

I'm curious, who estimated that?

One way certainly is better for cost. It might not end up being as good for performance, but so far AMD doesn't seem to be struggling there. If Intel is able to produce these large 400m^2 dies and package them while also being competitive, it's still fair to say that AMD's way is better because they cost less to produce.
I think the subject of "optimal" chiplet sizes deserves a bit more nuance. A larger chiplet size decreases yields, yes, but it also means less interconnect overhead (in both area and power) and a larger L3 domain (particularly useful for VM bucketing). AMD's solution is empirically successful, but I don't think it's necessarily the only viable path. And obviously that equilibrium is heavily dependent on what packaging tech is available.

But I agree with IntelUser2000 here in that the specifics of their chiplet implementation isn't Intel's main problem right now. Sure, it weighs on their financials, but if they end up PnP competitive with AMD, they can at least get decent revenue. And obviously, since Intel also fabs them, their effective wafer prices should be substantially cheaper than AMD sees. Though with their talk of an "internal foundry model", that tradeoff might change somewhat.
 

Geddagod

Golden Member
Dec 28, 2021
1,057
888
96
AMD's L2/L3 cache were never all that dense anyway, and now the L2 also includes TSVs for V-Cache as well, so I'm not that surprised.
Good point. Something interesting I found is that with Zen 1? and Zen 2, the L3 cache used to be HCC and it wasn't until Zen 3 that they switched over to a denser HDD SRAM cells. They claimed that resulted in a 14% reduction in area. And funnily enough,1MB of Zen 4's L3 is also ~15% (from my own measurements since I couldn't find that data online) reduction in area compared to their 1MB array of L2.
 
  • Like
Reactions: Tlh97 and Vattila

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
TSMC released this half way through 2020
View attachment 77713
Looks like TSMC starts HVM between 0.2 and 0.1 DD.
Thanks. That's an interesting slide. It's been a while since I heard the 0.5 number, but it encompassed TSMC at the time as well, so I'm curious about the disconnect. Perhaps 0.5 is the earliest possible time, but for TSMC's lead customers (Apple, historically Huawei), they need better, pushing back the actual start of volume production.

Funny enough, I once heard Cannon Lake's DD number some years back. Not going to repeat it precisely, but let's just say that decimal point is going a long way to the right.
 

Geddagod

Golden Member
Dec 28, 2021
1,057
888
96
Just wanted to comment on this. It's certainly an interesting choice gen/gen. Almost makes one think they underspecced the L3 with SPR. Or perhaps it was simply a less disruptive change than adding >64 cores. There will probably be a few niche cases where such a large unified LLC makes EMR an interesting option, but obviously not enough to save it vs Genoa.
1.875MB of L3, combined with 2MB of private? L2 for GLC. SNC had 1.5MB of L3 combined with 1.25MB of shared? L2 cache. Seems like a decent enough uplift from the previous architecture. Weird certainly, that the L3 is smaller than the L2.
Don't private caches have less effective space than a shared cache when cores are working on something that needs the same data? Since the data has to be 'replicated' in both cores private L2 caches but in a shared cache there just has to be one instance of it? Might be a factor in why L2 increased more proportionally than L3 did between generations. Could be totally off base for this though.