Stupid question - L2 vs. L3 cache

Feb 25, 2011
16,994
1,622
126
So... I was running memtest on my older C2D laptop tonight, and noticed that it has 3MB of L2 cache, and no L3. My old C2Q rig had 6MB of L2 cache, also no L3.

My newer i5 laptop has 256k L2 and 3MB of L3. My i5 desktop has 6MBs of L3, and 256k of L2 per core.

So I consulted Wikipedia. Athlons and Phenoms followed a similar pattern - Athlon 64s had 1-2MB L2 per core. Phenom II x6 had 512kb per core and a big L3.

Why is the newer arrangement better? Is it a per-core vs. shared cache thing? Am I right in assuming since it's all on die there's no speed hit from going down a level? (Or at least not much.)

Please use small words. I was a liberal arts major.
 

pantsaregood

Senior member
Feb 13, 2011
993
37
91
Core 2 shared L2 caches between pairs of cores. K8 and K10 both use per-core L2 caches, and K10 adds on a shared L3 cache.

Nehalem, Sandy Bridge, and Haswell follow the same approach as K10 does.

Higher cache levels tend to be significantly slower than lower cache levels, despite all being on-die.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
The point of the last level cache (LLC), whether it is an L2 or L3, is to reduce the number of DRAM accesses (which are about 10x slower than cache hits). So having a bigger and bigger LLC will avoid a larger number of DRAM accesses (which is good), but as the size increases, the access latency also increases (which is bad, and defeats the whole point of the LLC in the first place). Sticking a medium-sized L2 in between the L1 and LLC is done to bridge the gap between the tradeoffs of long latency and more DRAM accesses.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
Large L2 is faster than a large L3, all other things being equal. Unfortunately, all other things are NOT equal, when comparing the Core2Duo/Quad with newer architectures with L3 cache, so it's hard to directly compare.

But some cache-heavy benchmarks, actually favor the Q9650 with 12MB of L2, as compared to the 3770K with 8MB of L3. (Well, so I've heard.)

The minimalistic L2 with a large shared L3 is cheaper, that's for certain.
 

pelov

Diamond Member
Dec 6, 2011
3,510
6
0
The point of the last level cache (LLC), whether it is an L2 or L3, is to reduce the number of DRAM accesses (which are about 10x slower than cache hits). So having a bigger and bigger LLC will avoid a larger number of DRAM accesses (which is good), but as the size increases, the access latency also increases (which is bad, and defeats the whole point of the LLC in the first place). Sticking a medium-sized L2 in between the L1 and LLC is done to bridge the gap between the tradeoffs of long latency and more DRAM accesses.

This ^.

Yes, there's a speed decrease in going up to LLC, but how much of a speed hit depends on the architecture involved. For example, AMD's AM3+ chips have an L3 that runs at a separate speed that's decoupled from the cores. This has some to do with the HyperTransport (an FSB replacement) that allows for cache coherency, and if I'm not mistaken determines the speed of the L3 as well(?). Intel has also decoupled the L3$ in Haswell and previously in Nehalem, but both Sandy and Ivy had coupled L3$ that ran the same speed as the cores. This explanation gets even messier when we consider an on-die GPU accessing that same L3, so let's just ignore that ;P

Big L3 is generally a server feature and doesn't benefit client workloads all that much. It can help in gaming, but the benefits are in the single digit percentages.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,682
2,568
136
Please use small words. I was a liberal arts major.

Eh. Let's explain the entire point of multi-level caches.

SRAM cells, of which cache is made, are very fast to read. As in, you can do it in less than a tenth of a clock. If it's next to your alu, you can treat it as instant. What is not fast is getting the read command to the cell and back. This means that as you make a cache bigger, it necessarily becomes slower, as larger caches take more space on die so they signal in them must travel over a longer distance.

So, it seems that you have a choice of a fast small cache, or a slow large one. This sucks, because you absolutely need a fast cache to keep the ALUs fed, but if your cache is very small, it's hit rate will be abysmal and you will spend almost all your time waiting for memory.

So to solve this, we use a small fast cache *and* a large, slow one. This is the basic architecture used in most cpus since 80486 in -89.

However, as more transistors became available, and the second cache level grew, eventually it would have gotten so slow that making it even bigger and slower would have made performance worse. So, at this point they added a third cache level, between the large slow one and the fast small one that's medium in size and speed. So, this allowed the L3 to get bigger and slower (and be shared for more efficient use), and still have a reasonably fast cache level for data that spills L1.
 

podspi

Golden Member
Jan 11, 2011
1,982
102
106
And isn't crystalwell technically a 4th-level cache now? I wonder if they will have crystalwell Xenon SKUs...
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Is it a per-core vs. shared cache thing?
Yes. In a Nehalem or newer Intel CPU, the combination of L1 and L2, for each core, can be considered like the L1 from prior CPUs, as a tightly-integrated part of each core.

On AMD's side, much of it is historical, and they have to be careful about what they spend on. They were deriving the CPUs up to Llano from the K8, which owes a lot to the K7, so they preferred to tack things on to the K8, like L3, v. changing the cache hierarchy closer to the main logic (all models having some shared L3, instead of bigger exclusive L2s, would probably have been better, but would have cost more to develop, too).

I think pretty much everything else has been covered.
 
Last edited: