Stupid question - L2 vs. L3 cache

dave_the_nerd · Jul 17, 2013

So... I was running memtest on my older C2D laptop tonight, and noticed that it has 3MB of L2 cache, and no L3. My old C2Q rig had 6MB of L2 cache, also no L3.

My newer i5 laptop has 256k L2 and 3MB of L3. My i5 desktop has 6MBs of L3, and 256k of L2 per core.

So I consulted Wikipedia. Athlons and Phenoms followed a similar pattern - Athlon 64s had 1-2MB L2 per core. Phenom II x6 had 512kb per core and a big L3.

Why is the newer arrangement better? Is it a per-core vs. shared cache thing? Am I right in assuming since it's all on die there's no speed hit from going down a level? (Or at least not much.)

Please use small words. I was a liberal arts major.

pantsaregood · Jul 18, 2013

Core 2 shared L2 caches between pairs of cores. K8 and K10 both use per-core L2 caches, and K10 adds on a shared L3 cache.

Nehalem, Sandy Bridge, and Haswell follow the same approach as K10 does.

Higher cache levels tend to be significantly slower than lower cache levels, despite all being on-die.

sefsefsefsef · Jul 18, 2013

The point of the last level cache (LLC), whether it is an L2 or L3, is to reduce the number of DRAM accesses (which are about 10x slower than cache hits). So having a bigger and bigger LLC will avoid a larger number of DRAM accesses (which is good), but as the size increases, the access latency also increases (which is bad, and defeats the whole point of the LLC in the first place). Sticking a medium-sized L2 in between the L1 and LLC is done to bridge the gap between the tradeoffs of long latency and more DRAM accesses.

VirtualLarry · Jul 18, 2013

Large L2 is faster than a large L3, all other things being equal. Unfortunately, all other things are NOT equal, when comparing the Core2Duo/Quad with newer architectures with L3 cache, so it's hard to directly compare.

But some cache-heavy benchmarks, actually favor the Q9650 with 12MB of L2, as compared to the 3770K with 8MB of L3. (Well, so I've heard.)

The minimalistic L2 with a large shared L3 is cheaper, that's for certain.

pelov · Jul 18, 2013

sefsefsefsef said:
The point of the last level cache (LLC), whether it is an L2 or L3, is to reduce the number of DRAM accesses (which are about 10x slower than cache hits). So having a bigger and bigger LLC will avoid a larger number of DRAM accesses (which is good), but as the size increases, the access latency also increases (which is bad, and defeats the whole point of the LLC in the first place). Sticking a medium-sized L2 in between the L1 and LLC is done to bridge the gap between the tradeoffs of long latency and more DRAM accesses.

This ^.

Yes, there's a speed decrease in going up to LLC, but how much of a speed hit depends on the architecture involved. For example, AMD's AM3+ chips have an L3 that runs at a separate speed that's decoupled from the cores. This has some to do with the HyperTransport (an FSB replacement) that allows for cache coherency, and if I'm not mistaken determines the speed of the L3 as well(?). Intel has also decoupled the L3$ in Haswell and previously in Nehalem, but both Sandy and Ivy had coupled L3$ that ran the same speed as the cores. This explanation gets even messier when we consider an on-die GPU accessing that same L3, so let's just ignore that ;P

Big L3 is generally a server feature and doesn't benefit client workloads all that much. It can help in gaming, but the benefits are in the single digit percentages.

Tuna-Fish · Jul 18, 2013

dave_the_nerd said:
Please use small words. I was a liberal arts major.

Eh. Let's explain the entire point of multi-level caches.

SRAM cells, of which cache is made, are very fast to read. As in, you can do it in less than a tenth of a clock. If it's next to your alu, you can treat it as instant. What is not fast is getting the read command to the cell and back. This means that as you make a cache bigger, it necessarily becomes slower, as larger caches take more space on die so they signal in them must travel over a longer distance.

So, it seems that you have a choice of a fast small cache, or a slow large one. This sucks, because you absolutely need a fast cache to keep the ALUs fed, but if your cache is very small, it's hit rate will be abysmal and you will spend almost all your time waiting for memory.

So to solve this, we use a small fast cache *and* a large, slow one. This is the basic architecture used in most cpus since 80486 in -89.

However, as more transistors became available, and the second cache level grew, eventually it would have gotten so slow that making it even bigger and slower would have made performance worse. So, at this point they added a third cache level, between the large slow one and the fast small one that's medium in size and speed. So, this allowed the L3 to get bigger and slower (and be shared for more efficient use), and still have a reasonably fast cache level for data that spills L1.

podspi · Jul 18, 2013

And isn't crystalwell technically a 4th-level cache now? I wonder if they will have crystalwell Xenon SKUs...

dave_the_nerd · Jul 18, 2013

Ok, makes sense. Thanks, everybody.

Cerb · Jul 18, 2013

dave_the_nerd said:
Is it a per-core vs. shared cache thing?

Yes. In a Nehalem or newer Intel CPU, the combination of L1 and L2, for each core, can be considered like the L1 from prior CPUs, as a tightly-integrated part of each core.

On AMD's side, much of it is historical, and they have to be careful about what they spend on. They were deriving the CPUs up to Llano from the K8, which owes a lot to the K7, so they preferred to tack things on to the K8, like L3, v. changing the cache hierarchy closer to the main logic (all models having some shared L3, instead of bigger exclusive L2s, would probably have been better, but would have cost more to develop, too).

I think pretty much everything else has been covered.

Makaveli · Jul 18, 2013

dave_the_nerd said:
Please use small words. I was a liberal arts major.

lol I loved this part made me smile :biggrin:

JoeRambo · Jul 19, 2013

While not liberal arts major friendly at all, this article contains amazing level of detail about cache/memory hierachy and effects.

http://ftp.linux.org.ua/pub/docs/developer/general/cpumemory.pdf

A must for any seriuos programmer.

Search

Stupid question - L2 vs. L3 cache

dave_the_nerd

Lifer

pantsaregood

Senior member

sefsefsefsef

Senior member

VirtualLarry

No Lifer

pelov

Diamond Member

Tuna-Fish

Golden Member

podspi

Golden Member

dave_the_nerd

Lifer

Cerb

Elite Member

Makaveli

Diamond Member

JoeRambo

Golden Member

TRENDING THREADS