Question regarding cache (AMD)

Thunder 57 · Apr 11, 2019

AMD finally seems to have a good cache structure all around with Zen, something they have not had in quite some time. Bulldozer was a mess all over, whereas Phenom had good L1 and L2 caches, but L3 was never overly great.

So my question is, what was with the Phenom II's 48-way L3 cache, and BD's 64 (!) way L3? I've never seen anywhere near that many sets before or since. Did that contribute to their horrid latency? Was there some reasoning behind doing this?

Unrelated but fun trivia since I was reading the CPU upgrade history thread, regarding Duron. Duron had more L1 cache than L2 cache. Talk about odd!

EDIT, I just looked up the first K10 with the 2MB cache, it was 32-way. That seems closer to normal, except it seems like a lot of ways for such a small cache.

VirtualLarry · Apr 11, 2019

I'm not a cache expert, but I think that the "ways", basically is the total number of cache lines / segments, that can be stored in the cache, with the SAME valued address bits. So, it's kind of like a function of how many address bits that they want to use for the cache tags, etc., as well. Less address bits, means more aliasing, means, need more "ways" to allow the cache to be more effective.

Newer CPUs, seem to contain more address bits, so therefore need less "ways".

Then, there's the whole "Machine Intelligence" in Ryzen, which is used for Branch Prediction, I know that much. It might be used somehow for cache support as well, I don't know. (Maybe someone that is more expert with Zen architecture will chime in here.)

Nothingness · Apr 12, 2019

The ways are accessed in parallel when looking for an address in the cache (tag lookup). This means that the more ways you have the more power you burn (though that can be reduced with way prediction). But having more ways is a good way (pun intended) to reduce conflicts (and also aliasing for virtually indexed caches), so you need to make a trade off between cache efficiency and power.

The Wikipedia entry is a good read: https://en.wikipedia.org/wiki/CPU_cache

Thunder 57 · Apr 12, 2019

Nothingness said:
The ways are accessed in parallel when looking for an address in the cache (tag lookup). This means that the more ways you have the more power you burn (though that can be reduced with way prediction). But having more ways is a good way (pun intended) to reduce conflicts (and also aliasing for virtually indexed caches), so you need to make a trade off between cache efficiency and power.

The Wikipedia entry is a good read: https://en.wikipedia.org/wiki/CPU_cache

OK, that makes sense. So from the sounds of it the number of ways shouldn't effect latency, but power? I'm still curious as to why they went with so many ways, while Intel never did, and with Zen they are back down to 16 for L3. Maybe a learning process? Or maybe it was deemed better for those architectures?

I still don't understand why Phenom/BD had such horrid L3 latency, but I guess no one really knows.

Nothingness · Apr 13, 2019

Latency will be slightly increased by the number of ways (more wires, and more comparators to select from to see if there's a hit), but by how much, I don't know enough to say

For the choice of the number of ways for L3 I'm afraid I don't know.

BigDaveX · Apr 13, 2019

Thunder 57 said:
I still don't understand why Phenom/BD had such horrid L3 latency, but I guess no one really knows.

Historically, AMD's cache architectures have generally been slower than Intel's. Athlon and Athlon XP both had strong L1 caches, but their L2 caches were pretty weaksauce. Athlon 64's, while faster, was still slower than those of its Intel rivals, but it was cancelled out by the stupidly efficient on-board memory controller.

Phenom suffered because its last-level cache was L3, not particularly big, and clocked a good deal slower than the cores; Core 2's LLC was L2, 2x-3x the size (not counting the split L2 cache on the Core 2 Quad), and clocked at the same speed as the core. Phenom II was actually a lot more comparable with the first-gen i7s in terms of cache structure, but the i7 had too many other advantages.

As for Bulldozer... well, what didn't AMD screw up there?

rvborgh · Apr 20, 2019

i heard that the L3 in Phenom was more optimized for servers. If you check out the A8-3870K you can see that getting rid of the L3 didn't affect much. i also read a while ago that the L3 determined the upper bounds of write performance for Phenom. Someone correct me if i am wrong.

Search

Question regarding cache (AMD)

Thunder 57

Diamond Member

VirtualLarry

No Lifer

Nothingness

Diamond Member

Thunder 57

Diamond Member

Nothingness

Diamond Member

BigDaveX

Senior member

rvborgh

Member

TRENDING THREADS