skewed associative caches

CTho9305 · Nov 9, 2003

Why aren't they more common? Based on the papers I read ("A case for two-way skewed-associative caches", seznec; "Skewed associativity enhances performance predictability", bodin/seznec), it looks like you get a better hit rate with an n-way skewed associative cache than a n-way set associative cache, with similar hardware requirements. The only real problem I see (which one of the papers mentions) has to do with virtual memory if you are using small page sizes, but a workaround was presented.

CTho9305 · Nov 12, 2003

SuperTool · Nov 12, 2003

cliff notes please. what's skewed associative. Maybe there are issues with building multiprocessor systems and cache coherency.

CTho9305 · Nov 12, 2003

Originally posted by: SuperTool
cliff notes please. what's skewed associative. Maybe there are issues with building multiprocessor systems and cache coherency.

Know what a set associative cache is? Similar idea to set associative. In 2-way set associative, if address a, b and c all map to set s, you're not going to be able to hold all 3. With skewed associativity, each "bank" (there would be 2 in this case) is indexed by a different function, so address a might map to set s in bank 1, and set t in bank 2. Address b might also map to set s in bank 1, but set u in bank 2, and so on. Now a, b and c can all be in cache at the same time.

Lynx516 · Nov 13, 2003

Ah ok I see. Probably adds a clock of latency while it looks up in a table the address locations. Also could be hard to predict which sets will be the busyest

rjain · Nov 13, 2003

Lynx: no different from set-associative caches, tho.

CTho9305 · Nov 13, 2003

Originally posted by: rjain
Lynx: no different from set-associative caches, tho.

Well, the indexing functions usually involve 1 2-input XOR gate delay (at least for the ones they considered). It's not much more of a delay.

Matthias99 · Nov 13, 2003

No, it is different. With set-associative all the addresses are mapped with the same function. With skewed associative each bank (or whatever) is mapped differently. You'd have to either have an extra layer of abstraction (another lookup table, etc.) or build different lookup hardware into each bank, as you can no longer use the same translation for all addresses. It's probably just not worth the hassle -- if normal set-associative caches regularly get hit rates above 90%, then a whole lot of work and expense to get it up to maybe 95% isn't really worth it for most applications, especially if it limits the maximum clock of the processor. That's how I see it, anyway.

CTho9305 · Nov 13, 2003

Originally posted by: Matthias99
No, it is different. With set-associative all the addresses are mapped with the same function. With skewed associative each bank (or whatever) is mapped differently. You'd have to either have an extra layer of abstraction (another lookup table, etc.) or build different lookup hardware into each bank, as you can no longer use the same translation for all addresses.

Right... for bank 0, you index it with, say, the last n bits. Bank 1 you index with the last n bits, but XOR some of them with higher-order bits. Bank 2 XORs different higher-order bits. It isn't complicated hardware.

Lynx516 · Nov 13, 2003

Anyway prdicting which addresses will be used most regulary is not very easy for a PC.

MrScott81 · Nov 13, 2003

isn't the cache hit rate usually way over 99% already? or am i mistaken...if it is, it's not going to help very much, if at all, after the extra logic is included.

Matthias99 · Nov 13, 2003

Originally posted by: CTho9305

Originally posted by: Matthias99
No, it is different. With set-associative all the addresses are mapped with the same function. With skewed associative each bank (or whatever) is mapped differently. You'd have to either have an extra layer of abstraction (another lookup table, etc.) or build different lookup hardware into each bank, as you can no longer use the same translation for all addresses.

Click to expand...

Right... for bank 0, you index it with, say, the last n bits. Bank 1 you index with the last n bits, but XOR some of them with higher-order bits. Bank 2 XORs different higher-order bits. It isn't complicated hardware.

I feel like it probably *does* get complicated when you're dealing with circuits on nanosecond-scale clocks. Most caches have to run at processor speed, which nowadays is ~2GHz -- and the simpler you make the cache logic, the faster you'll be able to run it. I know high-speed clock propogation is difficult -- maybe they had issues placing gates at that particular point in the architecture. A slight increase in clockspeed will help a modern processor more than a slight improvement in cache hit ratio for the most part. Basically, I'm sure Intel did the math and found it wasn't worth it. Who knows what you'll see in the next generation of processors, though...

CTho9305 · Nov 13, 2003

Originally posted by: Matthias99

Originally posted by: CTho9305

Originally posted by: Matthias99
No, it is different. With set-associative all the addresses are mapped with the same function. With skewed associative each bank (or whatever) is mapped differently. You'd have to either have an extra layer of abstraction (another lookup table, etc.) or build different lookup hardware into each bank, as you can no longer use the same translation for all addresses.

Click to expand...

Right... for bank 0, you index it with, say, the last n bits. Bank 1 you index with the last n bits, but XOR some of them with higher-order bits. Bank 2 XORs different higher-order bits. It isn't complicated hardware.

Click to expand...

I feel like it probably *does* get complicated when you're dealing with circuits on nanosecond-scale clocks. Most caches have to run at processor speed, which nowadays is ~2GHz -- and the simpler you make the cache logic, the faster you'll be able to run it. I know high-speed clock propogation is difficult -- maybe they had issues placing gates at that particular point in the architecture. A slight increase in clockspeed will help a modern processor more than a slight improvement in cache hit ratio for the most part. Basically, I'm sure Intel did the math and found it wasn't worth it. Who knows what you'll see in the next generation of processors, though...

This is true, but if you read the papers, it looks like a 2-way skewed associative cache is as good as a 4-way set associative cache (and a more recent paper looked at ~16-way set associative caches vs. less associative skewed caches), and adding sets also adds complexity. If you currently had a 16-way associative cache, you might be able to do a 8-way skewed associative cache, adding a penalty of an XOR, and removing 4 comparators and shrinking a mux.

CTho9305 · Nov 24, 2003

Bump. Looks like true LRU is pretty much impossible.

Matthias99 · Nov 25, 2003

Not impossible -- just too much overhead. An approximate solution works well enough in practice and saves significant time and circuit area. It's a very common tradeoff in computer science.

rjain · Nov 25, 2003

Yep, LRU is an approximation for LSU (least-soon-to-be-used, the ideal cache eviction policy) anyway, so a slightly less perfect but much more efficient policy is usually preferable. The whole reason we have caches is to reduce latency. If we can reduce latency in the cache itself, we benefit. With hit ratios around 99%, the eviction methods are already so good. See also: Amdahl's law.

The same kinds of issues come up in process scheduling, too.

Search

skewed associative caches

CTho9305

Elite Member

CTho9305

Elite Member

SuperTool

Lifer

CTho9305

Elite Member

Lynx516

Senior member

rjain

Golden Member

CTho9305

Elite Member

Matthias99

Diamond Member

CTho9305

Elite Member

Lynx516

Senior member

MrScott81

Golden Member

Matthias99

Diamond Member

CTho9305

Elite Member

CTho9305

Elite Member

Matthias99

Diamond Member

rjain

Golden Member

TRENDING THREADS