skewed associative caches

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Why aren't they more common? Based on the papers I read ("A case for two-way skewed-associative caches", seznec; "Skewed associativity enhances performance predictability", bodin/seznec), it looks like you get a better hit rate with an n-way skewed associative cache than a n-way set associative cache, with similar hardware requirements. The only real problem I see (which one of the papers mentions) has to do with virtual memory if you are using small page sizes, but a workaround was presented.
 

SuperTool

Lifer
Jan 25, 2000
14,000
2
0
cliff notes please. what's skewed associative. Maybe there are issues with building multiprocessor systems and cache coherency.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: SuperTool
cliff notes please. what's skewed associative. Maybe there are issues with building multiprocessor systems and cache coherency.

Know what a set associative cache is? Similar idea to set associative. In 2-way set associative, if address a, b and c all map to set s, you're not going to be able to hold all 3. With skewed associativity, each "bank" (there would be 2 in this case) is indexed by a different function, so address a might map to set s in bank 1, and set t in bank 2. Address b might also map to set s in bank 1, but set u in bank 2, and so on. Now a, b and c can all be in cache at the same time.
 

Lynx516

Senior member
Apr 20, 2003
272
0
0
Ah ok I see. Probably adds a clock of latency while it looks up in a table the address locations. Also could be hard to predict which sets will be the busyest
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: rjain
Lynx: no different from set-associative caches, tho.

Well, the indexing functions usually involve 1 2-input XOR gate delay (at least for the ones they considered). It's not much more of a delay.
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
No, it is different. With set-associative all the addresses are mapped with the same function. With skewed associative each bank (or whatever) is mapped differently. You'd have to either have an extra layer of abstraction (another lookup table, etc.) or build different lookup hardware into each bank, as you can no longer use the same translation for all addresses. It's probably just not worth the hassle -- if normal set-associative caches regularly get hit rates above 90%, then a whole lot of work and expense to get it up to maybe 95% isn't really worth it for most applications, especially if it limits the maximum clock of the processor. That's how I see it, anyway.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: Matthias99
No, it is different. With set-associative all the addresses are mapped with the same function. With skewed associative each bank (or whatever) is mapped differently. You'd have to either have an extra layer of abstraction (another lookup table, etc.) or build different lookup hardware into each bank, as you can no longer use the same translation for all addresses.

Right... for bank 0, you index it with, say, the last n bits. Bank 1 you index with the last n bits, but XOR some of them with higher-order bits. Bank 2 XORs different higher-order bits. It isn't complicated hardware.
 

Lynx516

Senior member
Apr 20, 2003
272
0
0
Anyway prdicting which addresses will be used most regulary is not very easy for a PC.
 

MrScott81

Golden Member
Aug 31, 2001
1,891
0
76
isn't the cache hit rate usually way over 99% already? or am i mistaken...if it is, it's not going to help very much, if at all, after the extra logic is included.
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: CTho9305
Originally posted by: Matthias99
No, it is different. With set-associative all the addresses are mapped with the same function. With skewed associative each bank (or whatever) is mapped differently. You'd have to either have an extra layer of abstraction (another lookup table, etc.) or build different lookup hardware into each bank, as you can no longer use the same translation for all addresses.

Right... for bank 0, you index it with, say, the last n bits. Bank 1 you index with the last n bits, but XOR some of them with higher-order bits. Bank 2 XORs different higher-order bits. It isn't complicated hardware.

I feel like it probably *does* get complicated when you're dealing with circuits on nanosecond-scale clocks. Most caches have to run at processor speed, which nowadays is ~2GHz -- and the simpler you make the cache logic, the faster you'll be able to run it. I know high-speed clock propogation is difficult -- maybe they had issues placing gates at that particular point in the architecture. A slight increase in clockspeed will help a modern processor more than a slight improvement in cache hit ratio for the most part. Basically, I'm sure Intel did the math and found it wasn't worth it. Who knows what you'll see in the next generation of processors, though...
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: Matthias99
Originally posted by: CTho9305
Originally posted by: Matthias99
No, it is different. With set-associative all the addresses are mapped with the same function. With skewed associative each bank (or whatever) is mapped differently. You'd have to either have an extra layer of abstraction (another lookup table, etc.) or build different lookup hardware into each bank, as you can no longer use the same translation for all addresses.

Right... for bank 0, you index it with, say, the last n bits. Bank 1 you index with the last n bits, but XOR some of them with higher-order bits. Bank 2 XORs different higher-order bits. It isn't complicated hardware.

I feel like it probably *does* get complicated when you're dealing with circuits on nanosecond-scale clocks. Most caches have to run at processor speed, which nowadays is ~2GHz -- and the simpler you make the cache logic, the faster you'll be able to run it. I know high-speed clock propogation is difficult -- maybe they had issues placing gates at that particular point in the architecture. A slight increase in clockspeed will help a modern processor more than a slight improvement in cache hit ratio for the most part. Basically, I'm sure Intel did the math and found it wasn't worth it. Who knows what you'll see in the next generation of processors, though...

This is true, but if you read the papers, it looks like a 2-way skewed associative cache is as good as a 4-way set associative cache (and a more recent paper looked at ~16-way set associative caches vs. less associative skewed caches), and adding sets also adds complexity. If you currently had a 16-way associative cache, you might be able to do a 8-way skewed associative cache, adding a penalty of an XOR, and removing 4 comparators and shrinking a mux.
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Not impossible -- just too much overhead. An approximate solution works well enough in practice and saves significant time and circuit area. It's a very common tradeoff in computer science. :)
 

rjain

Golden Member
May 1, 2003
1,475
0
0
Yep, LRU is an approximation for LSU (least-soon-to-be-used, the ideal cache eviction policy) anyway, so a slightly less perfect but much more efficient policy is usually preferable. The whole reason we have caches is to reduce latency. If we can reduce latency in the cache itself, we benefit. With hit ratios around 99%, the eviction methods are already so good. See also: Amdahl's law.

The same kinds of issues come up in process scheduling, too.