l2 cache associtivity(CPU)

xenos500

Senior member
Jul 22, 2003
354
0
0
I have two P4 systems now, a P4 2.53/133/512 and a P4Celeron 2.0/100//128

Ofcourse, i have the celeron running at 133fsb....@2.66ghz

The systems are very competitive actually, to my supprise(given the cost differences)

With research I learned that the P4 has an 8-way associative L2 Cache, and the Celeron has a 4-Way associative cache. What is the real difference here? What is cache associativity?

I started looking at my other systems for specs...and turns out my P3 xeon server has 8-way l2 cache. so now im really curious to hear some information about it.

thanks
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
For sake of this explanation, let's use only 8 bits for addresses (256 byte address space). We'll cache 32 16-bit words (64 bytes cache).

Let's start with the simplest cache: direct mapped. Given an address, we will take that address mod 32 to decide which of the 32 cache lines it belongs in (so address 01001101 would go in cache line 00110 = 6). Note that the last bit is ignored, because we always work with 16 bits / 2 bytes at a time. To check if something is in cache requires comparing only one tag to the address we want (very little hardware). Implementing this cache is simple, because whenever 2 different memory locations share the same 5 bits, only one can be stored at a given time, and when we want to cache a new address, we have no choice but to overwrite the only entry currently in cache where it can be placed. What we sacrifice: if we want to access location 01001101, and then 10001101, we have to drop 01001101 from the cache and replace it with location 10001101. If our program tends to access locations spaced like that, we will have awful cache performance.

The other extreme is a fully associative cache. In this situation, we can put any address in any cache line. To check if we have something in cache or not, we have to compare ALL 32 tags - doing that requires more time and more chip area. Also, when we want to cache a new memory location, we have to decide which of the existing entries to drop, since we are no longer restricted to only 1 possible location. Doing this well requires keeping track of when the various entries were accessed, and finding the least-recently-used (LRU) entry. That also eats a lot of hardware. However, we can cache ANY 32 words in memory at any access pattern, and the hit rate will be extremely high. Note that completely-accurate LRU is too difficult, so usually there are approximations that are used.

Set associativity is the happy middle. In an n-way set associative cache, a given memory location can be put in one of n locations. The LRU hardware is simpler because you're only keeping track of a few entries for each set. While you can't cache ANY 32 words, you can cache a lot of the subsets. You only need n compares for each lookup, so the hardware there isn't too bad either. More associativity is generally faster, but costs more.