Inclusive or Exclusive? Which cache architecture you consider better

gustavo

Senior member
Jul 25, 2001
247
0
0
Which cache architecture in your opinion is better? Considering exclusives allow for a greater amount of data but also more latency. And of course all you can say on the subject

Gustavo.-
 

Lynx516

Senior member
Apr 20, 2003
272
0
0
With modern CPU architectures that feature inteligent prefetch e.t.c the latency involved in accessing L2 Cache (About 3 cycles if I remember correctly) is not realy a hit at all as it is fetching an instruction that will be used in a few instructions time. Also inclusive cahing means that L2 and L1 caches have some information that is the same. As the CPU hits the L1 Cache first then the data in the L2 cache that is replicated is useless due to if it is needed it woudl have been accessed in the L1 cache. Therefore Exclusive is ALWAYS better.
 

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
In the long run, inclusive is better. With exclusive every time you purge something from your L1 that data has to be duplicated to L2 before you can swap something in. Add in L3 and it gets even worse because pushing the line from L1 will make the L2 overflow and have to purge to the L3, etc.

L2 is of course cheaper than L1, or we would just use L1. So adding the little bit extra to L2 to compensate for L1/L2 having duplicate data is no big deal.
 

Sohcan

Platinum Member
Oct 10, 1999
2,127
0
0
Originally posted by: Lynx516
...Therefore Exclusive is ALWAYS better.

The answer isn't so cut-and-dry...as is often the case in computer architecture, the answer is cop-out "it depends."

The natural behavior of a multi-level cache is somewhat inbetween fully inclusive and exclusive; both require some effort to maintain. For full inclusitivity, it typically means ensuring when a cache line is evicted at a higher-level cache, it must be invalidated from the lower levels of cache if they exist. For full exclusitivity, the higher-levels of cache essentially act as a victim buffer for the lower levels of cache, generating a higher amount of traffic as glugglug described.

Exclusitivity is necessary if the L2 is less than 4 to 8 times the size of the L1 (just a general rule-of-thumb); otherwise the duplication of data begins to impact of the L2's local hit rate. If the L2 is around at least 8 times the size of the L1, then the extra effort to maintain exclusitivity may not be worth it given a very minimal increase in L2 hit rate.

The fact that in a fully inclusive hierarchy the L2 (or highest level of cache) contains all cache lines present at lower levels is actually a big advantage for snooping multiprocessors. Multiprocessors have to make sure that all data in caches are up-to-date (coherent)...if one processor writes to a value in its cache, all other copies of that value in other caches have to be invalidated. A snoopy multiprocessor system does this by broadcasting the event on the bus, on which other processors can "snoop" and invalidate data in their caches, if necessary.

For a fully inclusive hierarchy, the processor only needs to check the tags of the highest level of cache (each line in a cache has a tag, which is the upper portion of the line's full address so that the cache can identify it). A fully exclusive hierarchy would normally need to check the tags of all caches in the hierarchy. This is bad for the L1...a snoop would stall the L1 pipeline, which could stall the processor since the L1 is so tightly coupled to the processor pipeline. The solution is to duplicate the tags of the lower levels of cache with those of the highest level, so that the processor can snoop without stalling the L1. Depending on the cache hierarchy, the extra overhead of the duplicated tags can be significant.