Originally posted by: Martimus
I believe that AMD would use inclusive cache just like Intel if it had a better process and could just add the cache without using up much space, but they can't if they are going to compete with Intel.
.
.
.
Also here is the link to the Anandtech Article I suggested Earlier:
TLB Errata Explanation
I'm inclined to believe that is the most straightforward reason from an Occam's razor standpoint - it comes down to cache density and total effective cache size. It is interestingly coincidental that both Shanghai and Nehalem result in nearly identical total effective cache available to a given core.
And thanks for the link. I did google for the 9850 (as well as a bunch of cache articles) but was not having much luck in finding a "dumb enough but not too dumb" explanation. Probably because I am not asking for the definition of the cache styles but rather the pros and cons from the suppliers themselves and of course that is going to result in everything being a matter of opinion. Cache density, your opinion, is a very believable one in the absence of a more coercive opinion.
Originally posted by: Lonyo
The benefit is that if the CPU looks for data in L3 and doesn?t find it, it knows that the data doesn?t exist in any core?s L1 or L2 caches - thereby saving core snoop traffic, which not only improves performance but reduces power consumption as well.
An inclusive cache also prevents core snoop traffic from getting out of hand as you increase the number of cores, something that Nehalem has to worry about given its aspirations of extending beyond 4 cores.
Lonyo that is awesome :beer: Thanks for digging up the links and the relevant quotes. This explanation of the pro's of Nehalem going with an inclusive L3$ is crisp and clear.
Presumably the cons would be that since you are allocating xtors on the die for holding duplicate data you are going to increase manufacturing costs by having a larger die-size (or decrease performance by removing xtors from the budget that would have gone towards implementing another
feature elsewhere in the logic block of the core).
Is that a reasonable presumption?
Originally posted by: Lonyo
The hierarchy in Barcelona works like this:
This one was a little less pro/con and more an explanation of how exclusive cache works. Has AMD, or anyone with an authority on the subject matter, discussed the pro's and con's of exclusive cache on AMD's K10?
Smack me with a wet fish here if this makes no sense, but I vaguely remember reading something long ago about AMD's K10 being setup such that core-to-core communications (as in cache snooping, etc) did not require accessing the L3$ as the closest point of contact to a core, they could directly snoop the L1$ or L2$ to query the contents and that was faster than (if the data were found) to retrieve versus waiting for the slow(er) L3$ to respond and send the data if it was contained therein.
Is this true? If so then could this be a pro for exclusive cache? You get to use all the xtors in your sram for potentially storing 100% unique data (no duplication) and you could have faster core-to-core cache transfers than using an inclusive cache hierarchy as done on Nehalem?
I really find this dichotomy between Intel and AMD to be quite intriguing. Cache density could simply be the Occam's razor here though, its a simple beautiful proposition.
Another theory that cannot be ruled out
a priori is that this is an artificial situation brought about by IP restrictions on one side or the other. There may be patents involved here that Intel has licensed that AMD did not, or vice versa.