Quad Core Cache Useage

tallman45

Golden Member
May 27, 2003
1,463
0
0
I have a question on the 2 separate caches in a Quad Core processor

What is to say that the same set of data is not stored in each cache if there is no preference on which set of cores processed which instructions

Is it likely that the 4mb (or soon to be 6mb) of separate caches actually stores duplicate data that will not be used, hence reducing its actual effectiveness

OR

Is the same set of instructions directed to the processor that handles that process in a prior step ?
 

tallman45

Golden Member
May 27, 2003
1,463
0
0
Found my answer

Quads poorly handle cached date between the 2 sets of processors

Quad Cache Useage


"Unfortunately, the integration between the QX6700's two chips is less than ideal?
The two chips must coordinate to ensure the sanity of the contents of their respective L2 caches via this bus. That will sometimes mean writing modified data out of one chip's cache into main memory and then reading it back into the other chip's cache?a positively eternal operation in CPU time "
 

pcunite

Senior member
Nov 15, 2007
336
1
76
As a software developer myself (but not an expert on CPU design) I am aware of this limitation. However the value for Quad Core for me personally is the running of many separate processes all doing their own separate tasks (muliple instances of an .exe doing work on thier own copy of a file loaded into memory) and a few executables that break up work across the four cores (a single .exe doing four things to one file loaded into memory).

A single process breaking up its work (job) across four cores and then needing to manage the order or sequencing of a particular job is going to take a hit as you discovered, but it is still better than a singe or dual core would be at the same task assuming the clock speeds were similar. A single core would be faster if the job did not have many break up points. However assuming 100 or more break up points Quad again becomes effective even if a slower clock speed.

I am pulling that last sentence out of the air I must admit but I find it very plausible based on my understanding of operating system (Windows XP) thread switching, Internet Explorer open, background processes working etc.
 

DRavisher

Senior member
Aug 3, 2005
202
0
0
From the few reviews I've seen of AMD Barcelona/Phenom, it seems that AMD's native quad core scales quite a bit better than Intel's in cinebench (I think I saw 3.5 versus 3.9 scaling for Intel and AMD respectively in one benchmark, sorry I don't have any link at hand). But I don't know if the shared L3 cache of Barcelona is the only relevant difference in that benchmark.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
"Quake 4 supports only dual-core processors. Although overclocked Core 2 Duo E6850 works at a little higher frequency than the overclocked Core 2 Quad Q6600, it is the quad-core CPU that wins here. The determinative factor in this case is the two L2 caches with the total capacity of 8MB."
LINK

So although cache isnt as optimally distributed on a non-native Q6600 processor for instance, it still has double the cache than a dual core of the same family which makes up for the inefficiency.
 

myocardia

Diamond Member
Jun 21, 2003
9,291
30
91
Originally posted by: RussianSensation
"Quake 4 supports only dual-core processors. Although overclocked Core 2 Duo E6850 works at a little higher frequency than the overclocked Core 2 Quad Q6600, it is the quad-core CPU that wins here. The determinative factor in this case is the two L2 caches with the total capacity of 8MB."
LINK

So although cache isnt as optimally distributed on a non-native Q6600 processor for instance, it still has double the cache than a dual core of the same family which makes up for the inefficiency.

I had often theorized lately that the reason the Q6600 performed as well, and occasionally better, than the faster E6750 was that some software might be using one core from each pair of E6600 cores that comprise a Q6600, so both cores being used would have double the L2 cache available to them. Thanks for the link.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Originally posted by: myocardia
Originally posted by: RussianSensation
"Quake 4 supports only dual-core processors. Although overclocked Core 2 Duo E6850 works at a little higher frequency than the overclocked Core 2 Quad Q6600, it is the quad-core CPU that wins here. The determinative factor in this case is the two L2 caches with the total capacity of 8MB."
LINK

So although cache isnt as optimally distributed on a non-native Q6600 processor for instance, it still has double the cache than a dual core of the same family which makes up for the inefficiency.

I had often theorized lately that the reason the Q6600 performed as well, and occasionally better, than the faster E6750 was that some software might be using one core from each pair of E6600 cores that comprise a Q6600, so both cores being used would have double the L2 cache available to them. Thanks for the link.

Interesting. Could one possibly "bench" this idea by setting the CPU affinity for the benching application of interest to CPU0 and CPU2 (or any permutation that effects the same) and compare the results to say setting the CPU affinity to either combo of CPU0/CPU1 or CPU2/3?

This would force the application to function as if the system was dual-core (from the apps perspective, of course the OS is distributing its overhead across all four cores during the benching) but you should be able to tease out from the data the relative impact of each thread being forced to share the same 4MB of L2$ versus having "their own" 4MB of L2$ to play within.
 

myocardia

Diamond Member
Jun 21, 2003
9,291
30
91
Originally posted by: Idontcare
Interesting. Could one possibly "bench" this idea by setting the CPU affinity for the benching application of interest to CPU0 and CPU2 (or any permutation that effects the same) and compare the results to say setting the CPU affinity to either combo of CPU0/CPU1 or CPU2/3?

This would force the application to function as if the system was dual-core (from the apps perspective, of course the OS is distributing its overhead across all four cores during the benching) but you should be able to tease out from the data the relative impact of each thread being forced to share the same 4MB of L2$ versus having "their own" 4MB of L2$ to play within.

I honestly don't know if it's possible to set the affinity that way. Since I have a Q6600 and Quake 4, I'll probably try it. Of course, according to that link, it's quite obvious that with Quake 4 at least, it's using CPU0 and CPU2, allowing each to have their own 4MB of L2.