Hi there,
I have a setup of two Intel Xeon E5405 harpertown (core 2 quad). I'm developping a demonstration of falsesharing effect to be taught in the parallel processing course at my university.
My test involves two systems made out of two threads working as productor-consumer on a buffer that resides or not in the same cache line. Systems are always executing on a pair of cores that shares L2 cache.
I wanted to highlight performance between having the two system working on a single chip or on two separate chips. Given that even when both systems are on the same chip, they don't share any cache, I was expecting to have the same performance in both case. Surprisingly, performance obtained in multi-core configuration are worst than those obtained from multi-processor setup. (both with or without false sharing)
Using VTune performance analyser, I discovered that the multi-core configuration was generating way more snooping related traffic, but isn't this protocol only for multi-processor?
I also noticed much more cache miss in L2 for multi-core config, but I suspect this to be a consequence of snoop requests.
To state it clearly, I was wondering if there is any communication protocol between the two dies of an intel quad-core, that is likely to be related to cache coherency, that affect execution in a different way than how it happens when using two processors.
If you have any sugegstion on what could explain my results, feel free to let me know.
Thanks,
Fred
I have a setup of two Intel Xeon E5405 harpertown (core 2 quad). I'm developping a demonstration of falsesharing effect to be taught in the parallel processing course at my university.
My test involves two systems made out of two threads working as productor-consumer on a buffer that resides or not in the same cache line. Systems are always executing on a pair of cores that shares L2 cache.
I wanted to highlight performance between having the two system working on a single chip or on two separate chips. Given that even when both systems are on the same chip, they don't share any cache, I was expecting to have the same performance in both case. Surprisingly, performance obtained in multi-core configuration are worst than those obtained from multi-processor setup. (both with or without false sharing)
Using VTune performance analyser, I discovered that the multi-core configuration was generating way more snooping related traffic, but isn't this protocol only for multi-processor?
I also noticed much more cache miss in L2 for multi-core config, but I suspect this to be a consequence of snoop requests.
To state it clearly, I was wondering if there is any communication protocol between the two dies of an intel quad-core, that is likely to be related to cache coherency, that affect execution in a different way than how it happens when using two processors.
If you have any sugegstion on what could explain my results, feel free to let me know.
Thanks,
Fred