Originally posted by: Triskain
AMD's MCM approach is certainly much more robust than Intel's. At least the dies are connected directly on the packaging with a high speed Hypertransport link. Not as good as having a monolithic processor but way better than using the FSB.
Originally posted by: ExarKun333
Originally posted by: Triskain
AMD's MCM approach is certainly much more robust than Intel's. At least the dies are connected directly on the packaging with a high speed Hypertransport link. Not as good as having a monolithic processor but way better than using the FSB.
So your saying C2Q was much slower than PhI because of this? Oh wait...
The HT link will make a difference as you scale with more cores, but again, the MCM approcah can work if you take the time to make it right.
Let me take a stab at explaining this.
Efficiency of processing parallelized applications are roughly characterized by two orthogonal aspects - coarse vs. fine
grained code and
inter-process communication (IPC) bound vs. unbound (i.e. contention).
In HPC applications, as well as server enterprise, a
2x2 matrix model suffices to communicate the high-level categorization of one's parallel application and choice of hardware.
Now as you can see in my hastily put together matrix there are two axis, one is determined by the application of interest (course vs. fine grained) and the other is determined by the communication topology of choice (as this determines latency and bandwidth).
Now we can see that for those applications which are not excessive in their demands on the IPC topology (FSB or HT) it really won't matter which architecture one chooses when it comes down to scaling efficiency, as such the architecture with the highest single-threaded performance will also end up being the processor with the highest multi-threaded performance (for the apps with low IPC demands, regardless whether they are course or fine grained applications).
Cinebench, povray, etc, a whole host of desktop applications tend to be IPC unbound, and hence the choice of going MCM and using the FSB for communication topology versus going monolithic and using internal data buses on the CPU for communication topology does little to change the performance of the cpu's in handling most multithreaded desktop applications.
Conversely we can have an application which is IPC sensitive, lots of information/data is transmitted and required by each thread in order for it to get on with the next iteration of computations. In this case single-threaded performance can end up being a poor indicator of multithreaded performance because the scaling efficiency ends up being IPC bound.
Here are some examples of the communication topology impacting the scaling efficiency depending on one's choice of architecture.
http://i272.photobucket.com/al...chBenchmarkScaling.gif
http://i272.photobucket.com/al...3DBenchmarkScaling.gif
Suffice to say if you happened to be running an IPC bound application (which are common in the enterprise server and HPC markets) then you would have found yourself wanting to buy K10-based opteron chips to enable extraction of a higher level performance from your multicore system.
If you happened to be running an IPC unbound application (which are common in the desktop consumer market) then you would have found yourself wanting to buy an MCM core2duo-based chip to enable extraction of a higher level of performance from your system regardless of the applications
graininess.