Originally posted by: Fox5
First off, the Core 2 Duo is monolithic, the 2 cores are joined together at the L2 cache (an arguably superior solution to AMD's joining at the memory controller).
Right. C2D was the first monolithic dual-core Intel designed. But if you recall, it certainly wasn't the first dual-core they produced. Pentium D had that honor and was a strapped-together affair (each core was distinct and had it's own, non-shared L2 cache). This is what I was refering to when I said AMD had the first monolithic dual-core: X2 debuted before C2D.
Secondly, the Pentium 4 dual cores were also hampered by slow memory. Even if they had been able to handle the higher FSB speeds necessary, with a chipset capable of doing so, fast memory did not exist at the time.
Um, no. The Pentium D chips used DDR-400 (or higher) and DDR2 (400-800) for over a year before AMD made the jump with the new AM2 socket. And not able to handle higher FSB speeds? Seriously? Those chips were
all about high FSB & clockspeed. Remember the race toward 4GHz?? It wasn't low bandwidth memory or slow FSB that hampered the P4D chips - it was their crappy Netburst architecture.
The core 2 quads get a pass because:
1. Each pair is at least joined as the L2 cache, and most apps don't require a higher level of coherency than that.
Riiight...making it another strapped-together CPU. In this case, two duals packaged on one chip. Each pair connected, but not all four together, which is the case for AMD's Phenom X4. Monolithic, remember? But this time that wasn't enough by itself to overcome Intel's superior architecture. Making it not so Phenomenal.
2. They have access to memory over twice as fast as what the Pentium D's ever had access to.
Granted,
now. Not at launch, when the fastest thing (DDR2-800) was also available to the Pentium D chips.
Two more things to note here.
At C2D launch, in AT's extensive original
benchmarking, the lowly e6300 (1.86GHz, 2MB shared cache) was able to convincingly beat the fastest P4D ever made, the Pentium D 965 EE (3.73GHz, 2x2MB cache). Now, consider those two chips: the e6300 has exactly
half the clockspeed & cache of the P4D. And beat it in nearly all benchmarks. The e6300 also managed to approach the power of the AMD flagship, the almighty FX-62 (although it did take the $316 e6600 [2.4GHz, 4MB cache] to truly dethrone the $999 FX-62).
Second, since the A64 days for AMD and the C2D days for Intel memory bandwidth has had virtually no impact on performance. Really. Don't believe me? Take a look what happened
here when AT tested an e6300 with DDR-333, DDR-400, DDR2-533 and DDR2-667 memory in real-world applications. Guess what? DDR-400 won.
3. They start off with far better performance than the Pentium 4's did anyway.
Now that I won't argue.
You'll notice Intel went monolithic with Nehalem and onwards. There is a scaling advantage to having all cores on the same die, and it's too expensive/difficult to keep a large, fast shared L2 cache to make core 2's method viable. Xeons didn't fair as well as the Core 2's because the workloads and requirements are different. Intel made the right choice with a shared L2 for dual core on the desktop, and lucked out (or planned correctly?) with the availability of high speed memory. Additionally, the easiest things to multithread don't require high coherency.
C2D was the first Intel monolithic chip, as discussed above (first with cache shared between cores). Nehalem is Intel's first monolithic
quad. And yes, they have to go monolithic going forward with a shared L3 cache but that wouldn't work without an IMC to regulate the flow of data/work through the memory & cache to the cores. High speed memory has already been addressed.
I just don't understand your last sentence there. Can you clarify?
You probably could come up with plenty of situations (theoretical or otherwise) where the Phenom compares more favorably. Phenom is a flawed architecture as you noted, though. Not only is the L3 cache too small, it also is slower than they would have liked. Nehalem has an L3 cache done right, Phenom's (even the faster L3 cache of the phenom 2) needs to double in speed.
I think Phenom was constrained by size & heat more than anything else (they simply couldn't pack enough stuff onto their chip to make it work - focused too much on making it "monolithic" - apparently thinking that would improve performance). If AMD had been able to work at 45nm a year ago they could have released PII then (remember, PII is nothing more than PI on 45nm with more L3 cache added) and we might have a bit of a different situation these days.
Put simply, AMD designed an architecture for the server/HPC market that just happened to come onto the desktop while memory tech was stagnating. Since then, AMD themselves stagnated and underperformed. (until the launch of the Phenom 2's, they still didn't have anything terribly faster than their fastest stuff from maybe 2004/2005 in single or dual thread threaded performance).
Wrong and right. Memory "tech" had nothing to do with it. Thanks to their IMC the X2 chips saw
virtually no real-world benefit going from S939/DDR-400 to AM2/DDR2-800. And stagnating memory tech? In the last two years we've gone from DDR2-800 to DDR2-1200 and then to even faster DDR3 (up to 2000MHz the last time I looked, not that it makes any real difference in performance). You are right when you say AMD has "stagnated and underperformed" but it's been more like since 2006, they just haven't kept pace with Intel's relentless advances.
Now that both major manufacturers are using similar architectures on the desktop, and both the 360 and ps3 support 6 hardware threads, I expect we'll see more aggressively threaded apps. That's probably bad for AMD once again, as once we get beyond 4 threads, hyper threading should give Nehalem a very decisive advantage.
Dead right. GTA4 is a prime example in the gaming realm, it simply doesn't run well unless you have a quad available. I mean, a stock Q6600 can do as well as a 3.6GHz e8500. And that's an early title, I think as 2009 progresses we are going to see a major shift toward heavier and heavier multithreading in nearly all new software.