• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Bulldozer may not provide dramatic performance increase

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Man, Itanium chips are so delayed even if Intel executes on the next gen Poulson chips which skips the 45nm entirely, it would still be a 2 years behind the first 32nm chips and 1 year behind 32nm Xeon MP chip.

Still better than Tukwila though. First 65nm chips came early 2006 and Xeon MP 65nm came late 2006. That puts it 4 and 3 years behind respectively.

http://en.wikipedia.org/wiki/Tukwila_%28processor)

A dual core with a 130 watt TDP and 1.6 Ghz clockspeed?

High IPC and targeted at HPC?
The dual core Tukwilas are clocked so low and has high TDP likely because it was neutered.

The original roadmap called for dual core 90nm Montecito at 2GHz with the same 100W TDP with full 24MB L3 cache and 667MHz FSB, in early 2006. In comparison the actual chips were 1.6GHz with 18MB L3 and 533MHz FSB in late 2006.

There were also plans for a 65nm shrink called Montvale that would have put the clock speeds to 3GHz and allow 800MHz FSB with 2P versions. The final versions were 1.66GHz 90nm chips that had same L3 and FSB as the original Montecito.
 
Current i7s have 2MB L3 per core, right?
Yes, 8mb of L3 which is shared to all cores. The large exclusive cache is 1mb per core (L2)

As for performance, more cache is great, but sometimes having just enough cache that is faster can be better. There's a sweet spot somewhere, and if I remember right, the L3 of the i7 hexcore is a little larger than those of the quads, but in some cases the performance actually went down a bit due to increased latency (bigger but slower cache).

No L3 cache also shows worse drops, as seen in Phenom II vs Athlon II comparisons. It depends on the workloads, but I think AT found in their testing that clock for clock the Phenom II's have somewhere around 10% better performance in general. Sometimes it is less (3-4&#37😉, sometimes it is more significant (especially in games).

And the Celerons, if I understood them right, were crippled Pentiums, and this crippling was mostly a halving of the cache.

All those point that more cache is always better (assuming no big sacrifices were made to increase latency disproportionately), saving the trip to main memory. I suppose that was the idea behind their server chip ideas. Among other things, perhaps a full 24mb of L3 was impossible to implement without latency going up enough to make going to 24mb useless (no real performance gain) so they settled on the 18mb.

I seem to remember Anand quoting some engineer at Intel (Ronak?) that he was not actually 100% satisfied with Nehalem's cache, and that he thinks it should be higher. I am not sure as this was just one detail from an article long ago, and I may be wrong. I also seem to remember from that same article that, based on the same engineer, Nehalem's cache is almost (or actually is) the bare minimum amount that the cache should be for the chip to perform acceptably.
 
No L3 cache also shows worse drops, as seen in Phenom II vs Athlon II comparisons. It depends on the workloads, but I think AT found in their testing that clock for clock the Phenom II's have somewhere around 10% better performance in general. Sometimes it is less (3-4&#37, sometimes it is more significant (especially in games).
2-10 percent isn't huge, and could easily be made up by increasing clock speed because of the smaller die; according to the wiki article L3 cache makes up over 2/3 of the entire die. From what I gather it's a case of if you have enough, then it's enough; if this is so, and going from 0 to 6MB cache give 2-10 percent increases, then going from 8 to 24 would do even less for performance.

On a related note, the difference between Athlon II and Phenom II is L3 cache; what about Phenom & Phenom II? Isn't that just a matter of L3 as well, as far as performance is concerned?
 
2-10 percent isn't huge, and could easily be made up by increasing clock speed because of the smaller die
Yes, not a huge difference. But I'd have to disagree about small die = increasing clock speed. Both Athlon II and Phenom II chips seem to clock similarly. Of course, this may be due to mobo limitations since Athlon II's are non-BE. But even Sargas chips don't seem to clock any higher than Denebs. I suppose my point is that adding the L3 didn't hurt clockspeeds, and that it does improve performance (as measured by AT) 10% in general, so it's a clear win as far as performance is concerned.

From what I gather it's a case of if you have enough, then it's enough; if this is so, and going from 0 to 6MB cache give 2-10 percent increases, then going from 8 to 24 would do even less for performance.
You are right, I had similar thoughts in my mind. All I can think of is architecture differences may affect the importance of L3. Perhaps current gen of AMD CPUs only "suffer" by 10% in general when L3 is removed, but Intel CPUs gain more from larger caches. Just a guess.

On a related note, the difference between Athlon II and Phenom II is L3 cache; what about Phenom & Phenom II? Isn't that just a matter of L3 as well, as far as performance is concerned?
Clock for clock they perform similarly, the real difference is that Phenom II chips clock far higher than the original Phenoms (~1GHz higher)
 
Mostly a 5-10 percent; boost. There were some core-level changes to Phenom II as well as more L3.
You are right, my mistake. Mostly 5-10 percent, the same thing that extra cache did for Phenom II versus Athlon II. Just shows that the original Phenoms were cache starved. The 2mb of L3 was practically worthless.
 
Back
Top