Originally posted by: Sylvanas
whether it is the 6mb L3 or something else there is a leap in performance
I vote
something else. lol. It's the 45nm refresh, isn't it? Certainly at 45nm you have a lot less power consumption, and are able to have more, faster, and lower latency cache (is it this core when they have the high-k gates they've been working on w/ IBM?). All three factors should improve bandwidth and performance. They might have also tweaked the architecture and made slight improvements.
Originally posted by: Idontcare
Unfortunately we can't arbitrarily change the latency, so we can't make statements like that and have any confidence in being right.
I hear ya. But there are so many parallels to my point. We can't change the amount of cache on a processor either. When hypertransport came about you could say it didn't make a difference because you had no alternative to the FSB on intel chips for comparison- but we all know it made a difference. Speed increases are always used to mask latencies- the reason why we have so many crucial numbers to look for in computing. The cache on a harddisk hides the latency of read/write speeds, L2 cache hides the latency of fetching data across a bus to the ram. Quad-pumping FSBs masks the low internal clock, and a higher overall FSB speed masks the latency of the bus itself.
My point to the OP (which only comes across when reading past the first sentence, and taking meaning from the whole post combined) is that, on a microarchitecture with an IMC, cache sizes are less important.
The only reason why the L3 is present is because it is a quadcore chip. Adding an L3 to an athlon x2 would do very little for the power consumption and die footprint it would add. In the same manner, adding more L3 to the phenom wouldn't do all that much. Barcelona/Pehnom was designed for the server market, where you will load all the cores and have cross traffic on the die much more frequently than in a consumer desktop- it just so happens having more cores aids desktop multitasking. So for a consumer desktop, L2 is more important than L3 (an exception always exists). If adding more L3 was effectual, you'd see a performance gain on the X3 chips compared to X4s- the same amount of L3 split by three vs four should make an improvement- but it isn't strikingly there. To make the tiered cache system, AMD has cache of a higher latency vs the last generation. L3 is of a native higher latency, as L2 when comparing to L1. With adding in an extra tiers- AMD also (If I correctly remember reading) added latency to current L2.
To the op- AMD Phenom and X2 performance won't increase all that much with more cache @ the current node. Current Intel chips have so much, and penryn chips get a performance boost from more because having lots of cache, as well has lower latencies and higher speeds (in the case of penryn), masks the delay or latency (and performance hit) you get when the processor has to get data from the ram over the FSB- which is slow. On current intel parts, a request for data in the ram goes from the cpu to the nb to the ram. Integrated memory controllers drastically reduce latency. Reducing latency increases the speed in which a request is completed- just like increasing the speed reduces latency- two sides of a coin aimed at the same goal really. Which is why Nehalem chips will have much smaller L2 caches- nehalem will have an integrated memory controller and it will be quicker, thus 'okay', to ask for data from the ram. However, if memory serves me right, there is such a large amount of L3 cache on nehalems because the L3 tier also has a copy of the previous L2. So on the L3 on a nehalem is the contents of cores 1-4's individual L2's (once again, not 100% sure, but I vaguely remember reading this).
The point is that going from a FSB to IMC is a much greater improvement in performance than say going from ddr2-667 to ddr2-800, or from ddr2-ddr3. In both those cases, you gain higher speeds, at the cost of greater latency. The memory subsystem of a computer (and the L2/3 used to mask its latency) is more impacted by latency than speed. Otherwise, we'd have a higher-than linear ramp in improvement between the same processors running at higher speeds- as in most modern processors, the L2 is synchronous with the CPU clock: so you'd have the performance gain of the higher clock, and higher L2 clock. But this just isn't so. Both do a little combined as frequency increases, but not even linearly when combined. But we see JUMPS in speed across generations as nodes shrink and latencies come down. The same thing can be said as to why research is ongoing in using light as a transmission medium in future computing- not because whatever processors using light will automatically run @ a billion ghz, but because the transmissions, through the light itself moves faster = less latency.