Originally posted by: dmens
Originally posted by: Gary Key
Gary are you saying that the part scales better than linear? Interesting...
Not exactly, I cannot go into the details yet, just imagine the cache/memory pipeline as being a supercharger on a car, you have an engine (same compression ratio/cubic inches for NA versus SC) that performs the same until you hit a certain RPM/air-fuel mixture where the SC comes online and the power curve changes dramatically compared to the NA engine. The same basis is occurring here, all of the changes/enhancements made to the core / HT/ cache / memory controller are basically "idle" in some cases (SC is flowing more air than than the engine can take advantage of at low RPM plus you have parasitic drag from powering the SC), if not a hindrance (low compression and mismatched gearing). An engine (CPU) is most efficient at its torque peak (wherever that happens to be based on gearing, displacement, compression, efficiency, etc) and in this case, it starts nearing it (torque peak) around 2.4GHz from all indications.
Except a CPU has static resources, whereas a car engine can burn more fuel because a supercharger fed it compressed air. CPU core frequency increases cannot never yield greater than a linear performance gain.
I am not saying that increases in the CPU core frequency yield a linear performance gain greater than 1:1. My example as stated was crude and too simple, in this case the cubic inches, compression ratio, and gearing are static, the supercharger simply lets the engine perform more efficiently (from a power output perspective) by taking greater advantage of the fuel/air intake mixture (plus aggressive timings) it has available as RPMs rise.
Probably a very bad example, but I was trying to make the point that the changes in the architecture of this processor and the new chipsets (HT 3.0, etc) do not provide any advantages (in most cases) over the current platforms in performance until the core clock speed increases and we start to notice that around 2.4GHz (see below for other reasons at this time). I think I have said this several times since Computex, AMD desperately needs to get the core speeds on this processor architecture improved (above 2.4GHz or so, privately a few people at AMD agree) for it to be really competitive and to take full advantage of their processor/platform improvements.
I do not think AMD ever intended or even believed this CPU would launch at the speeds it will (1.8~2.0, possibly 2.2 in Q4) as the processor simply does not perform as efficiently as it should (appears capable of) based upon the architecture changes. A lot of the early information we had was that Barcelona would launch in the 2.2~2.4 range and then scale quickly, with a potential to 4GHz in the end. The early performance expectations and claims of performance improvements over current platforms were based on simulations at 2.4~2.6GHz and then scaling upwards. The CPU was designed with these speeds and above in mind, it simply is too slow right now not to mention several core improvements have been flipped on/off or just are not as efficient as they should be in early testing.
At least with the early samples we have seen, there are improvements against current processors on a clock for clock basis as the core speed improves, this does not mean a linear performance gain that is greater than 1:1, it simply means the chip is operating more efficiently as the core speed improves. There could be a wide variety of reasons for this as we have seen dramatic changes in the platform performance almost week to week as new steppings, chipet revisions, and BIOS code were changed. We have seen HT not working or set at 1.0, 2.0, 3.0 specifications depending upon core speed and chipset, secondary caches turned off or even gated based upon core speed (L3 cache and L2 prefetchers as late as July), floating-point instructions flipped on or off, out of order execution of load algorithms flipping from conservative to aggressive and back depending upon core speed, and even translation lookaside buffers being tinkered with during this time not too mention a dozen other changes.
Also remember that the DRAM controller is now split into two separate 64-bit controllers. Each controller can be operated independently by the chipset and there can be some significant improvements in efficiency, especially where the individual cores are working on independent threads and each have their own memory access patterns, yet another area where core speeds could create variable results. Added to this is the fact that the data prefetcher now brings data directly into the low latency L1 data cache, as opposed to the L2 cache in the K8. K10 also increased the ability of its L1 instruction cache prefetcher to handle two outstanding requests to any address. These two areas plus the new DRAM prefetcher on the revised memory controller are the control mechanisms that we have noticed having the greatest impact on performance, especially with the increase in core speed. It is also the area that believe has been most "tinkered" with during the prototype and pre-production phases. We have noticed the processors only needing DDR2-667 in June to really being responsive with DDR2-1066 as the core speeds have increased along with the other improvement/additions to the processor, BIOS, and chipsets.
When I said that certain features were "idle" in some cases, this is what I was talking about. Until we see production level silicon and final BIOS code, it is extremely difficult to determine what is occurring inside Barcelona/Phenom and what is not on a clock for clock basis. Throw into that mix, a whole new generation of chipsets (ie...RD790) that take further advantage of these changes and you have a situation that is very fluid as the initial performance results will be on older HT 2.0 chipsets that are designed for the enterprise environment. There is not a consumer level board available that is tuned for this processor series yet, trying to use it on one is like using a QX6850 on a VIA PT880, yeah it works, but look at the results.
That is why we do not want to guesstimate the performance or even provide tangible numbers until we have had a chance to test released product. For whatever reason, in the early tests, the processor operated more efficiently as the core speed increased, we will find out shortly why it did. I hope this helps and if I could speak in greater detail, I would, but September 10th is getting close. Like I said in my previous message, some people will be happy, some will not, and most will realize that certain hype does not directly translate into expected performance improvements, not until we see some speed (counting on this).
