Been reading about Phenom and I just don't understand...all these upgraded features, where is the increased performance?

Dec 30, 2004
12,553
2
76
Specifically,

* A new floating point scheduler now supports 36 128-bit operations
* Support for 128-bit SSE operations, an upgrade from the previous 64-bit architecture
* Two SSE operations and one SSE move can be processed per cycle
* Processor instruction fetch has been increased from 16 to 32 bytes
* Advanced branch prediction with built in a 512-entry indirect branch predictor
* Data cache bandwidth has increased from 1 x 64-bit loads per cycle to 1 x 128-bit loads per cycle
* L2 cache / memory controller bandwidth has been increased from 64-bits per clock to 128-bits per clock
* HyperTransport 3.0 Support for up to 20.8GB/s of raw bandwidth

So why is it performing so much worse than anticipated? Those look like some serious improvements. Like with the SSE improvements, I would expect it to be able to encode much faster.
 

AlabamaCajun

Member
Mar 11, 2005
126
0
0
Performance is better than anything AMD has in the Athlon line and some of the older Intel lines But I won't make compares to Intel.
I've taken a 9500 2.4G against my 5600 X2 2.8 and there is a noticeable improvement knocking a few seconds off of the time with both running two threads from one app. Running all four about doubles it on the Phenom. This rendering was almost all computational as the program was designed to run efficiently enough not to hit the caches or cause excess page faults. This was not a scientific benchmark but an observation using a watch and running the two machines side by side. For an AMD chip it is an advancement that look promising in the near future.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Are the media codecs highly-optimized for K8? If so, I would expect they need tweaking to take full advantage of Barcelona/Phenom, because the reduced execution latencies mean a code sequence that's optimal for K8 could leave units idle on Barcelona/Phenom.
 

BitByBit

Senior member
Jan 2, 2005
474
2
81
Originally posted by: soccerballtux
Specifically,

* A new floating point scheduler now supports 36 128-bit operations
* Support for 128-bit SSE operations, an upgrade from the previous 64-bit architecture
* Two SSE operations and one SSE move can be processed per cycle
* Processor instruction fetch has been increased from 16 to 32 bytes
* Advanced branch prediction with built in a 512-entry indirect branch predictor
* Data cache bandwidth has increased from 1 x 64-bit loads per cycle to 1 x 128-bit loads per cycle
* L2 cache / memory controller bandwidth has been increased from 64-bits per clock to 128-bits per clock
* HyperTransport 3.0 Support for up to 20.8GB/s of raw bandwidth

So why is it performing so much worse than anticipated? Those look like some serious improvements. Like with the SSE improvements, I would expect it to be able to encode much faster.

I find it baffling. In addition to the 32B instruction fetch and IMC enhancements, K10 essentially includes all the improvements Intel introduced with Core 2, which resulted in 25-30% over Yonah, per clock. The architectural updates implemented in K8 resulted in around 25% over K7, yet here we have far more extensive changes resulting in around 15% at best. Xbit spoke of K10's narrow 64 bit store (compared to Core 2's 128 bit) bus potentially causing a bottleneck, but I doubt AMD would have left it that way if doing so would result in severely capped performance. One thing I'm almost certain of however, is that K10 was rushed, possibly leaving little time for refinement. Hence the 'Phenom bug'.