- Dec 30, 2004
- 12,553
- 2
- 76
Specifically,
* A new floating point scheduler now supports 36 128-bit operations
* Support for 128-bit SSE operations, an upgrade from the previous 64-bit architecture
* Two SSE operations and one SSE move can be processed per cycle
* Processor instruction fetch has been increased from 16 to 32 bytes
* Advanced branch prediction with built in a 512-entry indirect branch predictor
* Data cache bandwidth has increased from 1 x 64-bit loads per cycle to 1 x 128-bit loads per cycle
* L2 cache / memory controller bandwidth has been increased from 64-bits per clock to 128-bits per clock
* HyperTransport 3.0 Support for up to 20.8GB/s of raw bandwidth
So why is it performing so much worse than anticipated? Those look like some serious improvements. Like with the SSE improvements, I would expect it to be able to encode much faster.
* A new floating point scheduler now supports 36 128-bit operations
* Support for 128-bit SSE operations, an upgrade from the previous 64-bit architecture
* Two SSE operations and one SSE move can be processed per cycle
* Processor instruction fetch has been increased from 16 to 32 bytes
* Advanced branch prediction with built in a 512-entry indirect branch predictor
* Data cache bandwidth has increased from 1 x 64-bit loads per cycle to 1 x 128-bit loads per cycle
* L2 cache / memory controller bandwidth has been increased from 64-bits per clock to 128-bits per clock
* HyperTransport 3.0 Support for up to 20.8GB/s of raw bandwidth
So why is it performing so much worse than anticipated? Those look like some serious improvements. Like with the SSE improvements, I would expect it to be able to encode much faster.