We recently learned all about "Prescott," Intel's next-generation Pentium4. We learned it will have twice the L1 data cache (16Kb), twice the L2 cache (1Mb), a 800MHz system bus, 13 new instructions that might be SSE3, reduced latency on integer multiplies, more write combining buffers, improved prefetching and branch prediction, plus "Hyperthreading 2."
Who cares, right? Nothing for AMD to worry about. It's just a P4 with more cache, SSE3, and a few other minor optimizations and improvements. Or is it?
Chiparchitect.com took a detailed look at the Prescott and Northwood core (die) pictures. And they came up with some very interesting conclusions...it seems that Intel is still keeping many of Prescott's improvements under wraps.
First, they compared the size of a 256K L2 block in both Northwood and Prescott to the size of the trace cache. On Northwood, a 256K L2 block is 2.4 times the size of the t-cache, while a 256K L2 block on Prescott is only 1.6 times the t-cache size. Thus, it appears that Prescott has a ~160kByte (16uOps) trace cache that is 30% larger than the 12uOps trace cache on Northwood and Williamette. This change will almost certainly improve IPC. Indeed, this change is even more significant, from a design standpoint, than the extra L1 and L2 cache.
Secondly, when considering the increased trace cache, and after comparing the layout of the pipeline stages for Northwood and Prescott, they've come to the conclusion that Prescott is actually a 4-way design; that is, it issues and retires 4 instructions per clock cycle, up from the 3 in Williamette and Northwood. This would represent a significant design change from the Pentium4--a potential 33% improvement in the work performed every cycle. Of course, there wouldn't be much point to this change without additional execution resources...
Perhaps the most significant revelation from Chiparchitect.com's analysis is the apparent differences in the Rapid Execution Engine (Intel's term for their double pumped ALUs running at twice clock speed). According to their observations, the Prescott appears to DOUBLE the number of Rapid Execution Engines. That is, whereas the Northwood and Williamette have a single [effective] 32-bit ALU running at twice chip frequency, the Prescott appears to mirror or "double up" this silicon. Said another way, the Prescott appears to double the number of integer execution resources on the current P4--this is the type of thing you would expect of a dual core processor.
In summary....if their silicon observations hold up, then Prescott will be more than deserving of the Pentium 5 name. Whereas the current 3.06GHz Northwood issues 3 instructions per cycle from 2 threads to 1 execution core, it looks like Prescott may do 4 instructions per cycle from 2 threads to 2 execution cores. This would *significantly* increase IPC and Hyperthreading throughput on Prescott. Whereas the current P4 may get a 5% to 15% boost (on average) with Hyperthreading, Prescott could well get a 30% to 90% improvement with Hyperthreading.
It's not inconceivable that Prescott would exceed the Athlon in IPC with these changes. And it could come close to Hammer's IPC. All while running at substantially higher clock speeds...
Comments? Thoughts?
Who cares, right? Nothing for AMD to worry about. It's just a P4 with more cache, SSE3, and a few other minor optimizations and improvements. Or is it?
Chiparchitect.com took a detailed look at the Prescott and Northwood core (die) pictures. And they came up with some very interesting conclusions...it seems that Intel is still keeping many of Prescott's improvements under wraps.
First, they compared the size of a 256K L2 block in both Northwood and Prescott to the size of the trace cache. On Northwood, a 256K L2 block is 2.4 times the size of the t-cache, while a 256K L2 block on Prescott is only 1.6 times the t-cache size. Thus, it appears that Prescott has a ~160kByte (16uOps) trace cache that is 30% larger than the 12uOps trace cache on Northwood and Williamette. This change will almost certainly improve IPC. Indeed, this change is even more significant, from a design standpoint, than the extra L1 and L2 cache.
Secondly, when considering the increased trace cache, and after comparing the layout of the pipeline stages for Northwood and Prescott, they've come to the conclusion that Prescott is actually a 4-way design; that is, it issues and retires 4 instructions per clock cycle, up from the 3 in Williamette and Northwood. This would represent a significant design change from the Pentium4--a potential 33% improvement in the work performed every cycle. Of course, there wouldn't be much point to this change without additional execution resources...
Perhaps the most significant revelation from Chiparchitect.com's analysis is the apparent differences in the Rapid Execution Engine (Intel's term for their double pumped ALUs running at twice clock speed). According to their observations, the Prescott appears to DOUBLE the number of Rapid Execution Engines. That is, whereas the Northwood and Williamette have a single [effective] 32-bit ALU running at twice chip frequency, the Prescott appears to mirror or "double up" this silicon. Said another way, the Prescott appears to double the number of integer execution resources on the current P4--this is the type of thing you would expect of a dual core processor.
In summary....if their silicon observations hold up, then Prescott will be more than deserving of the Pentium 5 name. Whereas the current 3.06GHz Northwood issues 3 instructions per cycle from 2 threads to 1 execution core, it looks like Prescott may do 4 instructions per cycle from 2 threads to 2 execution cores. This would *significantly* increase IPC and Hyperthreading throughput on Prescott. Whereas the current P4 may get a 5% to 15% boost (on average) with Hyperthreading, Prescott could well get a 30% to 90% improvement with Hyperthreading.
It's not inconceivable that Prescott would exceed the Athlon in IPC with these changes. And it could come close to Hammer's IPC. All while running at substantially higher clock speeds...
Comments? Thoughts?