- Oct 14, 1999
- 11,999
- 307
- 126
I see that the P4 will cache micro-ops, about 12,228 at any one time. Doesn't the K7 already do something similar to this, just in a different way?
So effectively the P4 will have a three-stage L1 cache of 12k trace (i.e. micro-ops library) and an 8k data cache... for a 20k L1!?! Sure it the 20k design would be ultra low-latency, but no way it's true. There must be some limit to the number of unique micro-ops in 32-bit programming, does anybody know the number?
The L2 cache will feature a 256KB L2 cache running at the processor's core clock speed. This seems pretty reasonable considering the reported 256-bit pipeline (4x 64-bit), which is double the 128-bit pipeline (2x 64-bit) of the P!!!.
Let me try to understand this, the 256-bit L2 cache will be faster than simply using a large L1 cache? The L1 cache must usually miss alot else then Intel wouldn't be taking this approach. I mean, Intel seems to be pushing for lower latency in the second stage of cache rather than trying to improve the "efficiency" of the L1 cache. (Does the L2 cache hold micro-ops, too?)
This makes me wonder if the next-generation computers will then simply move to cache micro-ops at each pipeline then rely on 512-bit or 1024-bit superpipelines to the L2!?! Its a classical paradox, push for quality via complexity or quantity via simplicity. Perhaps as the die-sizes get smaller the cpu makers can afford the extra cost of complex cores.
So effectively the P4 will have a three-stage L1 cache of 12k trace (i.e. micro-ops library) and an 8k data cache... for a 20k L1!?! Sure it the 20k design would be ultra low-latency, but no way it's true. There must be some limit to the number of unique micro-ops in 32-bit programming, does anybody know the number?
The L2 cache will feature a 256KB L2 cache running at the processor's core clock speed. This seems pretty reasonable considering the reported 256-bit pipeline (4x 64-bit), which is double the 128-bit pipeline (2x 64-bit) of the P!!!.
Let me try to understand this, the 256-bit L2 cache will be faster than simply using a large L1 cache? The L1 cache must usually miss alot else then Intel wouldn't be taking this approach. I mean, Intel seems to be pushing for lower latency in the second stage of cache rather than trying to improve the "efficiency" of the L1 cache. (Does the L2 cache hold micro-ops, too?)
This makes me wonder if the next-generation computers will then simply move to cache micro-ops at each pipeline then rely on 512-bit or 1024-bit superpipelines to the L2!?! Its a classical paradox, push for quality via complexity or quantity via simplicity. Perhaps as the die-sizes get smaller the cpu makers can afford the extra cost of complex cores.
