Williamette's L1 cache a knockoff of AMD's design?

MadRat · Sep 3, 2000

I see that the P4 will cache micro-ops, about 12,228 at any one time. Doesn't the K7 already do something similar to this, just in a different way?

So effectively the P4 will have a three-stage L1 cache of 12k trace (i.e. micro-ops library) and an 8k data cache... for a 20k L1!?! Sure it the 20k design would be ultra low-latency, but no way it's true. There must be some limit to the number of unique micro-ops in 32-bit programming, does anybody know the number?

The L2 cache will feature a 256KB L2 cache running at the processor's core clock speed. This seems pretty reasonable considering the reported 256-bit pipeline (4x 64-bit), which is double the 128-bit pipeline (2x 64-bit) of the P!!!.

Let me try to understand this, the 256-bit L2 cache will be faster than simply using a large L1 cache? The L1 cache must usually miss alot else then Intel wouldn't be taking this approach. I mean, Intel seems to be pushing for lower latency in the second stage of cache rather than trying to improve the "efficiency" of the L1 cache. (Does the L2 cache hold micro-ops, too?)

This makes me wonder if the next-generation computers will then simply move to cache micro-ops at each pipeline then rely on 512-bit or 1024-bit superpipelines to the L2!?! Its a classical paradox, push for quality via complexity or quantity via simplicity. Perhaps as the die-sizes get smaller the cpu makers can afford the extra cost of complex cores.

DDad · Sep 3, 2000

Great- just what we need, give AMD a reason to sue Intel......

xtreme2k · Sep 3, 2000

both the P4 and the P3/Coppermine has 256bit L2 cache

just the
P4's L2 transfer EVERY-CLOCK
P3's L2 transfer EVERY-OTHER-CLOCK

intel, by using a Smaller L1 cache, decrease itz lantency to 2 cycle
a 2-cycle lantency L1 (8K) should outperform a 3-cycly lantency L1 (32k), hit rate of 8K L1 is 92%, hitrate of 32k L1 is 96%

lantency of L1 to intel for the P4 is more important than itz size

MadRat · Sep 4, 2000

Can anyone anwser my question on micro-ops... how many unique micro-ops does 32-bit programming have?

MaxFPS · Sep 4, 2000

MadRat,

<< According to the P6 architectur manager the the EFFECTIVE number of pipestages in P6 is 15-20 for integer uOPs and about 30 for FP uOPs. >>

http://www.aceshardware.com/cgi-bin/ace/tech.pl?read=7281

Is that the answer you're looking for?

I still haven't finished reading the lengthy discussion in that long thread which includes Paul DeMone from RWT.

BTW, the Pentium4 bottleneck thread begins here http://www.aceshardware.com/cgi-bin/ace/tech.pl?read=7265

MadRat · Sep 4, 2000

No, actually I was wondering the combinations of micro-ops possible from a decoded x86 instruction.

MadRat · Sep 4, 2000

Search

Williamette's L1 cache a knockoff of AMD's design?

MadRat

Lifer

DDad

Golden Member

xtreme2k

Diamond Member

MadRat

Lifer

MaxFPS

Golden Member

MadRat

Lifer

MadRat

Lifer

TRENDING THREADS