Ya I hang out @ Ace's BBS (along with BurntK.), but there is not too much activity, so I come here when I get bored.
BTW: That?s a very nice article @ Ace's (P4 vs. Mustang), but a few technical errors. Here are a few quick ones:
link
"Yes, for today's games, which are the most demanding applications most desktop users run, sustainable memory bandwidth has become a serious bottleneck. No wonder, if you consider that the fastest x86 processor (Athlon 1200 MHz) today runs with a multiplier of 9x."
-It looks like the author has mixed up PC-2100 DDR (266MHz) Tbird platform with current platforms (SDR, 100Mhz). The current 1200Mhz Tbird has a 12.0x clock multiplier (not 9.0)...
"The trace cache, the huge instruction buffers, and many other considerations have resulted in a Pentium 4 architecture with 20 stages after the trace cache and no less than 28 total. Notice that the branch check is at the 19th cycle, and therefore the branch misprediction penalty is no less than 19 cycles! If an instruction is not the in the Trace cache, the penalty could be even worse (context switches). Luckily, Intel has implemented an excellent branch predictor, which should be better than any existent branch predictor today to minimize the impact of branch misprediction. There is also a bypass between the decoder and the rename/allocate unit, which should lower the performance decrease caused by trace cache (L1- Instruction cache) misses."
-This is controversial, but it?s an interesting topic. I have talked with a few reliable Intel sources, and they have leaned towards 24 final stages, not 28. It should be more then 20 anyway you look at it. It looks like Intel will be doubling P4's "Rename" stages, and there will most likely be more than one "Dispatch" stage. Rest is speculation.
My 0.02 cents.