P4 vs. P3 vs. Mustang vs. T-Bird

Soccerman · Oct 25, 2000

VladTrishkin, where did you get that 10 stage P3 (coppermine) spec? I never saw that before!

KarsinTheHutt · Oct 25, 2000

10 stage pipe on CuMine? I'm skeptical... but maybe that's why it can't get to 1.13 GHz

Sunner · Oct 25, 2000

Its pipe depth is 12, just like all other CPU's that are based on the P6 core.

VladTrishkin · Oct 25, 2000

link

Picture link

The basic P6 pipeline consists of 10 stages.....

I cant belive that Soc didnt know that..

Sunner · Oct 25, 2000

I stand corrected, though like previously stated, all the P6 based cores have equally long pipelines.

VladTrishkin · Oct 25, 2000

Everyone was shouting 12, not 10... The PIII katmai has a 12 stage pipeline, and then Intel reduced it to 10 with the release of the coppermine.

Sunner · Oct 25, 2000

Even on the pages you link to they refer to the "P6 Pipeline", not the Coppermine pipe, and I hardly think INtel would reduce the pipe for the Cumine, since thats quite a big redesign that would need to be done, its not as simple as to "just reduce it".

VladTrishkin · Oct 25, 2000

I am at work now, but maybe i'll try to dig up some Intel spec sheets later... Anyway, Its 10, used to be 12. I am not sure how they reduced it, but i know it gave the P6 a boost. If i remember correctly it was in the ALU unit, and it has something to do with the reduction of L2 cache.

Sunner · Oct 26, 2000

Got a reply at Ace's(wjat better place to ask than the tech BBS there

).

<< Actually, it's both. The entire pipeline is about 12 stages long. However, that includes a 3 cycle retirement pipeline. The retirement pipeline is not in the critical path from branch mispredict to next execution,
nor is it in the latency from operation to use of the result. As a result, the effective pipeline length is shorter, at around 9-10 pipe stages. This also corresponds to the way AMD draws their pipeline
diagrams. AMD shows a 10 stage pipeline, but they never include a retirement pipe; they end the pipeline after execution and register file writeback. >>

So I guess we were both right sorta, assuming he knows what he's talking about, which they typically do over there

IBMer · Oct 26, 2000

I think SSE2 with be a big differnce.

If you look at the GTS Ultra Benchmarks that Anandtech originally did, at 1600x1200x32, he compared a P3 550E and T-bird 1Ghz.

With every card including the Ultra, the P3 was 1-2fps faster than that Athlon. This is because Q3 is SSE optimized.

Try running 3dMark2000 without SSE. You can do this buy trying the Software T&L vs, the P3 Optimizations.

Anyways, the SSE2 is more than just more instructions it can also do 128-bit calculations, but this really means 4 32-bit calculations in sequence with each other.

I think we will find out later whether or not it is good or not, but really nobody knows. I will bet that nobody here has even seen a mustang or P4 to talk about what it faster and what is not. We will find out when they are availible for sale.

VladTrishkin · Oct 26, 2000

LOL Sunner, i saw your thread @ Ace's, didnt reply to make it fair

Anyway, there is a bit more to it, i remember reading (Intel spec sheets) that the Coppermine differs from the Katmai a bit when it comes to the pipeline, but that reply you got sounds pretty fair.

Sunner · Oct 26, 2000

I figured maybe you were a member(well not that it has members in the same sense as this BBS, but you know what I mean) at Ace's, most people who are somewhat knowledgeable tend to hang out there

VladTrishkin · Oct 26, 2000

Ya I hang out @ Ace's BBS (along with BurntK.), but there is not too much activity, so I come here when I get bored.

BTW: That?s a very nice article @ Ace's (P4 vs. Mustang), but a few technical errors. Here are a few quick ones:

link

"Yes, for today's games, which are the most demanding applications most desktop users run, sustainable memory bandwidth has become a serious bottleneck. No wonder, if you consider that the fastest x86 processor (Athlon 1200 MHz) today runs with a multiplier of 9x."

-It looks like the author has mixed up PC-2100 DDR (266MHz) Tbird platform with current platforms (SDR, 100Mhz). The current 1200Mhz Tbird has a 12.0x clock multiplier (not 9.0)...

"The trace cache, the huge instruction buffers, and many other considerations have resulted in a Pentium 4 architecture with 20 stages after the trace cache and no less than 28 total. Notice that the branch check is at the 19th cycle, and therefore the branch misprediction penalty is no less than 19 cycles! If an instruction is not the in the Trace cache, the penalty could be even worse (context switches). Luckily, Intel has implemented an excellent branch predictor, which should be better than any existent branch predictor today to minimize the impact of branch misprediction. There is also a bypass between the decoder and the rename/allocate unit, which should lower the performance decrease caused by trace cache (L1- Instruction cache) misses."

-This is controversial, but it?s an interesting topic. I have talked with a few reliable Intel sources, and they have leaned towards 24 final stages, not 28. It should be more then 20 anyway you look at it. It looks like Intel will be doubling P4's "Rename" stages, and there will most likely be more than one "Dispatch" stage. Rest is speculation.

My 0.02 cents.

Howard · Oct 26, 2000

Mustang launches at 1.4GHz, I believe.

Soccerman · Oct 26, 2000

Vladtrishken, contrary to popular believe (lol yeah right), I am not that knowledgeable. I've been corrected sooo much for the past little while, I don't even know why I try!

then again, the more I get corrected, the more I learn!

in any case, I suspect they did SOMETHING to the pipelines.. whatever it was, it sure helped them in the mhz-mhz race, however, most tests that we use to compare sheer clock speed are tipped in Intel's balance (Quake 3 is a perfect example of that, it SAY's 3DNow! support, but it really doesn't matter much).

VladTrishkin · Oct 27, 2000

Soccerman, yes thats what i suspect as well... but it looks like current cB0's are good up to 1.1GHz or so, we need cC0's (and .13 micron later on) for 1.2+GHz...

jpprod · Oct 27, 2000

VladTrishkin: Actually that's not a mixup, writer refers to memory speed vs. CPU clock frequency which on KT133 Athlon platform is
1200Mhz/133Mhz = 9

VladTrishkin · Oct 27, 2000

jpprod, the default multiplier for a 1.2GHz Tbird is set to 12x. Yes you can run it at 9x, 133Mhz at the same 1.2Ghz, but this is not the default setting. This will be implemented in the near DDR-266 future.

VladTrishkin · Oct 27, 2000

BTW: cant you guys call me "Vlad"?

Sunner · Oct 27, 2000

jp, Vlad's(sure we can

) right about that, referring to the mem speed instead of bus speed would be sorta messy anyway, since some people run at 100 MHz mem(like me with my crappy noname PC100).

Rigoletto · Oct 27, 2000

http://www.aceshardware.com/Spades/read.php?article_id=28

With regard to SSE2, this article has something of interest on 3DNow! e.g. 3dnow! is at least 2x as powerful as a well designed FPU. But K6-2 only outperformed at max PII by 30% with it.

jpprod · Oct 27, 2000

Sunner, Vlad: Read into the very essence of this quote

<< Yes, for today's games, which are the most demanding applications most desktop users run, sustainable memory bandwidth has become a serious bottleneck. No wonder, if you consider that the fastest x86 processor (Athlon 1200 MHz) today runs with a multiplier of 9x. >>

I'm sure Johan of Ace's Hardware doesn't think 1200MHz Athlon runs on a 133MHz FSB

VladTrishkin · Oct 27, 2000

<< I'm sure Johan of Ace's Hardware doesn't think 1200MHz Athlon runs on a 133MHz FSB >>

-this is not for me to judge, i just pointed out an error (not to be confused with a mistake). True, in a few weeks or so we WILL see 1.2Ghz Tbirds on the 133MHz FSB @ 9.0x

P4 vs. P3 vs. Mustang vs. T-Bird

Elite Member

Golden Member

Elite Member

Senior member

Elite Member

Senior member

Elite Member

Senior member

Elite Member

Golden Member

Senior member

Elite Member

Senior member

Lifer

Elite Member

Senior member

Platinum Member

Senior member

Senior member

Elite Member

Banned

Platinum Member

Senior member