P3 Faster then P4???

5LiterMustang

Senior member
Dec 8, 2002
531
0
0
Guys I've read two articles and from reading on here I gather the P3 is faster then the p4 mhz for mhz. What is the reason behind this, none of the articles I read did a good job explaining it.
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
31,742
31,694
146
The P3 had stronger IPC which found it's way into the Banias but the P4 ramps up mucho faster in MHZ.
 

Snooper

Senior member
Oct 10, 1999
465
1
76
Clock for clock, yes, a P3 IS faster than a P4......... But it REALLY doesn't matter as the P3s topped out in the 1.2GHz range and the P4s are at 3.2Ghz (oh, make that 3.06GHz). The reason for both the clock for clock speed difference and the absolute speed difference lies in the architecture of both chips. Remember, they are both being built on the same process if my memory is working (Intel has so many different processes these days it can get hard to keep up...). The VERY SHORT answer is that the P4 has a much longer instruction pipe than the P3. This pipe allows the P4 to keep all the components fed at higher clockspeeds, but it causes additional latency that the P3 does not have.

So, the P4 is faster. The P3 is more efficent. It is really similar to the Athlon vs. P4 debate. Same thing. Heck, look at the Itanium 2 process. It is only running at 1 Ghz, yet it is a VERY powerful processor and a WHOLE lot faster as a server than the Athlon or the P4 will ever be.
 

alkemyst

No Lifer
Feb 13, 2001
83,769
19
81
I haven't really looked into it myself, but it's true EXCEPT for video/audio manipulation (A/V Editting mainly)....in those cases the P4 spanks all.

I am going to run a 1.4-S Tualatin for a while I think....my Radeon 8500 handles games smoothly and with enough eye candy they are cool.

I even stopped overclocking my Radeon and it was still nice.
 

fkloster

Diamond Member
Dec 16, 1999
4,171
0
0
I haven't really looked into it myself, but it's true EXCEPT for video/audio manipulation (A/V Editting mainly)....in those cases the P4 spanks all.

FACTS:

1) the HIGHEST clock P4 (3.06) FLOGS the HIGHEST clock P3 (1.0) by an EXTREME margin in every test...

2) the LOWEST clock P4 (1.3) FLOGS the LOWEST clock P3 (.45) by an EXTREME margin in every test...

...any questions?
 

HokieESM

Senior member
Jun 10, 2002
798
0
0
DAPunisher has a good point. The IPC on the P3 is much better. It also handles integer calculations very well. Look at the performance of the Pentium M in the new notebooks--the 1.6 GHz Pentium M does VERY well competing with a 2.4GHz P4M... especially in office productivity. So is the P3 faster than the P4? A 1.3GHz P3 is much faster than a 1.3GHz P4 (actually, the P3-S 1.4 will compete very well with a P4 1.8). But the P4 scales better--they're coming out with a 3.2GHz soon, I believe.

This is exactly the same as the Athlon/P4 debate. The Athlon is more efficient per clock cycle--but unfortunately, they can't run at the same physical speeds (if they could... woo-wee). And keep in mind that you always have to buy a computer based on what you want to do with it--there are certain applications that benefit greatly from individual architectures.
 

5LiterMustang

Senior member
Dec 8, 2002
531
0
0
I figured it was something with the IPC but wasn't sure. Anyone know of any sites that list the IPC of various processors? I've seen athlon does 720 but it was in one article does anyone actually test for this stuff?
 

alkemyst

No Lifer
Feb 13, 2001
83,769
19
81
Originally posted by: fkloster
I haven't really looked into it myself, but it's true EXCEPT for video/audio manipulation (A/V Editting mainly)....in those cases the P4 spanks all.

FACTS:

1) the HIGHEST clock P4 (3.06) FLOGS the HIGHEST clock P3 (1.0) by an EXTREME margin in every test...

2) the LOWEST clock P4 (1.3) FLOGS the LOWEST clock P3 (.45) by an EXTREME margin in every test...

...any questions?

Yeah last I checked the HIGHEST P3 was a Tualatin, and it's a 1.4 with 512K cache :) (133FSB).

 

Diable

Senior member
Sep 28, 2001
753
0
0
A P3 is probably faster then a P4 of the same speed on old non-SSE2 enhanced code. But if the app has any SSE2 code the P4 will smack the P3 silly.
 

ElFenix

Elite Member
Super Moderator
Mar 20, 2000
102,393
8,552
126
Originally posted by: fkloster
I haven't really looked into it myself, but it's true EXCEPT for video/audio manipulation (A/V Editting mainly)....in those cases the P4 spanks all.

FACTS:

1) the HIGHEST clock P4 (3.06) FLOGS the HIGHEST clock P3 (1.0) by an EXTREME margin in every test...

2) the LOWEST clock P4 (1.3) FLOGS the LOWEST clock P3 (.45) by an EXTREME margin in every test...

...any questions?

wow, that was loaded.
 

dullard

Elite Member
May 21, 2001
25,925
4,516
126
Originally posted by: Diable
A P3 is probably faster then a P4 of the same speed on old non-SSE2 enhanced code. But if the app has any SSE2 code the P4 will smack the P3 silly.
Correct. Initially the P3 was faster than the P4 clock for clock. But the P4 has improved dramatically since the 1.4 GHz days:
1) The long P4 pipleline meant the IPC was really bad due to these mispredictions. This was the major reason that the P3 was faster clock for clock. However the faster the clockspeed the less time penalty you have for the branch mispredictions. And we all know the P4 clockspeed has doubled since the early days. Thus this problem is greatly diminished.
2) The P4 now has double the cache. This also increases the P4 IPC.
3) The P4 has gone from 400 MHz to 533 MHz fsb. Soon it will be at 800 MHz fsb. All this means the P4's IPC keeps increasing.
4) Better memory, better motherboards, better optimizations, etc have occured since the early P4 days.
5) Hyperthreading which will soon move down to the slower P4s will give a good boost on average to the P4's IPC.
6) Software is now optimized for the P4 - including SSE2 code.

All these reasons now mean that for many programs the P4 is actually now faster than the P3 clock for clock. 5LiterMustang, I bet those articles you read were old and didn't include the things I listed above. The picture has changed dramatically.
 

EdipisReks

Platinum Member
Sep 30, 2000
2,722
0
0
hell, they improved the p4 clock for clock even recently. i just bought a 2.4b C1 chip. at 2.4/533 i get a higher 3dmark score than i did at 2.4/600 (my old 1.6a@2.4). the fact that it makes up for a FSB reduction and increases the overall score speaks highly to new optimizations.
 

dexvx

Diamond Member
Feb 2, 2000
3,899
0
0
Common myth. The p3 cannot scale very high, due to its low FSB bandwidth. If you scale P3's to 3Ghz, I doubt it will perform as well as a 3.06HT P4. But then again if you scale a P4 to ~1Ghz, it'll probably get beat by a p3 1Ghz simply because it cannot take advantage of its high bandwidth FSB.

The initial reviews of the p4 1.5Ghz put it marginally faster than the P3 1Ghz, and it left everyone with a very sour image to the P4's performance. Since that time there has been numerous optimizations for the P4, compiler wise. Simply recompiling the program using the SSE2 intel extensions can yield a substantial boost in app speed. Couple that with the 512KB cache, 533/800 FSB, PC1066 RDRam, and its quite fast per clock.
 

ElFenix

Elite Member
Super Moderator
Mar 20, 2000
102,393
8,552
126
Originally posted by: dexvx
Common myth. The p3 cannot scale very high, due to its low FSB bandwidth. If you scale P3's to 3Ghz, I doubt it will perform as well as a 3.06HT P4. But then again if you scale a P4 to ~1Ghz, it'll probably get beat by a p3 1Ghz simply because it cannot take advantage of its high bandwidth FSB.

The initial reviews of the p4 1.5Ghz put it marginally faster than the P3 1Ghz, and it left everyone with a very sour image to the P4's performance. Since that time there has been numerous optimizations for the P4, compiler wise. Simply recompiling the program using the SSE2 intel extensions can yield a substantial boost in app speed. Couple that with the 512KB cache, 533/800 FSB, PC1066 RDRam, and its quite fast per clock.

and it still gets pwned! by a pentium m
 

Macro2

Diamond Member
May 20, 2000
4,874
0
0
What the P4 doesn't make up for by clockspeed, Intel makes up with "fuzzy" benchmarks...<G>
 

Wingznut

Elite Member
Dec 28, 1999
16,968
2
0
Originally posted by: dullard
Originally posted by: Diable
A P3 is probably faster then a P4 of the same speed on old non-SSE2 enhanced code. But if the app has any SSE2 code the P4 will smack the P3 silly.
Correct. Initially the P3 was faster than the P4 clock for clock. But the P4 has improved dramatically since the 1.4 GHz days:
1) The long P4 pipleline meant the IPC was really bad due to these mispredictions. This was the major reason that the P3 was faster clock for clock. However the faster the clockspeed the less time penalty you have for the branch mispredictions. And we all know the P4 clockspeed has doubled since the early days. Thus this problem is greatly diminished.
2) The P4 now has double the cache. This also increases the P4 IPC.
3) The P4 has gone from 400 MHz to 533 MHz fsb. Soon it will be at 800 MHz fsb. All this means the P4's IPC keeps increasing.
4) Better memory, better motherboards, better optimizations, etc have occured since the early P4 days.
5) Hyperthreading which will soon move down to the slower P4s will give a good boost on average to the P4's IPC.
6) Software is now optimized for the P4 - including SSE2 code.

All these reasons now mean that for many programs the P4 is actually now faster than the P3 clock for clock. 5LiterMustang, I bet those articles you read were old and didn't include the things I listed above. The picture has changed dramatically.
Excellent post, dullard.

 

jam3

Member
Apr 9, 2003
90
0
0
Yeah it's the same ol IPC/Clock speed relationship debate. And a big reason why people who think AMD using the arbitrary numberic to represent there chips are somehow misleading simply have 0 understanding of how cpu's work. They think a half adder is someone who went out and cut a snake in half with an axe.
 

imgod2u

Senior member
Sep 16, 2000
993
0
0
The P4 actually does have the neccessary resources to achieve a higher IPC than the P3 does. It simply has a lot of quirks that cripples its clock-normalized performance. One, as others have mentioned, is the extremely long hyperpipelined design. This, however, comprises only a small margin of its reduced clock-normalized performance. The advanced branch prediction algorithms on the P4 are quite accurate and only in very branch-heavy code (Office apps) do we see a dramatic drop in average IPC. There are, of course, many other reasons why the P4 initially (and still is to some degree) achieving a lower clock-normalized performance. Lack of dedicated shifter units, the narrow decoding stage (while the trace cache is used probably the majority of time to issue instructions, a great deal of code is still non-repeatable and the narrow decoders on the P4 would hurt in these instances), and probably a few others.
The architecture was significantly different and required a great deal of code adjustment. Today, the P4's clock-normalized performance has increased dramatically. In Tomshardware's latest roundup, we see the 1.5 GHz Willamette score an average of 40.25% better than the P3 CuMine 1.0 GHz. In the initial reviews of the P4 1.5 GHz that were done 3 years ago using the software of that day, the 1.5 Willamette pulled a very marginal performance increase compared to the 1.0 P3 CuMine. In Anand's review, the 1.5GHz P4 pulled a measily 11.76% above the 1.0 P3GHz CuMine. With SSE2, we can see the clock-normalized performance of the P4 significantly surpass the P3's. However, again, we need software to be adjusted and this only applies to software that isn't too branch-heavy. In situations where a lot of unconditional branches are used, we have a fundamental problem of branch mispredicts and the very deep hyperpipelined design significantly kills clock-normalized performance.
 

apoppin

Lifer
Mar 9, 2000
34,890
1
0
alienbabeltech.com
Originally posted by: alkemyst
I haven't really looked into it myself, but it's true EXCEPT for video/audio manipulation (A/V Editting mainly)....in those cases the P4 spanks all.

I am going to run a 1.4-S Tualatin for a while I think....my Radeon 8500 handles games smoothly and with enough eye candy they are cool.

I even stopped overclocking my Radeon and it was still nice.
I have a 1.2 Ghz Tualatin Celeron (which is basically an "improved" PIII @100FSB). However, it really doesn't perform too well until you raise the FSB to approach 133Mhz. Mine is at 1.5Ghz and is very comparable performancewise to the (old) 1.8 Ghz P4 (still no "need" to change my 8500 either). :)

 

dexvx

Diamond Member
Feb 2, 2000
3,899
0
0
Originally posted by: ElFenix
Originally posted by: dexvx
Common myth. The p3 cannot scale very high, due to its low FSB bandwidth. If you scale P3's to 3Ghz, I doubt it will perform as well as a 3.06HT P4. But then again if you scale a P4 to ~1Ghz, it'll probably get beat by a p3 1Ghz simply because it cannot take advantage of its high bandwidth FSB.

The initial reviews of the p4 1.5Ghz put it marginally faster than the P3 1Ghz, and it left everyone with a very sour image to the P4's performance. Since that time there has been numerous optimizations for the P4, compiler wise. Simply recompiling the program using the SSE2 intel extensions can yield a substantial boost in app speed. Couple that with the 512KB cache, 533/800 FSB, PC1066 RDRam, and its quite fast per clock.

and it still gets pwned! by a pentium m

You didnt appear to have read my statement. I reiterated that the P3 was FSB bound. The 512KB cache on the tualatins help, but when you're approaching a higher ratio, it is nullified. Pentium-M enjoys the benefit of a quad pumped P4 bus.
 

imgod2u

Senior member
Sep 16, 2000
993
0
0
That and the benefits of better branch prediction algorithms and a very large cache, plus SSE2.
 

merlocka

Platinum Member
Nov 24, 1999
2,832
0
0
Originally posted by: fkloster
I haven't really looked into it myself, but it's true EXCEPT for video/audio manipulation (A/V Editting mainly)....in those cases the P4 spanks all.

FACTS:

1) the HIGHEST clock P4 (3.06) FLOGS the HIGHEST clock P3 (1.0) by an EXTREME margin in every test...

2) the LOWEST clock P4 (1.3) FLOGS the LOWEST clock P3 (.45) by an EXTREME margin in every test...

...any questions?

Yes. I believe the question was "is the p3 faster than the p4 mhz for mhz".

Thanks for pointing out the obvious though. It was very helpful.

 

ElFenix

Elite Member
Super Moderator
Mar 20, 2000
102,393
8,552
126
Originally posted by: dexvx
Originally posted by: ElFenix
Originally posted by: dexvx
Common myth. The p3 cannot scale very high, due to its low FSB bandwidth. If you scale P3's to 3Ghz, I doubt it will perform as well as a 3.06HT P4. But then again if you scale a P4 to ~1Ghz, it'll probably get beat by a p3 1Ghz simply because it cannot take advantage of its high bandwidth FSB.

The initial reviews of the p4 1.5Ghz put it marginally faster than the P3 1Ghz, and it left everyone with a very sour image to the P4's performance. Since that time there has been numerous optimizations for the P4, compiler wise. Simply recompiling the program using the SSE2 intel extensions can yield a substantial boost in app speed. Couple that with the 512KB cache, 533/800 FSB, PC1066 RDRam, and its quite fast per clock.

and it still gets pwned! by a pentium m

You didnt appear to have read my statement. I reiterated that the P3 was FSB bound. The 512KB cache on the tualatins help, but when you're approaching a higher ratio, it is nullified. Pentium-M enjoys the benefit of a quad pumped P4 bus.

it completely depends on what you're doing, but ok, if you want to see it that way, i guess thats alright