Why is a 2 Ghz athlon faster than a 2 Ghz PIV?

Zebo

Elite Member
Jul 29, 2001
39,398
19
81
You know I've read these reviews and benchmarks over the years and never gleaned from reading them why this holds true.. Anyone?
 

Barnaby W. Füi

Elite Member
Aug 14, 2001
12,343
0
0
Same reason that one 2000lb car can be faster than another 2000lb car. Mhz isn't a measurement of how fast it can process useful stuff, it's just a measurement of how fast stuff is moving around.
 

Jeff7

Lifer
Jan 4, 2001
41,596
20
81
I don't remember the technical terms (been awhile since I read the magazine article on why), but the P4 has more stages in its processing pipeline. This was so that it could scale to higher GHz ratings eventually, but with present speeds, it was a bit of a slowdown. This meant that the processor had to go through more clock cycles until it got a result from a calculation - the P4 got less work done per clock tick than the Athlon. It was to make up for this though by simply having more clock ticks per second - more GHz.
Intel introduced Hyperthreading too, which, from what I gather, is like trying to make a single processor work as two. It tries to do more than one action at a time, which usually translates to pretty good performance gains, especially in apps optimized to support Hyperthreading.
 

high

Banned
Sep 14, 2003
1,431
0
0
Athlon's do a lot more work per clock cycle. For example, my 2.2ghz 2500 oc'd to 3200 would be the equivalent to a p4 3.2 without HT. When HT is on it's more of a 2.6-2.8 P4 speed
 

Zebo

Elite Member
Jul 29, 2001
39,398
19
81
Originally posted by: Jeff7
I don't remember the technical terms (been awhile since I read the magazine article on why), but the P4 has more stages in its processing pipeline. This was so that it could scale to higher GHz ratings eventually, but with present speeds, it was a bit of a slowdown. This meant that the processor had to go through more clock cycles until it got a result from a calculation - the P4 got less work done per clock tick than the Athlon. It was to make up for this though by simply having more clock ticks per second - more GHz.
Intel introduced Hyperthreading too, which, from what I gather, is like trying to make a single processor work as two. It tries to do more than one action at a time, which usually translates to pretty good performance gains, especially in apps optimized to support Hyperthreading.

What do you mean by "more stages in it's processing pipeline"? Is this like pushing water though a 1/2" pipe vs. 3/4" pipe? More pressure but less volume?
 

JustAnAverageGuy

Diamond Member
Aug 1, 2003
9,057
0
76
20 pipeline stages in a P4

I think a Mac G4 is about 7

Athlons are about 10-14 or so.

Not a wider pipe.

A shorter pipe.

assuming "water" (data) travels at a one pipeline every second second rate. PSI is the same.

in an Athlon system it would take about 10-14 seconds for the water to completely empty.

in a P4 it would take 20 seconds.

However, now lets assume you want to push oil down that same pipe.

You have to wait for all the of the water to get out before you can put new stuff in. Hence, wasted time.

An athlon system has been pushing oil down that pipe for 6-10 seconds before the P4 pipe even gets started getting filled. Meanwhile the athlon system finishes faster.

To make up for that fact, the intel P4 sends the data through faster (GHz rating). That is why a P4 2GHz loses to a AMD 2GHZ.

Hyperthreading is just the same as adding another pipe to send the data to a different part of the CPU. Which also allows the data in the first pipe to clear out while the data in the second virtual pipe is being sent through at the same time as the AMD system.

That's my basic understanding of it, but you may wait for someone to confirm that :/
 

OddTSi

Senior member
Feb 14, 2003
371
0
0
I think everyone is confusing the guy by using technical terms and going into descriptions of pipelines.

It's all a matter of throughput, which is the true measure of a processor's performance, not IPC or frequency.

Imagine two escalators, A and B. A is only wide enough for two people to fit on one step, B on the other hand is wide enough for 4 people to fit on one step. A however moves twice as fast as B does. If you compare the escalators based only upon how many people can fit on each step, B is the better one. If you compare the escalators based upon how fast they go, then A is the better one. But which one can move more people for a given time period (the true measure of performance)? The answer, they both move the same. B fits twice as many people per step, but A moves twice as fast so it makes up for that.

This is analogous to processors. How many people can fit on each step is like the IPC (Instructions Per Cycle) of a processor, how much work it gets done per cycle. How fast the escalator moves is like the frequency of a processor, how many cycles in a second. So the true performance of a processor is based on the product of these two values (there's other things that come into play like memory etc, but we'll just keep it simple), how much work gets done in a second.

AMD processors do more work per cycle, but the P4 processors have a higher frequency that makes up for it. In the end each individual value (IPC and Frequency) is fairly useless, it's the product of the two values that matters.
 

Jeff7181

Lifer
Aug 21, 2002
18,368
11
81
Originally posted by: OddTSi
I think everyone is confusing the guy by using technical terms and going into descriptions of pipelines.

It's all a matter of throughput, which is the true measure of a processor's performance, not IPC or frequency.

Imagine two escalators, A and B. A is only wide enough for two people to fit on one step, B on the other hand is wide enough for 4 people to fit on one step. A however moves twice as fast as B does. If you compare the escalators based only upon how many people can fit on each step, B is the better one. If you compare the escalators based upon how fast they go, then A is the better one. But which one can move more people for a given time period (the true measure of performance)? The answer, they both move the same. B fits twice as many people per step, but A moves twice as fast so it makes up for that.

This is analogous to processors. How many people can fit on each step is like the IPC (Instructions Per Cycle) of a processor, how much work it gets done per cycle. How fast the escalator moves is like the frequency of a processor, how many cycles in a second. So the true performance of a processor is based on the product of these two values (there's other things that come into play like memory etc, but we'll just keep it simple), how much work gets done in a second.

AMD processors do more work per cycle, but the P4 processors have a higher frequency that makes up for it. In the end each individual value (IPC and Frequency) is fairly useless, it's the product of the two values that matters.

You just said it doesn't have to do with IPC, then you said it does.

You're confusing Operations Per Clock Cycle with Instructions Per Cycle.

In your analogy, the people it can fit on each step is like the OPC, the speed is the frequency (like you said), and the amount of people it moves total in a given period of time is the IPC. And no, Intel's clock speed doesn't ALWAYS make up for it's lower OPC. A 3.2 Ghz P4 does 19,200 IPC without hyper threading... and an XP3200 does 19,800. But like you said, there's other factors like memory that effect the efficiency of the processors.
 

Ramses

Platinum Member
Apr 26, 2000
2,871
4
81
So how do the P4M's or whatever they are that are suposed to be so much faster than a regular P4 fit into this? More effecient internaly I'm guessing, but how so?

 

Lynx516

Senior member
Apr 20, 2003
272
0
0
The Average Instructions Per Clock is the answer. The Athlon can produce more instructions per clock than the P4. End of story no need to go into pipelines e.t.c Unless you want an explanation of why the Athlon produces more instructions per clock.
 

Ionizer86

Diamond Member
Jun 20, 2001
5,292
0
76
Athlon is 9 instructions per clock, while P4 is 6 instructions per clock (IPC). That's theoretical. While the Athlon at 2.0 doesn't really perform as fast as a P4 50% faster (the 9:6 ratio isn't exact), an Athlon at 2.0 does perform like a P4 at about 2.6-2.8GHz.

The longer pipeline of the P4 lets the architecture scale faster though. Intel will be hitting 4.0GHz long before AMD gets there with their chips. Intel's beaten AMD to 2Ghz and 3GHz for the same reason, though AMD's overall performance has been higher at numerous times during history.
 

imported_Phil

Diamond Member
Feb 10, 2001
9,837
0
0
Well I wanna go into pipelines, dammit! :p

With a 20-stage pipeline like the P4, say you have a conditional branch; namely that the next piece of code executed relies on the output of the current piece of code. If the Branch Prediction Unit is doing its job properly, then the chances are it's already stacked the correct piece of code next into the pipeline. However, if it gets it wrong, then the entire pipeline has to be flushed and reloaded, as the code that is "wrong", and thus all code "before" it (i.e. loaded after itself) is therefoce wrong. With a 20-stage pipeline, this takes time.

The Athlon has an 8 (IIRC)-stage pipeline, possibly 12. It's late and I forget; plus I'm halfway through my cider.

Anyhoo, the penalty for a branch mis-prediction with an Athlon is lower, so the time it takes to flush & reload the pipeline is shorter.

Although that's not the main reason why the P4 has to run at higher clock frequencies, it plays its own part, and I fancied seeing if I could still remember how it all works @ 11pm while half drunk. :D

Feel free to correct if I've messed it all up.
 
Oct 31, 2003
112
0
0
The AMD Athlon has 10 pipelines. The P4 has 20 main pipelines. Athlon does more work per clock cycle. The P4 can reach a higher clock cycle.

Banias... the mobile chip from Intel... has between 10 and 20... last article I read on it had no exact number. This increased the effiency of the chip when there was a failed branch prediction and still allowed fairly high clock rates. Big reason that Intel could not advertise the banias as banias is because they'd basically be saying that AMD's lower clock cycles can perform on par with Intels chips. Intel wants the average consumer to think that the 3.2GHz number is better than the 2.2GHz number.

You can look at Jeff7181 and OddTSi's posts... I like the analogy and the fix up by Jeff
 

dnuggett

Diamond Member
Sep 13, 2003
6,703
0
76
Originally posted by: Ramses
So how do the P4M's or whatever they are that are suposed to be so much faster than a regular P4 fit into this? More effecient internaly I'm guessing, but how so?



There is no P4M. There is an M and a P4-M. Which one are you talking about?
 

aka1nas

Diamond Member
Aug 30, 2001
4,335
1
0
Also, to get a little more technical, the P4 has a weak FPU but has SSE2. The athlon has a very very burly FPU(s). This is why Athlons tend to run unoptimized code so much more efficiently per clock cycle. This also helps explain why the P4 always beats the pants off the Athlon in most video encoding benchmarks, as the encoder has been optimized to use SSE2 instructions. The longer pipeline vs shorter pipeline aso has tradeoffs with certain types of applications favoring one or the other. I.E. applications that have many branches or greater penalties for missed branches ought to be the achilles heel for the P4, as flushing the pipeline is an expensive operation performance-wise. Maybe wingnutPEZ or PM could enlighten us more on whether such a condition actually occurs very often or not.
 

Venomous

Golden Member
Oct 18, 1999
1,180
0
76
Easiest definition....

P4 = a front wheel drive car with turbo. Has the horsepower, but lacks the torque.

Athlon = rear wheel drive car with turbo. Has all the torque but lacks a little hp.

Torque moves you.;)
 

InlineFive

Diamond Member
Sep 20, 2003
9,599
2
0
Originally posted by: Venomous
Easiest definition....

P4 = a front wheel drive car with turbo. Has the horsepower, but lacks the torque.

Athlon = rear wheel drive car with turbo. Has all the torque but lacks a little hp.

Torque moves you.;)

That's how I always describe it. :)
 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Originally posted by: Venomous
Easiest definition....

P4 = a front wheel drive car with turbo. Has the horsepower, but lacks the torque.

What does FWD have to do with torque output of a turboed engine?

P4 = F1 car. Supremely fast and the epitomy of CPU technology, but requires a smooth track to run at top speed.

Athlon = rear wheel drive car with turbo. Has all the torque but lacks a little hp.

Athlon = World Rally cars, also fast in virtually any conditions. However, in ideal situations, will lose out to the P4.

Torque moves you.;)
But Horsepower wins races.

 

aka1nas

Diamond Member
Aug 30, 2001
4,335
1
0
The P4 is an SUV: Not energy efficient, but it has a lot of power.
The Athlon is a Honda Civic, cheap gets the job done, can be riced out easily and cheaply
The Xeon is an Escalade with bling bling rims
The C3 is a Geo Metro
 

randumb

Platinum Member
Mar 27, 2003
2,324
0
0
Originally posted by: FishTankX
Now here's the real question.

Why is a 1.6GHz Pentium-M faster than a 1.6GHz Athlon??

The Pentium M is based of the Tualatin (Pentium III) core, which had a high/comparable throughput to the Athlons. However, the Pentium M was tweaked and has 1MB L2 Cache to put it over the Athlon.

 

drag

Elite Member
Jul 4, 2002
8,708
0
0
Originally posted by: Accord99
Originally posted by: Venomous
Easiest definition....

P4 = a front wheel drive car with turbo. Has the horsepower, but lacks the torque.

What does FWD have to do with torque output of a turboed engine?

P4 = F1 car. Supremely fast and the epitomy of CPU technology, but requires a smooth track to run at top speed.

Athlon = rear wheel drive car with turbo. Has all the torque but lacks a little hp.

Athlon = World Rally cars, also fast in virtually any conditions. However, in ideal situations, will lose out to the P4.

Torque moves you.;)
But Horsepower wins races.

No it doesn't

Torque wins races. Horsepower is measurement of Torque over time.

Torque(in foot-pounds) * RPM / 5250 = Horsepower. That's the formula exactly.

A car with a motor that puts out 230HP at 6200 rpms, but has the torque peak of 195 at 4500 will get it's @ss handed to it by a car that has only 215HP at 6000, but has a torque peak of 225 foot-pounds at 5000 rpm. As long as all other factors are equal.

Much like a P4 vs a Opteron. :p