Why is a 2 Ghz athlon faster than a 2 Ghz PIV?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Originally posted by: drag
No it doesn't

Torque wins races. Horsepower is measurement of Torque over time.

A car with a motor that puts out 230HP at 6200 rpms, but has the torque peak of 195 at 4500 will get it's @ss handed to it by a car that has only 215HP at 6000, but has a torque peak of 225 foot-pounds at 5000 rpm. As long as all other factors are equal.
But all things are not equal. A motor that puts out 195lb*ft/s at 4500rpm-6200rpm clearly has the breathing to go higher in RPMS, generating much more power and take advantage of gear multiplication, while the second motor is already starting to run out breath. Even with the artificial rev limit, gearing can be adjusted to take better advantage of its power characteristics, since in a race situation, a motor will spend most of its time near its rev limit anyways. And if the second car is faster, its because it generates higher average HP than the first.

What about a car with a motor that produces 300HP with peak torque of 300 lb*fts versus a car with 250HP and peak torque of 450lb*fts. Assuming equal weight, the 300HP car will out-accelerate the 250HP.

Much like a P4 vs a Opteron. :p
Depends on what you do. The P4 still is superior for video encoding, MP3 encoding, 3D rendering and multitasking situations.

 

Ramses

Platinum Member
Apr 26, 2000
2,871
4
81
Originally posted by: Accord99
Originally posted by: drag
No it doesn't

Torque wins races. Horsepower is measurement of Torque over time.

A car with a motor that puts out 230HP at 6200 rpms, but has the torque peak of 195 at 4500 will get it's @ss handed to it by a car that has only 215HP at 6000, but has a torque peak of 225 foot-pounds at 5000 rpm. As long as all other factors are equal.
But all things are not equal. A motor that puts out 195lb*ft/s at 4500rpm-6200rpm clearly has the breathing to go higher in RPMS, generating much more power and take advantage of gear multiplication, while the second motor is already starting to run out breath. Even with the artificial rev limit, gearing can be adjusted to take better advantage of its power characteristics, since in a race situation, a motor will spend most of its time near its rev limit anyways. And if the second car is faster, its because it generates higher average HP than the first.

What about a car with a motor that produces 300HP with peak torque of 300 lb*fts versus a car with 250HP and peak torque of 450lb*fts. Assuming equal weight, the 300HP car will out-accelerate the 250HP.

Much like a P4 vs a Opteron. :p
Depends on what you do. The P4 still is superior for video encoding, MP3 encoding, 3D rendering and multitasking situations.

Granted this is way off topic, but the bit about the 250 and 300 horse isnt always true. You can gear the lower HP/higher torque vehicle to eat the other alive, and be much more drivable at the same time. There's also the issue of what kind of racing, quarter mile, flat out running, etc, etc. There a lot of old big block car's with little HP and a massive amount of torque coupled with tall rear gears that will out right move. I had a 78 Lincoln with a mild 460 and 2.50 rear gears and a C6 with a shift kit that from a 50mph dead even would kill a 300ish horse 5L mustang, and pull to over 140mph. That's a 6K/lbs car there too.. :)

And got 12mpg..

Had to defend torque there, and I run a P4..

:D
 

drag

Elite Member
Jul 4, 2002
8,708
0
0
But all things are not equal. A motor that puts out 195lb*ft/s at 4500rpm-6200rpm clearly has the breathing to go higher in RPMS, generating much more power and take advantage of gear multiplication, while the second motor is already starting to run out breath.

You assuming things. You look at the high-torque car and calculate the torque and horsepowers peaks you'd realise that I gave it a fairly flat torque curve. It still has to produce 188 ft-pounds of tourqe to get 215HP, and at the same time you have 194.something for 230 HP at 6200. At 5000-ish rpm you have a 30 ft.lbs advantage to the high-tourqe vs 6-8 ft.lbs advantage to the high-hp car at 6200 rpm. And it's torque that = force of acceleration.

We will assume that the high-torque motor redlines at 6500 and the HP motor redlines at 7200. The high-torque may still be putting out 140 gt.lbs of torque (173hp)and the high-breathing motor still puts out 140 at 7200(197hp).

So we will assume a transmission with the first 2 gears:
1st gear ratio 3.461
2nd gear ratio 1.750


Final drive ratio 3.208 is stock for both motors, but we will increase it for the HP motor to compicate for the lack of torque and take advantage of the gearing. To take advantage of the gearing to give the HP more torque(motovational power) we will have to increase the final drive up to 3.849 which will give the HP car the power to pull away from the high-torque car a little bit at the starting line. (assuming both hook-up perfectly)

At 7200 redline the HP car will be making (with 10% drivetrain losses) 1678-lbs of torque at a axle rpm of 540. After that you shift to 2nd and your RPMs will drop to about 3637 rpm, and maybe be putting out 180 ft.lbs of torque, with 1091 torque at the rear axle.

The torque'y car will shift to second at 6500 with 140ft.lbs of torque, with 1398 ft.lbs at the axle with a axle rpm of 585, at that same speed the HP car will be only putting out around 1150ft.lbs of torque. At a axle rpm of 540 the tourque motor will still putting around at 6000 rpms, were it has 194ft.lbs of tourqe compared to the 140ft.lbs the HP car would be putting out at the same speed. That's 1938 ft.lbs or torque at the rear axle!

So even with the gearing advantage to give the HP car a head jump at the starting line by the time the HP car would hit redline you would be only putting out 1678 ft.lbs while the high-torque car will still be putting out 1938 and thus would start to catch up quickly.

Then when you switch into 2nd, it will still be worse, since the HP car would begin putting out around only around 1100-1200 ft.lbs, vs the torque car 1400 after his 2nd gear shift point.

Then when he finally gets to 2nd gear he will still drop down to 3060 rpm and maybe still have 200ft.lbs of power and have a axle torque of 1122.8. So then it would look like the HP car could almost hang with the high-torque car until the torque'y car hits his torque peak and the HP car would have to shift into 3rd. Then the torque car would just continually pull away because as the speed increases the HP car would have to shift sooner each time you'd go into the next gear. It's attempt to using gearing to compisate for lack of power and take advantage of the higher RPM potential (assuming it exists in the first place) would only make the problem worse at high speeds.


What about a car with a motor that produces 300HP with peak torque of 300 lb*fts versus a car with 250HP and peak torque of 450lb*fts. Assuming equal weight, the 300HP car will out-accelerate the 250HP.

That doesn't make sense, in order for a car to have a 300hp and 300ft.lbs of torque you'd have the peak HP at 5250 RPM. Because 300ft.lbs*5250/5250=300HP. Unless it had a perfectly flat torque line.

In order for a 450ft.lbs motor to put out only 250HP, That 450 would have to come at 2,900 RPM's and then drop off rapidly from there. That maybe a big truck desiel motor or something, but it doesn't sound like any racing car I know of.

Either way it would be a pretty silly race.

edit:

Of course realise that I picked the HP/torque ratios in the GOAL of showing exactly how HP vs HP readings can be misleading.

Horse power is only ONE of the many things you need to considure when comparing cars,

Saying a 230HP car will be faster then a 215HP car is not nessicarially true.

Much like mhz is only ONE of the many factors to considure when comparing CPU's.

Just like saying that a 3.2 ghz CPU will always be faster then a 2.0ghz CPU.
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
Why is a 2 Ghz athlon faster than a 2 Ghz PIV?
It is a design choice. The Athlon is designed to get more done per clock but has a lower maximum clock, the Pentium 4 is designed to do less per clock and is able to be clocked faster.

And, as others have said, it all comes down to pipelining. But what exactly is "pipelining"? It is splitting a task into multiple operations and that can be completed serially. The more you split, the less you get done at each stage, but the less time that you spend on that stage.

Take the example of a one-stage pipeline CPU. Making the numbers easy to work with (but unrealistically slow, so stick with me here), you might find that it take 1s to complete the instruction decode, the add operation and then write the result back to memory. Since the clock needs to wait for the data to be ready, you would find that you could clock this theoretical CPU at 1Hz. Now, if we could chop the logic neatly in half, we would now find that we can complete the instruction decode, one half of the add in 0.5s and then finish the add and write the result back to memory in another 0.5s. Nothing has really changed - it still takes 1s to complete one add operation, but now we can clock the design at 2Hz. So, if you have back to back instructions filling up the pipeline, we can now complete them twice as fast. In theory, we have doubled the performance of this theoretical CPU. This is pipelining.

You might think, "well, what's the limit? why can't we put in 100+ pipeline stages into a CPU and make it 100x faster?". Aside from the obvious one, there are plenty of other reasons, but I'm not going to go into clock skew/uncertainty, CK -> Q vs. logic delays and other really in-depth stuff. But the obvious reason is branch prediction. Let's say we have all of these instructions in the pipeline and one of them is a branch instruction... say we are comparing two numbers and if they are equal we will execute one section of code, and if they aren't equal then we will execute another. We want to fill the pipeline, but we won't know the outcome of the branch until later. What do we do? We make an educated guess, which in CPU terms is "branch prediction". If we get it right, the pipeline stays full and everything continues on like before. If we get it wrong, then we need to dump all the instructions that we started after the branch, and the load in the other branch. This is the big downside of pipelining. There are others, but this is the biggie. Since we can't always get branches right, we will take a misprediction penalty when we get it wrong. So you definitely don't want a 60 stage pipeline, because then you may have to wait 59 cycles before everything is back to normal on a misprediction.

How long a pipeline is too long has been a debate for a long time in the industry. It is called the Brainiac vs Speed Demon debate and you can find out more by typing those words into Google. Different design teams favor different pipeline lengths and each is convinced that they are maximizing performance by choosing the length that they have in their design.
 

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106
Originally posted by: drag
Then when you switch into 2nd, it will still be worse, since the HP car would begin putting out around only around 1100-1200 ft.lbs, vs the torque car 1400 after his 2nd gear shift point.
I think you're making a mistake with the gear multiplier. Using your ratios, the HP car should have the edge in wheel torque in second gear. It would have been easier to set the final drive ratio to a value that would caused both cars to shift at the same speed. In that case, and assuming a flat torque curve in both engines that linearly declines after peak torque RPM is passed, the torquey car has a small advantage until it nears redline, when the HP car gains the advantage. The two vehicle's acceleration is close, but the high HP one should have the higher acceleration.

Horse power is only ONE of the many things you need to considure when comparing cars,
Yes, but in racing terms, horse power is more important than simple peak torque, since torque means nothing without knowledge of gear ratios, torque curves, etc. HP already indrectly takes this into account.

Just like saying that a 3.2 ghz CPU will always be faster then a 2.0ghz CPU.

RPM is the equivalent to CPU MHz.
Torque is equivalent to CPU performance per clock
HP is equivalent to performance of CPU as measured by some Industry standard benchmark, such as SPEC.
 

drag

Elite Member
Jul 4, 2002
8,708
0
0
HP already indrectly takes this into account.

No it doesn't. It's just torque measured over distance. HP doesn't mean nearly as much as torque ratings, because horsepower has no real meaning. People advertise horse power ratings because it's easier to make peak horsepower then it is to make a car that is actually fast.

All HP is Torque multiplied by RPM then divided by 5250. A torque curve or quarter mile timeslip means 100x times more concerning the potential of the car then just a max horsepower rating. HP rating is as about as important when taken by itself as tire width. It's good for comparing similar engine types or similar cars in a racing class, but that's about it.

250hp out of a healthy ford small block is not equal to a 250hp blown civic motor. Either one can be much faster then the other one, you can't tell just by the hp rating.
 

RaynorWolfcastle

Diamond Member
Feb 8, 2001
8,968
16
81
Originally posted by: pm
Why is a 2 Ghz athlon faster than a 2 Ghz PIV?
It is a design choice. The Athlon is designed to get more done per clock but has a lower maximum clock, the Pentium 4 is designed to do less per clock and is able to be clocked faster.

And, as others have said, it all comes down to pipelining. But what exactly is "pipelining"? It is splitting a task into multiple operations and that can be completed serially. The more you split, the less you get done at each stage, but the less time that you spend on that stage.

Take the example of a one-stage pipeline CPU. Making the numbers easy to work with (but unrealistically slow, so stick with me here), you might find that it take 1s to complete the instruction decode, the add operation and then write the result back to memory. Since the clock needs to wait for the data to be ready, you would find that you could clock this theoretical CPU at 1Hz. Now, if we could chop the logic neatly in half, we would now find that we can complete the instruction decode, one half of the add in 0.5s and then finish the add and write the result back to memory in another 0.5s. Nothing has really changed - it still takes 1s to complete one add operation, but now we can clock the design at 2Hz. So, if you have back to back instructions filling up the pipeline, we can now complete them twice as fast. In theory, we have doubled the performance of this theoretical CPU. This is pipelining.

You might think, "well, what's the limit? why can't we put in 100+ pipeline stages into a CPU and make it 100x faster?". Aside from the obvious one, there are plenty of other reasons, but I'm not going to go into clock skew/uncertainty, CK -> Q vs. logic delays and other really in-depth stuff. But the obvious reason is branch prediction. Let's say we have all of these instructions in the pipeline and one of them is a branch instruction... say we are comparing two numbers and if they are equal we will execute one section of code, and if they aren't equal then we will execute another. We want to fill the pipeline, but we won't know the outcome of the branch until later. What do we do? We make an educated guess, which in CPU terms is "branch prediction". If we get it right, the pipeline stays full and everything continues on like before. If we get it wrong, then we need to dump all the instructions that we started after the branch, and the load in the other branch. This is the big downside of pipelining. There are others, but this is the biggie. Since we can't always get branches right, we will take a misprediction penalty when we get it wrong. So you definitely don't want a 60 stage pipeline, because then you may have to wait 59 cycles before everything is back to normal on a misprediction.


Out of curiosity (and I have no clue if this is technically feasible), would it be possible to implement something like the following to lessen the impact of the branch misprediction penalty:

Have 1 full pipeline and 1 "half pipeline"; once a branch instruction is decoded (but before it is carried out) in the main pipeline, the half pipeline starts to execute the second set of code pertaining to the branch: so in effect both results of the branch are being executed. Now once the branch comparison is carried out and you figure out what branch you really need, you have a mux-type structure transfer the "half" pipeline's content to the full pipeline to continue from there, without having to wait for the data to make its way all the way through the pipeline.

Now there are some issues with this approach (inefficiency of having a bunch of idle transistors most of the time, complicated logic design, etc.) but wouldn't this be worthwhile on CPUs with deep pipelining (such as the P4). Or maybe something similar to this already exists? any thoughts on my idea?
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
Have 1 full pipeline and 1 "half pipeline"; once a branch instruction is decoded (but before it is carried out) in the main pipeline, the half pipeline starts to execute the second set of code pertaining to the branch: so in effect both results of the branch are being executed. Now once the branch comparison is carried out and you figure out what branch you really need, you have a mux-type structure transfer the "half" pipeline's content to the full pipeline to continue from there, without having to wait for the data to make its way all the way through the pipeline.
This is done on the Itanium instruction set.
 

Zebo

Elite Member
Jul 29, 2001
39,398
19
81
Thanks so much pm. Finally a laymens explination even I can understand:p
 

RaynorWolfcastle

Diamond Member
Feb 8, 2001
8,968
16
81
Originally posted by: pm
Have 1 full pipeline and 1 "half pipeline"; once a branch instruction is decoded (but before it is carried out) in the main pipeline, the half pipeline starts to execute the second set of code pertaining to the branch: so in effect both results of the branch are being executed. Now once the branch comparison is carried out and you figure out what branch you really need, you have a mux-type structure transfer the "half" pipeline's content to the full pipeline to continue from there, without having to wait for the data to make its way all the way through the pipeline.
This is done on the Itanium instruction set.

Do you have any references for that? I'd like to read up on it.

Also, I don't understand how this can be in the instruction set (unless the instruction set expressly calls for hardware support of such a function). Can't this be implemented on any old instruction set? You'd just need a piece of hardware who takes care of managing the pipelines. Again, I'm sure it's not as simple as this, but would it be possible to implement something like that for x86?

The reason I ask is that it seems to me that this would probably be more beneficial to the Netburst architecture given P4's deep pipeline than to a CPU with a shallow pipeline like the Itanium.
 

FishTankX

Platinum Member
Oct 6, 2001
2,738
0
0
.....The itanium chooses a very intresting aproach to pipelines.

It just does both branches.

The itanium, when faced with a branch, will execute both. This means wasted execution power, but ensures it 100% branch prediction.