A comparison of Intel IPC over the last 24 years

Hulk

Diamond Member
Oct 9, 1999
5,103
3,629
136
I've been testing my computers over the years using an old benchmark called CPU99. Using my scores and others I have tried to arrive at an average value for each CPU core in the chart.

Keep in mind that this is only one old benchmark. It only test integer performance, it fits in the L2 cache, and only supports one core. These are good things in a way since it means the benchmark is nearly platform agnostic and therefore does a decent job of isolating the core.

In order to take MHz out of the comparison I am not showing raw CPUmark99 scores but rather CPUmark99/MHz of the processor. The chart shows how many MHz a given processor needs to earn a score of 1 CPUmark99. Of course lower scores are better.

A couple things are pretty interesting.
First, you can clearly see the wrong turn Intel makes with the P4. And in fact after making that wrong turn with Willamette they continue to dig a deeper hole with Northwood, Prescott, etc...

Second, there is a huge increase in IPC from P4 to Conroe and then it pretty much levels off. As we've been saying around here for quite some time now, there is only so much instruction level parallelism that can be exploited. That being said Haswell does make a nice little improvement from Ivy/Sandy.

It really seems like for significant performance increases we are going to need some combination of more clockspeed, more cores and better software to support them, and more specialized instructions and the software to support them.

So what do you think?

computeripc.jpg



I also have some AMD data as well but it is not as well vetted. The later Athlons really were some great parts.

amdipc.jpg
 
Last edited:
  • Like
Reactions: bsly1314

Soulkeeper

Diamond Member
Nov 23, 2001
6,731
155
106
These kind of things interest me too
I hung onto an old dos app for measuring cache/mem latencies and ran it on many systems over the years. I don't have any info in a chart tho.
I really wish I would have had access to the source code so I could have updated it.
 
Last edited:

VirtualLarry

No Lifer
Aug 25, 2001
56,570
10,203
126
Nice graph. I would love to see something similar for AMD, and also for AMD and Intel's "little core" CPUs.
 

Centauri

Golden Member
Dec 10, 2002
1,631
56
91
Man, what a wild ride the Pentium 4 era was. Prescott on the same level as 486... yowza.
 

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
I'd imagine most of the reason things stay flat from Conroe until Haswell is because the back end stays largely the same up until Haswell.
 
Last edited:

Yuriman

Diamond Member
Jun 25, 2004
5,530
141
106
I'd like to see a graph of CPUMark99 per MHz (inverted).

Very interesting, I had no idea that Prescott's IPC was that incredibly poor - though most modern programs can use instructions that CPUMark99 doesn't have which would leave Prescott looking considerably better - the vast majority of its core wouldn't be idle.
 

Hulk

Diamond Member
Oct 9, 1999
5,103
3,629
136
I'd imagine must of the reason things stay flat from Conroe until Haswell is because the back end stays largely the same up until Haswell.

I was thinking the same thing.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
I'd like to see a graph of CPUMark99 per MHz (inverted).

Very interesting, I had no idea that Prescott's IPC was that incredibly poor - though most modern programs can use instructions that CPUMark99 doesn't have which would leave Prescott looking considerably better - the vast majority of its core wouldn't be idle.
Prescott's pipeline was longer than Willamette's, so performance did drop a little on an IPC basis. However I think this benchmark is particularly brutal, as the average performance hit wasn't as bad as it is in CPUMark.
 

JimmiG

Platinum Member
Feb 24, 2005
2,024
112
106
Interesting and amazing just how bad Netbust was when running software not optimized for the architecture, almost as bad as the 486.

Remember Intel originally promised that the Pentium 4 would reach 10 GHz before 2008 or something like that. Looking at the IPC difference between Conroe and Preschot, they pretty much delivered on that.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
It would be interesting to see similar data for a benchmark that is not cache resident. We would see a lot more variation going from Core2 -> Haswell. The problem is that as soon as your benchmark is no longer cache resident, then IPC becomes a function of clockspeed*, so it would be harder to make the comparison, without over/underclocking some of the samples.

* this is a point I've battled before with some of you, but it is indeed a fact that for non-cache resident workloads, IPC goes down as clockspeed goes up.
 

Lorne

Senior member
Feb 5, 2001
873
1
76
Need the P-Pro in the list, It was Intels look back when designing the C2.
 

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,002
126
I'm kind of surprised to see so much variance between the K6, K6-2, and K6-3. I thought those were more or less all the same CPU core, just with differing amounts of L2, SIMD instructions, and maybe built on a different process. It'd be interesting to see where Phenom and FX fit into the mix, too.

Regarding Intel, Prescott scored somewhere slower than the original Pentium but faster than the 486. Pretty crazy!
 
Last edited:

Lonyo

Lifer
Aug 10, 2002
21,938
6
81
A couple things are pretty interesting.
First, you can clearly see the wrong turn Intel makes with the P4. And in fact after making that wrong turn with Willamette they continue to dig a deeper hole with Northwood, Prescott, etc...

Second, there is a huge increase in IPC from P4 to Conroe and then it pretty much levels off. As we've been saying around here for quite some time now, there is only so much instruction level parallelism that can be exploited. That being said Haswell does make a nice little improvement from Ivy/Sandy.

They took a design decision. It didn't pay off, but IPC dropped because they expected C to increase by more than it did. That's why such a measure on its own doesn't mean all that much. IPC is all well and good when you design for IPC efficiency, but some were designed for clock scaling, and you deliberately sacrifice IPC in order to get clockspeed scaling that you hope will exceed the IPC loss. Of course, that didn't happen, but it wasn't a "wrong turn" to sacrifice IPC at that time, based on their expectations.

Northwood clocked higher than Will, and Prescott clocked higher than NW, no wrong turn there, they did exactly what they were supposed to do. They clocked higher but with lower IPC, and a net (eventual) performance increase when the clockspeed was high enough.
 

Hulk

Diamond Member
Oct 9, 1999
5,103
3,629
136
I'm kind of surprised to see so much variance between the K6, K6-2, and K6-3. I thought those were more or less all the same CPU core, just with differing amounts of L2 SIMD instructions, and maybe built on a different process. It'd be interesting to see where Phenom and FX fit into the mix, too.

Regarding Intel, Prescott scored somewhere slower than the original Pentium but faster than the 486. Pretty crazy!

If I had to guess the jump from K6 to K6-2 was the result of the execution engine being increased from 7 to 10 units and the FSB speed increasing from 66MHz to 100MHz. These CPUs generally had 512Kb of L2 cache on the motherboard and since CPUmark99 doesn't fit in the L1, going to the L2 produces a big hit in performance. Speed up L2 access and CPUmark99 performance increases.

For the K6-3 the L3 was 256Kb but it moved to the chip and ran at full speed. Another boost in CPUmark99 performance, and integer performance in general if you look at old Winstone benches.
 

Zodiark1593

Platinum Member
Oct 21, 2012
2,230
4
81
Seems my Westmere based i5 460m compares pretty well compared to the bridge due that succeeded it in performance.
 

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,002
126
If I had to guess the jump from K6 to K6-2 was the result of the execution engine being increased from 7 to 10 units and the FSB speed increasing from 66MHz to 100MHz. These CPUs generally had 512Kb of L2 cache on the motherboard and since CPUmark99 doesn't fit in the L1, going to the L2 produces a big hit in performance. Speed up L2 access and CPUmark99 performance increases.

For the K6-3 the L3 was 256Kb but it moved to the chip and ran at full speed. Another boost in CPUmark99 performance, and integer performance in general if you look at old Winstone benches.


Ah, that makes sense. I'm so used to L3 being the cache that is sometimes there, sometimes not on AMD CPU's, I forgot that L2 was not on-CPU in K6 or K6-2 (and was also on die on the K6-2+).
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
These test results look heavily dependent on L2 latency and bandwidth normalized per clock, assuming it really is totally cache resident for every single CPU tested. That's the most likely way to explain the variation from PPro to P2 and P3 Katmai (both had off-die cache at half speed) to P3 Coppermine and Tualatin (on-chip cache). Likewise, Prescott showed a big increase in L2 latency vs Northwood (http://ixbtlabs.com/articles2/rmma/rmma-p4.html)

AMD shows the exact same trend with improvements from K6-2 to K6-3 where it incorporated on-die L2 cache, which even outperforms K7 (initially off-die L2) which would almost never happen in real world tests. By the time you get to Athlon XP you're back to on-die L2 cache but the clock speed is so much higher that you inevitably end-up with higher cycle counts for latency.

This chart gives a little broader insight into L2 latency in clocks:

http://web.eece.maine.edu/~vweaver/cornell/fusion_machines/cache_summary.txt

Look how low it is for Pentium Pro.

As a rule of thumb I don't pay any attention to benchmark results at all if no one can even make a claim as to what the benchmark is doing.
 
Last edited:

Maximilian

Lifer
Feb 8, 2004
12,604
15
81
I never realised just how crap Prescott was... I mean I knew IPC was down due to the pipeline lengthening and it ran hotter but wow, pretty lousy overall!
 

Hulk

Diamond Member
Oct 9, 1999
5,103
3,629
136
They took a design decision. It didn't pay off, but IPC dropped because they expected C to increase by more than it did. That's why such a measure on its own doesn't mean all that much. IPC is all well and good when you design for IPC efficiency, but some were designed for clock scaling, and you deliberately sacrifice IPC in order to get clockspeed scaling that you hope will exceed the IPC loss. Of course, that didn't happen, but it wasn't a "wrong turn" to sacrifice IPC at that time, based on their expectations.

Northwood clocked higher than Will, and Prescott clocked higher than NW, no wrong turn there, they did exactly what they were supposed to do. They clocked higher but with lower IPC, and a net (eventual) performance increase when the clockspeed was high enough.


"Wrong turn" meaning an incorrect business decision.
 

Hulk

Diamond Member
Oct 9, 1999
5,103
3,629
136
IMHO using a 15 year old benchmark would skew the results. Many of the performance increases of modern processors are due to instructions that didn't exist 15 years ago.

We could go way back and use Landmark. http://dosbenchmark.wordpress.com/research/landmark/


Like I wrote in the first post. "One old benchmark..."
It's hard to find a benchmark that is valid across 25 years of CPU's and has been run on all of them, not simulated or anything like that.
Just having some fun here;)