A comparison of Intel IPC over the last 24 years

Hulk · Dec 4, 2013

I've been testing my computers over the years using an old benchmark called CPU99. Using my scores and others I have tried to arrive at an average value for each CPU core in the chart.

Keep in mind that this is only one old benchmark. It only test integer performance, it fits in the L2 cache, and only supports one core. These are good things in a way since it means the benchmark is nearly platform agnostic and therefore does a decent job of isolating the core.

In order to take MHz out of the comparison I am not showing raw CPUmark99 scores but rather CPUmark99/MHz of the processor. The chart shows how many MHz a given processor needs to earn a score of 1 CPUmark99. Of course lower scores are better.

A couple things are pretty interesting.
First, you can clearly see the wrong turn Intel makes with the P4. And in fact after making that wrong turn with Willamette they continue to dig a deeper hole with Northwood, Prescott, etc...

Second, there is a huge increase in IPC from P4 to Conroe and then it pretty much levels off. As we've been saying around here for quite some time now, there is only so much instruction level parallelism that can be exploited. That being said Haswell does make a nice little improvement from Ivy/Sandy.

It really seems like for significant performance increases we are going to need some combination of more clockspeed, more cores and better software to support them, and more specialized instructions and the software to support them.

So what do you think?

I also have some AMD data as well but it is not as well vetted. The later Athlons really were some great parts.

Soulkeeper · Dec 4, 2013

These kind of things interest me too
I hung onto an old dos app for measuring cache/mem latencies and ran it on many systems over the years. I don't have any info in a chart tho.
I really wish I would have had access to the source code so I could have updated it.

VirtualLarry · Dec 4, 2013

Nice graph. I would love to see something similar for AMD, and also for AMD and Intel's "little core" CPUs.

Centauri · Dec 4, 2013

Man, what a wild ride the Pentium 4 era was. Prescott on the same level as 486... yowza.

Hulk · Dec 4, 2013

VirtualLarry said:
Nice graph. I would love to see something similar for AMD, and also for AMD and Intel's "little core" CPUs.

I put up all the AMD data I have.

Homeles · Dec 4, 2013

I'd imagine most of the reason things stay flat from Conroe until Haswell is because the back end stays largely the same up until Haswell.

Yuriman · Dec 4, 2013

I'd like to see a graph of CPUMark99 per MHz (inverted).

Very interesting, I had no idea that Prescott's IPC was that incredibly poor - though most modern programs can use instructions that CPUMark99 doesn't have which would leave Prescott looking considerably better - the vast majority of its core wouldn't be idle.

Hulk · Dec 4, 2013

Homeles said:
I'd imagine must of the reason things stay flat from Conroe until Haswell is because the back end stays largely the same up until Haswell.

I was thinking the same thing.

ViRGE · Dec 5, 2013

Yuriman said:
I'd like to see a graph of CPUMark99 per MHz (inverted).

Very interesting, I had no idea that Prescott's IPC was that incredibly poor - though most modern programs can use instructions that CPUMark99 doesn't have which would leave Prescott looking considerably better - the vast majority of its core wouldn't be idle.

Prescott's pipeline was longer than Willamette's, so performance did drop a little on an IPC basis. However I think this benchmark is particularly brutal, as the average performance hit wasn't as bad as it is in CPUMark.

JimmiG · Dec 5, 2013

Interesting and amazing just how bad Netbust was when running software not optimized for the architecture, almost as bad as the 486.

Remember Intel originally promised that the Pentium 4 would reach 10 GHz before 2008 or something like that. Looking at the IPC difference between Conroe and Preschot, they pretty much delivered on that.

sefsefsefsef · Dec 5, 2013

It would be interesting to see similar data for a benchmark that is not cache resident. We would see a lot more variation going from Core2 -> Haswell. The problem is that as soon as your benchmark is no longer cache resident, then IPC becomes a function of clockspeed*, so it would be harder to make the comparison, without over/underclocking some of the samples.

* this is a point I've battled before with some of you, but it is indeed a fact that for non-cache resident workloads, IPC goes down as clockspeed goes up.

Lorne · Dec 5, 2013

Need the P-Pro in the list, It was Intels look back when designing the C2.

Hulk · Dec 5, 2013

Lorne said:
Need the P-Pro in the list, It was Intels look back when designing the C2.

Good point. Updated.

Ajay · Dec 5, 2013

Lorne said:
Need the P-Pro in the list, It was Intels look back when designing the C2.

Really? I thought it was the PIII M :\

SlowSpyder · Dec 5, 2013

I'm kind of surprised to see so much variance between the K6, K6-2, and K6-3. I thought those were more or less all the same CPU core, just with differing amounts of L2, SIMD instructions, and maybe built on a different process. It'd be interesting to see where Phenom and FX fit into the mix, too.

Regarding Intel, Prescott scored somewhere slower than the original Pentium but faster than the 486. Pretty crazy!

Lonyo · Dec 5, 2013

A couple things are pretty interesting.
First, you can clearly see the wrong turn Intel makes with the P4. And in fact after making that wrong turn with Willamette they continue to dig a deeper hole with Northwood, Prescott, etc...

Second, there is a huge increase in IPC from P4 to Conroe and then it pretty much levels off. As we've been saying around here for quite some time now, there is only so much instruction level parallelism that can be exploited. That being said Haswell does make a nice little improvement from Ivy/Sandy.

They took a design decision. It didn't pay off, but IPC dropped because they expected C to increase by more than it did. That's why such a measure on its own doesn't mean all that much. IPC is all well and good when you design for IPC efficiency, but some were designed for clock scaling, and you deliberately sacrifice IPC in order to get clockspeed scaling that you hope will exceed the IPC loss. Of course, that didn't happen, but it wasn't a "wrong turn" to sacrifice IPC at that time, based on their expectations.

Northwood clocked higher than Will, and Prescott clocked higher than NW, no wrong turn there, they did exactly what they were supposed to do. They clocked higher but with lower IPC, and a net (eventual) performance increase when the clockspeed was high enough.

Hulk · Dec 5, 2013

SlowSpyder said:
I'm kind of surprised to see so much variance between the K6, K6-2, and K6-3. I thought those were more or less all the same CPU core, just with differing amounts of L2 SIMD instructions, and maybe built on a different process. It'd be interesting to see where Phenom and FX fit into the mix, too.

Regarding Intel, Prescott scored somewhere slower than the original Pentium but faster than the 486. Pretty crazy!

If I had to guess the jump from K6 to K6-2 was the result of the execution engine being increased from 7 to 10 units and the FSB speed increasing from 66MHz to 100MHz. These CPUs generally had 512Kb of L2 cache on the motherboard and since CPUmark99 doesn't fit in the L1, going to the L2 produces a big hit in performance. Speed up L2 access and CPUmark99 performance increases.

For the K6-3 the L3 was 256Kb but it moved to the chip and ran at full speed. Another boost in CPUmark99 performance, and integer performance in general if you look at old Winstone benches.

Zodiark1593 · Dec 5, 2013

Seems my Westmere based i5 460m compares pretty well compared to the bridge due that succeeded it in performance.

SlowSpyder · Dec 5, 2013

Hulk said:
If I had to guess the jump from K6 to K6-2 was the result of the execution engine being increased from 7 to 10 units and the FSB speed increasing from 66MHz to 100MHz. These CPUs generally had 512Kb of L2 cache on the motherboard and since CPUmark99 doesn't fit in the L1, going to the L2 produces a big hit in performance. Speed up L2 access and CPUmark99 performance increases.

For the K6-3 the L3 was 256Kb but it moved to the chip and ran at full speed. Another boost in CPUmark99 performance, and integer performance in general if you look at old Winstone benches.

Ah, that makes sense. I'm so used to L3 being the cache that is sometimes there, sometimes not on AMD CPU's, I forgot that L2 was not on-CPU in K6 or K6-2 (and was also on die on the K6-2+).

Phynaz · Dec 5, 2013

IMHO using a 15 year old benchmark would skew the results. Many of the performance increases of modern processors are due to instructions that didn't exist 15 years ago.

We could go way back and use Landmark. http://dosbenchmark.wordpress.com/research/landmark/

Exophase · Dec 5, 2013

These test results look heavily dependent on L2 latency and bandwidth normalized per clock, assuming it really is totally cache resident for every single CPU tested. That's the most likely way to explain the variation from PPro to P2 and P3 Katmai (both had off-die cache at half speed) to P3 Coppermine and Tualatin (on-chip cache). Likewise, Prescott showed a big increase in L2 latency vs Northwood (http://ixbtlabs.com/articles2/rmma/rmma-p4.html)

AMD shows the exact same trend with improvements from K6-2 to K6-3 where it incorporated on-die L2 cache, which even outperforms K7 (initially off-die L2) which would almost never happen in real world tests. By the time you get to Athlon XP you're back to on-die L2 cache but the clock speed is so much higher that you inevitably end-up with higher cycle counts for latency.

This chart gives a little broader insight into L2 latency in clocks:

http://web.eece.maine.edu/~vweaver/cornell/fusion_machines/cache_summary.txt

Look how low it is for Pentium Pro.

As a rule of thumb I don't pay any attention to benchmark results at all if no one can even make a claim as to what the benchmark is doing.

Maximilian · Dec 5, 2013

I never realised just how crap Prescott was... I mean I knew IPC was down due to the pipeline lengthening and it ran hotter but wow, pretty lousy overall!

Hulk · Dec 5, 2013

Lonyo said:
They took a design decision. It didn't pay off, but IPC dropped because they expected C to increase by more than it did. That's why such a measure on its own doesn't mean all that much. IPC is all well and good when you design for IPC efficiency, but some were designed for clock scaling, and you deliberately sacrifice IPC in order to get clockspeed scaling that you hope will exceed the IPC loss. Of course, that didn't happen, but it wasn't a "wrong turn" to sacrifice IPC at that time, based on their expectations.

Northwood clocked higher than Will, and Prescott clocked higher than NW, no wrong turn there, they did exactly what they were supposed to do. They clocked higher but with lower IPC, and a net (eventual) performance increase when the clockspeed was high enough.

"Wrong turn" meaning an incorrect business decision.

Hulk · Dec 5, 2013

Phynaz said:
IMHO using a 15 year old benchmark would skew the results. Many of the performance increases of modern processors are due to instructions that didn't exist 15 years ago.

We could go way back and use Landmark. http://dosbenchmark.wordpress.com/research/landmark/

Like I wrote in the first post. "One old benchmark..."
It's hard to find a benchmark that is valid across 25 years of CPU's and has been run on all of them, not simulated or anything like that.
Just having some fun here😉

ninaholic37 · Dec 5, 2013

double post

A comparison of Intel IPC over the last 24 years

Diamond Member

Diamond Member

No Lifer

Golden Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Elite Member, Moderator Emeritus

Platinum Member

Senior member

Senior member

Diamond Member

Lifer

Lifer

Lifer

Diamond Member

Platinum Member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Golden Member