A benchmark - SeventeenorBust

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
I run a distributed computing program called SeventeenorBust . It computes prime numbers using background CPU time.

Well, I recently had the ability to compare performance between my E2140 @ 3.2GHz, a P4 Northy @ 3.2Ghz, and an AMD64 S939 3800+ @ 2.4Ghz.

The C2D had scores of 7.1M (per core), the P4 had scores of 4M (configured the client for only one core, even though the CPU supports HT), and the AMD64 (running as an app, not configured as a service) scores 2.4-2.7M.

Note that the C2D has 1MB L2 shared between two cores, the P4 Northy has 512KB L2, and the AMD64 also has 512KB L2. So the performance differences are not likely to be due to differences in cache size.

Now, I thought that the AMD64 3800+ was supposed to be equivalent in performance to a theoretical P4 at 3800Mhz. I also thought that the C2D was supposed to be twice as fast at the same clock speed as a P4.

So in this case, neither of these is so. Discuss.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Well for prime searches you are strictly talking integer math...and the P4 was an integer monster with it's dual-execution for integers.

Here is an in-depth analysis of the integer vs. floating point comparisons of P4 vs. A64: (perhaps too much data there to readily digest)
http://freespace.virgin.net/roy.longbottom/cpuspeed.htm

Here's perhaps an easier-on-the-eyes anandtech graph from the past to show what this means:
http://www.anandtech.com/showdoc.aspx?i=1884&p=12

There were always these kinds of "corner cases" for the P4's architecture to shine, those double-clocked ALU's were built that way for a reason. (remember the ALU's for a 3.2GHz P4 are operating at 6.4GHz)

Oh, and in my book when it comes to processor performance 7.1 is ~2x more than 4, so I'd say the Core2 is spot on with expectation over a P4.
 

jones377

Senior member
May 2, 2004
466
68
91
It appears that the program uses SSE2 integer instructions. This would explain the large difference between Core2 and K8. Core2 supports about 3x throughput vs K8 in this particular case (3*128 bit vs 1*128bit instructions/clock). I must say, this is the only realworld example I have seen of this effect. The only other test I know of is a Sisoft Sandra subtest but that is just a synthetic benchmark. P4 also only does 1*128bit instruction/clock and I guess as long as the code does not include much in the way of branches, the higher clockspeed should win out. Linpack shows the same difference between P4 and K8 for floating point code.

edit: I should have been more clear. K8 and P4 does 2*64bit per clockcycle for SSEx integer code since it's executed in the float pipes. Core2 has 3 128-bit integer execution units in addition to the 2 128-bit float units. For regular x86 integer code using the normal registers K8 and Core2 should be much closer in performance (both having 3 64-bit (in 64-bit mode only) integer ALUs, but Core2 having the advantage of 4 vs 3 decoders among other things.

edit2: This shows you what I mean. Look at the benchmark at the bottom of that page. It is neither an error or bias on the part of Toms Hardware, it's simply perhaps the most striking difference between Core2 and K8 (and all other previous x86 processors for that matter including Intels own)
 

nerp

Diamond Member
Dec 31, 2005
9,865
105
106
Makes sense. When all Intel had to offer was the P4, it was basically understood that the AMD 64 chips were best for gaming and general use and the P4 was what you'd want if you did mostly encoding.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,587
10,225
126
I have another data point to report, and another conclusion. My friend's E5200 @ 3.75Ghz reports 6.7M per core. It seems that SB really likes memory bandwidth. That would explain why the AMD64 scores were so low (DDR1-400), and why the P4 Northy scores were so high (DDR2-7xx). The E5200 has a 300Mhz FSB and DDR2-600. My E2140s have a 400Mhz FSB and DDR2-800.
 

dbcooper1

Senior member
May 22, 2008
594
0
76
I just ran it for a few minutes on systems I have running:
P3 1.4GHz (512k cache version) .365M (PC133)
core2duo e6420 @ 3.20GHz 7.85M (DDR2@800)
e8500 @4.00 GHz 9.61M (DDR2@842)
 

jones377

Senior member
May 2, 2004
466
68
91
I tested it at 2,67GHz/333FSB and 2GHz/333FSB on my E6400 and near as I can tell the scaling is perfect with regards to clockspeed, which would indicate that it runs completely within the CPU cache, probably L1 cache (it's a very simple algoritm after all).

6.8M/s @ 2667MHz = 2.55M/GHz
5.1M/s @ 2000MHz = 2.55M/GHz