Linpack Challenge

Hyperlite · Dec 4, 2009

Idontcare said:
Bingo! You got it. Your single-threaded performance jumped from 9.24GFlops to 10.76Gflops.

That makes the 40.34GFlops value with four-threads (which won't be affected by core affinity) represent a thread scaling of 3.75 right in line with lopri's results.

Super-linear speedup results explained, your single-threaded performance was sucking wind.

If you have the time/energy/desire would you mind re-running the 2 thread test with LinX affinity set to two cores, and again for the 3 thread test (affinity lock to 3 cores) for the sake of completion?

Cool. Yes i intend to do so, i'm working on a Toxicology final at the moment though so it may have to wait until tomorrow

Idontcare · Dec 4, 2009

Take your time, your chip and my spreadsheet will still be here tomorrow after the exam. Good luck on that final!

Hyperlite · Dec 4, 2009

Thanks! just finished it up. Epic, epic spreadsheet.

Affinity Set 0,1
2 Threads: 21.0649

0,1,2
3 Threads: 30.6025

dank69 · Dec 4, 2009

Updated i7 920 info added to my previous google doc.

I used problem size 16134, HT off and locked the threads to 1 core each. 5 passes each test.

Uppsala9496 · Dec 4, 2009

Anyone know how to install this on Ubuntu (9.1) for an AMD chip?
I attempted to install version 10.1something via the intel website, however it won't run on an AMD chip. I haven't had any luck figuring out how to install it for my X3.

lopri · Dec 5, 2009

Uppsala9496 said:
Anyone know how to install this on Ubuntu (9.1) for an AMD chip?
I attempted to install version 10.1something via the intel website, however it won't run on an AMD chip. I haven't had any luck figuring out how to install it for my X3.

I see v10.2 here.

http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/

Or simply try the LinX which I linked in the first post. It's the same thing and you need not install it. (Just unzip in a folder and run the executable)

Idontcare · Dec 5, 2009

Hyperlite said:
Thanks! just finished it up. Epic, epic spreadsheet.

Affinity Set 0,1
2 Threads: 21.0649

0,1,2
3 Threads: 30.6025

dank69 said:
Updated i7 920 info added to my previous google doc.

I used problem size 16134, HT off and locked the threads to 1 core each. 5 passes each test.

lopri said:
I've run it with varying thread counts, but with a fixed problem size of 15000. LinX assigned 1729MB of memory.

CPU: Phenom II 955 BE
Core Frequency: 3600 MHz
NB/Uncore Frequency: 2250 MHz
Memory Frequency: DDR2-900 (5-5-5-15)

Thread Count : GFlops

1 : 12.24 GFlops
2 : 23.96 GFlops
3 : 35.20 GFlops
4 : 45.73 GFlops

http://cid-17de86f1059defe0.skydriv...inpack Challenge/Perf^_per^_Thread^_955BE.jpg
(898MHz reported by cpu-z is due to C'nQ)

Gentlemen, I present you with our grand thread scaling overlay:

Thanks to all your data I was all the better able to zero-in on a proper estimate of the parallel code percentage present in the LinX computations, coming in at around 98.9% (revised upwards).

Interestingly we see that the L3$ on Phenom II X4 does not appear to improve the thread scaling in this benchmark versus that of the Athlon II X4. While this is not surprising at first glance given that we are intentionally operating on matrix sizes that force the usage of paging to the ram on the other side of the memory controller it is interesting to see that the pre-fetch mechanisms in play for the L3$ on Phenom II do not yield an advantage in thread scaling per say (but it may in terms of absolute performance on an IPC/thread metric).

Lopri is there any chance we could convince you to reduce your clockspeed to 3.2GHz to match Hyperlite's Athlon II clockspeed and rerun the 1-4 threads?

dank69 I noticed in your spreadsheet you ran both cases for 1-thread (affinity locked to one core versus no core locking) and the result was virtually identical. I noticed this too on my Kentsfield. It seems the Athlon II performance gets hit hard by thread migration in this application whereas thread migration on the Intel architectures and the PhenomII are less sensitive (completely insensitive actually).

Also we see that the IMC makes all the difference, clear delineation between the Nehalem/Deneb versus Kentsfield there. The nehalem architecture does result in about 1/2 the interprocessor communication time in comparison to the Deneb architecture (or rather its IMC intricacies).

Presumably this performance difference is stemming from the added bandwidth of the triple-channel IMC on bloomfield. If one was curious to test this theory we would repeat the 1-4 thread test but remove the ram sticks from the third channel to intentionally reduce the bandwidth (and could do it yet again to reduce the bandwidth even further to single-channel).

Thanks guys for all your time and efforts in generating this data. Pretty cool results from a geeky computer-science perspective

Tsavo · Dec 6, 2009

48.3073 Gigaflops on Core i5 750 @ 3.2GHz.
6.2726 Gigaflops on my Athlon X2 4800+ (2.4 Ghz).

kerr · Dec 6, 2009

I made a run yesterday, I think with some tweaking I can get close to 70, but still, there is no way to compete withe the 6 core stuff..

Hyperlite · Dec 6, 2009

Hey guys, i've been playing with my OC a little bit and am slightly intrigued by the results. I figured out this morning that i was FSB limited, my board doesn't like a tick over about 2470 FSB. I figured i was probably bumping up against ram limits at reasonable voltage as well @ 1646mhz, so i dropped my memory divider and FSB divider down to run mem @ 1360 and FSb @ ~2300mhz, bumped bus speed to 255 which put clock at 3315. It is stable here. I expected to take a big hit in linpack, but to my (albeit naive) surprise, i actually got about a .5 Gflop increase @ 4 threads. The moral of the story is that, at least on the athlon II platform, linpack is rather insensitive to bus and memory speeds. Up to a point, i'm sure, but i'm going to head on up to 3.4ghz and see what happens. I hear an aftermarket cooler calling to me in the distance though...

Idontcare · Dec 6, 2009

Hyperlite, yeah even on my rig which showed the largest scaling impact from ram bandwidth wasn't so sensitive to the FSB, the ram bandwidth (and latency) itself was a big deal. I imagine if you tighten your timings or increase your ram speed you'll see more increases in GFlops.

Hyperlite · Dec 6, 2009

Idontcare said:
Hyperlite, yeah even on my rig which showed the largest scaling impact from ram bandwidth wasn't so sensitive to the FSB, the ram bandwidth (and latency) itself was a big deal. I imagine if you tighten your timings or increase your ram speed you'll see more increases in GFlops.

Which is exactly what i'm seeing

8-8-8-24 peak: 40.90
8-8-8-22 peak: 41.14
8-8-8-20 peak: 40.83

(was 9-9-9 before)

out of 10 runs each. So for some reason it doesn't like 20 tRAS. going to give it another clock speed juice now, see if i can hold those timings.

The last time i seriously played with ram timings was on my barton...i think i was running 1-2-2-5 or something

edit:

this little athlon II is beginning to impress me. Not bad for a $450 system!

lopri · Dec 6, 2009

Idontcare said:
Lopri is there any chance we could convince you to reduce your clockspeed to 3.2GHz to match Hyperlite's Athlon II clockspeed and rerun the 1-4 threads?

I'm not sure if that's necessary. Looking at the Athlon II's results, actually I was thinking how useless the Phenom II's L3 is in this benchmark. I'm guessing Athlons will reach the same results as my 955 BE's @3.6 GHz. It's a stark contrast to Core 2 which seems to be hugely benefited by larger, faster L2. (E8400 vs. E5200, E6400)

There are some other theories that I'm thinking of, but I am not in a shape to spell them out at the moment, so I'll try to post later. Great job at the comparing the scaling results against the Amdahl's law. I've learned more about my CPU as well as others' thanks to your analysis.

Hyperlite · Dec 6, 2009

lopri said:
I've learned more about my CPU as well as others' thanks to your analysis.

seconded!

I think i'm staying here:
3406mhz 1.45 vcore (bios) 262mhz bus
8-8-8-22 @ 1.8v (ram is rated to 1.85) 1396mhz
2096mhz HT
2358mhz FSB

going to 7-7-7 causes some seriously screwy side effects...windows just refuses to load correctly, services don't start, hangups, strange stuff. even at 1.85v.

Idontcare · Dec 6, 2009

lopri said:
I'm not sure if that's necessary. Looking at the Athlon II's results, actually I was thinking how useless the Phenom II's L3 is in this benchmark. I'm guessing Athlons will reach the same results as my 955 BE's @3.6 GHz. It's a stark contrast to Core 2 which seems to be hugely benefited by larger, faster L2. (E8400 vs. E5200, E6400)

There are some other theories that I'm thinking of, but I am not in a shape to spell them out at the moment, so I'll try to post later. Great job at the comparing the scaling results against the Amdahl's law. I've learned more about my CPU as well as others' thanks to your analysis.

By reducing your clockspeed to 3.2GHz we will be able to compare IPC of Phenom II vs. Athlon II. They might both deliver 3.75x scaling at 4 threads but the L3$ might result in the actual GFlops being 10% higher for the Phenom II.

We need you to run that test so we can deduce the impact of L3$ on actual performance. It is one thing to conclude the L3$ doesn't improve scaling, quite another to say it doesn't help IPC.

The single-thread result will be particularly interesting to me personally. But I understand if you don't want to monkey around with your rigs OC settings.

lopri · Dec 6, 2009

Hyperlite said:
3406mhz 1.45 vcore (bios) 262mhz bus
8-8-8-22 @ 1.8v (ram is rated to 1.85) 1396mhz
2096mhz HT
2358mhz FSB

What is it that you're referring to FSB?

Hyperlite · Dec 6, 2009

lopri said:
What is it that you're referring to FSB?

northbridge, sorry.

kerr · Dec 6, 2009

jeesh I guess posting a new high does not mean much here,. lol

Mir96TA · Dec 7, 2009

Man My Workhorse Computer Suck Arse compare to my
HTPC AMD.
I understand Intel is ruuning lesser cores and running with lower Cache and Ram.
Any rate here it is

I might come back with OC version of Intel

Mir96TA · Dec 7, 2009

Here is 100 %Stock

Hyperlite said:
any chance you could run that again using 2Gb of ram?

Here is 2 Gig with 4 Core

With MAx Mem and Quiet and Cool Enable

It is amazing, my HTPC has left my Intel in a DUST :twisted:

Kenmitch · Dec 8, 2009

Damn that pumps some heat....But it did complete without errors 79*c on both cores

e5200 @ 4.25ghz vcore 1.412 in bios 1.39 idle in OCCT and 1.38 loaded I'll add my data

Now if I pick the 16134 problem size with 1999mb memory my system will break 26.1GFlops.

But it sucks my vcore down into the 1.36 maybe cpu starts to throttle as all speedstep options are enabled in bios....Don't need system to idle at 4.25ghz now do we

I'd add my results to your google chart....But without everybody using the same settings on it I think it's kinda worthless....No offense implied

lopri · Dec 8, 2009

kerr said:
jeesh I guess posting a new high does not mean much here,. lol

I didn't know what to say about your result. It's without a doubt the best performance I've ever seen from any quad-core CPU. System looks to be absolutely stable with fantastic temperatures. Considering there isn't much overclocking on 2S systems, I think your system may beat any 2S system commercially available today. You're literally running a next-gen CPU today!

P.S. I went ahead and entered your result to the data base. Hope you don't mind.

lopri · Dec 8, 2009

Mir96TA said:
Here is 100 %Stock

And another look at how AMD screwed its NB with K10/K10.5.

lopri said:
CPU: Athlon X2 7750 BE
Core Frequency: 3200 MHz (1.325v, Stock Opteron HSF)
NB/Uncore Frequency: 2400 MHz (1.325V)
Memory Frequency: DDR2-1066 (5-5-5-15-2T)
Peak Performance: 20.24 GFlops

Edit: Didn't know I was running an older version of LinX. I don't think it'd make a difference but I guess I'll double check the channel log of the latest version.

I'm guessing they started with a grand plan but just couldn't make it within the thermal budget. My 955 BE's NB tops at 2250 MHz at its stock voltage (1.10V) and that's why I'm running it there despite that I know the performance will improve with the frequencies.

7750 BE's NB was running at the same voltage as its execution core (1.325V) for 2400 MHz and I almost felt like it's hotter than 955 BE @3.6 GHz. (Though to be fair 7750 was cooled by an Opteron 165 HSF)

lopri · Jun 10, 2010

Sorry for bumping this old thread but I ran into this and was impressed. 91 GFlops.

lopri · Jun 10, 2010

Strike that. 97 GFlops!

Linpack Challenge

Diamond Member

Elite Member

Diamond Member

Lifer

Diamond Member

Elite Member

Elite Member

Platinum Member

Junior Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Elite Member

Elite Member

Diamond Member

Junior Member

Golden Member

Golden Member

Diamond Member

Elite Member

Elite Member

Elite Member

Elite Member