Linpack Challenge

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hyperlite

Diamond Member
May 25, 2004
5,664
2
76
Bingo! You got it. Your single-threaded performance jumped from 9.24GFlops to 10.76Gflops.

That makes the 40.34GFlops value with four-threads (which won't be affected by core affinity) represent a thread scaling of 3.75 right in line with lopri's results.

Super-linear speedup results explained, your single-threaded performance was sucking wind.

If you have the time/energy/desire would you mind re-running the 2 thread test with LinX affinity set to two cores, and again for the 3 thread test (affinity lock to 3 cores) for the sake of completion?

Cool. Yes i intend to do so, i'm working on a Toxicology final at the moment though so it may have to wait until tomorrow :D
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Take your time, your chip and my spreadsheet will still be here tomorrow after the exam. Good luck on that final!
 

Hyperlite

Diamond Member
May 25, 2004
5,664
2
76
Thanks! just finished it up. Epic, epic spreadsheet. :p

Affinity Set 0,1
2 Threads: 21.0649

0,1,2
3 Threads: 30.6025
 

dank69

Lifer
Oct 6, 2009
37,088
32,434
136
Updated i7 920 info added to my previous google doc.

I used problem size 16134, HT off and locked the threads to 1 core each. 5 passes each test.
 

Uppsala9496

Diamond Member
Nov 2, 2001
5,272
19
81
Anyone know how to install this on Ubuntu (9.1) for an AMD chip?
I attempted to install version 10.1something via the intel website, however it won't run on an AMD chip. I haven't had any luck figuring out how to install it for my X3.
 

lopri

Elite Member
Jul 27, 2002
13,310
687
126
Anyone know how to install this on Ubuntu (9.1) for an AMD chip?
I attempted to install version 10.1something via the intel website, however it won't run on an AMD chip. I haven't had any luck figuring out how to install it for my X3.
I see v10.2 here.

http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/

Or simply try the LinX which I linked in the first post. It's the same thing and you need not install it. (Just unzip in a folder and run the executable)
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Thanks! just finished it up. Epic, epic spreadsheet. :p

Affinity Set 0,1
2 Threads: 21.0649

0,1,2
3 Threads: 30.6025

Updated i7 920 info added to my previous google doc.

I used problem size 16134, HT off and locked the threads to 1 core each. 5 passes each test.

I've run it with varying thread counts, but with a fixed problem size of 15000. LinX assigned 1729MB of memory.

CPU: Phenom II 955 BE
Core Frequency: 3600 MHz
NB/Uncore Frequency: 2250 MHz
Memory Frequency: DDR2-900 (5-5-5-15)

Thread Count : GFlops

1 : 12.24 GFlops
2 : 23.96 GFlops
3 : 35.20 GFlops
4 : 45.73 GFlops

http://cid-17de86f1059defe0.skydriv...inpack Challenge/Perf^_per^_Thread^_955BE.jpg
(898MHz reported by cpu-z is due to C'nQ)

Gentlemen, I present you with our grand thread scaling overlay:

LinxScalingNehalemDenebKentsfield.png


Thanks to all your data I was all the better able to zero-in on a proper estimate of the parallel code percentage present in the LinX computations, coming in at around 98.9% (revised upwards).

Interestingly we see that the L3$ on Phenom II X4 does not appear to improve the thread scaling in this benchmark versus that of the Athlon II X4. While this is not surprising at first glance given that we are intentionally operating on matrix sizes that force the usage of paging to the ram on the other side of the memory controller it is interesting to see that the pre-fetch mechanisms in play for the L3$ on Phenom II do not yield an advantage in thread scaling per say (but it may in terms of absolute performance on an IPC/thread metric).

Lopri is there any chance we could convince you to reduce your clockspeed to 3.2GHz to match Hyperlite's Athlon II clockspeed and rerun the 1-4 threads?

dank69 I noticed in your spreadsheet you ran both cases for 1-thread (affinity locked to one core versus no core locking) and the result was virtually identical. I noticed this too on my Kentsfield. It seems the Athlon II performance gets hit hard by thread migration in this application whereas thread migration on the Intel architectures and the PhenomII are less sensitive (completely insensitive actually).

Also we see that the IMC makes all the difference, clear delineation between the Nehalem/Deneb versus Kentsfield there. The nehalem architecture does result in about 1/2 the interprocessor communication time in comparison to the Deneb architecture (or rather its IMC intricacies).

Presumably this performance difference is stemming from the added bandwidth of the triple-channel IMC on bloomfield. If one was curious to test this theory we would repeat the 1-4 thread test but remove the ram sticks from the third channel to intentionally reduce the bandwidth (and could do it yet again to reduce the bandwidth even further to single-channel).

Thanks guys for all your time and efforts in generating this data. Pretty cool results from a geeky computer-science perspective
icon14.gif
 
  • Like
Reactions: dank69

Tsavo

Platinum Member
Sep 29, 2009
2,645
37
91
48.3073 Gigaflops on Core i5 750 @ 3.2GHz.
6.2726 Gigaflops on my Athlon X2 4800+ (2.4 Ghz). :eek:
 

kerr

Junior Member
Apr 1, 2009
23
0
0
I made a run yesterday, I think with some tweaking I can get close to 70, but still, there is no way to compete withe the 6 core stuff..

13.png
 

Hyperlite

Diamond Member
May 25, 2004
5,664
2
76
Hey guys, i've been playing with my OC a little bit and am slightly intrigued by the results. I figured out this morning that i was FSB limited, my board doesn't like a tick over about 2470 FSB. I figured i was probably bumping up against ram limits at reasonable voltage as well @ 1646mhz, so i dropped my memory divider and FSB divider down to run mem @ 1360 and FSb @ ~2300mhz, bumped bus speed to 255 which put clock at 3315. It is stable here. I expected to take a big hit in linpack, but to my (albeit naive) surprise, i actually got about a .5 Gflop increase @ 4 threads. The moral of the story is that, at least on the athlon II platform, linpack is rather insensitive to bus and memory speeds. Up to a point, i'm sure, but i'm going to head on up to 3.4ghz and see what happens. I hear an aftermarket cooler calling to me in the distance though...
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Hyperlite, yeah even on my rig which showed the largest scaling impact from ram bandwidth wasn't so sensitive to the FSB, the ram bandwidth (and latency) itself was a big deal. I imagine if you tighten your timings or increase your ram speed you'll see more increases in GFlops.
 

Hyperlite

Diamond Member
May 25, 2004
5,664
2
76
Hyperlite, yeah even on my rig which showed the largest scaling impact from ram bandwidth wasn't so sensitive to the FSB, the ram bandwidth (and latency) itself was a big deal. I imagine if you tighten your timings or increase your ram speed you'll see more increases in GFlops.

Which is exactly what i'm seeing :)

8-8-8-24 peak: 40.90
8-8-8-22 peak: 41.14
8-8-8-20 peak: 40.83

(was 9-9-9 before)

out of 10 runs each. So for some reason it doesn't like 20 tRAS. going to give it another clock speed juice now, see if i can hold those timings.

The last time i seriously played with ram timings was on my barton...i think i was running 1-2-2-5 or something :D :D

edit:
3406_88822.jpg


this little athlon II is beginning to impress me. Not bad for a $450 system!
 
Last edited:

lopri

Elite Member
Jul 27, 2002
13,310
687
126
Lopri is there any chance we could convince you to reduce your clockspeed to 3.2GHz to match Hyperlite's Athlon II clockspeed and rerun the 1-4 threads?
I'm not sure if that's necessary. Looking at the Athlon II's results, actually I was thinking how useless the Phenom II's L3 is in this benchmark. I'm guessing Athlons will reach the same results as my 955 BE's @3.6 GHz. It's a stark contrast to Core 2 which seems to be hugely benefited by larger, faster L2. (E8400 vs. E5200, E6400)

There are some other theories that I'm thinking of, but I am not in a shape to spell them out at the moment, so I'll try to post later. Great job at the comparing the scaling results against the Amdahl's law. I've learned more about my CPU as well as others' thanks to your analysis.
 

Hyperlite

Diamond Member
May 25, 2004
5,664
2
76
I've learned more about my CPU as well as others' thanks to your analysis.

seconded!

I think i'm staying here:
3406mhz 1.45 vcore (bios) 262mhz bus
8-8-8-22 @ 1.8v (ram is rated to 1.85) 1396mhz
2096mhz HT
2358mhz FSB

going to 7-7-7 causes some seriously screwy side effects...windows just refuses to load correctly, services don't start, hangups, strange stuff. even at 1.85v.
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
I'm not sure if that's necessary. Looking at the Athlon II's results, actually I was thinking how useless the Phenom II's L3 is in this benchmark. I'm guessing Athlons will reach the same results as my 955 BE's @3.6 GHz. It's a stark contrast to Core 2 which seems to be hugely benefited by larger, faster L2. (E8400 vs. E5200, E6400)

There are some other theories that I'm thinking of, but I am not in a shape to spell them out at the moment, so I'll try to post later. Great job at the comparing the scaling results against the Amdahl's law. I've learned more about my CPU as well as others' thanks to your analysis.

By reducing your clockspeed to 3.2GHz we will be able to compare IPC of Phenom II vs. Athlon II. They might both deliver 3.75x scaling at 4 threads but the L3$ might result in the actual GFlops being 10% higher for the Phenom II.

We need you to run that test so we can deduce the impact of L3$ on actual performance. It is one thing to conclude the L3$ doesn't improve scaling, quite another to say it doesn't help IPC.

The single-thread result will be particularly interesting to me personally. But I understand if you don't want to monkey around with your rigs OC settings.
 

Mir96TA

Golden Member
Oct 21, 2002
1,950
37
91
Man My Workhorse Computer Suck Arse compare to my
HTPC AMD.
I understand Intel is ruuning lesser cores and running with lower Cache and Ram.
Any rate here it is
Linx.jpg

I might come back with OC version of Intel :D
 

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,250
136
Damn that pumps some heat....But it did complete without errors 79*c on both cores

e5200 @ 4.25ghz vcore 1.412 in bios 1.39 idle in OCCT and 1.38 loaded I'll add my data

LinX_Maxram.png


Now if I pick the 16134 problem size with 1999mb memory my system will break 26.1GFlops.

But it sucks my vcore down into the 1.36 maybe cpu starts to throttle as all speedstep options are enabled in bios....Don't need system to idle at 4.25ghz now do we :)

I'd add my results to your google chart....But without everybody using the same settings on it I think it's kinda worthless....No offense implied :)
 
Last edited:

lopri

Elite Member
Jul 27, 2002
13,310
687
126
jeesh I guess posting a new high does not mean much here,. lol
I didn't know what to say about your result. It's without a doubt the best performance I've ever seen from any quad-core CPU. System looks to be absolutely stable with fantastic temperatures. Considering there isn't much overclocking on 2S systems, I think your system may beat any 2S system commercially available today. You're literally running a next-gen CPU today!

P.S. I went ahead and entered your result to the data base. Hope you don't mind.
 

lopri

Elite Member
Jul 27, 2002
13,310
687
126
Here is 100 %Stock
Stock.jpg

And another look at how AMD screwed its NB with K10/K10.5.

CPU: Athlon X2 7750 BE
Core Frequency: 3200 MHz (1.325v, Stock Opteron HSF)
NB/Uncore Frequency: 2400 MHz (1.325V)
Memory Frequency: DDR2-1066 (5-5-5-15-2T)
Peak Performance: 20.24 GFlops



Edit: Didn't know I was running an older version of LinX. I don't think it'd make a difference but I guess I'll double check the channel log of the latest version.

I'm guessing they started with a grand plan but just couldn't make it within the thermal budget. My 955 BE's NB tops at 2250 MHz at its stock voltage (1.10V) and that's why I'm running it there despite that I know the performance will improve with the frequencies.

7750 BE's NB was running at the same voltage as its execution core (1.325V) for 2400 MHz and I almost felt like it's hotter than 955 BE @3.6 GHz. (Though to be fair 7750 was cooled by an Opteron 165 HSF)
 

lopri

Elite Member
Jul 27, 2002
13,310
687
126
Sorry for bumping this old thread but I ran into this and was impressed. 91 GFlops.