BOINC performance of Skylake-X / Skylake-EP?

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,328
4,913
136
Anyone seen any results in the wild? I keep asking in the CPU forum here and a few other sites but have had zero takers on my request to (ab)use their hardware. I'm interested in both production efficiency as well as thermal efficiency.

I'm planning on expanding my CPU capabilities in the fall once my utility rates drop into "not summer/peak" pricing and I have more thermal headroom to work with due to lower ambients... my GPU farm (not in sig) kind of limits that currently.

I've got a few more months to make decisions/comparisons, but I'm currently leaning towards just adding more Ryzen 1700 systems, as the efficiency and perf/$ are both hard to beat...
 

crashtech

Lifer
Jan 4, 2013
10,523
2,111
146
I'd be interested in knowing as well, although it looks like SKL-X is somewhat out of my budget range. Coffeelake will probably be the focus of my attention by the end of year.
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
I am not aware of any reports yet either, and haven't discovered a Skylake-X or -SP in any of the BOINC host lists yet.

Skylake-SP has been out in the wild longer than Skylake-X already (stated ServeTheHome for example), but only at certain selected large customers, pitting Skylake-SP in restricted use cases long before its actual launch. (Which hasn't happened yet, has it?)

From generic reviews of Skylake-X so far, I expect it to perform very similar to Broadwell-E if driven at same clocks, with respect to throughput as well as to energy efficiency. Some applications may benefit from increased L2 cache, but some more will suffer from smaller L3 cache with higher latency and different caching policy, compared to Broadwell-E.
--- Edit: There are several more architectural changes from BDW to SKL, but their effects seem to be minuscule according to what I have seen in BDW-E : SKL-X comparisons so far.

AFAIU AVX heavy applications may see a very slight performance uplift on Skylake-X compared to Broadwell-E, but they could gain more on i9-7900X and above if specifically optimized to take advantage of the additional AVX-512 execution port. Somebody with better insight into AVX programming correct me if I'm wrong.

AVX aside, my impression furthermore is that AMDs Zeppelin (currently only available as Ryzen) and Broadwell-E/EP have pretty much the same performance per Watt if driven in the 3...3.5 GHz range, and my guess is that Skylake-X/SP is identical in this range and with low to medium core count too. Efficiency at even lower clocks and/or with higher core count my be another matter again. I can hardly wait for independent reviews of Threadripper, Epyc, and Skylake-SP.

On the other hand, there is little reason to expect Skylake-SP to have anything but the unattractive price-performance ratio which we are accustomed to from current-generation Xeons. Worse, it seems there will be a lot less choice for single-socket Skylake-SP systems than for single-socket Broadwell-EP systems, which can be built with consumer-level mainboards.

Lastly, running Skylake-X at stock turbo clocks or even overclocked for all-core loads like most Distributed Computing applications are, seems to me to be of little practical value, due to the cooling problems (either delid, or live with absurd core temperatures) and due to the quickly degrading performance-per-watt ratio at increasing clock. Fun and challenging, but impractical and ultimately wasteful.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
Johan De Gelas and Ian Cutress pitted Sandy Bridge-EP, Broadwell-EP, Skylake-SP, and EPYC against each other. They hadn't too much time for machine-specific optimization, they used a compiler which wasn't the very latest version, and their multithreaded integer and floating point test cases scaled well with core count because of little to no synchronization overhead, if I understand correctly. These points are also true for most Distributed Computing applications.

On the other hand, a bunch of Ryzen 1700 PCs would beat all this shiny server stuff in Distributed Computing in performance-per-dollar, by far. Do AM4 BIOSes have good downclocking and undervolting support? If so, those Ryzen PCs could also match energy efficiency of the server gear. Only density would be lacking.
 

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,328
4,913
136
Johan De Gelas and Ian Cutress pitted Sandy Bridge-EP, Broadwell-EP, Skylake-SP, and EPYC against each other. They hadn't too much time for machine-specific optimization, they used a compiler which wasn't the very latest version, and their multithreaded integer and floating point test cases scaled well with core count because of little to no synchronization overhead, if I understand correctly. These points are also true for most Distributed Computing applications.

On the other hand, a bunch of Ryzen 1700 PCs would beat all this shiny server stuff in Distributed Computing in performance-per-dollar, by far. Do AM4 BIOSes have good downclocking and undervolting support? If so, those Ryzen PCs could also match energy efficiency of the server gear. Only density would be lacking.

This is pretty much what I have concluded, and why I have a *third* Ryzen 1700 CPU en route.

It is hard to beat a $269 R7 1700 + $80 B350 motherboard combination for performance per dollar. As you mention though, density is lacking. Not to mention the need for case, PSU, networking, etc. per system, which negates some of the cost advantage.

But, with the 1P Epyc 7401P coming in at $1075 with 24 cores/48 threads 2.8GHz all-core at 155W/170W TDP, it is looking like I may pick one up in the fall for DC usage.
neo8ent9f49z.jpg

This should give me enough time to also compare versus Threadripper offerings.
 
  • Like
Reactions: TennesseeTony

crashtech

Lifer
Jan 4, 2013
10,523
2,111
146
My guess is that Threadripper is going to be the new DC champion, perhaps not as cost effective as Ryzen, but the added density will bring power efficiency gains and lower the TCO. I probably won't be able to get into a Threadripper system until 2018, since the Folding race will be coming up in December and I want to have some 1070s or better on hand for that.
 

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,328
4,913
136
My guess is that Threadripper is going to be the new DC champion, perhaps not as cost effective as Ryzen, but the added density will bring power efficiency gains and lower the TCO. I probably won't be able to get into a Threadripper system until 2018, since the Folding race will be coming up in December and I want to have some 1070s or better on hand for that.

My last Ryzen 1700 build cost me $641 after rebates discounting the cost of the GPUs. This is with mobo, case, PSU, 16GB RAM, m.2 SSD, etc.

A 24c/48t Epyc 7401P costs $1075 suggested. This is a little over 3x the suggested $329 pricing of the R7 1700, but should offer at least 2.67x the performance at stock 155W TDP (2.8GHz ACT versus 3.15GHz with XFR) - up to 3x if there is some XFR headroom or if the 170W TDP mode = 3GHZ ACT (+XFR?).

So on a perf/W basis and density basis, Epyc will be superior while likely not costing much more than building 3 separate R7 1700 systems. Or at least, I am hoping.
 
  • Like
Reactions: crashtech

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
Re improved vector arithmetic support, from http://www.primegrid.com/forum_thread.php?id=7540&nowrap=true#109444 and follow-ups:
niktak11 said:
I saw that AVX512 should be on mainstream and server CPUs later this year. How much will that improve LLR runtime?
mackerel said:
AVX-512 is here now in Skylake-X. It will make exactly zero difference in the short term as we need software to be updated to use it. In theory it could offer doubling of performance per clock, but apparently the few things that actually use it really heats things up, so it might not be that simple. There is also talk that ram bandwidth limiting may be more of a problem than ever.
The theoretical upper bound of double the performance/clock is applicable only to the higher-end -X and -SP SKUs on which the second AVX-512 unit per core is enabled.
Iain Bethune said:
I understand George Woltman has access to a Xeon Phi KNL which also has AVX-512, so I think gwnum support (and by extension LLR) are on the way, but no idea when it will be ready.
 

Smoke

Distributed Computing Elite Member
Jan 3, 2001
12,649
198
106
Have you seen this article? http://wccftech.com/intel-launches-skylake-sp-purley-xeon/
For my GPU folding rigs I'll take an E-5 over an i7 any day. I'm interested on what kind of motherboard implementations come with AMD's upcoming ThreadRipper. Me wants MB with 64 PCIe 3.0 lanes as either four x16 3.0 slots or eight x8 3.0 slots!!!
http://wccftech.com/amd-ryzen-threadripper-1950x-cpu-performance-benchmarks-leak/

Welcome to the TeAm Forum! :)

If you are not already crunching one of our TeAm Projects, I'd like to formally invite you to join the friendliest DC TeAm in the universe. TeAm AnandTech :D

Let's hear it for Aurum! Hear! Hear! Aurum! Aurum! Aurum!

Pull up a comfortable chair and have your first beer on me! :beermug:
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,540
14,495
136
My guess is that Threadripper is going to be the new DC champion, perhaps not as cost effective as Ryzen, but the added density will bring power efficiency gains and lower the TCO. I probably won't be able to get into a Threadripper system until 2018, since the Folding race will be coming up in December and I want to have some 1070s or better on hand for that.
Well, my TR is churning POGS right now, so you can compare it to other rigs. Let me know what you think. Its only using 29 of the 32 threads though (2 1080TI's doing folding at the same time)
 

TennesseeTony

Elite Member
Aug 2, 2003
4,209
3,634
136
www.google.com
ThreadRipper, and XEON E5-2683-v3 compared, crunching POGS tasks that pay 99.06 points:
(all numbers are rounded up/down)

TR: 26.4 tasks per day per thread (32) multiplied by 99.06 points per task= 83,686 ppd.
2683-v3: 17.2 tasks per day per thread (28) multiplied by 99.06 points per= 47,707 ppd.
So, ThreadRipper wins by an astounding 43% on performance, but what about energy?

TR advertises 180w TDP, but Mark is OC'ing I think, so not sure. My XEONs' TDP is 120w but currently the four of them are running at 86w, 88w, 93w, 94w, according to CPUID's HardWare Monitor, so let's average that to 90.25watts. Being overclocked, I'll have to assume the TR is pulling the full 180 watts (or more) for the calculation.

And...I can't think right now, how to calculate that, lol, Lunch is making me sleepy I suppose, but 90.25 is almost half of 180, so you are getting 43% more performance for about double the energy used. Hmm. It's only fair to say though, that TR probably wins here too, because you have to factor in the power used by 16 sticks of RAM (instead of TR's 4-8 sticks), 4 motherboards/chipsets versus one, power losses of 4 PSU's instead of one, basically 4x more of everything (fans, SSD/HD, etc).

I'm not ready to upgrade, but when the time comes, the current choice is clear to me: TR.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,540
14,495
136
570 watts a6 7%CPU to 722 with 99% usage. I am only at 1.2 vcore, and stock is 1.15, so 152 watts more than essentially idle. 4 sticks of ram. 3.9 ghz sustained speed. 62c with an artic cooling 360 AIO.

Edit: and you saw me beating a 7900X OC'ed to 6 ghz on LN2 in cinebench15 ? I was only at 4.1 ghz on that run, so 2/3rds the speed, but 12 more threads. Any way you figure it, I am happy.

Edit 2 : totally idle (turned off F@H), 0% CPU 135 watts, but the 1080TI's take something at idle. Then JUST pogs, 308, so 173 watts total idle to full on the CPU that way,.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
@TennesseeTony, your idea of looking at durations of POGS tasks out of one credit category sounded like a good method of measurement at first. But once again it turns out that POGS performance is just silly random noise which tells little about real CPU performance.

Taking the topmost 20 tasks of the 99.06 points type, on my two clients with currently highest validated points:

2696-v4: average 6491 s/task, with 44 threads that's 586 tasks/d = 58,000 PPD
2690-v4: average 3975 s/task, with 28 threads = 609 tasks/d = 60,000 PPD

In saner projects, 2696-v4 is at least 1.3 times as fast as 2690-v4. Not slower. In projects which are not memory bandwidth constrained, the performance difference between the two is quite exactly according to core count and non-AVX turbo or AVX turbo.

Both are Linux hosts, same mainboard, same RAM, same OS. The only advantage that the 2690-v4 box has is a PCIe NVMe SSD as data disk, compared to a SATA AHCI SSD on the 2696-v4. But POGS is far from being disk I/O constrained; the disk LED hardly lights up at all.

Those 2x 20 tasks were downloaded on August 30,
2696-v4 at 23:32:26 UTC and 23:33:08 UTC,
2690-v4 at 22:36:23 UTC and 22:36:38 UTC.​
A possible explanation for the performance differences is that the tasks handed out at 22:36 had lower CPU demand than those from 23:32.

Regarding power use:

I wouldn't be surprised if power usage by Threadripper on POGS is lower than TDP too, even if somewhat overclocked. POGS doesn't pull much power on my Xeons either, hinting that this is unoptimized code.

Alas my Linux hosts lack a CPU power sensor application. The power meter which is in front of them (plus switch and other gear) shows that POGS pulls less than many other projects.

A Windows host with 2696-v4 is showing circa 125 W CPU power with POGS on all threads.
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
Or maybe POGS is heavily RAM bandwidth constrained.
Mark's TR: quad channel DDR4 3600 c16
My BDWs: quad channel DDR4 2400 c17
Tony's HSWs: t.b.d.
 

StefanR5R

Elite Member
Dec 10, 2016
5,498
7,786
136
YAFU's list of CPU models has an entry with "Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz [Family 6 Model 85 Stepping 4]".

The list says there are 34 dual-processor computers with this CPU participating, but this is certainly because of multiple client instances per host. Here are the corresponding host pages. From a quick glance, none of them have any current tasks listed. However, YAFU would be about the worst project to gauge actual CPU performance anyway.
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=20983
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=20984
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=20986
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=20987
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=20994
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=20995
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21001
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21004
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21007
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21010
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21011
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21012
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21017
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21020
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21022
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21026
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21035
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21037
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21039
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21040
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21041
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21044
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21045
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21048
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21050
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21051
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21056
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21060
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21062
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21069
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21070
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21077
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21079
https://yafu.myfirewall.org/yafu/show_host_detail.php?hostid=21083