Discussion EPYC builders thread

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
So now that I have 2 boxes, I thought I would start a builders thread, and also pass along what I have learned.

First, They are are cheap, and they are a different animal to build (see below)

OK, so first, I got these 2 7551 ES chips for $300 each, I thought, well that is cheap for a 32 core CPU, and with 8 channel memory they won't be handicapped like my 2990wx's.

Now to the bad... First they were only compatible with ONE motherboard that was available to buy, and with a BIOS that I had to fight with Gigabyte to provide me with ! (not available online)
So I tried several other motherboards and different BIOS's since I could not even find this one online. FINALLY I found the board at $470. I updated the BIOS (odd way, boot into motherboards default stream environment (or something like that) and do a DOS bios update. Well, it worked. Then while trying to install linux mint 19.2, no internet ???? Oh, I did not read the fine print. No ethernet that goes outside the local lan. Only SFP+ (whatever that is), so after some reasearch, I find the adapter for $13 more.

OK fast forward to when I now have an install and updated with ethernet. This 32 core chip only runs turbo all core at 1.6 ghz !!! So I spent almost $500 on the motherboard, and for 128 gig of ECC registered (the cheapest I could find@64 gig on 8 sticks or more) at another $530 and a $300 CPU I have $1330 invested in a 1600 mhz 32 core box... Picture:
JXvDun7.jpg


And before I got this working, one of the motherboards I tried (dual socket) that would not work with these ES chips, I found 2 7601 used retail chips for $1750 and a $650 motherboard. Now this one needed 16 sticks so that was $1100. So now I am in this one $3500 with no SSD, case or PSU. Well, I get it running, and it only boosts all core to 2.6 ghz. But at least thats a far cry to 1600 mhz.

Here:
5Gr0BgB.jpg


Bottom line ? The new 3960x and 3970x are not overpriced. At the same speed they are almost twice as fast. And don't venture into EPYC chips unless you have wads of cash to blow and learn. I bought another motherboard for my 2nd 7551 ES chip, so now I have to build that box too !

But I will soon have 3 server grade ECC memory chip boxes with 128 cores and 256 threads for almost $6000. I can't even get video cards to fit in these, as the memory and heatsinks are in the way ! I have to use the on-board video.
 
Last edited:

Trotador22

Junior Member
Dec 5, 2019
22
3
81
Thanks @StefanR5R for the data, very useful. I've read somewhere that AMD do not want to state all core boost values for Zen2 and has rewritten the max boost frequency notes to indicate it is just for a single core.

@Markfw, your CPU is a little mystery, which motherboard are you using so we can see where to have a look to get the cTDP value?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
Thanks @StefanR5R for the data, very useful. I've read somewhere that AMD do not want to state all core boost values for Zen2 and has rewritten the max boost frequency notes to indicate it is just for a single core.

@Markfw, your CPU is a little mystery, which motherboard are you using so we can see where to have a look to get the cTDP value?
EPYCD8-2T
 

Trotador22

Junior Member
Dec 5, 2019
22
3
81
In BIOS: Advanced -> CPU Configuration -> cTDP Control

It should be set to "Auto", you can try 240 and to check if it has some effect. Based on the clock stretch issue, through some kind of benchmark, or crunching some WCG Cancer units.
 

zrav

Junior Member
Nov 11, 2017
20
21
51
FWIW: We received our 7702p server at work and put it to work as a build machine (we do Java development). Compared to our old 7501, builds take around 25% less time to complete, for the same number of threads used. And we have twice as many of those :)
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
Thanks @StefanR5R for the data, very useful. I've read somewhere that AMD do not want to state all core boost values for Zen2 and has rewritten the max boost frequency notes to indicate it is just for a single core.

@Markfw, your CPU is a little mystery, which motherboard are you using so we can see where to have a look to get the cTDP value?
Its more that they can't. People really really really need to get away from treating boost behavior of Zen2 like any other cpu. It will run within whatever levels that its thinks its safe within its power envelope. That said much like Threadripper I wouldn't assume that all core turbo is going to be much outside base clock and infact on EPYC I wouldn't assume it is anything but base clock. Take the 3990 on this site into consideration ~90w for uncore, 3.2w per core, pushed to 280w. Assuming that Mark for example has a 225w CPU, and maybe softening the uncore to 70w (which it might still be at 90w, but lets give it some wiggle room). That gives Mark 2.4w per core, 2.1 if the uncore is still 90w. That can easily account for him running that low.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
FWIW: We received our 7702p server at work and put it to work as a build machine (we do Java development). Compared to our old 7501, builds take around 25% less time to complete, for the same number of threads used. And we have twice as many of those :)

Dang bro, :eek:
mvn clean package -T 128

We also have a large CI/CD infrastructure which relies mainly on docker images. Should be a straight forward move to EPYC.
Our CI/CD infra took big hits from all these mitigations because we use standardized linux images to deploy across all our infra. Devs are complaining big time :D
One of our workflows uses OpenEmbedded and a full build on a single CI job (usually scheduled on a full node based on Intel Xeon 16 Core) takes about 2.5 - 3 hours.
Our guys are requesting quotations from our preferred vendor and they are evaluating EPYC for our next upgrade cycle for our CI/CD Infra. But it will take a while until they finalize something, so until end of the year. By then hopefully we can get some EPYC Milan stuffs, fingers crossed.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
Dang bro, :eek:


We also have a large CI/CD infrastructure which relies mainly on docker images. Should be a straight forward move to EPYC.
Our CI/CD infra took big hits from all these mitigations because we use standardized linux images to deploy across all our infra. Devs are complaining big time :D
One of our workflows uses OpenEmbedded and a full build on a single CI job (usually scheduled on a full node based on Intel Xeon 16 Core) takes about 2.5 - 3 hours.
Our guys are requesting quotations from our preferred vendor and they are evaluating EPYC for our next upgrade cycle for our CI/CD Infra. But it will take a while until they finalize something, so until end of the year. By then hopefully we can get some EPYC Milan stuffs, fingers crossed.
16 core Xeon ? wow... You really need them to step it up, the 64 core EPYC would take tat time down to about 30-40 minutes.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Today I plugged together:
  • Supermicro H11DSi rev 2.00
  • 2x EPYC 7452 (32 cores each, 155 W default TDP)
  • 16x 16 GB DDR4-3200c22 reg/ecc
  • a spare 240 GB SandDisk SATA SSD; need to replace it with something larger eventually
  • 750 W platinum-rated PSU of beQuiet
  • 2x Noctua NH-U14S TR4-SP3
  • a 40mm Noctua fan laid on top of the VRM heatsink
  • everything resting on the desk on a piece of cardboard --- not in a case yet :-)
img_2046.jpg

Observations:
  • I forgot to put a dedicated powermeter in front of it. I think it is drawing ~400 W, currently warming up with Cosmology@home on all threads.
  • My Xeon-based crunchers are cooled by NH-D15S dual-tower coolers, and I was afraid that the NH-U14S single-tower cooler would require higher fan speeds. So far it doesn't.
  • Unlike my other Supermicro mainboards (LGA1150 and LGA2011-3 based ones), the H11DSi features Supermicro's infamous problem with slow-spinning fans: The BIOS misdetects fan failure all the time and periodically spins the fans up, then down again.
    My workaround:
    Use a triple fan adapter to plug the small and faster spinning 40 mm fan and the two slow spinning 140 mm fans into a single fan port, and show only the RPM signal of the 40 mm fan to the board.
  • I expected having to enter ADMIN:ADMIN into the iKVM login. I was mistaken.
    Solution: BMC Unique Password Security Feature, November 2019
    (Username is still ADMIN, but the password is printed on a sticker next to CPU2.)
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
Today I plugged together:
  • Supermicro H11DSi rev 2.00
  • 2x EPYC 7452 (32 cores each, 155 W default TDP)
  • 16x 16 GB DDR4-3200c22 reg/ecc
  • a spare 240 GB SandDisk SATA SSD; need to replace it with something larger eventually
  • 750 W platinum-rated PSU of beQuiet
  • 2x Noctua NH-U14S TR4-SP3
  • a 40mm Noctua fan laid on top of the VRM heatsink
  • everything resting on the desk on a piece of cardboard --- not in a case yet :-)
View attachment 18806

Observations:
  • I forgot to put a dedicated powermeter in front of it. I think it is drawing ~400 W, currently warming up with Cosmology@home on all threads.
  • My Xeon-based crunchers are cooled by NH-D15S dual-tower coolers, and I was afraid that the NH-U14S single-tower cooler would require higher fan speeds. So far it doesn't.
  • Unlike my other Supermicro mainboards (LGA1150 and LGA2011-3 based ones), the H11DSi features Supermicro's infamous problem with slow-spinning fans: The BIOS misdetects fan failure all the time and periodically spins the fans up, then down again.
    My workaround:
    Use a triple fan adapter to plug the small and faster spinning 40 mm fan and the two slow spinning 140 mm fans into a single fan port, and show only the RPM signal of the 40 mm fan to the board.
  • I expected having to enter ADMIN:ADMIN into the iKVM login. I was mistaken.
    Solution: BMC Unique Password Security Feature, November 2019
    (Username is still ADMIN, but the password is printed on a sticker next to CPU2.)
Cool ! I have been thinking about those, much cheaper than the 7742, and they have a higher clock.
Where did you find the 3200 ecc ram ? I could only find 2666

Edit: I just found these. Would be $1400 for 256 gig, not bad.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
I got Kingston KSM32RS4/16MEI. I originally planned for 16x 8 GB, but that wasn't in stock at 3200 speed. Given that some DC applications tend to increased memory footprints (most recent example: Rosetta), I am glad I went for 16x 16 GB.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
I got Kingston KSM32RS4/16MEI. I originally planned for 16x 8 GB, but that wasn't in stock at 3200 speed. Given that some DC applications tend to increased memory footprints (most recent example: Rosetta), I am glad I went for 16x 16 GB.
Can you give me a link to where you got it ?
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Current core clocks under all core load are around 2.95 GHz.

For the time being, I don't have workloads which require high clocks. A good ratio of throughput to energy use is what I am primarily after, combined with a certain limit to cost of purchase.

On another note, this system is definitely not designed to sit there on a table without case. Due to no airflow at all, the BMC, the little Gbit Ethernet chip, and the DIMMs easily get over 60°C warm. But any component within a light airflow from a quiet fan placed on the side cools down into mid-40ies temperatures right away.

I plan to put this into a compact ATX tower case, but need to modify the motherboard tray of the case for SSI-EEB mounts first.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Oh, and something else which may be worth of note is that the OS installation went through at first attempt without any glitch. No need to specify any arcane kernel command line parameters, install 3rd party tools, or whatever. I chose openSUSE Leap 15.1, which was released in May 2019. I installed with internet access enabled though, i.e. whichever packages on the installation medium were outdated were skipped in favor of up-to-date packages from the internet during installation. This may have factored in to this painless process.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Today I plugged together:
  • Supermicro H11DSi rev 2.00
  • 2x EPYC 7452 (32 cores each, 155 W default TDP)
  • 16x 16 GB DDR4-3200c22 reg/ecc
  • a spare 240 GB SandDisk SATA SSD; need to replace it with something larger eventually
  • 750 W platinum-rated PSU of beQuiet
Here are at-the-wall power consumption figures (taken with a comparably cheap powermeter which, according to reviews, is pretty precise – Brennenstuhl PM 231 E):
  • computer powered down, PSU in standby, feeding the still operational BMC:
    ≈4 W
  • idle KDE Plasma desktop through the BMC's VGA:
    ≈90 W
  • very light load such as booting up, checking updates and such:
    ≈135 W with occasional ≈200 W during boot
  • AVX2/FMA3 load, PrimeGrid SGS-LLR (1 MByte FFT size), 128 simultaneous single-threaded tasks:
    =312 W
That is, total system power consumption, including losses in the power supply, merely matches the sum of TDPs of the two processors (with BIOS set to "auto" TDP, which is 155 W according to AMD's specification).

I now replaced the beQuiet P11-750 power supply (which is rated for 25 A on the 12 V rail which feeds the ATX connector, and 25 A on the 12 V rail which feeds both P8 CPU connectors) by a beQuiet P11-550 (rated for 20 A and 20 A on the corresponding rails).

The change from the 750 W rated PSU to the 550 W rated one of the same series didn't change any of the above figures, except for circa 1 W lower consumption at idle desktop (estimated from looking at the instantaneous power meter readings; I did not take averages over a longer period for the idle state).

Edit,
  • AVX2/FMA3 load, PrimeGrid SGS-LLR (1 MByte FFT size), 64 simultaneous dual-threaded tasks:
    =340...347 W
So this config uses the CPU's power budget fully (but accomplishes less work per Joule). Note, unlike Ryzen/Matisse, EPYC/Rome's sustained power limit is the same as the TDP.
 
Last edited:

Trotador22

Junior Member
Dec 5, 2019
22
3
81
Did you check the during the AVX2/FMA3 load test the CPU's frequency and CPUpwm temperature sensor?

I'd would like to know the same results but setting cTDP to 180watts in BIOS, if you have spare time and want to dedicate the time.

Do you intend to dedicate it to distributed computing applications, BOINC, Folding..?
 

moinmoin

Diamond Member
Jun 1, 2017
4,934
7,619
136
AVX2/FMA3 load, PrimeGrid SGS-LLR (1 MByte FFT size), 128 simultaneous single-threaded tasks:
=312 W
(...)
AVX2/FMA3 load, PrimeGrid SGS-LLR (1 MByte FFT size), 64 simultaneous dual-threaded tasks:
=340...347 W
So this config uses the CPU's power budget fully (but accomplishes less work per Joule).
That's really interesting that that makes a difference in power usage.
Looking it up it seems PrimeGrid takes much longer when multithreading as long as the task fits in L3 cache. The difference only disappears once L3 cache is saturated. So maybe you can toy with bigger FFT sizes? Seems there is no job that could saturated the L3 cache in Zen 2 though?
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
That's really interesting that that makes a difference in power usage.
Yes, the difference surprised me too. It's repeatable though. Very similarly, 64 single-threaded tasks pull =297 W, 32 dual-threaded tasks ≈335 W.

Looking it up it seems PrimeGrid takes much longer when multithreading as long as the task fits in L3 cache. The difference only disappears once L3 cache is saturated.
Correct; multithreaded LLR decreases throughput, and throughput-per-Watt compared to single-threaded LLR, as long as there is enough cache.

So maybe you can toy with bigger FFT sizes? Seems there is no job that could saturated the L3 cache in Zen 2 though?
Rome is a cluster of 4c/ 8t/ 16MB L3$ core complexes, of course. It will be interesting to see how it fares once individual tasks exceed the size of one complex.

A highly threaded application which makes good use of AVX2 vector hardware is Folding@home's FahCore_a7. This application is based on Gromax. It has already been seen to show mediocre to poor performance on Naples and Rome, compared with dual-processor Broadwell-EP. Spanning a single FahCore_a7 process across all hardware threads of both sockets works extremely well with this application on BDW-EP. I am in the process to try different FahCore_a7 runs on Rome, but first impressions show what I already heard from Mark: It's not running as well on Epyc as it does on Intel based servers.

(Edit: Folding@home performance is impossible to measure currently, due to extreme variations of TPF and PPD between work units.)

(Edit 2: Folding@home performance during the last several hours appeared comparable on dual-Rome and dual-Broadwell-EP, and more energy-efficient on dual-Rome; but it is not feasible to really measure this for now.)

(Edit 3: I am still not up to the difficult task to properly measure FahCore_a7 performance. But from what I have seen so far, per-core per-clock performance of Rome should easily match but more probably exceed BDW-EP's in this one, and scaling over all threads and both sockets works just as well as on my dual BDW-EP computers. Meaning, total FahCore_a7 performance on the dual EPYC is considerably higher because of its higher core count combined with very good scaling of this application, all the while the dual 32c EPYC draws only about as much power as a dual 14c BDW-EP.)

Did you check the during the AVX2/FMA3 load test the CPU's frequency and CPUpwm temperature sensor?
Alas not. With 64 single-threaded SGS-LLR jobs, the core clocks were around 2.4 GHz. I did not check temperatures.

At the moment, I have a single 128-threaded FahCore_a7 process running (avx_256 enabled, load average is indeed ≈128). Core clocks, from a single observation, are 2.47...2.82 GHz with a median of 2.49 GHz. Temperatures are
  • 25 °C ambient at the air intake,
  • 52 °C and 49 °C at the CPUs (no doubt because I applied thermal paste somewhat inconsistently),
  • 50...69 °C at VRMs.
Screenshot_20200405_161622.png

The BMC is showing 100 °C as critical VRM temperature threshold, and 105 °C as nonrecoverable VRM temperature threshold.

I'd would like to know the same results but setting cTDP to 180watts in BIOS, if you have spare time and want to dedicate the time.
In May, TeAm AnandTech will be taking part in the BOINC Pentathlon. This will probably be the moment for me to try a higher cTDP. ;-)

Do you intend to dedicate it to distributed computing applications, BOINC, Folding..?
Not exclusively, but so far I presume that this computer will spend more time at the Distributed Computing hobby than at work.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Here are performance data of a bioinformatics application for Gene Regulatory Networks (GRN) expansion: "gene@home PC-IM 1.10 x86_64-pc-linux-gnu", currently hosted at TN-Grid's BOINC project:

host hardwarecore clockper-thread performanceper-host performancesystem power drawpower efficiency
dual 14c/28t Broadwell-EP
3.2 GHz​
4.88 GFLOPS​
273 GFLOPS​
≈350 W​
≈0.78 GFLOPS/W​
dual 22c/44t Broadwell-EP
2.8 GHz​
4.28 GFLOPS​
377 GFLOPS​
≈390 W​
≈0.97 GFLOPS/W​
dual 32c/64t Rome*
≈2.74 GHz​
4.80 GFLOPS​
614 GFLOPS​
≈315 W​
≈1.95 GFLOPS/W​
dual 32c/64t Rome**
≈2.94 GHz​
5.19 GFLOPS​
664 GFLOPS​
≈375 W​
≈1.77 GFLOPS/W​

*) 2x 7452 at default TDP and PPT of 155 W per socket
**) 2x 7452 at cTDP = PPT = 180 W per socket

GFLOPS are specific for this application and measured by the BOINC platform for each host, as a running average. Power draw is measured "at the wall". All three systems are compute hosts without any bells and whistles (CPUs + RAM + SSD + platinum PSU); the BDW-EPs are similar to the one in #58 in this regard.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Some useful info:

This guide has two parts: First a reference manual of many BIOS settings related to the CPUs, northbridge, and memory controllers, and second a collection of tables of recommended settings for several use cases.
  • BIOS Options and Their Benefits
    • Infinity Fabric settings​
    • NUMA and memory settings​
    • power efficiency settings (and power limits)​
    • processor core settings​
    • I/O settings​
  • BIOS Setting Selection by Workload
    • general-purpose workloads​
    • virtualization and containers​
    • database and analytics​
    • HPC and Telco settings​
 

eek2121

Platinum Member
Aug 2, 2005
2,904
3,906
136
So the IO Die uses 90 wattts? Wow, I never realized. Zen 3 is going to be something of a revolution for EPYC if they end up shrinking and optimizing the IO Die. I imagine we'll see 4 GHz EPYC chips at that point.
 

eek2121

Platinum Member
Aug 2, 2005
2,904
3,906
136
Here are performance data of a bioinformatics application for Gene Regulatory Networks (GRN) expansion: "gene@home PC-IM 1.10 x86_64-pc-linux-gnu", currently hosted at TN-Grid's BOINC project:

host hardwarecore clockper-thread performanceper-host performancesystem power drawpower efficiency
dual 14c/28t Broadwell-EP
3.2 GHz​
4.88 GFLOPS​
273 GFLOPS​
≈350 W​
≈0.78 GFLOPS/W​
dual 22c/44t Broadwell-EP
2.8 GHz​
4.28 GFLOPS​
377 GFLOPS​
≈390 W​
≈0.97 GFLOPS/W​
dual 32c/64t Rome
≈2.74 GHz​
4.80 GFLOPS​
614 GFLOPS​
≈315 W​
≈1.95 GFLOPS/W​

GFLOPS are specific for this application and measured by the BOINC platform for each host, as a running average. Power draw is measured "at the wall". All three systems are compute hosts without any bells and whistles (CPUs + RAM + SSD + platinum PSU); the BDW-EPs are similar to the one in #58 in this regard.

Absolutely solid performance. While people may try and claim Ryzen doesn't have good single threaded performance (which I disagree with, Ryzen dominates most single threaded tasks, but that is the subject of another thread), it has unbeatable same-system performance. You can put up to 128 cores/256 threads in a 2U, or possibly even a 1U system. You cannot do that with Intel chips at all. I also have yet to see an ARM system with 128 cores/256 threads in a small-ish form factor (though I'd love to see one). I wish we had these amazing processors back when I worked in the field.