Discussion EPYC builders thread

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
So now that I have 2 boxes, I thought I would start a builders thread, and also pass along what I have learned.

First, They are are cheap, and they are a different animal to build (see below)

OK, so first, I got these 2 7551 ES chips for $300 each, I thought, well that is cheap for a 32 core CPU, and with 8 channel memory they won't be handicapped like my 2990wx's.

Now to the bad... First they were only compatible with ONE motherboard that was available to buy, and with a BIOS that I had to fight with Gigabyte to provide me with ! (not available online)
So I tried several other motherboards and different BIOS's since I could not even find this one online. FINALLY I found the board at $470. I updated the BIOS (odd way, boot into motherboards default stream environment (or something like that) and do a DOS bios update. Well, it worked. Then while trying to install linux mint 19.2, no internet ???? Oh, I did not read the fine print. No ethernet that goes outside the local lan. Only SFP+ (whatever that is), so after some reasearch, I find the adapter for $13 more.

OK fast forward to when I now have an install and updated with ethernet. This 32 core chip only runs turbo all core at 1.6 ghz !!! So I spent almost $500 on the motherboard, and for 128 gig of ECC registered (the cheapest I could find@64 gig on 8 sticks or more) at another $530 and a $300 CPU I have $1330 invested in a 1600 mhz 32 core box... Picture:
JXvDun7.jpg


And before I got this working, one of the motherboards I tried (dual socket) that would not work with these ES chips, I found 2 7601 used retail chips for $1750 and a $650 motherboard. Now this one needed 16 sticks so that was $1100. So now I am in this one $3500 with no SSD, case or PSU. Well, I get it running, and it only boosts all core to 2.6 ghz. But at least thats a far cry to 1600 mhz.

Here:
5Gr0BgB.jpg


Bottom line ? The new 3960x and 3970x are not overpriced. At the same speed they are almost twice as fast. And don't venture into EPYC chips unless you have wads of cash to blow and learn. I bought another motherboard for my 2nd 7551 ES chip, so now I have to build that box too !

But I will soon have 3 server grade ECC memory chip boxes with 128 cores and 256 threads for almost $6000. I can't even get video cards to fit in these, as the memory and heatsinks are in the way ! I have to use the on-board video.
 
Last edited:

sdifox

No Lifer
Sep 30, 2005
94,686
14,935
126
I have looked ! I have only been deaf a year, so I have not worked out all the details. If you know of one, link me is appreciated !

Edit, looked and found one for $15 ! coming wed. But thats the day I need it/


How are you going to hear the delivery of the new doorbell :colbert:

Joke aside, ESP32 + power source + light and motor would make a cheap diy wifi doorbell. With ESP22-CAM you can have video doorbell
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
How are you going to hear the delivery of the new doorbell :colbert:

Joke aside, ESP32 + power source + light and motor would make a cheap diy wifi doorbell. With ESP22-CAM you can have video doorbell
I got one at home depot for $45. I also have the one that was delivered (no doorbell used/required)
 
  • Like
Reactions: scannall

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
Update... My Rome 7742 has a bad memory channel. 2 Different motherboards will not see all8 channels ! So I will only have 112 gig ram.
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
Update... My Rome 7742 has a bad memory channel. 2 Different motherboards will not see all8 channels ! So I will only have 112 gig ram.
Well explains the deal. I doubt there is a chance as a non-oem to get AMD to do replacement for it? If not you going to keep it, seems like getting a ferrari with a bad stereo. Sucks but you still have that engine.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
Well explains the deal. I doubt there is a chance as a non-oem to get AMD to do replacement for it? If not you going to keep it, seems like getting a ferrari with a bad stereo. Sucks but you still have that engine.
Yup. For $1000, I got what I paid for, a defective ES chip, that is close to retail on GHZ and 7 of 8 memory channels. One day I may get a retail 7742, but thats a LOT of $$$. Even on ebay they are going for $4000 or more....
 
  • Like
Reactions: Tlh97 and moinmoin

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
Having a discussion on speed, need linux help here. I think all cpu's are running at 2 ghx. First, lscpu output:
lscpu says 2 ghz. I am unaware of any way to configure ctdp in bios.
mark@EPYC-7742:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD Eng Sample: 100-000000053-04_32/20_N
Stepping: 0
CPU MHz: 2987.928
CPU max MHz: 2000.0000
CPU min MHz: 1500.0000
BogoMIPS: 3992.62
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 16384K
NUMA node0 CPU(s): 0-127
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca
mark@EPYC-7742:~$

Then this command:

mark@EPYC-7742:~$ sudo cat /proc/cpuinfo | grep MHz
[sudo] password for mark:
cpu MHz : 2975.585
cpu MHz : 2975.728
cpu MHz : 2975.755
cpu MHz : 2975.734
cpu MHz : 2975.668
cpu MHz : 2975.556
cpu MHz : 2975.749
cpu MHz : 2975.633
cpu MHz : 2976.267
cpu MHz : 2976.572
cpu MHz : 2976.540
cpu MHz : 2976.595
cpu MHz : 2974.706
cpu MHz : 2974.745
cpu MHz : 2974.632
cpu MHz : 2974.981
cpu MHz : 2972.941
cpu MHz : 2973.021
cpu MHz : 2973.010
cpu MHz : 2973.038
cpu MHz : 2974.448
cpu MHz : 2522.847
cpu MHz : 2974.108
cpu MHz : 2974.311
cpu MHz : 2974.190
cpu MHz : 2974.256
cpu MHz : 2973.971
cpu MHz : 2974.401
cpu MHz : 2975.334
cpu MHz : 2975.256
cpu MHz : 2975.044
cpu MHz : 2975.284
cpu MHz : 2973.354
cpu MHz : 2972.899
cpu MHz : 2973.324
cpu MHz : 2973.434
cpu MHz : 2973.677
cpu MHz : 2973.914
cpu MHz : 2973.619
cpu MHz : 2973.878
cpu MHz : 2977.777
cpu MHz : 2978.290
cpu MHz : 2978.325
cpu MHz : 2978.359
cpu MHz : 2979.636
cpu MHz : 2979.764
cpu MHz : 2979.691
cpu MHz : 2979.865
cpu MHz : 2976.801
cpu MHz : 2976.856
cpu MHz : 2976.786
cpu MHz : 2976.871
cpu MHz : 2973.061
cpu MHz : 2972.881
cpu MHz : 2973.142
cpu MHz : 2973.004
cpu MHz : 2973.563
cpu MHz : 2973.734
cpu MHz : 2973.614
cpu MHz : 2973.720
cpu MHz : 2976.048
cpu MHz : 2976.278
cpu MHz : 2976.237
cpu MHz : 2976.137
cpu MHz : 2977.159
cpu MHz : 2977.359
cpu MHz : 2977.376
cpu MHz : 2977.381
cpu MHz : 2977.294
cpu MHz : 2977.190
cpu MHz : 2977.398
cpu MHz : 2977.306
cpu MHz : 2977.322
cpu MHz : 2977.645
cpu MHz : 2977.642
cpu MHz : 2977.718
cpu MHz : 2976.682
cpu MHz : 2976.574
cpu MHz : 2976.429
cpu MHz : 2976.722
cpu MHz : 2974.487
cpu MHz : 2974.557
cpu MHz : 2974.512
cpu MHz : 2974.570
cpu MHz : 2975.702
cpu MHz : 2641.614
cpu MHz : 2975.387
cpu MHz : 2975.604
cpu MHz : 2975.413
cpu MHz : 2975.506
cpu MHz : 2975.217
cpu MHz : 2975.598
cpu MHz : 2977.073
cpu MHz : 2977.002
cpu MHz : 2976.806
cpu MHz : 2977.029
cpu MHz : 2974.520
cpu MHz : 2974.053
cpu MHz : 2974.485
cpu MHz : 2974.371
cpu MHz : 2974.812
cpu MHz : 2975.128
cpu MHz : 2974.811
cpu MHz : 2975.072
cpu MHz : 2978.496
cpu MHz : 2979.049
cpu MHz : 2979.070
cpu MHz : 2979.074
cpu MHz : 2980.842
cpu MHz : 2981.002
cpu MHz : 2980.909
cpu MHz : 2981.076
cpu MHz : 2977.750
cpu MHz : 2977.821
cpu MHz : 2977.742
cpu MHz : 2977.829
cpu MHz : 2974.328
cpu MHz : 2974.126
cpu MHz : 2974.375
cpu MHz : 2974.242
cpu MHz : 2974.285
cpu MHz : 2974.434
cpu MHz : 2974.293
cpu MHz : 2974.419
cpu MHz : 2977.065
cpu MHz : 2977.241
cpu MHz : 2977.235
cpu MHz : 2977.139
mark@EPYC-7742:~$ ^C
mark@EPYC-7742:~$

So for the linux guru's, running 2 ghz all core @100%load ?
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
On my Xeon E5v4 computers, /proc/cpuinfo contains only the base clock, not the actual clock. I read actual clocks from the sysfs interface of the cpufreq driver.
Example:
Bash:
cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq | sort | sed -e 1b -e '$!d'
to show min and max of the current clocks.

However, 1.) I don't know whether the kernel uses the cpufreq driver with EPYCs too, 2.) your /proc/cpuinfo output has all these different values instead of precisely the same for all CPUs, which lets me think that your output shows actual clocks. If so, /proc/cpuinfo was showing 2.97...2.98 GHz at 126 of the 128 CPUs, and 2.52 and 2.64 GHz at the remaining two CPUs at that moment.

PS, for your convenience:
Reducing the 128 lines of proc output to just the min and max:
Bash:
grep MHz /proc/cpuinfo | sort | sed -e 1b -e '$!d'
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
On my Xeon E5v4 computers, /proc/cpuinfo contains only the base clock, not the actual clock. I read actual clocks from the sysfs interface of the cpufreq driver.
Example:
Bash:
cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq | sort | sed -e 1b -e '$!d'
to show min and max of the current clocks.

However, 1.) I don't know whether the kernel uses the cpufreq driver with EPYCs too, 2.) your /proc/cpuinfo output has all these different values instead of precisely the same for all CPUs, which lets me think that your output shows actual clocks. If so, /proc/cpuinfo was showing 2.97...2.98 GHz at 126 of the 128 CPUs, and 2.52 and 2.64 GHz at the remaining two CPUs at that moment.

PS, for your convenience:
Reducing the 128 lines of proc output to just the min and max:
Bash:
grep MHz /proc/cpuinfo | sort | sed -e 1b -e '$!d'
I tried both of those, and it came back with essentially 3 ghz, but under windows, I could swear that it never got above 2 ghz when all cores were loaded.

It looks like several tasks running mapping cancer markers isz taking 4 hours 50 minutes. Let me check my other boxes and see how that fares.

On my 4 ghz 3900x, its taking 2 hours 40 minutes. So it probably is running 2 ghz. Half the 7742 is 2:25, and that could be faster per ghz due to 8 channel memory.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
looks like 3GHz on all cores to me. Pretty good. Is this the same workload you were running under windows?

Still only 7 channels right?
yup, 7 channels. But based on the WCG run times, I am pretty sure its really only 2 ghz,.
 

Trotador22

Junior Member
Dec 5, 2019
22
3
81
Is there any other explanation for this apparent contradiction? Could the WCG crunching times due to other reason? It seems difficult that the linux commands are giving a wrong value,.. or no?
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
@Markfw, if you check the proc values twice, though ~half a minute apart, do the values change, even just slightly?
But based on the WCG run times, I am pretty sure its really only 2 ghz,.
Is there any other explanation for this apparent contradiction? Could the WCG crunching times due to other reason? It seems difficult that the linux commands are giving a wrong value,.. or no?
Perhaps there is so-called clock stretching in effect.
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
Generally, it is a device stretching out its operating periods beyond the periods it should be observing according to an external clock.

Specifically, Matisse (Ryzen 3000) has been found to employ clock stretching when undervolted, to varying extent. Whether or not clock stretching is in effect on Matisse cannot be seen by monitoring the core clocks, but by monitoring certain performance counters. Indirectly, the ratio of application performance to core clocks may be an indicator for whether or not clock stretching is happening (if application performance is consistent and known at stock settings).

EPYC SKUs have a lower power budget per core than Ryzen models. So far I assumed that they achieve it by their firmware applying a voltage-frequency curve which reaches further into the low voltage/ low clock range. (According to The Stilt, Matisse's stock V/F curve is level below 3.4 GHz.) Whether or not clock stretching happens on EPYCs regularly on top of clock control according to the V/F curve, or on Mark's particular sample at least, is unknown to me.

Edit,
mark@EPYC-7742:~$ lscpu
[...]
Model name: AMD Eng Sample: 100-000000053-04_32/20_N
Stepping: 0
CPU MHz: 2987.928
CPU max MHz: 2000.0000
CPU min MHz: 1500.0000
Given this it is reasonable to assume that the processor will not clock higher than 2000 MHz.
 
Last edited:

Trotador22

Junior Member
Dec 5, 2019
22
3
81
Thanks for the explanation!.

It has to be the case of this sample, as an ES it should not have properly tuned the firmware. Odd in any case as it seems a close to production sample. Surely no the case of retail processors.

Could it be also partially related to the cTDP value in BIOS that could be limitting the thermal stress and thus the actual frequency?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
Thanks for the explanation!.

It has to be the case of this sample, as an ES it should not have properly tuned the firmware. Odd in any case as it seems a close to production sample. Surely no the case of retail processors.

Could it be also partially related to the cTDP value in BIOS that could be limitting the thermal stress and thus the actual frequency?
The all-core max speed for a retail 7742 is 2.25 ghz. Mine is doing 2 ghz, so I assumed it was close to retail in its stepping.
 

Trotador22

Junior Member
Dec 5, 2019
22
3
81
My understanding is that 2.25 GHz is the base clock, roughly the minimum clock at which it will perform any operation. On top of that, there will be a frequency boost based on thermal stress and cTDP (simplified). 7742 maximum cTDP is 240 watts, 3990x is 280watts, no that higher. So sure in my view that 7742 can reach much higher all core boost frequency.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
My understanding is that 2.25 GHz is the base clock, roughly the minimum clock at which it will perform any operation. On top of that, there will be a frequency boost based on thermal stress and cTDP (simplified). 7742 maximum cTDP is 240 watts, 3990x is 280watts, no that higher. So sure in my view that 7742 can reach much higher all core boost frequency.
When I was testing in windows, before linux, it was taking 275 watts from the wall with all cores loaded using WCG, speed was 2 ghz, and temps were 70c. And 7742 is 225 tdp. So 50 watts for loss on PSU + motherboard, memory and nvme drive. Pretty sure it was at 225 watts for cpu. linux should be no different on any of those areas.
 

Soulkeeper

Diamond Member
Nov 23, 2001
6,712
142
106
I was noticing some pcie adapters that add external pcie cable support.
You can link two systems at pcie 16x ... insane stuff
Installing a video card shouldn't be too hard with an adapter, they make all kinds.

You're well on your way to building a super computer cluster :)
 

StefanR5R

Elite Member
Dec 10, 2016
5,459
7,718
136
@Trotador22, correct, 2.25 GHz is the base clock of retail 7742.

I don't think there is any all-core turbo specification, is there? AMD's product page doesn't give one, which is only natural if you consider how boost works in Zen2 (and in Zen, for that matter); there is just a max boost clock spec. However, there is a site which claims an all-core turbo of 2.8 GHz.

Here is ServeTheHome's current list of EPYC Rome SKUs, along with default TDP, configurable TDP range, base clock and max boost clock.
AMD-EPYC-7002-Series-SKU-List-Comparison-Feb-2020-Edition.jpg


And here is a previous version of the list, missing the more recently added 7H12, 7762, and 7532, but showing core count × clock / price. Should be helpful to all you EPYC builders out there. :-)
AMD-EPYC-7002-SKU-List-and-Value-Comparison-Full.jpg


Special models:
  • 7H12 was reported to be available to special HPC customers only.
  • The 7xxxP models are limited to single-socket machines, but otherwise identical in features with the non-P models.
  • 7282, 7272, 7252, and 7232P are specified for only 85.3 GB/s memory bandwidth per socket, compared to 204.8 GB/s of all other models. They have all 8 RAM channels enabled, but don't carry as many CCDs as needed to saturate the memory controllers.
  • 7262 is a large-cache SKU with just one core enabled per core complex, geared towards software with per-core licensing.
    7532 is large-cache SKU with 2 cores enabled per CCX.
  • 7662 is a new "entry-level" 64-core SKU for 2P systems, at a 1000 units list price of $6,150 (compared to $6,450 of 7702 and $4,425 of 7702P).
 
Last edited: