Just got a 9654 Genoa ** Motherboard /ram/heatsink/CPU are here !!

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Skillz

Senior member
Feb 14, 2014
905
918
136
If the new Zen 4 Threadripper uses the same socket, named differently, but physically the same size then we'll definitely see a cooler from them that will also support SP5.

It's already speculated the Threadripper 7000 series (Zen 4) will use a new socket most likely named TR5. Just like sTRX4 being the same size as SP5 making the coolers interchangeable.
 

vityan666

Member
Apr 12, 2023
42
31
51
While I would like to hope, that they have internal secret project regarding new SP5 cooler which they can't talk about, but I'm just being realistic here...
When companies released PC coolers for server sockets? Actually, almost never. The only recent times we saw it happen was because workstation sockets was the same size(So they made TR4 coolers, and added ",SP3" to description, genius).

Thus, seems like our only hope is new TR's by AMD, if they will release em on Sep. Aaaand, let's hope they will make em, or at least the TR Pro, for socket based on SP5 and not the smaller socket SP6, otherwise we will be left with the strange options we have now or EKWB/Alpha custom loops.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,439
14,409
136
The difference the way I see it, SP3 had the CPU mount pressure define by the torque on the 3 socket screws. SP5 uses the 6 screws on the heatsink to create the pressure to mount all 6096 pins, so that's a critical piece, and dependant on the HSF, unlike before.
 
  • Like
Reactions: H T C

vityan666

Member
Apr 12, 2023
42
31
51
Maybe they decided to get rid of the extra screws as integrator/builder needs to put the heatsink anyway, and it's pressure at the end always defined the cooling efficiency?
Dunno. The locking mechanism looks very similar, except one main screw, instead of 3.
The memory of Pentinum 2 Slot-A CPU just popped up in the head. Dunno why. May-be because it was the CPU we could install in 2 seconds, and it had all the cooling it needed with it... :)


Meantime I've just ordered a PCI-E height extender, can be used instead of riser cable to put the VGA card on the bottom slot with breathing space for different mobo connectors below.
Hoping it will sustain signal integrity of PCI-E 4.0. 1 HDMI port will become inaccessible, and 2DPs+1HDMI will remain visible which is more than enough. And will need to find a spacer for PCI-E cards tower locking bolt to change locking height...

 

jonperron09

Junior Member
Apr 14, 2023
6
10
36
@Markfw I'm building a setup with the same motherboard and came upon this thread also looking at cooling solution. I was about to buy the same thing as you when I saw @vityan666 post and chose the one from Alibaba. Thanks for this btw :)

Regarding the post problem, you got me worried so I decided to reply and follow this thread while waiting for the rest of my parts. Hope you will fix this soon and that it won't happen to me.

I will post my build in your new thread.
 

StefanR5R

Elite Member
Dec 10, 2016
5,441
7,681
136
As far as I have seen, all of the available SP5 coolers are mounted with spring bolts. I.e., it's basically the stiffness of the springs and the length of the bolts which define the cooler's contact pressure. So I am idly wondering, how hard it might be to build one's own SP5 mounting kit for an off-the-shelf tower cooler, using a) springs similar to — or to be sure, taken from — an SP5 cooler, b) a self made metal plate which is put on top of the cooler's cold plate, c) bolts of the necessary length which depends on how high the mounting plate sits over the socket, compared to the respective measure on an original SP5 cooler.

Noctua commented on the video from post #39 that NH-U14S DX-4677 is good for 700 W sustained at a temperature delta of 80 K between intake air and CPU die. (And, I guess, about maximum fan speed.) That looks promising, although a larger radiator would be nice to have. Furthermore, their NH-U14S DX-4677 product page mentions 70mm × 56mm contact area between heat spreader and their cold plate. I am of course in the dark how the contact area should be sized for the various Genoa SKUs.

Then again, it would obviously be better to have ready-made coolers available for CPUs as expensive as these ones. (Which for now means high air speed, or a water loop, or in case of certain server products even both.)
 

StefanR5R

Elite Member
Dec 10, 2016
5,441
7,681
136
In my case, I choose to go with Epyc 9274F due to it's great base frequency [...]
Also one Micron RDIMM 2R DDR5 4800 memory module arrived(From one seller), 5 more on the way(From another), and today got a deal on eBay for 2 extra used modules for only 120$ for both of them, so instead of originally planned 6 I will go for 8 modules,
thus 8 channels.
Servethehome posted a few benchmark results with EPYC 9654 on 8 channels RAM vs. 12 channels RAM. (Scroll down to the graph "ASRock Rack GENOAD8UD 2T X550 Performance 8ch To 12ch Reference". This was measured on different mainboards, but the difference in RAM channel count should be the biggest influence.) There is not a lot of a performance drop in most of these benchmarks. However, EPYC 9654 has got 12 CCXs. I wonder if SKUs with 8 CCXs (EPYC 9274F is one of those) have any performance drop at all if going with 8 instead of 12 RAM channels.
 
  • Like
Reactions: igor_kavinski
Jul 27, 2020
15,446
9,556
106
They used a liquid cooler and it gave them extra performance. Looks like a regular air cooler is gonna have some trouble dissipating the heat of so many closely packed cores.
 

vityan666

Member
Apr 12, 2023
42
31
51
StefanR5R
Servethehome posted a few benchmark results with EPYC 9654 on 8 channels RAM vs. 12 channels RAM. (Scroll down to the graph "ASRock Rack GENOAD8UD 2T X550 Performance 8ch To 12ch Reference". This was measured on different mainboards, but the difference in RAM channel count should be the biggest influence.) There is not a lot of a performance drop in most of these benchmarks.
If we were running something almost purely memory bandwidth bound like STREAM Triad, we would of course have larger variances. There are some applications that are effectively not memory bandwidth sensitive after a minimum threshold is reached, and those applications performed well.
They mentioned that their test wasn't quite bandwidth bound, and mentioned they expect larger differences for more intensive tests.

I wonder if SKUs with 8 CCXs (EPYC 9274F is one of those) have any performance drop at all if going with 8 instead of 12 RAM channels.
Whats quite interesting question. I think there are debates about it every Epyc/TR release of how amount of CCDs will balance with memory controller.
I suspect it will have some negative impact but can't be sure. As I don't plan to feed it with 12 DIMMS, I will not be able to test it anyway, and 6 to 8 channel diff test is useless for this scenario.
For my build jump from dual-channel DDR4 to 6 or 8 channel DDR5 is already impressive, and covers for single-module lack of bandwidth due to their JEDEC freqs.
 

StefanR5R

Elite Member
Dec 10, 2016
5,441
7,681
136
As a legacy note, and a little off-topic: EPYC 7002 'Rome' performs terrible if the number of populated channels differs from 4 or 8, even in applications which are not quite memory bandwidth sensitive. EPYC 7003 'Milan' was improved in this regard and works pretty well with 2, 4, and 6 (but of course best with 8) populated channels. AMD published the order in which channels should be populated for optimum performance, but I don't have a bookmark.
 
  • Like
Reactions: igor_kavinski

StefanR5R

Elite Member
Dec 10, 2016
5,441
7,681
136
Noctua's product roadmap currently shows "Next-gen AMD Threadripper coolers" for Q3 2023. Perhaps this cooler will be compatible with EPYC Genoa as well. Or maybe not, if (non-pro) Threadrippers happen to be released on a smaller socket. (Caveat: Noctua are infamous for repeated delays of their products, if not canceling announced products altogether.)

I've asked em' on 24.03 about some love for our small and cute SP5 socket, and got this response on 28.03:
Do you plan to manufacture/introduce a socket SP5 (LGA 6096) compatible cooler for AMD Epyc Genoa CPUs? [...] will SP5 socket get some love as well, before TR-7000 release?
Response:
[...] Unfortunately, no such mounting kit is currently planned.
Hah! You asked about a cooler, they responded about a mounting kit...

The above mentioned roadmap update was made public in early May.
 
Last edited:
  • Like
Reactions: cellarnoise

StefanR5R

Elite Member
Dec 10, 2016
5,441
7,681
136
As a legacy note, and a little off-topic: EPYC 7002 'Rome' performs terrible if the number of populated channels differs from 4 or 8, even in applications which are not quite memory bandwidth sensitive. EPYC 7003 'Milan' was improved in this regard and works pretty well with 2, 4, and 6 (but of course best with 8) populated channels. AMD published the order in which channels should be populated for optimum performance, but I don't have a bookmark.
According to the publicly available Overview of AMD EPYC™ 7003 Series Processors Microarchitecture, access to AMD's Milan memory population guide requires a login. However, there is for example a publicly available guide for memory population of Milan based Dell servers: Memory Population Rules for 3rd Generation AMD EPYC™ CPUs on PowerEdge Servers

Back to the thread's subject:
AMD's BIOS & Workload Tuning Guide for AMD EPYC™ 9004 Series Processors has got a bunch of documentation on Genoa CPU performance related BIOS settings, their default values, and recommended values in several common workloads. Most scientific/ distributed computing related workloads should fit somewhere between the "General Purpose: CPU Intensive", "General Purpose: Power Efficiency", and "HPC" classes of workloads. Interestingly, the guide recommends to set TDP and PPT to the OPN maximum for each one of these three workloads, even in "Power Efficiency" workloads (but not in a variety of other workloads which are less relevant for scientific/ distributed computing).

The guide also shows a high-level diagram of the I/O die's inner topology. According to this, Genoa's I/O die is organized into four quadrants much like Rome's and Milan's I/O die. It's just that there is now effectively a triple-channel memory controller per quadrant, instead of a dual-channel memory controller. This is a hint (but not a proof) that not only Genoa SKUs with 12 CCDs profit from full population of all twelve memory channels, but also SKUs with 8 or 4 CCDs.

More EPYC server tuning guides, for 9004, 7003, 7002, and 7001 EPYCs:
https://www.amd.com/en/processors/tuning-guides-server
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,439
14,409
136
Update for those that have not followed my build thread in CPU's. I now have a 9654 96 core genoa, and 2 9554 64 core Genoa's, one of the three is on windows (one of the the 9554's), the other two are on linux. Currently only the win 10 9554 is powered up due to budget, but when I have paid off the $14,000 I will be back.
 

StefanR5R

Elite Member
Dec 10, 2016
5,441
7,681
136
I almost finished building a 64-core Genoa computer too now. It is watercooled.

Running 64 SGS-LLR tasks at once...
... on dual EPYC 7452 (2x 32 Zen 2 cores/ 2x 180 W cTDP):
723 s average elapsed time, ~370 W at the wall (305 kPPD, ~820 PPD/W)
Cores are running at about 2.6 GHz.

... on EPYC 9554P (64 Zen 4 cores/ 360 W TDP):
408 s average elapsed time, ~420 W at the wall (540 kPPD, ~1,290 PPD/W)
Cores are running at about 3.3 GHz.

I.e., +77 % computer performance, +14 % higher power draw, +56 % power efficiency from 2P air-cooled Rome to 1P water-cooled Genoa in this vector arithmetic centric workload.

Concentrating this much power in a single socket unfortunately complicates cooling if you want to avoid extreme noise, and increases power draw of the cooling subsystem somewhat.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,439
14,409
136
I almost finished building a 64-core Genoa computer too now. It is watercooled.

Running 64 SGS-LLR tasks at once...
... on dual EPYC 7452 (2x 32 Zen 2 cores/ 2x 180 W cTDP):
723 s average elapsed time, ~370 W at the wall (305 kPPD, ~820 PPD/W)
Cores are running at about 2.6 GHz.

... on EPYC 9554P (64 Zen 4 cores/ 360 W TDP):
408 s average elapsed time, ~420 W at the wall (540 kPPD, ~1,290 PPD/W)
Cores are running at about 3.3 GHz.

I.e., +77 % computer performance, +14 % higher power draw, +56 % power efficiency from 2P air-cooled Rome to 1P water-cooled Genoa in this vector arithmetic centric workload.

Concentrating this much power in a single socket unfortunately complicates cooling if you want to avoid extreme noise, and increases power draw of the cooling subsystem somewhat.
Welcome to the Genoa family ! I got one more 9554 since my last post, so, 3 9554's and a 9654, all air cooled. You are the first with a water cooled Genoa !
 

gsrcrxsi

Member
Aug 27, 2022
46
26
51
If the new Zen 4 Threadripper uses the same socket, named differently, but physically the same size then we'll definitely see a cooler from them that will also support SP5.

It's already speculated the Threadripper 7000 series (Zen 4) will use a new socket most likely named TR5. Just like sTRX4 being the same size as SP5 making the coolers interchangeable.
looks like they built the new TR on SP6 (sTR5), which it more similar to SP3, and SP3 coolers will probably work on it based on the pics (looks identical in size to SP3 to me)

I'd love to see a head to head of the TR Pro 7995WX vs the EPYC 9654.

same zen4 cores
same L3 cache
8ch 5200 vs 12ch 4800 memory (EPYC still has ~38% more bandwidth, despite having 50% more channels)

interesting that they got all those cores in the smaller package. the larger SP5 socket takes up too much real estate IMO and impinges too much on PCIe slot expansion, even when they cut down to 8ch on some boards. things are just getting too big for the ATX standard of 7x PCIe slots.

I'd imagine that the TR will probably be overall faster depending how high power limit can be maintained for higher clocks and the speed difference will probably scale with that clock difference since the core architecture is the same. it'll probably take the crown for absolute single socket compute, with higher power draw and price. (for scenarios not bound by memory bandwidth of course)

personally I don't trust AMD with Threadripper. they never really had a multi-gen platform like AM4, and early adopters got raked over the coals with every new generation, having to buy another new expensive platform each time.
 

StefanR5R

Elite Member
Dec 10, 2016
5,441
7,681
136
looks like they built the new TR on SP6 (sTR5), which it more similar to SP3, and SP3 coolers will probably work on it based on the pics (looks identical in size to SP3 to me)
SP6 and sTR5 apparently have the same mounting frame as SP3 and TR4, as some cooler makers claimed compatibility of one and the same cooler model of theirs for all of these sockets. Noctua on the other hand produced new retention mechanisms for SP6 and sTR5 with increased mounting pressure (source: press release). Makes some sense, give the higher pin count. I suspect that this is more of a theoretical than practical issue though.

interesting that they got all those cores in the smaller package.
In the CPU & overclocking subforum here, somebody spotted that AMD had to make cutouts in the heatspreader for the 96c Threadripper.

I'd imagine that the TR will probably be overall faster depending how high power limit can be maintained for higher clocks and the speed difference will probably scale with that clock difference since the core architecture is the same.
You'd have to increase the power limit above stock, obviously. In case of EPYC, the PPT can probably only go up to the OPN's hardwired max cTDP, whereas in case of Threadripper and Threadripper Pro, it's evidently open-ended.

Therefore I expect Threadripper Pro prices to handily exceed those of corresponding EPYC 9004P models.

personally I don't trust AMD with Threadripper. they never really had a multi-gen platform like AM4, and early adopters got raked over the coals with every new generation, having to buy another new expensive platform each time.
Early Ryzen generations: Great CPU price-per-performance + great platform longevity (although with transient compatibility issues)
Early Threadripper generations: Great CPU price-per-performance as well, but one-off platforms against customer expectations

While AMD made claims about Ryzen's socket support timelines, and went through with them, I believe they never said anything about Threadripper socket support roadmaps. As we know now, when they don't publish a roadmap, we should assume that this road is short and a dead end.

As for prices: Now that AMD beat Intel in performance several times in a row and made big progress in server market share, the times of great price-per-performance are (arguably) over.

----------------

Apropos SP6. Personally, I was curious if the EPYC 8004 series (Siena) had items which could be attractive for Distributed Computing. But two properties turned me off: The halved level 3 cache (bad for PrimeGrid), and the smaller possible power density per host compared to Genoa/ Milan/ Rome.
 

gsrcrxsi

Member
Aug 27, 2022
46
26
51
My focus is always on multi-GPU more than raw CPU performance, and I personally don't really care for PG or other number theory type projects. so that's why EPYC Rome/Milan is such a great fit for me. 7x GPUs without bifurcation, no problem :) the CPU projects I contribute to (mostly universe now) don't seem to scale with anything but clocks and core count and don't really see much IPC, memory, or cache based gains. Milan got a boost in production over Rome purely from more power and proportionally higher clocks, which made the gains a wash from an efficiency POV.

I do have an H11DSi with dual 7642 and no GPUs, but really only as a novelty/experiment and because I have it on custom watercooling with that "greasy chicken" custom Monoblock. I could upgrade to 7742, but it's a hassle to remount the monoblock so I've been putting it off lol.

Sienna looked interesting to me, but it seems like they are limiting the PCIe lanes on those too :( I'd love to see what Bergamo can do on Universe though with 128c/256t even with the lower clocks and half cache :)
 

StefanR5R

Elite Member
Dec 10, 2016
5,441
7,681
136
I was curious if the EPYC 8004 series (Siena) had items which could be attractive for Distributed Computing. But two properties turned me off: The halved level 3 cache (bad for PrimeGrid), and the smaller possible power density per host compared to Genoa/ Milan/ Rome.
According to rumors, the Bergamo successor in the 5th generation of EPYCs should be having 16-core complexes with 32 MB L3$. That's still half the L3$ normalized to core count, but at least recent PrimeGrid applications scale well to larger thread counts per application instance, such that they should work nicely on these CCXs. And many other DC projects should do well on such a configuration too. It seems reasonable to assume that these same CCXs will appear in a Siena successor.

On the other hand, it is IMO possible that Bergamo successors and Siena successors may miss out on upgrades of the FP/vector pipelines which Ryzen and Turin may receive.