Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 92 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Jul 27, 2020
28,032
19,131
146
Our laptops can handle most analyses, just not the nonlinear time history earthquake sims that we use the servers for.
Hyper-V runs on laptops too. Leave the Zen4 laptops plugged in overnight while they crunch and munch on the sims in their VMs. They would have a higher ST throughput than servers.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
Haha, that's not exactly viable because it can take a few days for the analysis to complete. Our laptops can handle most analyses, just not the nonlinear time history earthquake sims that we use the servers for.
why'd i assumed you worked for nasa?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,247
16,107
136
Hyper-V runs on laptops too. Leave the Zen4 laptops plugged in overnight while they crunch and munch on the sims in their VMs. They would have a higher ST throughput than servers.
Or buy a 7950x. But even one 9554 could do in hours what their current farm may take days to do.
 

Saylick

Diamond Member
Sep 10, 2012
4,036
9,456
136
Hyper-V runs on laptops too. Leave the Zen4 laptops plugged in overnight while they crunch and munch on the sims in their VMs. They would have a higher ST throughput than servers.
I see. Not to sound argumentative, but the other caveats are: 1) No one keeps their laptops continuously plugged in for multiple days straight, 2) the laptop processor will throttle under said workload, thus making it hot and uncomfortable for the user, 3) the size of the files for results of the analyses are in the hundreds of GB and our laptop hard drives are only 1 TB, and 4) the software requires an internet connection to the license server or else the analysis self-terminates. I know, a-hole design feature by the software vendor.

Trust me. There's a reason why we specifically run these kinds of analyses on a server. It's far easier to remote into the server VM than it is to run it locally.
 

Saylick

Diamond Member
Sep 10, 2012
4,036
9,456
136
Or buy a 7950x. But even one 9554 could do in hours what their current farm may take days to do.
Not entirely true but I agree that having a lot of fast cores helps. The workload is 100% single threaded, but as I mentioned in an earlier post, we run multiple analyses in parallel. But again, each analysis is single threaded. One design iteration requires simulating each earthquake, and there's 11 or 14 earthquakes to simulate, depending on the project. Heavy emphasis on ST performance is crucial on a per-earthquake basis, but total MT throughput of the processor lets us run more iterations in general.
 
  • Like
Reactions: Tlh97 and A///

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,247
16,107
136
Not entirely true but I agree that having a lot of fast cores helps. The workload is 100% single threaded, but as I mentioned in an earlier post, we run multiple analyses in parallel. But again, each analysis is single threaded. One design iteration requires simulating each earthquake, and there's 11 or 14 earthquakes to simulate, depending on the project. Heavy emphasis on ST performance is crucial on a per-earthquake basis, but total MT throughput of the processor lets us run more iterations in general.
A 9174F may be better for you. 16 cores @ 4.4 ghz could be really fast. What are the specs of what you are running ?
 
  • Like
Reactions: Tlh97 and A///

Saylick

Diamond Member
Sep 10, 2012
4,036
9,456
136
A 9174F may be better for you. 16 cores @ 4.4 ghz could be really fast. What are the specs of what you are running ?
You're on the right track. I am eyeing the Genoa F SKUs because they clock high and have a ton of cores. The EPYC 9474F would be really sweet (48c, 3.6 base, 4.1 boost). We currently use blades with dual Xeon Gold 6154 (3.0 GHz base, 18c).
 

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
Not entirely true but I agree that having a lot of fast cores helps. The workload is 100% single threaded, but as I mentioned in an earlier post, we run multiple analyses in parallel. But again, each analysis is single threaded. One design iteration requires simulating each earthquake, and there's 11 or 14 earthquakes to simulate, depending on the project. Heavy emphasis on ST performance is crucial on a per-earthquake basis, but total MT throughput of the processor lets us run more iterations in general.
very interesting. this post of yours reminds me of how zen 5 is allegedly going to have "hybrid" cores but not the clusterf*** intel has designed that leaves them without 2 threads per core and no avx512 that they had to come up with avx10, also let's not forget thier newest security nightmare cripples older processors. I need to watch the frizzy haired guy's video when I have free time on my hands and not focused on finishing up a bottle of wine but if the zen 5 stuff c cores still come threaded and support avx512 I think intel's next gen hardware is still going to be in trouble afaik because arrow lake is still going to be on this contrived design decision.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
You're on the right track. I am eyeing the Genoa F SKUs because they clock high and have a ton of cores. The EPYC 9474F would be really sweet (48c, 3.6 base, 4.1 boost). We currently use blades with dual Xeon Gold 6154 (3.0 GHz base, 18c).
if xeons were ever considered crimes against humanity it would be that series of xeons and ones since. any intel xeon in the last decade.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,650
5,189
136
CES 2024 is in 4 months. They are going to somehow present Strix there since they can't afford breaking the sacred AMD ExecutionTM.
CES might be too optimistic, unless AMD is now going to accept the recent practice of announce at CES, ship after Computex.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,247
16,107
136
You're on the right track. I am eyeing the Genoa F SKUs because they clock high and have a ton of cores. The EPYC 9474F would be really sweet (48c, 3.6 base, 4.1 boost). We currently use blades with dual Xeon Gold 6154 (3.0 GHz base, 18c).
I will look for benchmarks, but as old as those are, the Genoa could be 3-4 times as fast per core.
 
  • Like
Reactions: MangoX

Saylick

Diamond Member
Sep 10, 2012
4,036
9,456
136
I will look for benchmarks, but as old as those are, the Genoa could be 3-4 times as fast per core.
That would be really sweet if true, but I think between Skylake and Zen 4, there's about a 25% IPC gain. Going from 3.0 GHz to 3.6 GHz nets another 20%, so my guess is that I'd get a 50% speed boost per core from the Xeon Gold 6154 to the EPYC 9474F. Of course, with a gajillion more cores, I could run 33% iterations with one of the EPYC processors vs. the 2P configuration of the Xeon.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,650
5,189
136
That would be really sweet if true, but I think between Skylake and Zen 4, there's about a 25% IPC gain. Going from 3.0 GHz to 3.6 GHz nets another 20%, so my guess is that I'd get a 50% speed boost per core from the Xeon Gold 6154 to the EPYC 9474F. Of course, with a gajillion more cores, I could run 33% iterations with one of the EPYC processors vs. the 2P configuration of the Xeon.
You might want to test how it runs with V-Cache.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
That would be really sweet if true, but I think between Skylake and Zen 4, there's about a 25% IPC gain.
33% comparatively to a 10900K in ST according to Computerbase, and that include one bench out of 4 that is underestimated by 18% for Zen 4, so the number is rather 35-36%.
 
  • Wow
Reactions: Tlh97 and Saylick
Jul 27, 2020
28,032
19,131
146
Of course, with a gajillion more cores, I could run 33% iterations with one of the EPYC processors vs. the 2P configuration of the Xeon.
Is the simulation completely CPU limited or does the storage or memory speed also factor in its performance? VMs generally suck when it comes to storage and memory bandwidth. IOPS takes a pretty serious hit inside a VM. If you are running multiple parallel instances of your sim software inside a VM, why not just run 36 parallel instances directly on the host OS without using VMs? Or is eight parallel instances the max that software allows running on a machine?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,247
16,107
136
33% comparatively to a 10900K in ST according to Computerbase, and that include one bench out of 4 that is underestimated by 18% for Zen 4, so the number is rather 35-36%.
And a 50% freq gain, so like possibly twice as fast overall. (3.0 - 4.4 for the 9474F I think)
 

Saylick

Diamond Member
Sep 10, 2012
4,036
9,456
136
Is the simulation completely CPU limited or does the storage or memory speed also factor in its performance? VMs generally suck when it comes to storage and memory bandwidth. IOPS takes a pretty serious hit inside a VM. If you are running multiple parallel instances of your sim software inside a VM, why not just run 36 parallel instances directly on the host OS without using VMs? Or is eight parallel instances the max that software allows running on a machine?
Astute observations and questions. Let me see how many I can answer.
Is the simulation completely CPU limited or does the storage or memory speed also factor in its performance? VMs generally suck when it comes to storage and memory bandwidth. IOPS takes a pretty serious hit inside a VM.
I suspect it is primarily CPU limited, although having a fast storage solution helps because of how much data is being written. We used to run this software on HDDs and we saw an improvement to analysis runtimes when SSDs became common place. Currently, our servers use PCIe SSDs.

I would be curious to know where the bottlenecks lie, but I don't know how to run performance monitoring on software to determine which parts of the computer are getting hit hardest. Are there free tools that can do this? Do you have recommendations?
If you are running multiple parallel instances of your sim software inside a VM, why not just run 36 parallel instances directly on the host OS without using VMs?
Good point. We run Windows Server on the VMs, which are limited to two concurrently active remote desktop connections. Engineers like to remote into the server and make changes to the model in the software, post-process the data directly, etc.

If there wasn't a limitation to the number of concurrently active remote desktop connections, we would likely eliminate VMs entirely.
Or is eight parallel instances the max that software allows running on a machine?
Actually, each instance of the software can run 8 parallel analyses, and three instances of the software consumes one license. Therefore, having a core count that is a multiple of 24 is ideal from a license usage standpoint.
 
Jul 27, 2020
28,032
19,131
146
If there wasn't a limitation to the number of concurrently active remote desktop connections, we would likely eliminate VMs entirely.

The licensing cost for that seems reasonable. It could help you do away with VMs entirely. One license of 5 users per server doesn't seem that expensive considering that there might be gains to be had from running your software on baremetal.
 
Jul 27, 2020
28,032
19,131
146
We used to run this software on HDDs and we saw an improvement to analysis runtimes when SSDs became common place. Currently, our servers use PCIe SSDs.
From my testing, PCIe SSDs suffer pretty badly under a VM (though my only test subject was a Samsung 980 Pro in a Ivy Bridge-E server). Can't say for sure if enterprise PCIe SSDs working on a newer server would be held back by the virtualization overhead. The easiest way to test would be to run the software directly on the server without a VM and see if the computation times improve.
I would be curious to know where the bottlenecks lie, but I don't know how to run performance monitoring on software to determine which parts of the computer are getting hit hardest. Are there free tools that can do this? Do you have recommendations?
I wish I had tools that real engineers use to pinpoint bottlenecks and I knew how to use them. My approach is just to give the application maximum resources that I can and see if it scales. With your application, suppose if it generates about 10GB of data to finish a typical simulation, I would first run the simulation inside the VM and note down the time. Then I would run the application on the server without a VM and note that data point. Finally, I would create a 10GB RAMdrive and let the simulation run while writing to that RAMdrive and take another data point. Then I would analyze those three data points to see how much the application benefits. If it sees a good speed up on the RAMdrive, I would either try to run it as such or I would create a RAID 0 of the PCIe SSDs to double the writing throughput. Or I might decide to get an Optane P5800X or even two of those in RAID 0.
 
  • Like
Reactions: Tlh97 and Saylick

Saylick

Diamond Member
Sep 10, 2012
4,036
9,456
136
From my testing, PCIe SSDs suffer pretty badly under a VM (though my only test subject was a Samsung 980 Pro in a Ivy Bridge-E server). Can't say for sure if enterprise PCIe SSDs working on a newer server would be held back by the virtualization overhead. The easiest way to test would be to run the software directly on the server without a VM and see if the computation times improve.

I wish I had tools that real engineers use to pinpoint bottlenecks and I knew how to use them. My approach is just to give the application maximum resources that I can and see if it scales. With your application, suppose if it generates about 10GB of data to finish a typical simulation, I would first run the simulation inside the VM and note down the time. Then I would run the application on the server without a VM and note that data point. Finally, I would create a 10GB RAMdrive and let the simulation run while writing to that RAMdrive and take another data point. Then I would analyze those three data points to see how much the application benefits. If it sees a good speed up on the RAMdrive, I would either try to run it as such or I would create a RAID 0 of the PCIe SSDs to double the writing throughput. Or I might decide to get an Optane P5800X or even two of those in RAID 0.
Funny you say that, I have an intern this summer who will be doing some benchmarking for me on these servers just to understand where we currently stand. One of the parameters we wanted to study was, as you mentioned, the impact of storage speed on the analysis runtime. The IT Department doesn't believe that the VMs introduce a slow-down so this is one way to verify that...