Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

DisEnchantment · Sep 29, 2022

Speculate at will

igor_kavinski · Aug 10, 2023

Saylick said:
Our laptops can handle most analyses, just not the nonlinear time history earthquake sims that we use the servers for.

Hyper-V runs on laptops too. Leave the Zen4 laptops plugged in overnight while they crunch and munch on the sims in their VMs. They would have a higher ST throughput than servers.

A/// · Aug 10, 2023

Saylick said:
Haha, that's not exactly viable because it can take a few days for the analysis to complete. Our laptops can handle most analyses, just not the nonlinear time history earthquake sims that we use the servers for.

why'd i assumed you worked for nasa?

Markfw · Aug 10, 2023

igor_kavinski said:
Hyper-V runs on laptops too. Leave the Zen4 laptops plugged in overnight while they crunch and munch on the sims in their VMs. They would have a higher ST throughput than servers.

Or buy a 7950x. But even one 9554 could do in hours what their current farm may take days to do.

Saylick · Aug 10, 2023

igor_kavinski said:
Hyper-V runs on laptops too. Leave the Zen4 laptops plugged in overnight while they crunch and munch on the sims in their VMs. They would have a higher ST throughput than servers.

I see. Not to sound argumentative, but the other caveats are: 1) No one keeps their laptops continuously plugged in for multiple days straight, 2) the laptop processor will throttle under said workload, thus making it hot and uncomfortable for the user, 3) the size of the files for results of the analyses are in the hundreds of GB and our laptop hard drives are only 1 TB, and 4) the software requires an internet connection to the license server or else the analysis self-terminates. I know, a-hole design feature by the software vendor.

Trust me. There's a reason why we specifically run these kinds of analyses on a server. It's far easier to remote into the server VM than it is to run it locally.

Saylick · Aug 10, 2023

A/// said:
why'd i assumed you worked for nasa?

That's flattering. I wish my line of work was as interesting or sophisticated as NASA.

A/// · Aug 10, 2023

Saylick said:
That's flattering. I wish my line of work was as interesting or sophisticated as NASA.

I would say earthquake monitoring and analysis is of greater importance in the immediate time frame over things millions of light years away.

Saylick · Aug 10, 2023

Markfw said:
Or buy a 7950x. But even one 9554 could do in hours what their current farm may take days to do.

Not entirely true but I agree that having a lot of fast cores helps. The workload is 100% single threaded, but as I mentioned in an earlier post, we run multiple analyses in parallel. But again, each analysis is single threaded. One design iteration requires simulating each earthquake, and there's 11 or 14 earthquakes to simulate, depending on the project. Heavy emphasis on ST performance is crucial on a per-earthquake basis, but total MT throughput of the processor lets us run more iterations in general.

Markfw · Aug 10, 2023

Saylick said:
Not entirely true but I agree that having a lot of fast cores helps. The workload is 100% single threaded, but as I mentioned in an earlier post, we run multiple analyses in parallel. But again, each analysis is single threaded. One design iteration requires simulating each earthquake, and there's 11 or 14 earthquakes to simulate, depending on the project. Heavy emphasis on ST performance is crucial on a per-earthquake basis, but total MT throughput of the processor lets us run more iterations in general.

A 9174F may be better for you. 16 cores @ 4.4 ghz could be really fast. What are the specs of what you are running ?

Saylick · Aug 10, 2023

Markfw said:
A 9174F may be better for you. 16 cores @ 4.4 ghz could be really fast. What are the specs of what you are running ?

You're on the right track. I am eyeing the Genoa F SKUs because they clock high and have a ton of cores. The EPYC 9474F would be really sweet (48c, 3.6 base, 4.1 boost). We currently use blades with dual Xeon Gold 6154 (3.0 GHz base, 18c).

A/// · Aug 10, 2023

Saylick said:
Not entirely true but I agree that having a lot of fast cores helps. The workload is 100% single threaded, but as I mentioned in an earlier post, we run multiple analyses in parallel. But again, each analysis is single threaded. One design iteration requires simulating each earthquake, and there's 11 or 14 earthquakes to simulate, depending on the project. Heavy emphasis on ST performance is crucial on a per-earthquake basis, but total MT throughput of the processor lets us run more iterations in general.

very interesting. this post of yours reminds me of how zen 5 is allegedly going to have "hybrid" cores but not the clusterf*** intel has designed that leaves them without 2 threads per core and no avx512 that they had to come up with avx10, also let's not forget thier newest security nightmare cripples older processors. I need to watch the frizzy haired guy's video when I have free time on my hands and not focused on finishing up a bottle of wine but if the zen 5 stuff c cores still come threaded and support avx512 I think intel's next gen hardware is still going to be in trouble afaik because arrow lake is still going to be on this contrived design decision.

A/// · Aug 10, 2023

Saylick said:
You're on the right track. I am eyeing the Genoa F SKUs because they clock high and have a ton of cores. The EPYC 9474F would be really sweet (48c, 3.6 base, 4.1 boost). We currently use blades with dual Xeon Gold 6154 (3.0 GHz base, 18c).

if xeons were ever considered crimes against humanity it would be that series of xeons and ones since. any intel xeon in the last decade.

Joe NYC · Aug 10, 2023

yuri69 said:
CES 2024 is in 4 months. They are going to somehow present Strix there since they can't afford breaking the sacred AMD ExecutionTM.

CES might be too optimistic, unless AMD is now going to accept the recent practice of announce at CES, ship after Computex.

Markfw · Aug 10, 2023

Saylick said:
You're on the right track. I am eyeing the Genoa F SKUs because they clock high and have a ton of cores. The EPYC 9474F would be really sweet (48c, 3.6 base, 4.1 boost). We currently use blades with dual Xeon Gold 6154 (3.0 GHz base, 18c).

I will look for benchmarks, but as old as those are, the Genoa could be 3-4 times as fast per core.

Saylick · Aug 10, 2023

Markfw said:
I will look for benchmarks, but as old as those are, the Genoa could be 3-4 times as fast per core.

That would be really sweet if true, but I think between Skylake and Zen 4, there's about a 25% IPC gain. Going from 3.0 GHz to 3.6 GHz nets another 20%, so my guess is that I'd get a 50% speed boost per core from the Xeon Gold 6154 to the EPYC 9474F. Of course, with a gajillion more cores, I could run 33% iterations with one of the EPYC processors vs. the 2P configuration of the Xeon.

Joe NYC · Aug 10, 2023

Saylick said:
That would be really sweet if true, but I think between Skylake and Zen 4, there's about a 25% IPC gain. Going from 3.0 GHz to 3.6 GHz nets another 20%, so my guess is that I'd get a 50% speed boost per core from the Xeon Gold 6154 to the EPYC 9474F. Of course, with a gajillion more cores, I could run 33% iterations with one of the EPYC processors vs. the 2P configuration of the Xeon.

You might want to test how it runs with V-Cache.

Saylick · Aug 10, 2023

Joe NYC said:
You might want to test how it runs with V-Cache.

Good point. I'll see if we can evaluate Genoa-X.

Abwx · Aug 10, 2023

Saylick said:
That would be really sweet if true, but I think between Skylake and Zen 4, there's about a 25% IPC gain.

33% comparatively to a 10900K in ST according to Computerbase, and that include one bench out of 4 that is underestimated by 18% for Zen 4, so the number is rather 35-36%.

igor_kavinski · Aug 10, 2023

Saylick said:
Of course, with a gajillion more cores, I could run 33% iterations with one of the EPYC processors vs. the 2P configuration of the Xeon.

Is the simulation completely CPU limited or does the storage or memory speed also factor in its performance? VMs generally suck when it comes to storage and memory bandwidth. IOPS takes a pretty serious hit inside a VM. If you are running multiple parallel instances of your sim software inside a VM, why not just run 36 parallel instances directly on the host OS without using VMs? Or is eight parallel instances the max that software allows running on a machine?

Markfw · Aug 10, 2023

Abwx said:
33% comparatively to a 10900K in ST according to Computerbase, and that include one bench out of 4 that is underestimated by 18% for Zen 4, so the number is rather 35-36%.

And a 50% freq gain, so like possibly twice as fast overall. (3.0 - 4.4 for the 9474F I think)

Saylick · Aug 10, 2023

igor_kavinski said:
Is the simulation completely CPU limited or does the storage or memory speed also factor in its performance? VMs generally suck when it comes to storage and memory bandwidth. IOPS takes a pretty serious hit inside a VM. If you are running multiple parallel instances of your sim software inside a VM, why not just run 36 parallel instances directly on the host OS without using VMs? Or is eight parallel instances the max that software allows running on a machine?

Astute observations and questions. Let me see how many I can answer.

Is the simulation completely CPU limited or does the storage or memory speed also factor in its performance? VMs generally suck when it comes to storage and memory bandwidth. IOPS takes a pretty serious hit inside a VM.

I suspect it is primarily CPU limited, although having a fast storage solution helps because of how much data is being written. We used to run this software on HDDs and we saw an improvement to analysis runtimes when SSDs became common place. Currently, our servers use PCIe SSDs.

I would be curious to know where the bottlenecks lie, but I don't know how to run performance monitoring on software to determine which parts of the computer are getting hit hardest. Are there free tools that can do this? Do you have recommendations?

If you are running multiple parallel instances of your sim software inside a VM, why not just run 36 parallel instances directly on the host OS without using VMs?

Good point. We run Windows Server on the VMs, which are limited to two concurrently active remote desktop connections. Engineers like to remote into the server and make changes to the model in the software, post-process the data directly, etc.

If there wasn't a limitation to the number of concurrently active remote desktop connections, we would likely eliminate VMs entirely.

Or is eight parallel instances the max that software allows running on a machine?

Actually, each instance of the software can run 8 parallel analyses, and three instances of the software consumes one license. Therefore, having a core count that is a multiple of 24 is ideal from a license usage standpoint.

igor_kavinski · Aug 10, 2023

Saylick said:
If there wasn't a limitation to the number of concurrently active remote desktop connections, we would likely eliminate VMs entirely.

Thinstuff License Management - My Account

www.thinstuff.com

The licensing cost for that seems reasonable. It could help you do away with VMs entirely. One license of 5 users per server doesn't seem that expensive considering that there might be gains to be had from running your software on baremetal.

eek2121 · Aug 10, 2023

adroc_thurston said:
Tech debts are hard to get rid of.

You are cut-and-dry evidence of that.

adroc_thurston · Aug 10, 2023

eek2121 said:
You are cut-and-dry evidence of that.

Funny, but not funny enough to make it.

igor_kavinski · Aug 10, 2023

Saylick said:
We used to run this software on HDDs and we saw an improvement to analysis runtimes when SSDs became common place. Currently, our servers use PCIe SSDs.

From my testing, PCIe SSDs suffer pretty badly under a VM (though my only test subject was a Samsung 980 Pro in a Ivy Bridge-E server). Can't say for sure if enterprise PCIe SSDs working on a newer server would be held back by the virtualization overhead. The easiest way to test would be to run the software directly on the server without a VM and see if the computation times improve.

I would be curious to know where the bottlenecks lie, but I don't know how to run performance monitoring on software to determine which parts of the computer are getting hit hardest. Are there free tools that can do this? Do you have recommendations?

I wish I had tools that real engineers use to pinpoint bottlenecks and I knew how to use them. My approach is just to give the application maximum resources that I can and see if it scales. With your application, suppose if it generates about 10GB of data to finish a typical simulation, I would first run the simulation inside the VM and note down the time. Then I would run the application on the server without a VM and note that data point. Finally, I would create a 10GB RAMdrive and let the simulation run while writing to that RAMdrive and take another data point. Then I would analyze those three data points to see how much the application benefits. If it sees a good speed up on the RAMdrive, I would either try to run it as such or I would create a RAID 0 of the PCIe SSDs to double the writing throughput. Or I might decide to get an Optane P5800X or even two of those in RAID 0.

Saylick · Aug 10, 2023

igor_kavinski said:
From my testing, PCIe SSDs suffer pretty badly under a VM (though my only test subject was a Samsung 980 Pro in a Ivy Bridge-E server). Can't say for sure if enterprise PCIe SSDs working on a newer server would be held back by the virtualization overhead. The easiest way to test would be to run the software directly on the server without a VM and see if the computation times improve.

I wish I had tools that real engineers use to pinpoint bottlenecks and I knew how to use them. My approach is just to give the application maximum resources that I can and see if it scales. With your application, suppose if it generates about 10GB of data to finish a typical simulation, I would first run the simulation inside the VM and note down the time. Then I would run the application on the server without a VM and note that data point. Finally, I would create a 10GB RAMdrive and let the simulation run while writing to that RAMdrive and take another data point. Then I would analyze those three data points to see how much the application benefits. If it sees a good speed up on the RAMdrive, I would either try to run it as such or I would create a RAID 0 of the PCIe SSDs to double the writing throughput. Or I might decide to get an Optane P5800X or even two of those in RAID 0.

Funny you say that, I have an intern this summer who will be doing some benchmarking for me on these servers just to understand where we currently stand. One of the parameters we wanted to study was, as you mentioned, the impact of storage speed on the analysis runtime. The IT Department doesn't believe that the VMs introduce a slow-down so this is one way to verify that...

Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Golden Member

Lifer

Diamond Member

Moderator Emeritus, Elite Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Moderator Emeritus, Elite Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Moderator Emeritus, Elite Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Lifer

Moderator Emeritus, Elite Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member