- Mar 3, 2017
- 1,777
- 6,791
- 136
Hyper-V runs on laptops too. Leave the Zen4 laptops plugged in overnight while they crunch and munch on the sims in their VMs. They would have a higher ST throughput than servers.Our laptops can handle most analyses, just not the nonlinear time history earthquake sims that we use the servers for.
why'd i assumed you worked for nasa?Haha, that's not exactly viable because it can take a few days for the analysis to complete. Our laptops can handle most analyses, just not the nonlinear time history earthquake sims that we use the servers for.
Or buy a 7950x. But even one 9554 could do in hours what their current farm may take days to do.Hyper-V runs on laptops too. Leave the Zen4 laptops plugged in overnight while they crunch and munch on the sims in their VMs. They would have a higher ST throughput than servers.
I see. Not to sound argumentative, but the other caveats are: 1) No one keeps their laptops continuously plugged in for multiple days straight, 2) the laptop processor will throttle under said workload, thus making it hot and uncomfortable for the user, 3) the size of the files for results of the analyses are in the hundreds of GB and our laptop hard drives are only 1 TB, and 4) the software requires an internet connection to the license server or else the analysis self-terminates. I know, a-hole design feature by the software vendor.Hyper-V runs on laptops too. Leave the Zen4 laptops plugged in overnight while they crunch and munch on the sims in their VMs. They would have a higher ST throughput than servers.
That's flattering. I wish my line of work was as interesting or sophisticated as NASA.why'd i assumed you worked for nasa?
I would say earthquake monitoring and analysis is of greater importance in the immediate time frame over things millions of light years away.That's flattering. I wish my line of work was as interesting or sophisticated as NASA.
Not entirely true but I agree that having a lot of fast cores helps. The workload is 100% single threaded, but as I mentioned in an earlier post, we run multiple analyses in parallel. But again, each analysis is single threaded. One design iteration requires simulating each earthquake, and there's 11 or 14 earthquakes to simulate, depending on the project. Heavy emphasis on ST performance is crucial on a per-earthquake basis, but total MT throughput of the processor lets us run more iterations in general.Or buy a 7950x. But even one 9554 could do in hours what their current farm may take days to do.
A 9174F may be better for you. 16 cores @ 4.4 ghz could be really fast. What are the specs of what you are running ?Not entirely true but I agree that having a lot of fast cores helps. The workload is 100% single threaded, but as I mentioned in an earlier post, we run multiple analyses in parallel. But again, each analysis is single threaded. One design iteration requires simulating each earthquake, and there's 11 or 14 earthquakes to simulate, depending on the project. Heavy emphasis on ST performance is crucial on a per-earthquake basis, but total MT throughput of the processor lets us run more iterations in general.
You're on the right track. I am eyeing the Genoa F SKUs because they clock high and have a ton of cores. The EPYC 9474F would be really sweet (48c, 3.6 base, 4.1 boost). We currently use blades with dual Xeon Gold 6154 (3.0 GHz base, 18c).A 9174F may be better for you. 16 cores @ 4.4 ghz could be really fast. What are the specs of what you are running ?
very interesting. this post of yours reminds me of how zen 5 is allegedly going to have "hybrid" cores but not the clusterf*** intel has designed that leaves them without 2 threads per core and no avx512 that they had to come up with avx10, also let's not forget thier newest security nightmare cripples older processors. I need to watch the frizzy haired guy's video when I have free time on my hands and not focused on finishing up a bottle of wine but if the zen 5 stuff c cores still come threaded and support avx512 I think intel's next gen hardware is still going to be in trouble afaik because arrow lake is still going to be on this contrived design decision.Not entirely true but I agree that having a lot of fast cores helps. The workload is 100% single threaded, but as I mentioned in an earlier post, we run multiple analyses in parallel. But again, each analysis is single threaded. One design iteration requires simulating each earthquake, and there's 11 or 14 earthquakes to simulate, depending on the project. Heavy emphasis on ST performance is crucial on a per-earthquake basis, but total MT throughput of the processor lets us run more iterations in general.
if xeons were ever considered crimes against humanity it would be that series of xeons and ones since. any intel xeon in the last decade.You're on the right track. I am eyeing the Genoa F SKUs because they clock high and have a ton of cores. The EPYC 9474F would be really sweet (48c, 3.6 base, 4.1 boost). We currently use blades with dual Xeon Gold 6154 (3.0 GHz base, 18c).
CES might be too optimistic, unless AMD is now going to accept the recent practice of announce at CES, ship after Computex.CES 2024 is in 4 months. They are going to somehow present Strix there since they can't afford breaking the sacred AMD ExecutionTM.
I will look for benchmarks, but as old as those are, the Genoa could be 3-4 times as fast per core.You're on the right track. I am eyeing the Genoa F SKUs because they clock high and have a ton of cores. The EPYC 9474F would be really sweet (48c, 3.6 base, 4.1 boost). We currently use blades with dual Xeon Gold 6154 (3.0 GHz base, 18c).
That would be really sweet if true, but I think between Skylake and Zen 4, there's about a 25% IPC gain. Going from 3.0 GHz to 3.6 GHz nets another 20%, so my guess is that I'd get a 50% speed boost per core from the Xeon Gold 6154 to the EPYC 9474F. Of course, with a gajillion more cores, I could run 33% iterations with one of the EPYC processors vs. the 2P configuration of the Xeon.I will look for benchmarks, but as old as those are, the Genoa could be 3-4 times as fast per core.
You might want to test how it runs with V-Cache.That would be really sweet if true, but I think between Skylake and Zen 4, there's about a 25% IPC gain. Going from 3.0 GHz to 3.6 GHz nets another 20%, so my guess is that I'd get a 50% speed boost per core from the Xeon Gold 6154 to the EPYC 9474F. Of course, with a gajillion more cores, I could run 33% iterations with one of the EPYC processors vs. the 2P configuration of the Xeon.
Good point. I'll see if we can evaluate Genoa-X.You might want to test how it runs with V-Cache.
33% comparatively to a 10900K in ST according to Computerbase, and that include one bench out of 4 that is underestimated by 18% for Zen 4, so the number is rather 35-36%.That would be really sweet if true, but I think between Skylake and Zen 4, there's about a 25% IPC gain.
Is the simulation completely CPU limited or does the storage or memory speed also factor in its performance? VMs generally suck when it comes to storage and memory bandwidth. IOPS takes a pretty serious hit inside a VM. If you are running multiple parallel instances of your sim software inside a VM, why not just run 36 parallel instances directly on the host OS without using VMs? Or is eight parallel instances the max that software allows running on a machine?Of course, with a gajillion more cores, I could run 33% iterations with one of the EPYC processors vs. the 2P configuration of the Xeon.
And a 50% freq gain, so like possibly twice as fast overall. (3.0 - 4.4 for the 9474F I think)33% comparatively to a 10900K in ST according to Computerbase, and that include one bench out of 4 that is underestimated by 18% for Zen 4, so the number is rather 35-36%.
Astute observations and questions. Let me see how many I can answer.Is the simulation completely CPU limited or does the storage or memory speed also factor in its performance? VMs generally suck when it comes to storage and memory bandwidth. IOPS takes a pretty serious hit inside a VM. If you are running multiple parallel instances of your sim software inside a VM, why not just run 36 parallel instances directly on the host OS without using VMs? Or is eight parallel instances the max that software allows running on a machine?
I suspect it is primarily CPU limited, although having a fast storage solution helps because of how much data is being written. We used to run this software on HDDs and we saw an improvement to analysis runtimes when SSDs became common place. Currently, our servers use PCIe SSDs.Is the simulation completely CPU limited or does the storage or memory speed also factor in its performance? VMs generally suck when it comes to storage and memory bandwidth. IOPS takes a pretty serious hit inside a VM.
Good point. We run Windows Server on the VMs, which are limited to two concurrently active remote desktop connections. Engineers like to remote into the server and make changes to the model in the software, post-process the data directly, etc.If you are running multiple parallel instances of your sim software inside a VM, why not just run 36 parallel instances directly on the host OS without using VMs?
Actually, each instance of the software can run 8 parallel analyses, and three instances of the software consumes one license. Therefore, having a core count that is a multiple of 24 is ideal from a license usage standpoint.Or is eight parallel instances the max that software allows running on a machine?
If there wasn't a limitation to the number of concurrently active remote desktop connections, we would likely eliminate VMs entirely.
You are cut-and-dry evidence of that.Tech debts are hard to get rid of.
Funny, but not funny enough to make it.You are cut-and-dry evidence of that.
From my testing, PCIe SSDs suffer pretty badly under a VM (though my only test subject was a Samsung 980 Pro in a Ivy Bridge-E server). Can't say for sure if enterprise PCIe SSDs working on a newer server would be held back by the virtualization overhead. The easiest way to test would be to run the software directly on the server without a VM and see if the computation times improve.We used to run this software on HDDs and we saw an improvement to analysis runtimes when SSDs became common place. Currently, our servers use PCIe SSDs.
I wish I had tools that real engineers use to pinpoint bottlenecks and I knew how to use them. My approach is just to give the application maximum resources that I can and see if it scales. With your application, suppose if it generates about 10GB of data to finish a typical simulation, I would first run the simulation inside the VM and note down the time. Then I would run the application on the server without a VM and note that data point. Finally, I would create a 10GB RAMdrive and let the simulation run while writing to that RAMdrive and take another data point. Then I would analyze those three data points to see how much the application benefits. If it sees a good speed up on the RAMdrive, I would either try to run it as such or I would create a RAID 0 of the PCIe SSDs to double the writing throughput. Or I might decide to get an Optane P5800X or even two of those in RAID 0.I would be curious to know where the bottlenecks lie, but I don't know how to run performance monitoring on software to determine which parts of the computer are getting hit hardest. Are there free tools that can do this? Do you have recommendations?
Funny you say that, I have an intern this summer who will be doing some benchmarking for me on these servers just to understand where we currently stand. One of the parameters we wanted to study was, as you mentioned, the impact of storage speed on the analysis runtime. The IT Department doesn't believe that the VMs introduce a slow-down so this is one way to verify that...From my testing, PCIe SSDs suffer pretty badly under a VM (though my only test subject was a Samsung 980 Pro in a Ivy Bridge-E server). Can't say for sure if enterprise PCIe SSDs working on a newer server would be held back by the virtualization overhead. The easiest way to test would be to run the software directly on the server without a VM and see if the computation times improve.
I wish I had tools that real engineers use to pinpoint bottlenecks and I knew how to use them. My approach is just to give the application maximum resources that I can and see if it scales. With your application, suppose if it generates about 10GB of data to finish a typical simulation, I would first run the simulation inside the VM and note down the time. Then I would run the application on the server without a VM and note that data point. Finally, I would create a 10GB RAMdrive and let the simulation run while writing to that RAMdrive and take another data point. Then I would analyze those three data points to see how much the application benefits. If it sees a good speed up on the RAMdrive, I would either try to run it as such or I would create a RAID 0 of the PCIe SSDs to double the writing throughput. Or I might decide to get an Optane P5800X or even two of those in RAID 0.