- Mar 3, 2017
- 1,623
- 5,894
- 136
vs
Vanilla Zen4 latency VS X3D Zen4 latency
vs
All this extra gaming performance in Zen4 X3D comes from this little red square
"Instructions Per Cycle" means instructions per cycle. Is that so hard to memorize?
Edit, as an example, when one processor spins on a lock for 0.2 ms, and the other for 0.3 ms, which of the two processors got the higher Instructions Per Cycle count?
yea.Are those caches clocking higher?
and you gotta pay 4 cycles for V$ latching.lower latency
Why does the Zen4 show higher bandwidth / lower latency below 32MB? Are those caches clocking higher?
By looking at CPU performance counters?IPC can be calculated for game too
By looking at CPU performance counters?
Or by looking at Frames Per Second on the display output?
That was the point of #10,817, basically.
Old one threaded game engines were synced to fps. Such a case fps per frame is pretty much constant. Multithreaded engines do run game/physical engines asyncronously to rendering engine so instructions aren't totally tied to fps, but for fps mattering visual part they still pretty much are, at least if there's enough threads to not stall visual side.Instructions Per Frame are not a constant.
Prove me wrong. :-)
(Or don't. Somebody requested this loop to end already a while ago.)
So we are down to throwing as much spaghetti (G-Rated) as possible against the wall to see what sticks. Then claim 100% accuracy in prediction. Seriously, how bent does one have to be.AMD Ryzen 9000 CPUs rumored to only have 10% IPC boost - but let's not panic over Zen 5 yet
We'll need more seasoning than usual with this claim from out of left-field that Zen 5 processors will only usher in a 10% generational IPC increase.www.tweaktown.com
AMD Zen 5 CPUs Rumored To Feature Around 10% IPC Increase, Slightly More In Cinebench R23 Single-Thread Test
AMD's Zen 5 CPUs are rumored to feature an IPC increase of around 10% with the latest core architecture powering next-gen Ryzen & EPYC chips.wccftech.com
So we now have:
9800X, 8 cores, 170w TDP
Clock regression, ~100Mhz
IPC, ~10% compared to Zen4 <NEW>
OMG. I strongly recommend Mike Clark don't wake up anytime soon and keep sleeping until Zen6.
My take I guess is just my own, but its close to yours. I think 2 points are shared. Efficiency is key for us, and avx-512 helps a lot.No, there must have been mixed up something.
Zen 1 -> Zen 2: circa double the FP throughput per core, circa double the throughput/WattZen 2 -> Zen 3: some throughput increase but barely any throughput/Watt increase in most cases, big benefit to special multithreaded workloads which have larger than 16 MB cache footprintZen 3 -> Zen 4: notably higher throughput and throughput/Watt, additional performance increase in vectorized FP workloadsin various Distributed Computing applications. (These are applications which are highly parallel/ almost entirely compute-bound/ power-limited workloads with FP focus. One could conclude that the manufacturing node updates are all what counts in this set of workloads. But really, microarchitecture updates <edit: and SOC updates> and node updates go hand in hand as they enable and leverage each other.)
[I don't have Zen 1/ Naples (but Broadwell-EP which has got similar throughput/Watt), nor do I have Zen 3 myself. I do have Zen 2/ Rome and Zen 4/ Genoa in machines which are configured to same core counts and similar power budgets. My conclusions relative to Zen 1 and Zen 3 rely on what I have seen from others' computers.]
Zen 5 in Distributed Computing? I trust that AMD carves out a decent perf/W update once again, despite only a minor manufacturing node update. But how much? Various hints earlier in this thread sounded promising to me. Though so far, 1T or/and iso-clock or/and integer performance characteristics have been more of a focus in this thread so far, rather than nT iso-power FP.
Actually SMT does measurably improve throughput in PrimeGrid on Zen 4, desktop and server, and does improve perf/W slightly. In contrast, on Zen 2 and Zen 3, SMT usage in PrimeGrid provides no or sometimes a small host throughput advantage but always reduces perf/W. (PrimeGrid is vectorized FP with large cache footprint, but not too large on Zen 3 and 4 if the user gives hints to the OS's process scheduler. Zen 2's cache is too small in many but not all of PrimeGrid's currently active projects.)
Well, not owning Zen 1 and Zen 3 myself, I don't ultimately trust my own assessments of them. Though back in the day, the Zen 1-->2 step evidently was a big one in perf/host and perf/W thanks to the Glofo 14nm --> TSMC 7nm switch, but not only due to that as the Zen 2 core and SOC update was far from a straightforward shrink.My take I guess is just my own, but its close to yours.
Yep, as the aggregate core count in the household reaches certain above-average levels, and many of these cores are actually used 24/7 (be it for Citizen Science or for engineering jobs etc.), small things like the electric bill, the heat load in the home, or which computer to attach to which power circuit do become more of a concern. I find myself thinking more often in terms of perf/host and perf/W than perf/core. So, while the (alas rather circular) iso-clock performance discussions here in this thread are surely interesting (vulgo: IPC), what I am looking forward to more is to eventually get to see perf/W figures.Efficiency is key for us,
For MT workloads primarily? Ideally there should have been models with E cores for that, if perf/W was important. But it looks like we won’t get that for Zen5 on DT at least.what I am looking forward to more is to eventually get to see perf/W figures.
Says who?For MT workloads primarily? Ideally there should have been models with E cores for that, if perf/W was important. But it looks like we won’t get that for Zen5 on DT at least.
you can't do HMP in DC.Ideally there should have been models with E cores for that
Dang gone it....... I got to learn that trick!otherwise, stop being poor
Why would it?Any idea if Zen5 has (Intel) AMX?
What's the max cores in Turin-D?you can't do HMP in DC.
otherwise, stop being poor and buy Turin-D then?
192c.What's the max cores in Turin-D?
I guess it depends on what workloads you are running. Most people with DT systems do not have them mounted in racks. So space is not really a concern.For compute nodes,
– CPUs with cores of uneven per-core performance,– area-optimized coresare not attractive. You'd want
+ CPUs with homogeneous cores,+ cores and SOCs which are optimized towards a certain point between the three targets performance, performance efficiency, and performance density.The particular location of the optimization sweet spot depends on your cost structure (e.g. whether or not there are software licensing costs involved; whether or not rack space is at a premium to you…).
Edit, that's also true for home computers, if used for computing in the narrower sense, "HPC at home" if you will. E.g. when I built my first two dual-socket computers a while back, I needed not just plain perf/dollar (which would have been much better with desktop computers) but also perf/node (due to synchronization overhead in my application, which was too high over Ethernet for my purpose) and perf/core (due scaling difficulties in this application). If CPUs with "e cores" had been available back at that time, they would not have been what I needed due to the latter aspect. Edit 2, nowadays I accumulated enough computers that "rack space" (shelf space actually) is definitely a criterion to me too. (Energy consumption more so, though.)
Nope, that's only ever useful for cinememe.For use cases with above ~8C, you want max MT perf, max perf/watt, and max E core count for lowest price.