Discussion AWS Graviton2 64 vCPU Arm CPU Heightens War of Intel Betrayal

exquisitechar · Dec 3, 2019

https://www.servethehome.com/aws-graviton2-64-core-arm-cpu-heightens-war-of-intel-betrayal/

Pretty big deal for ARM in servers. Interested in seeing a comparison between this and Rome.

DrMrLordX · Mar 30, 2020

CHADBOGA said:
It is always good to have an Internet Strong Man around.

As long as the Strong Man isn't too strong. Like that one, you know . . . guy.

amrnuke · Mar 30, 2020

Richie Rich said:
Imagine ARM competition for AMD Renoir 8c/16t for laptops. They can use 16x core A77 (ARM core has half area so resulting in same total area) running it at 2.5GHz and still get performance similar to Renoir clocked at 5GHz (and that's not even possible) while having 4x lower TDP. And that's the ARM power for which x86 has no answer.

Care to share your data supporting this situation?

ksec · Mar 30, 2020

Wait a min, I am not following, are we in discussion that N1 from Graviton is not good enough to complete with x86 single thread / core?

Or do we simply have a disagreement whether ARM is so good that it will take over the ( Server and PC ) world ?

And cant we agree that while N1 could in theory compete with x86 in single core, it doesn't necessarily mean it will take over the world? That is a far too simplistic view.

Richie Rich · Mar 30, 2020

amrnuke said:
Care to share your data supporting this situation?

ARM's Neoverse N1 is base on A76 (1.2mm2 512kb L2$, 1.4mm2 1MB L2$). You can read AnandTech article about that. https://www.anandtech.com/show/13959/arm-announces-neoverse-n1-platform/2
A77 is 17% more transistors so 1.4 and 1.6mm2 for different L2$ size.
Zen2 core was measured about 3.6mm2 (512kB L2$).

03_Infra%20Tech%20Day%202019_Filippo%20Neoverse%20N1%20FINAL%20WM15.jpg

Elfear · Mar 30, 2020

Richie Rich said:
ARM's Neoverse N1 is base on A76 (1.2mm2 512kb L2$, 1.4mm2 1MB L2$). You can read AnandTech article about that. https://www.anandtech.com/show/13959/arm-announces-neoverse-n1-platform/2
A77 is 17% more transistors so 1.4 and 1.6mm2 for different L2$ size.
Zen2 core was measured about 3.6mm2 (512kB L2$).

Get back to me when I can go out and buy an ARM competitor for my desktop.

rainy · Mar 30, 2020

DrMrLordX said:
As long as the Strong Man isn't too strong. Like that one, you know . . . guy.

He's pretty similar to juanrga - I see exactly the same fixation on ARM architecture.

NTMBK · Mar 30, 2020

Elfear said:
Get back to me when I can go out and buy an ARM competitor for my desktop.

Why would the ARM vendors bother? It's not a terribly lucrative niche. There's a reason the only desktop chips today are either overclocked laptop chips or cut down server chips.

amrnuke · Mar 30, 2020

Richie Rich said:
ARM's Neoverse N1 is base on A76 (1.2mm2 512kb L2$, 1.4mm2 1MB L2$). You can read AnandTech article about that. https://www.anandtech.com/show/13959/arm-announces-neoverse-n1-platform/2
A77 is 17% more transistors so 1.4 and 1.6mm2 for different L2$ size.
Zen2 core was measured about 3.6mm2 (512kB L2$).

This does not at all support your conclusion that a 16 core A77 clocked at 2.5 GHz would have the same "performance" (whatever you mean by that - gaming? photo work? rendering? browsing?) as Renoir 8c/16t clocked at 5 GHz while still having 4 times lower TDP.

I asked you to support your statement about performance, and you gave me information on die size and transistors and cache.

Again, I ask you to support your statement with some kind of hard data, some benchmark, some comparative evaluation of actual performance.

I'm not saying you're wrong (how could I prove that? There are no 5 GHz Renoir chips nor any 16 core A77-based CPUs). But I'm saying you need to produce some data before you just start throwing claims around.

amrnuke · Mar 30, 2020

NTMBK said:
Why would the ARM vendors bother? It's not a terribly lucrative niche. There's a reason the only desktop chips today are either overclocked laptop chips or cut down server chips.

Fair point. One could even argue that Renoir is also just a cut down server CPU uarch (ie, Zen2 CCXs) taped onto silicon next to some 7nm Vega CUs.

Richie Rich · Mar 30, 2020

amrnuke said:
This does not at all support your conclusion that a 16 core A77 clocked at 2.5 GHz would have the same "performance" (whatever you mean by that - gaming? photo work? rendering? browsing?) as Renoir clocked at 5 GHz while still having 4 times lower TDP.

I asked you to support your statement about performance, and you gave me information on die size and transistors.

Again, I ask you to support your statement with some kind of hard data, some benchmark, some comparative evaluation of actual performance.

Ok, let me explain it in more details.

Assuming same IPC A77 vs. Zen2 (A77 is a bit faster in SPECint2006 but lets forget that now, it depends on SW).
Renoir 8c at 5 GHz vs. 16c A77 at 2.5 GHz..... identical MT performance for both.
SMT lowers performance per thread to about half so it's now equal to A77 at half frequency. Performance per thread is identical for both.

Half frequency is about 4x lower power consumption. (double A77 cores neutralize ARM's double energy efficiency)

Summary: Same performance per thread while 4x lower power consumption.
But in reality Renoir cannot be clocked at 5GHz physically, not speaking about TDP in laptop, while A77 can go to 2,8Ghz easily. Unfortunately there is no such a 16 core A77 CPU while Renoir is on market. That's the little catch

coercitiv · Mar 30, 2020

You heard it here first folks, 16c Ryzen @ 2.5Ghz is equivalent to 8c Ryzen @ 5Ghz in consumer workloads.

lobz · Mar 30, 2020

coercitiv said:
You heard it here first folks, 16c Ryzen @ 2.5Ghz is equivalent to 8c Ryzen @ 5Ghz in consumer workloads.

Not to mention the discovery of @name99 - every monkey can make CPUs with a lot of cores, it's not that hard.

Markfw · Mar 30, 2020

coercitiv said:
You heard it here first folks, 16c Ryzen @ 2.5Ghz is equivalent to 8c Ryzen @ 5Ghz in consumer workloads.

Yea, I am really getting sick of the ARM advocates, as many of them have no clue about reality.

amrnuke · Mar 30, 2020

Richie Rich said:
Ok, let me explain it in more details.

Assuming same IPC A77 vs. Zen2 (A77 is a bit faster in SPECint2006 but lets forget that now, it depends on SW).

SPECint2006
A77 at 2.6 GHz is estimated to be about 31.66 (link)
3900X at 4.6 GHz is 52.12 (link)
EPYC 7742 at 3.4 GHz is 41.9 (link)
= A77 IPC per GHz is 12.18
= 3900X IPC per GHz is 11.33
= EPYC 7742 IPC per GHz is 12.32
Let's call it even.

Richie Rich said:
Renoir 8c at 5 GHz vs. 16c A77 at 2.5 GHz..... identical MT performance for both.

That's a huge jump. Assuming one can just build out an A77 to perform in real, actual multithreaded applications in the laptop/desktop/HEDT market, which has never, not once, ever been done.

Richie Rich said:
SMT lowers performance per thread to about half so it's now equal to A77 at half frequency. Performance per thread is identical for both.

That's just entirely untrue.

1) Disabling SMT has a very small performance difference in single-threaded applications

2) SMT vs non-SMT tests show that SMT cores achieve somewhere between 54% and 82% of what would be expected from the addition of a true extra core (heavily threaded apps including wPrime, CBR20, Blender, Corona, Keyshot, MySQL, 7z-decompression are all used in this calculation). On average, an SMT core is worth about 66% of a real core.

3900X	SMT off	SMT on	12c/12t per-thread score	12c/24t per-thread score	Expected 24c/24t time/per-thread score	% of expected for added threads
wPrime	82.39	56.59	n/a	n/a	41.195	62.63
CBR20	5553.2	7260.3	462.767	302.512	462.767	65.37
Blender	229.79	156.92	n/a	n/a	114.895	63.42
Corona	176.8	129	n/a	n/a	88.4	54.07
Keyshot	208.4	303.6	17.367	12.650	17.367	72.84
MySQL	220741	277754	18395.083	11573.083	18395.083	62.91
7z-decomp	53722	88438	4476.833	3684.917	4476.833	82.31
					Average	66.22

So we can derive expected performance of an virtual cores when compared to real cores, as (# of real cores) + (# of virtual cores * 0.5 * 0.66).

E.g. if we are comparing 3900X with SMT disabled to a 3600 with SMT on, assuming both are clocked the same:

3600 (6c/12t) relative performance to 3900X (12c/12t) = 6 + (12 * 0.5 * 0.66) = 9.66 "real" cores vs 12 real cores

So it's not a 50% performance hit. It's more like a 1 - ( 9.66 / 12 ) = 19.5% performance hit compared to using real cores.

Richie Rich said:
Half frequency is about 4x lower power consumption. (double A77 cores neutralize ARM's double energy efficiency)

You're shoving Zen2 WAAAAAY up the voltage-frequency curve from 4.2 GHz (4800H, for example) to 5 GHz while pushing A77 DOWN the voltage-frequency curve from 2.6 to 2.5 GHz. That's unfair when you're wanting to compare power consumption.

If we just take them as they are, and assume the A77 scales up to 16 cores perfectly:

Zen2 = ~11.8 (12.3 + 11.3 / 2) IPC / GHz score
A77 = ~12.2 IPC / GHz score

Renoir 4800 at 4.2 GHz = 11.8 * 4.2 = 49.56 for each core
A77 i at 2.6 GHz = 12.18 * 2.6 = 31.66 for each core (as per the above)

Since we know from my above calculations that enabling SMT only results in a 20% performance hit compared to using a real core, we can easily extrapolate this out.

Renoir 4800 = 49.56 * (8 real cores) + (16 virtual cores * 0.5 * 0.66) = 658.16
A77 i = 31.66 * 16 real cores = 506.56
Renoir performance per thread lead would be 30% even though half of those threads aren't even "real" cores!

Richie Rich said:
Summary: Same performance per thread while 4x lower power consumption.

Summary: You have overestimated the penalty for using SMT, and then misapplied it.

As for power consumption, 4800 has a 10-45W TDP. I don't know what a 16 core A77 i would have in power consumption. But if we match it up against a 4800U at 10 watts TDP, I doubt a 16 core A77 would have the 2 watt TDP it would need to quadruple efficiency.

ksec · Mar 30, 2020

Markfw said:
Yea, I am really getting sick of the ARM advocates, as many of them have no clue about reality.

Let's be fair, it really isn't ARM's advocates that is the problem, it is the lack of understanding, gap in knowledge and unwillingness to learn. You see that with Intel and AMD advocates as well.

I think we need a term for these people.

( Edit: I am still waiting for TSMC to ship 5nm to prove my point. )

DrMrLordX · Mar 30, 2020

rainy said:
He's pretty similar to REDACTED - I see exactly the same fixation on ARM architecture.

Do not speak the naaaame!

Unlike our resident "Strong Man", he-who-must-not-be-named can smite an entire forum through sheer force of will. We would surely perish in his presence. Our resident "Strong Man"/troll is persistently foolish. That is all.

NTMBK said:
Why would the ARM vendors bother? It's not a terribly lucrative niche. There's a reason the only desktop chips today are either overclocked laptop chips or cut down server chips.

Interesting question! I think Linus Torvalds wrote a good piece on the importance of having consumer ARM hardware available for developers:

Real World Technologies - Forums - Thread: ARM announces Ares

www.realworldtech.com

We will see if that has any bearing on the future of ARM. But he has a point.

name99 · Mar 30, 2020

amrnuke said:
SPECint2006
A77 at 2.6 GHz is estimated to be about 31.66 (link)
3900X at 4.6 GHz is 52.12 (link)
EPYC 7742 at 3.4 GHz is 41.9 (link)
= A77 IPC per GHz is 12.18
= 3900X IPC per GHz is 11.33
= EPYC 7742 IPC per GHz is 12.32
Let's call it even.

That's a huge jump. Assuming one can just build out an A77 to perform in real, actual multithreaded applications in the laptop/desktop/HEDT market, which has never, not once, ever been done.

That's just entirely untrue.

1) Disabling SMT has a very small performance difference in single-threaded applications

2) SMT vs non-SMT tests show that SMT cores achieve somewhere between 54% and 82% of what would be expected from the addition of a true extra core (heavily threaded apps including wPrime, CBR20, Blender, Corona, Keyshot, MySQL, 7z-decompression are all used in this calculation). On average, an SMT core is worth about 66% of a real core.

3900X SMT off SMT on 12c/12t per-thread score 12c/24t per-thread score Expected 24c/24t time/per-thread score % of expected for added threads
wPrime 82.39 56.59 n/a n/a 41.195 62.63
CBR20 5553.2 7260.3 462.767 302.512 462.767 65.37
Blender 229.79 156.92 n/a n/a 114.895 63.42
Corona 176.8 129 n/a n/a 88.4 54.07
Keyshot 208.4 303.6 17.367 12.650 17.367 72.84
MySQL 220741 277754 18395.083 11573.083 18395.083 62.91
7z-decomp 53722 88438 4476.833 3684.917 4476.833 82.31
Average 66.22

So we can derive expected performance of an virtual cores when compared to real cores, as (# of real cores) + (# of virtual cores * 0.5 * 0.66).

E.g. if we are comparing 3900X with SMT disabled to a 3600 with SMT on, assuming both are clocked the same:

3600 (6c/12t) relative performance to 3900X (12c/12t) = 6 + (12 * 0.5 * 0.66) = 9.66 "real" cores vs 12 real cores

So it's not a 50% performance hit. It's more like a 1 - ( 9.66 / 12 ) = 19.5% performance hit compared to using real cores.

You're shoving Zen2 WAAAAAY up the voltage-frequency curve from 4.2 GHz (4800H, for example) to 5 GHz while pushing A77 DOWN the voltage-frequency curve from 2.6 to 2.5 GHz. That's unfair when you're wanting to compare power consumption.

If we just take them as they are, and assume the A77 scales up to 16 cores perfectly:

Zen2 = ~11.8 (12.3 + 11.3 / 2) IPC / GHz score
A77 = ~12.2 IPC / GHz score

Renoir 4800 at 4.2 GHz = 11.8 * 4.2 = 49.56 for each core
A77 i at 2.6 GHz = 12.18 * 2.6 = 31.66 for each core (as per the above)

Since we know from my above calculations that enabling SMT only results in a 20% performance hit compared to using a real core, we can easily extrapolate this out.

Renoir 4800 = 49.56 * (8 real cores) + (16 virtual cores * 0.5 * 0.66) = 658.16
A77 i = 31.66 * 16 real cores = 506.56
Renoir performance per thread lead would be 30% even though half of those threads aren't even "real" cores!

Summary: You have overestimated the penalty for using SMT, and then misapplied it.

As for power consumption, 4800 has a 10-45W TDP. I don't know what a 16 core A77 i would have in power consumption. But if we match it up against a 4800U at 10 watts TDP, I doubt a 16 core A77 would have the 2 watt TDP it would need to quadruple efficiency.

There's a much simpler way to say it. Essentially what you are asserting is that for Intel SMT2 a second thread is equivalent to about 1/3 of a core. (At least I think that's what you're saying, I'm not interested enough to validate your arithmetic and try to reverse engineer what you're calculating).

Now is this true? I'd say that it's way too optimistic in general.
Over a wider range of benchmarks, I'd say SMT worth 25% of a core is a better approximation.
To do better than this requires code that
- doesn't spend all its time in a single execution unit (usually SIMD).
- doesn't utilize memory in "normal" ways.

The first is obvious -- if thread A is using the SIMD unit(s) on 90% of cycles, and thread B wants to do the same, then both are going to run at close to half speed.
The second is less obvious (and a prime reason why SMT just never gets much better, no matter how much proponents try t push it). For most code, the single biggest bottleneck is the L1 cache. But with SMT under most conditions you're now halving the effective size of that cache :-( (and other "cache like" structures like branch tables). What you lose from that takes away much of the win you might hope for from a naive analysis of SMT.

I don't know many of the benchmarks listed. But I expect most of them take the form of extreme computation (with little reference to memory) while not using much AVX.
What happens when you get rid of those assumptions? Well then you get something like this:

AMD Ryzen 9 3900X, SMT on vs SMT off, vs Intel 9900K

By community request, we present our findings on how the AMD Ryzen 9 3900X performs with SMT disabled. This approach has potential, especially for gaming, because it ensures more physical hardware units are available for each thread, and could also benefit the processor's power management.

www.techpowerup.com

Sometimes great -- and sometimes basically nothing...

If you want an extra 25% throughput, you can add SMT. Or you can add an ARM small core (which generally has about the same performance compared to the big core). The ARM cores small are small enough they're basically lost in the chip area noise (even the Apple ones are small.)
You lose the SYMMETRY of SMT, sure. But you also lose the INSECURITY of SMT. And you gain the optionality of lower power.

SMT isn't some superpower that x86 has and ARM does not. It's ONE way of increasing throughput at low area. Not the only way, not a great solution for some purposes. And of course should a company think it does make sense for their products (Broadcom, now Marvell) it can be added easily enough.
(Personally I think this was a dumb decision by Marvell, both generically and wrt details of how they did it. But it's their company not mine, we'll see the consequences soon enough. Anyone else in the ARM space can also add it -- and maybe they will, hopefully done right rather than done dumb.)

coercitiv · Mar 31, 2020

name99 said:
If you want an extra 25% throughput, you can add SMT. Or you can add an ARM small core (which generally has about the same performance compared to the big core). The ARM cores small are small enough they're basically lost in the chip area noise (even the Apple ones are small.)
You lose the SYMMETRY of SMT, sure. But you also lose the INSECURITY of SMT. And you gain the optionality of lower power.

While this may look true and completely valid at first sight, one immediately asks the next logical question: why stop at 25% throughput increase when the small cores are basically "lost in the chip area noise"? Why not go for 50% or even 75%?

Sounds to me like that SYMMETRY may be much more important than your think.

Nothingness · Mar 31, 2020

Markfw said:
Yea, I am really getting sick of the ARM advocates, as many of them have no clue about reality.

I have the same issue with AMD fanatics you know. Or Intel ones.

Just ignore ARM threads like many ignore threads where AMD advocates are making claims that prove they have no clue about reality.

Nothingness · Mar 31, 2020

ksec said:
Let's be fair, it really isn't ARM's advocates that is the problem, it is the lack of understanding, gap in knowledge and unwillingness to learn. You see that with Intel and AMD advocates as well.

I think we need a term for these people.

I call them fanbois but it's considered as an insult on this forum

Someone used fanatics and I like it. Would that be OK with forum rules?

Richie Rich · Mar 31, 2020

amrnuke said:
On average, an SMT core is worth about 66% of a real core.
3600 (6c/12t) relative performance to 3900X (12c/12t) = 6 + (12 * 0.5 * 0.66) = 9.66 "real" cores vs 12 real cores
So it's not a 50% performance hit. It's more like a 1 - ( 9.66 / 12 ) = 19.5% performance hit compared to using real cores.

I do not see such a huge SMT benefit on my 3700X. Coercitiv and Markfw experience average 66% SMT benefit too?

I like any calculation because it's hard to bend numbers. But if you do math with incorrect input data such as assumption that SMT brings between 66-80% more performance, then no wonder Renoir wins everywhere. I think if you "tune" the input data even further you may get Renoir into TOP500 supercomputer rank too

DrMrLordX · Mar 31, 2020

SMT is generally not going to add +66% throughput. AMD's implementation is very good, so you might see +40% in some benchmarks, especially where AVX2 is not in use.

Nothingness · Mar 31, 2020

amrnuke said:
1) Disabling SMT has a very small performance difference in single-threaded applications

That's correct. That used to be true in early Intel chips where some HW resources were statically partitioned rather than dynamically. But this has been fixed long ago.

2) SMT vs non-SMT tests show that SMT cores achieve somewhere between 54% and 82% of what would be expected from the addition of a true extra core (heavily threaded apps including wPrime, CBR20, Blender, Corona, Keyshot, MySQL, 7z-decompression are all used in this calculation). On average, an SMT core is worth about 66% of a real core.

3900X SMT off SMT on 12c/12t per-thread score 12c/24t per-thread score Expected 24c/24t time/per-thread score % of expected for added threads
wPrime 82.39 56.59 n/a n/a 41.195 62.63
CBR20 5553.2 7260.3 462.767 302.512 462.767 65.37
Blender 229.79 156.92 n/a n/a 114.895 63.42
Corona 176.8 129 n/a n/a 88.4 54.07
Keyshot 208.4 303.6 17.367 12.650 17.367 72.84
MySQL 220741 277754 18395.083 11573.083 18395.083 62.91
7z-decomp 53722 88438 4476.833 3684.917 4476.833 82.31
Average 66.22

I hate it when people don't cite their source: https://www.techpowerup.com/review/amd-ryzen-9-3900x-smt-off-vs-intel-9900k/3.html

You excluded many of the tests. Some are explicitly stated as being singlethreaded so I can understand. Are you sure the others are not?

Any workload that has SMT on faster than SMT off has to be multithreaded right? So I would have included all such tests.

Anyway a good study would be one that examines the scaling of apps varying the number of physical cores enabled and turning SMT on/off. I failed to find one

Summary: You have overestimated the penalty for using SMT, and then misapplied it.

IMHO you're making it look better than what it is. The "truth" likely lies in between.

EDIT: Forgot to say, that I find the results you showed excellent (better than what I was expecting). Thanks for sharing

That's convincing me even more my next desktop will be AMD-based.

amrnuke · Mar 31, 2020

name99 said:
There's a much simpler way to say it. Essentially what you are asserting is that for Intel SMT2 a second thread is equivalent to about 1/3 of a core. (At least I think that's what you're saying, I'm not interested enough to validate your arithmetic and try to reverse engineer what you're calculating).
[...]

AMD Ryzen 9 3900X, SMT on vs SMT off, vs Intel 9900K

By community request, we present our findings on how the AMD Ryzen 9 3900X performs with SMT disabled. This approach has potential, especially for gaming, because it ensures more physical hardware units are available for each thread, and could also benefit the processor's power management.

www.techpowerup.com

No, based on the above calculations I did, SMT provides about a 67% benefit for each additional thread added by SMT2. That is, if you compare 3900X SMT off (12 cores/12 threads) vs SMT on (12 cores/24 threads) those additional 12 threads add about 8 "cores" of performance instead of 12.

About my protocol: I didn't pick the benchmarks willy-nilly or to make things look better. I picked the based on benchmarks that scale well with cores. I did not include several of the benchmarks because they clearly do not scale well with increasing core count or because they were specifically single or very lightly threaded. To vet the benchmarks, I compared 3600X vs 3700X scores (both boost to 4.4 GHz) - the 3700X has 33% more cores, and should see something approximating 33% better performance than the 3600X if the benchmark scales well with cores. There are some limitations, in that both have the same amount of L3$ and I/O bandwidth available despite the 3700X having 33% more cores, so I gave a lot of wiggle room. In most cases I also confirmed by ensuring 3700X vs 3900X scaled up somewhere around 50% (since 3900X has 50% more cores and a 5% boost freq benefit so I actually expected more than that, but gave some wiggle room). There are a lot more details that could explain why cores don't scale well, but when comparing 3600X vs 3700X the benchmark scores were not even scaling 67% of expected (e.g. difference less than 22% for 33% core count increase) then I don't think it's a test that scales well or should be used to verify.

Doesn't scale well with cores (doesn't even come close to 33% score improvement when comparing 3600X vs 3700X):
Unreal Engine 4 - 6% difference
VS C++ - 6.5% difference
Tensorflow - 12.7% difference
Euler3D - 13.8% difference
DigiCortex - 9.4% difference
Tesseract OCR - 14.2% difference
WinRAR compress - 11.5% difference
x265 - 18.5% difference

Specifically single-threaded or VERY lightly threaded
SuperPi
CBR20 Single
Octane
Kraken
WebXPRT
Tensorflow
Office
Photoshop CC
Premiere Pro CC
3dF Zephyr
VMWare Workstation 15
LAME

Special exclusions // what I could have included but didn't for reasons I'll explain:

Java SE 8 is not included because it is a conglomeration of single-threaded and multi-threaded and thus I couldn't justify including it in a comparison that seeks to focus on just multi-threaded questions.

7-Zip compress is not included because the results are inconsistent - a 3900X has 50% more cores and threads than a 3700X, and only saw a 30% boost in compress performance, of which 5% was higher boost clock. This just is an odd test.

VeraCrypt - 3600X actually faster throughput than 3900X and 3700X.

x264 - 27% scaling vs 33% expected on 3600X -> 3700X jump but only 23.5% scaling vs 50%+ expected on 3700X -> 3900X jump.

And if you look at my original post, the worst of the "heavily threaded" apps was Corona, which BARELY made the 22% cutoff (23% scaling with 33% expected 3600X vs 3700X).

In the end, the benefit of SMT in benchmarks largely depends on the benefit of adding cores in that benchmark. So I picked benchmarks that benefit from adding cores, then saw how they do when adding virtual cores.

What I may do over the next couple of days:
For all of the application benchmarks, take the 3600X vs 3700X vs 3900X and compare the per-core benefit to the benefit of SMT-on and SMT-off for the 3900X. That should normalize for apps that don't scale well. I'll still exclude the single-threaded/lightly threaded apps since those make little sense to include, full-stop.

Nothingness · Mar 31, 2020

@amrnuke Thanks for clarifying, much appreciated.

Discussion AWS Graviton2 64 vCPU Arm CPU Heightens War of Intel Betrayal

Senior member

Lifer

Golden Member

Senior member

Senior member

Diamond Member

Senior member

Lifer

Golden Member

Golden Member

Senior member

Diamond Member

Platinum Member

Moderator Emeritus, Elite Member

Golden Member

Senior member

Lifer

Senior member

Diamond Member

Platinum Member

Platinum Member

Senior member

Lifer

Platinum Member

Golden Member

Platinum Member