Discussion Rudi_Float_Bench v0.02a

Jul 27, 2020
25,795
17,865
146
v0.02 with an additional AVX-512 specific binary (it will either crash or exit unexpectedly if the CPU is lacking the necessary ISA extensions): https://drive.google.com/file/d/12RuZsWdhNueu7th2HCuzblBA8CUGFu9u/view?usp=sharing

If it complains about missing vcruntime140 DLL something, install this: https://aka.ms/vs/17/release/vc_redist.x64.exe

Older version (not much point in testing this. Only putting it here for historical purposes): https://drive.google.com/file/d/1l7PU3W0u82iJovpbmJ9FTnhGMHJoVJGw/view?usp=sharing

Thanks, MS_AT and Hail for identifying the compiler issue.

And now for some scores!

1741376894671.jpeg



1741377533902.jpeg

1741377770472.jpeg

1741377112863.png

1741378383546.png

1741378431596.jpeg

1741379685572.png
 
Last edited:
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,092
16,012
136
Well, what about your 64 core AMD Rome ??
 

MS_AT

Senior member
Jul 15, 2024
723
1,463
96
So what this benchmark is measuring, as for sure not what it claims, as those intrinsics are AVX512 specific, so no way it could run on anything but Xeon.
 
Jul 27, 2020
25,795
17,865
146
So what this benchmark is measuring, as for sure not what it claims, as those intrinsics are AVX512 specific, so no way it could run on anything but Xeon.
Visual Studio 2022 is apparently putting in an alternate AVX2 codepath. Seems they wisened up and did the right thing, for once.

The actual hot loop code:

1741381581323.png
 
  • Like
Reactions: lightmanek

MS_AT

Senior member
Jul 15, 2024
723
1,463
96
Visual Studio 2022 is apparently putting in an alternate AVX2 codepath. Seems they wisened up and did the right thing, for once.

The actual hot loop code:

View attachment 119235
Have you verified avx512 ops are actually in the compiled binary?;) Because any sensible optimizing compiler should just discard them when higher optimization levels are used (O2, O3, might be even O1). And if you want to have a benchmark you should use at least O2 equivalent or higher.

Unfortunately godbolt is unusable on mobile so I cannot check.
 
  • Like
Reactions: igor_kavinski
Jul 27, 2020
25,795
17,865
146
Well, what about your 64 core AMD Rome ??
Unfortunately, Ice Lake wins :(

Because:

The benchmark has some bug. It won't use more than 64 threads. I'll have to check the code tomorrow to see if there is a hard limit in there.

If Ice Lake is really using AVX-512, that might explain how 48 threads are able to beat 64 physical Zen 2 cores, though the Epyc puts up a fierce fight armed only with AVX2 and gets pretty close. Considering that the Ice Lake server cost $7000 and my Epyc cost not more than $1500, I would say the Epyc wins fair and square :p

However, if the compiler has "cheated" and replaced the AVX-512 code with AVX2 code, the only possible explanation for the Epyc's loss is higher frequency on the Ice Lake CPU (3.95 GHz vs. 2.91 GHz on the Epyc). But even then, with almost a GHz deficit, the Epyc comes really close (screenshots will be posted soon!).

Also waiting for MS_AT's confirmation on whether the binary contains AVX-512 instructions to further understand the Epyc's performance.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,092
16,012
136
Unfortunately, Ice Lake wins :(

Because:

The benchmark has some bug. It won't use more than 64 threads. I'll have to check the code tomorrow to see if there is a hard limit in there.

If Ice Lake is really using AVX-512, that might explain how 48 threads are able to beat 64 physical Zen 2 cores, though the Epyc puts up a fierce fight armed only with AVX2 and gets pretty close. Considering that the Ice Lake server cost $7000 and my Epyc cost not more than $1500, I would say the Epyc wins fair and square :p

However, if the compiler has "cheated" and replaced the AVX-512 code with AVX2 code, the only possible explanation for the Epyc's loss is higher frequency on the Ice Lake CPU (3.95 GHz vs. 2.91 GHz on the Epyc). But even then, with almost a GHz deficit, the Epyc comes really close (screenshots will be posted soon!).

Also waiting for MS_AT's confirmation on whether the binary contains AVX-512 instructions to further understand the Epyc's performance.
Rome has very limited avx-512 support. Zen 4 has quite a bit. Zen 5 kicks butt ! If I get time to load up my Turin, I will run it. Even at 2.3 ghz, it beats Zen 4s at 3.5 ghz in avx-512 stull.
 
Jul 27, 2020
25,795
17,865
146
9800X3D. Seems kinda low?
View attachment 119246
Can't say unless someone posts their 9800X3D score for comparison.

BUT, you may lose up to 9% score if you don't run it with admin rights.

7-max may help a bit too but doesn't always work.

It's a pure floating point bench. Only taxes the FPU units themselves.

The current champ Ice Lake Xeon (until more users test) does 42 Mops/s per thread. Yours is doing 55.25 :)
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,780
3,098
146
Can't say unless someone posts their 9800X3D score for comparison.

BUT, you may lose up to 9% score if you don't run it with admin rights.

7-max may help a bit too but doesn't always work.

It's a pure floating point bench. Only taxes the FPU units themselves.

The current champ Ice Lake Xeon (until more users test) does 42 Mops/s per thread. Yours is doing 55.25 :)
Disabling AVX512 made no difference to scores, which would seem to corroborate the assertion that the AVX512 instructions are getting compiled away.

Administrator did nothing for me.

7-Max didn't crash the application, at least. No change to scores either.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,092
16,012
136
1741395967735.png

Kinda low. It barely beats the icelake Xen !
 
Jul 27, 2020
25,795
17,865
146
Disabling AVX512 made no difference to scores, which would seem to corroborate the assertion that the AVX512 instructions are getting compiled away.
Crap!

Back to the drawing board :(

I have to agree that the compiler stripped out the AVX-512 instructions. Guess I need to push out v0.02a with a fix.

May need to have a dedicated AVX-512 binary coz can't afford to spend time on doing proper CPU feature detection.
 

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,780
3,098
146
Crap!

Back to the drawing board :(

I have to agree that the compiler stripped out the AVX-512 instructions. Guess I need to push out v0.02a with a fix.

May need to have a dedicated AVX-512 binary coz can't afford to spend time on doing proper CPU feature detection.
A benchmark that uses different instruction sets based on what the CPU has available is also not a very good benchmark IMO. Consistency and comparability right out the window

Edit: Before someone bites my head off, I mean a benchmark like this where the intention is to test how quickly the CPU can do a specific operation. If different CPU's are doing different operations, what are you even trying to compare then? Since no real work is being done, it's not measuring how fast different CPU's can accomplish a greater task.
 
  • Like
Reactions: igor_kavinski
Jul 27, 2020
25,795
17,865
146
Edit: Before someone bites my head off, I mean a benchmark like this where the intention is to test how quickly the CPU can do a specific operation. If different CPU's are doing different operations, what are you even trying to compare then? Since no real work is being done, it's not measuring how fast different CPU's can accomplish a greater task.
You are right, in a way :)
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,092
16,012
136
This is a little better with SMT disabled

well, after damn MS did an update on last boot, it won't pase !!!

But its 2308 !

1741398576592.png
 
  • Wow
Reactions: igor_kavinski
Jul 27, 2020
25,795
17,865
146
A benchmark that uses different instruction sets based on what the CPU has available is also not a very good benchmark IMO. Consistency and comparability right out the window
Well, yeah I agree that for proper comparison, both CPUs should use the same ISA extensions so the score tells us which chip is designed better.