Discussion Intel Binary Optimization Tool thread

511 · Tuesday at 11:56 AM

gdansk said:
They suggest it's converting scalar code into vector code. I assume Intel didn't have someone do that by hand (what a waste that would be).

I doubt they are doing it by hand for all the code

gdansk · Tuesday at 12:00 PM

511 said:
I doubt they are doing it by hand for all the code

Who knows.
All I can say it is proven we cannot trust what Intel marketing says about this. They're being very selective.

igor_kavinski · Tuesday at 12:07 PM

gdansk said:
Who knows.
All I can say it is proven we cannot trust what Intel marketing says about this. They're being very selective.

But they got busted really bad. Really should've tried to buy John Poole off 😀

coercitiv · Tuesday at 12:17 PM

gdansk said:
All I can say it is proven we cannot trust what Intel marketing says about this. They're being very selective.

Very selective indeed 🙂

igor_kavinski · Tuesday at 12:42 PM

Wow. Someone's going to get in trouble at Intel...

coercitiv · Tuesday at 12:47 PM

igor_kavinski said:
Wow. Someone's going to get in trouble at Intel...

Yup, I smell a promotion.

511 · Tuesday at 12:51 PM

igor_kavinski said:
Wow. Someone's going to get in trouble at Intel...

coercitiv said:
Yup, I smell a promotion.

Technically they are safe cause there is no change in instruction set it's not like a AVX2 only processor Becomes a AVX-512 they avoided it cleverly.

igor_kavinski · Tuesday at 12:55 PM

511 said:
they avoided it cleverly.

They are not going to feel very clever if after seeing how much vectorization is possible, John gets to work and releases GB 6.8 that is massively vectorized and helps AMD even more. He will want to do that to discourage Intel because he does not want them optimizing every version and him having extra work to do before every GB release. He does not want AMD or websites stopping the use of his benchmark in reviews because it can be cheated at.

511 · Tuesday at 12:59 PM

igor_kavinski said:
They are not going to feel very clever if after seeing how much vectorization is possible, John gets to work and releases GB 6.8 that is massively vectorized and helps AMD even more. He will want to do that to discourage Intel because he does not want them optimizing every version and him having extra work to do before every GB release.

Well yeah but i doubt anyone can out do Intel in X86_64 Software optimization they increased this much score just by iBOT imagine how much they can do with if they wrote stuff correctly

gdansk · Tuesday at 1:00 PM

igor_kavinski said:
They are not going to feel very clever if after seeing how much vectorization is possible, John gets to work and releases GB 6.8 that is massively vectorized and helps AMD even more.

He won't do that, or shouldn't. It defeats the point of a cross-platform benchmark. We know that hand-tuned code can get very nice results on wide vector engines. We've known this since Cray. But GB, to be a useful consumer benchmark, should try to reflect real world applications and the optimizations they use. Autovec is reasonable, maybe, but real world applications seldom use it because it often provides small increases while limiting who can run your binary.

But if Intel is distributing a program to replace binaries in place, why won't AMD follow the next time they're behind? And since AMD has a very nice (possibly the best?) FPU in consumer parts they actually stand to gain more doing this.

igor_kavinski · Tuesday at 1:05 PM

gdansk said:
He won't do that, or shouldn't. It defeats the point of a cross-platform benchmark.

He could change the scoring method and split the scores into INT and FP ones while noting that the FP ones could be massively optimized by a vendor and may not reflect real world optimizations in applications. But seriously, Intel has opened a whole can of worms. They need to make iBOT vendor agnostic and apply the optimizations indiscriminately to whatever CPU if they want the worms to disappear.

511 · Tuesday at 1:08 PM

He already invalidate iBOT Run which is fine and should be that way

Nothingness · Tuesday at 1:43 PM

511 said:
Well yeah but i doubt anyone can out do Intel in X86_64 Software optimization they increased this much score just by iBOT imagine how much they can do with if they wrote stuff correctly

At this point, we can assume Intel is cheating until proven otherwise.

Nothingness · Tuesday at 1:44 PM

511 said:
He already invalidate iBOT Run which is fine and should be that way

I think the upcoming 6.7 release is only there to get around iBOT.

511 · Tuesday at 1:47 PM

Nothingness said:
I think the upcoming 6.7 release is only there to get around iBOT.

Yeah

Doug S · Tuesday at 4:46 PM

gdansk said:
They suggest it's converting scalar code into vector code. I assume Intel didn't have someone do that by hand (what a waste that would be).

I'm willing to bet there is some hand tuning involved in some cases, depending on your definition of "tuning". Because if I had to guess it is probably AI doing it, but there's no way they'd let that run wild without some human oversight - and occasional human "correction", so there 100% are humans in the loop.

That's the reason it has to hit the cloud. The AI model they're using can't be run on a typical PC, and even if it could they wouldn't trust it unattended like that.

What I don't understand are the two second lags. You need some sort of lag the first time you run for it to do the checksum and check if Intel has an optimized binary and download it if its there, but once you've done the checksum you can check your stored checksum has a more recent modification time than the executable and know it hasn't changed. Having the service enabled could handle "push" type situations where they didn't have an optimized version the first time you ran it but they later develop one. It gets "pushed" to your PC and it'll use that the next time. No reason for a two second or even two millisecond delay, that's just stupidity on their part.

poke01 · Tuesday at 5:30 PM

thoughts?

gdansk · Tuesday at 6:01 PM

I think the steps are actually something like this:
1. Lift x64 binaries to LLVM IR with something like remill.
2. Use llc to recompile this using PGO, autovec, -march for the target architecture. Store the check sum and corresponding new binary on Intel's benchmark busting servers for that chip family
3. iBOT client has a white list of known binary checksums for which it should fetch replacement binaries on start on supported chip family
4. Replace the original binary. It's actually hard for me to understand the additional start up time even after the initial download of new binaries. It shouldn't take two seconds to load an alternate GB binary but maybe they're doing something clever - like applying this entire process only to certain functions.

But Intel doesn't want to say what they're actually doing. I wonder why. If I'm right all the tools to do this are open source and very general but combined in a new way here. This would work on any chip not running natively tuned builds for which llvm has reasonable architectural optimzations. But you would probably need to review the results of steps 1 and step 2 takes a long time which explains the whitelist & server concept.

MS_AT · Tuesday at 6:22 PM

poke01 said:
thoughts?

Currently the lack of transparency is making it hard to say how clever this is. Is this a breakthrough that will allow us speed up legacy software en masse or cheap trick that will fade away when they will get a CPU that can compete without this hassle.

From pragmatic point of view, I will take whatever that makes my production binaries faster without breaking them (faster compile times, more fps in games whatever).

But for benchmarks it's basically cheating at this stage. It makes comparing CPUs harder. We also don't know how fast they will add support beyond these initial apps. And if the support is not widespread then it is only skewing perception that the CPU is better than it actually is.

Saylick · Tuesday at 6:42 PM

This is outside of my wheelhouse, but would an analogy for IBOT be like GPU drivers, where a vendor releases a new set of drivers that are optimized for their latest generation of GPUs? If so, then I don't think there's an issue with it. All GPU vendors run DirectX, but each vendor is allowed to make tweaks and optimizations for their specific architectures rather than using a generic GPU driver. Does it now mean everyone needs to make custom drivers to make their product have the best competitive edge? Yes. But is it unfair? Likely not?

Doug S · Tuesday at 6:45 PM

gdansk said:
I think the steps are actually something like this:
1. Lift x64 binaries to LLVM IR with something like remill.
2. Use llc to recompile this using PGO, autovec, -march for the target architecture. Store the check sum and corresponding new binary on Intel's benchmark busting servers for that chip family
3. iBOT client has a white list of known binary checksums for which it should fetch replacement binaries on start on supported chip family
4. Replace the original binary. It's actually hard for me to understand the additional start up time even after the initial download of new binaries. It shouldn't take two seconds to load an alternate GB binary but maybe they're doing something clever - like applying this entire process only to certain functions.

But Intel doesn't want to say what they're actually doing. I wonder why. If I'm right all the tools to do this are open source and very general but combined in a new way here. This would work on any chip not running natively tuned builds for which llvm has reasonable architectural optimzations. But you would probably need to review the results of steps 1/2 which explains the whitelist.

If that's all they're doing there's no reason it couldn't be done locally since it only has to be done once. It has to be something more than that.

poke01 · Tuesday at 6:46 PM

Saylick said:
custom drivers to make their product have the best competitive edge? Yes. But is it unfair? Likely not?

In real world apps, not unfair. It’s good.

In benchmarks, bad

gdansk · Tuesday at 6:52 PM

Doug S said:
If that's all they're doing there's no reason it couldn't be done locally since it only has to be done once. It has to be something more than that.

Step 1 can go wrong. They can fix it up. I think that's all it is.

adroc_thurston · Tuesday at 7:07 PM

poke01 said:
thoughts?
View attachment 141092

Load of horse since they can always contribute optimizations to Geekbench directly.

Saylick · Tuesday at 7:11 PM

poke01 said:
In real world apps, not unfair. It’s good.

In benchmarks, bad

So using the GPU driver analogy, if a GPU driver makes a GPU look better in a gaming benchmark, is that bad?

Again, this is not my wheelhouse whatsoever, so if the analogy of a GPU driver is not anywhere near representative, then I stand corrected but would like to understand how IBOT is different.

Discussion Intel Binary Optimization Tool thread

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member