• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Intel Binary Optimization Tool thread

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
they avoided it cleverly.
They are not going to feel very clever if after seeing how much vectorization is possible, John gets to work and releases GB 6.8 that is massively vectorized and helps AMD even more. He will want to do that to discourage Intel because he does not want them optimizing every version and him having extra work to do before every GB release. He does not want AMD or websites stopping the use of his benchmark in reviews because it can be cheated at.
 
They are not going to feel very clever if after seeing how much vectorization is possible, John gets to work and releases GB 6.8 that is massively vectorized and helps AMD even more. He will want to do that to discourage Intel because he does not want them optimizing every version and him having extra work to do before every GB release.
Well yeah but i doubt anyone can out do Intel in X86_64 Software optimization they increased this much score just by iBOT imagine how much they can do with if they wrote stuff correctly
 
They are not going to feel very clever if after seeing how much vectorization is possible, John gets to work and releases GB 6.8 that is massively vectorized and helps AMD even more.
He won't do that, or shouldn't. It defeats the point of a cross-platform benchmark. We know that hand-tuned code can get very nice results on wide vector engines. We've known this since Cray. But GB, to be a useful consumer benchmark, should try to reflect real world applications and the optimizations they use. Autovec is reasonable, maybe, but real world applications seldom use it because it often provides small increases while limiting who can run your binary.

But if Intel is distributing a program to replace binaries in place, why won't AMD follow the next time they're behind? And since AMD has a very nice (possibly the best?) FPU in consumer parts they actually stand to gain more doing this.
 
He won't do that, or shouldn't. It defeats the point of a cross-platform benchmark.
He could change the scoring method and split the scores into INT and FP ones while noting that the FP ones could be massively optimized by a vendor and may not reflect real world optimizations in applications. But seriously, Intel has opened a whole can of worms. They need to make iBOT vendor agnostic and apply the optimizations indiscriminately to whatever CPU if they want the worms to disappear.
 
They suggest it's converting scalar code into vector code. I assume Intel didn't have someone do that by hand (what a waste that would be).

I'm willing to bet there is some hand tuning involved in some cases, depending on your definition of "tuning". Because if I had to guess it is probably AI doing it, but there's no way they'd let that run wild without some human oversight - and occasional human "correction", so there 100% are humans in the loop.

That's the reason it has to hit the cloud. The AI model they're using can't be run on a typical PC, and even if it could they wouldn't trust it unattended like that.

What I don't understand are the two second lags. You need some sort of lag the first time you run for it to do the checksum and check if Intel has an optimized binary and download it if its there, but once you've done the checksum you can check your stored checksum has a more recent modification time than the executable and know it hasn't changed. Having the service enabled could handle "push" type situations where they didn't have an optimized version the first time you ran it but they later develop one. It gets "pushed" to your PC and it'll use that the next time. No reason for a two second or even two millisecond delay, that's just stupidity on their part.
 
I think the steps are actually something like this:
1. Lift x64 binaries to LLVM IR with something like remill.
2. Use llc to recompile this using PGO, autovec, -march for the target architecture. Store the check sum and corresponding new binary on Intel's benchmark busting servers for that chip family
3. iBOT client has a white list of known binary checksums for which it should fetch replacement binaries on start on supported chip family
4. Replace the original binary. It's actually hard for me to understand the additional start up time even after the initial download of new binaries. It shouldn't take two seconds to load an alternate GB binary but maybe they're doing something clever - like applying this entire process only to certain functions.

But Intel doesn't want to say what they're actually doing. I wonder why. If I'm right all the tools to do this are open source and very general but combined in a new way here. This would work on any chip not running natively tuned builds for which llvm has reasonable architectural optimzations. But you would probably need to review the results of steps 1 and step 2 takes a long time which explains the whitelist & server concept.
 
Last edited:
thoughts?
Currently the lack of transparency is making it hard to say how clever this is. Is this a breakthrough that will allow us speed up legacy software en masse or cheap trick that will fade away when they will get a CPU that can compete without this hassle.

From pragmatic point of view, I will take whatever that makes my production binaries faster without breaking them (faster compile times, more fps in games whatever).

But for benchmarks it's basically cheating at this stage. It makes comparing CPUs harder. We also don't know how fast they will add support beyond these initial apps. And if the support is not widespread then it is only skewing perception that the CPU is better than it actually is.
 
This is outside of my wheelhouse, but would an analogy for IBOT be like GPU drivers, where a vendor releases a new set of drivers that are optimized for their latest generation of GPUs? If so, then I don't think there's an issue with it. All GPU vendors run DirectX, but each vendor is allowed to make tweaks and optimizations for their specific architectures rather than using a generic GPU driver. Does it now mean everyone needs to make custom drivers to make their product have the best competitive edge? Yes. But is it unfair? Likely not?
 
I think the steps are actually something like this:
1. Lift x64 binaries to LLVM IR with something like remill.
2. Use llc to recompile this using PGO, autovec, -march for the target architecture. Store the check sum and corresponding new binary on Intel's benchmark busting servers for that chip family
3. iBOT client has a white list of known binary checksums for which it should fetch replacement binaries on start on supported chip family
4. Replace the original binary. It's actually hard for me to understand the additional start up time even after the initial download of new binaries. It shouldn't take two seconds to load an alternate GB binary but maybe they're doing something clever - like applying this entire process only to certain functions.

But Intel doesn't want to say what they're actually doing. I wonder why. If I'm right all the tools to do this are open source and very general but combined in a new way here. This would work on any chip not running natively tuned builds for which llvm has reasonable architectural optimzations. But you would probably need to review the results of steps 1/2 which explains the whitelist.

If that's all they're doing there's no reason it couldn't be done locally since it only has to be done once. It has to be something more than that.
 
In real world apps, not unfair. It’s good.

In benchmarks, bad
So using the GPU driver analogy, if a GPU driver makes a GPU look better in a gaming benchmark, is that bad?

Again, this is not my wheelhouse whatsoever, so if the analogy of a GPU driver is not anywhere near representative, then I stand corrected but would like to understand how IBOT is different.
 
Back
Top