as long as they don’t touch benchmarking software, i don’t care what Intel does to get themselves a win.I'm pretty excited about Intel violating the trust of other prominent software developers.
Sure and in that manner quite unlike Zen 2. Yet it chews through code that Intel marketing says is "console optimized". A curious line they spin.RPL Clocks sky high and the uncore is really fast in Raptor Lake not to mention lower latency than any other modern processor
That's only true for handful of games 🤣 better to not take marketing at face value. Also Lion Cove from uArch perspective kind of different than Skylake the ports has changed in LNC it's no longer unified ports and not to mention we have L0/L1/L2/L3 another cache hierarchy Golden Cove is not much changed in terms of cache hierarchy and ports (I don't mean the capacity obviously).Sure and in that manner quite unlike Zen 2. Yet it chews through code that Intel marketing says is "console optimized". A curious line they spin.
What I don't like is that they probably (I am guessing, but they marketing does not want to give us real answers so take it with a grain of salt) is that they have made LLVM's BOLT (OSS) work on Windows but do not want to share the results with the rest of community. Their current compiler is downstream of LLVM what just make it even more likely.But it’s a messy way of gaining performance.


here have thisToo bad it's missing results of subtests of Geekbench.
Thanks a lot! So that's not only large code bases that benefit, interesting.here have this
In general, in many games, there are things that the E cores can be doing that don't require extreme low latency response. Offloading to those cores can reduce the context switches needed on the Pcores, making them more efficient for throughput. The performance difference in the above charts is almost exactly in line with the P core max boost frequency difference between the two SKUs, meaning that having only 6 P-cores isn't really a hold up if there are sufficient e-cores to carry the background load. Games typically only have 1-2 threads/processes that are latency critical with respect to performance. they are starting to get more secondary threads that have a lot of general work to do, so they need a good core to complete that in a timely manner so their output is available to the latency critical threads, so there is a need for performant secondary cores as well. Then, there's the various housekeeping and pre-work threads that can all be done efficiently by the e cores. It looks like the software is doing it's job with respect to getting threads where they are supposed to be.
I have to wonder if the uncore fixes/improvements/improved timings are playing a crucial part in getting this software to work right? If the uncore was much slower, moving those other threads and their data around would probably be more painful.
It's the work of APOIn general, in many games, there are things that the E cores can be doing that don't require extreme low latency response. Offloading to those cores can reduce the context switches needed on the Pcores, making them more efficient for throughput. The performance difference in the above charts is almost exactly in line with the P core max boost frequency difference between the two SKUs, meaning that having only 6 P-cores isn't really a hold up if there are sufficient e-cores to carry the background load. Games typically only have 1-2 threads/processes that are latency critical with respect to performance. they are starting to get more secondary threads that have a lot of general work to do, so they need a good core to complete that in a timely manner so their output is available to the latency critical threads, so there is a need for performant secondary cores as well. Then, there's the various housekeeping and pre-work threads that can all be done efficiently by the e cores. It looks like the software is doing it's job with respect to getting threads where they are supposed to be.
I have to wonder if the uncore fixes/improvements/improved timings are playing a crucial part in getting this software to work right? If the uncore was much slower, moving those other threads and their data around would probably be more painful.
That's a good point and could be tested by launching a game with higher than normal process priority on the 250K and see if that improves performance since the higher priority will minimize context switches on the P-cores.Offloading to those cores can reduce the context switches needed on the Pcores, making them more efficient for throughput.
This one is good. The others are a little dated. IMO
Wow, I didn't think they'd include autovectorization.Analyzing Geekbench 6 under Intel's BOT - Geekbench Blog
www.geekbench.com
It's interesting if they handtune the code or they have some sort of algorithmic solution to the problem. Usually the auto autovec leaves a lot of perf on the table.Wow, I didn't think they'd include autovectorization.
If AMD follows along in these shenanigans autovec may not even be a net win for Intel.
A pity they focused only on the most shocking example I was really curious what is there behind clang improvements😉Analyzing Geekbench 6 under Intel's BOT - Geekbench Blog
www.geekbench.com
They suggest it's converting scalar code into vector code. I assume Intel didn't have someone do that by hand (what a waste that would be).I haven't dug into the details yet. I wonder if it's even taking older pre-AVX2 code and vectorizing as much of that as it can to AVX2?