Apple's GPU IP currently has 2-3x perf/watt lead over Nvidia and AMD. They deliver a GPU with 1024 ALUs and 2.6GFLOPS of theoretical peak throughput for 10 watts of power. This is not the kind of lead you can get from the process advantage alone.
I have a feeling that your posts will not age well. It's quite interesting to me how folks have been consistently underestimating and dismissing Apple's hardware efforts. Coupe of years ago, the sentiment was all "Geekbench is useless and those ARM CPUs will never be as fast as x86 in real desktop workloads". Now that Apple has demonstrated a 4x perf-per-watt advantage with the same peak performance, it is "sure, it's ok for an ultrabook SoC, but it's not gonna scale".
Hm? Only if you compare with crusty old Vega IP. You know, that IP that wasn't efficient even when it launched? Also, you're pretty significantly off with the peak throughput, because it's TFLOPs you should be talking about, not GFLOPs.
Van Gogh in the Steam Deck is about 1.6TFLOPs for about 10W GPU only, and that's with half the ALUs. Rembrandt (next gen APU) is 768ALUs, and with the same GPU power if my estimation's right should hold about 1.3GHz with GPU power locked at 10W, which gives you about 1.9TFLOPs.
And that's assuming it has identical V/f properties to my 6700XT, but realistically it should be safe to assume an extra 200-300MHz worth of clocks from additional optimisations and actual binning, something that doesn't take place with neither Van Gogh nor Navi22 silicon currently.
By which point you're looking at over 2.2TFLOPs, which actually is now actually in line with the 10-15% process improvement from N6 -> N5. Oh, and let me again remind you that Apple still holds an ALU advantage here, meaning they can clock their iGPU lower and get the same performance, which by nature brings an efficiency improvement on it's own.
So then, where does your 2-3x efficiency advantage come from?