1. How do we know macOS has better power management? For all we know, macOS has worse power management than Windows and Linux but the SoC carries the OS. How does OS power management lower total SoC power when running ST loads? We seem to be making a ton of assumptions without any proof.
You don't know for sure, but that's just what Apple
does. We know that all of their scheduler/power management was reworked just for Apple Silicon, and we know from their engineers that there is an almost obsessive focus in that direction across the company. I mean, they changed their preferred memory management technique when Swift was introduced in 2014 so that they could design more performant silicon around that in A and M series - they forced every developer to do this slightly harder thing in their code because it would allow the system to be faster. It's why I'm skeptical of people demanding a linux OS on Apple Silicon for performance testing because I seriously doubt it would have a lighter footprint. The linux devs aren't even going to be
aware of where Apple seeks power management because they don't disclose that.
"How does OS power management lower total SoC power when running ST loads?"
We know that Apple's scheduler puts
all of the service threads onto the e cores. They'd been doing that since they started doing asymmetric cores. It's not a traditional scheduler that simply round-robins any given thread depending on which core is open - the OS stuff goes on the e cores even if they're congested so that user applications can have 100% of the P cores all the time. And whenever possible higher compute needs are getting put on ANE or GPU when those are suitable. There really is no lag between when Apple puts silicon out and when their software makes maximal use of it. In-house their adoption curve is a step function - it effectively goes from 0% to 100% the day it ships.
If you look in the linux contributor space, there is a 'payoff' function to adopt something. It's not immediate. They're not necessarily going to ifdef some part of the kernel around a new feature in the 9000 series on day one because there is an opportunity cost in doing so, there is a cost in code stability, and so on. But Apple is in the job of selling Apple Silicon and part of how they do that is to use every inch of it
immediately so that when you buy your M4 MBA your reaction is 'good lord this is fast' and so they will take every opportunity for even marginal gains and spend those resources to rewrite that code. And in a lot of cases that new silicon was created in consultation with the OS developers - the said 'hey, we can cut power if that cache is bigger' or whatever, and that gets evaluated against their other priorities and if they need the power savings and have the silicon that gets implemented both in hardware and software at the same time. They aren't doing the 'oh that's marginal and so it's unlikely anyone will use it in their code so let's not burn silicon on it', everyone report to the same guy at Apple - there is a 100% chance it will be implemented in every possible opportunity, even if Apple needs to create a new API and shove developers through it against their will (which they do often). And that's not unique to Apple. The big data center operations all have their own custom designs for servers because they know
exactly which marginal improvements will pay off for them which a generic company isn't going to do because they don't know if the 1% improvement is worth the effort, but Meta knows that 1% improvement because they do x, y, z is like $10M a year in operating expenses or something. That is part of why vertical businesses seek verticality.
So I just can't imagine a scenario where Apple's not maximally taking advantage of any silicon benefit they might have. That said, Apple has their other priorities like the overhead of a GUI that you can more easily bypass on linux, but you can bypass it on MacOS if performance testing is what you are trying to do. You can strip MacOS back surprisingly far.