And even if we ignore the "performance" domain completely, there are plenty of other things for scheduler to consider - like power policies, CPU parking decisions and so on. And all of them feed back into performance as well.
What If you have limited power budget and 1B+4S cores, 4 threads of workload, what will result in best MT performance if power is limited and 1Big is half as efficient, but 25% faster than 1Small?
What if there are 4B+8S and 5?8? threads of workload and still limited power?
What if load dynamically changes and transitions from what is optimal for running on big cores into loosing MT potential performance due to power ceiling.
Even obviuos stuff, like scheduling decisions need to obviously run on actual CPU and burn cycles and evict good guys from CPU caches. Too complex and you start loosing performance.
"Hardware" cannot solve any of those, other than help transitions between clocks, or core wake up faster or use clever tricks like Speed Shift that take decisions from software. But it can give hints from hardware about the state of hybrid system load and power characteristics, that can then guide scheduler and power subsystem of OS to make hopefully better decisions. Or move your critical rendering thread to Atom core due to the bug and cut your FPS in half