As i have written elsewhere, Cortex A76 has IPC somewhere between Zen and Skylake but is 3 times smaller (normalized to the same process node).
Apple made some very good choices, and their execution is stellar. Lower pipeline stages that are wider may be inevitable for everyone though.
Why?
Consider that the air cooled limit is ~5GHz. This is where it needs absolute bleeding edge optimization techniques to achieve the frequencies. It would result in leakage which would worsen the localized heating caused by the high frequency.
So even if the next generation processes were able to offer faster clocks, you won't get faster than 5GHz anyway. If(and its a big one) you are able to create a CPU that clocks 3GHz but performs 50% better per clock, you might lose 10% of performance, but you get potential for future scaling, until you reach 5GHz again.
Skylake core seems exceptionally large. This could be done to reduce the effect above to reach the high frequency. If it were to retarget it to 3-4GHz range, the core could be much more dense.
We know Intel can make more dense cores, because their Atom cores are very dense.
As possible improvements in performance from fabrication advances dry up, maybe we will see a reappearance of
ternary computers, which are potentially faster and more efficient than our simple binary machines.
The problem is we are asking for traditional computers to be faster. I think, this is the heart of the problem. If you include Smartphones, they've been improving in a rapid way. So when the scaling showed in desktops, the answer was not "faster traditional computers", but Smartphones and Tablets.
There are no near-term replacements because it doesn't need to be. Machines dedicated to the task, fit with accelerators are the future. Even quantum computers(long term thing) are looked to be the same thing. More accelerators.