I agree with everything you said except... The nice thing about GPUs is they scale very well, at least up to some fraction of the number of pixels in the display. So if you lower the voltage and frequency enough, I think a 3D GPU die could work.
I don't think its anywhere
near that simple. Even if works the thermal part isn't solved. I guess they can hack a dual side cooling solution where they make it 2-stacks and have one heatsink on bottom die and another heatsink on the top die. What happens if you make 4 stacks? 8 stacks? Would you have to lower the clocks and voltages to 1/8th? Even the dual die solution would have worse thermals because they are just closer.
The transistors don't run below 0.7V. And going to that level seriously impacts frequency. People will shudder at the clock speed when running it at 0.8V. If you lower the clock by half and get two dies, you get maybe same performance but worse thermals. From 0.7V to 1.1V on GPUs and 1.3V on CPUs the clock speeds increase superlinearly. You aren't going from 1GHz @ 0.7V to 2GHz @ 1.3V, you are going from 800MHz @ 0.7V to 4GHz @ 1.3V.
It's true that you can undervolt GPUs a fair bit from manufacturer specs. But manufacturer specs it so across dozen million GPUs it would minimize RMA due to too low of a voltage for a particularly crappy one. Let's say they implement fancy circuitry so to correct/minimize those errors and lower the voltage to the point not much different from you/me lowering voltage at the card level. How low can we go in voltages? 10%? That's only 20% reduction in power. Let's say they get to 0.8x the voltage. That's 36% reduction or 0.64x the power. And that's a one time benefit.
That's why death of Moore's Law isn't just about lack of scaling and economic benefits. No voltage scaling can happen anymore. We went very quickly from 2.xV to 1.3V on CPUs. We are at 1.3V for a decade now.
The Near-Threshold Voltage(NTV for short) circuit solutions shown at IDF by Intel runs it at real low frequencies. Yea sure you may run it at 0.7V but how does that help high performance CPUs/GPUs when it runs at 400MHz?
This is the reason people like Ray Kurzweil is insane. He based Singularity by 2050 using rapid scaling from past events. He wouldn't have known the scaling would slow down to a crawl. The reason Exascale systems have been delayed from original 2018 to 2021+ is the same reason.