I think AMD "fine wine" was a bit of a product of GCN sticking around for such a long while and making it easy to maintain support for their older cards without too much of a work commitment. I suspect it also has a little bit to do with Nvidia being willing to stop spending as much effort on their older cards that also allows for this perception of AMD aging so much better.
It was really with Kepler (rather than Maxwell or Pascal) where Nvidia moved away from putting as much focus on compute in their gaming cards just as AMD was launching GCN which incorporated more compute in response to what Nvidia had done in previous generations. Nvidia having architectures that were much more gaming focused and weren't held back by any constraints to enable better compute performance generally faired better than their AMD counterparts of the time. While some of that could be attributed to AMD growing more and more cash-starved, I don't think that's the main reason.
Now we see a similar situation, where Nvidia has created one architecture that's designed to combine compute and gaming, while AMD has separated them off. I don't think it's too surprising that we're seeing this same result, just like AMD failed with Bulldozer when they attempted a design similar to Intel's NetBurst with a longer pipeline and higher clock speeds that ultimately faired just as poorly.
Yeah, GCN's FineWine or any GPU architecture having the potential to be FineWine isn't a function of how much an architecture is compute focused; it's a product of optimization over time. The reason why Kepler did so bad in the long run is the same reason why AMD ditched Terascale: both of those architectures threw a ton (and I mean a
ton) of execution units at the problem while relying on primarily instruction level parallelism (ILP) via a software compiler to keep the units fed. If the compiler was unoptimized or out of date, performance fell off the cliff. AMD and Nvidia tackled this problem in their subsequent architectures following Terascale and Kepler in different ways.
AMD ditched the software scheduler entirely and went with something closer to Fermi by adding back a hardware scheduler and using thread level parallelism (TLP) to keep the units fed. For compute related tasks, which is what GCN and Fermi were designed to tackle, it's harder to extract ILP ahead of time via a software scheduler/compiler because compute workloads are typically heavy with dependent instructions, so it's better to use a CPU-esque approach to keeping GPU utilization high by simply switching threads when one is bogging down.
Nvidia, on the other hand, tackled Kepler's crummy utilization issues from the other side of the spectrum. Instead of trying to fix the problem from the compiler/scheduling side, they tackled the problem from the silicon side. They noticed that an SMX containing 4 multiple warp schedulers presiding over a shared bank of 192 ALUs was simply harder to keep fed since a warp was 32 threads and if you just multiply the number of warp schedulers by 32, you'd get 128 threads per clock. Having all 192 ALUs being constantly fed by only 4 warp schedulers intuitively was going to be an issue because there were simply too many mouths to feed relative to the number of hands feeding them. Also, it was an "all or nothing" ordeal with Kepler in the sense that the entire SMX was one cohesive block; there was no granularity smaller than the SMX itself. If you only needed to use a quarter of the execution units, the entire SMX and all of it's logic needed to be turned on, leaving the remainder more or less idle while burning energy for no reason. Maxwell addressed this by partitioning the SMX into 4 smaller blocks, each containing only 1 warp scheduler and 32 ALUs, which meant that the SMM only had 128 ALUs instead of Kepler's SMX containing 192 ALUs. In theory, if all else remained the same, this overall reduction in ALUs per SMM would mean a degradation in performance, and on a per SM basis it did, but the fact was that an SMM with 128 ALUs could provide 90% of the performance as an SMX with 192 ALUs. This was just a testament to how underutilized Kepler's SMX was if the compiler wasn't sharp enough to work around it. Nvidia then just threw more SMMs into the GPU along with more advanced memory compression and voila, you get 35% IPC gains and higher clocks with Maxwell due to the better energy efficiency of the SMM.
Going back to explaining FineWine: as history showed, GCN, while great for compute, had its problems with gaming workloads since it has to execute a full warp (64 threads) over 4 cycles, which in latency sensitive workloads, like gaming, had detrimental effects. It took years of AMD sticking with GCN for the driver optimizations to mature enough for GCN to really start to shine. Kepler, on the other hand, had fundamental architecture issues and got left in the dust once Maxwell came out. It's almost as if Kepler swung too far to the Terascale side of the spectrum and Maxwell reeled it back. From a scheduling standpoint, I don't believe Maxwell or Pascal or any modern Nvidia GPU architecture uses a hardware scheduler, but the underlying ratio of 1 warp scheduler per group of 32 ALUs pretty much hasn't changed since Maxwell, so in a way, they've already had years worth of "FineWining" built in, even if the architecture itself has changed. If someone knows if any modern Nvidia architecture still uses pure software scheduling, I would love to be informed.