If they were going with +20% shaders over the 290X they should have just dropped HBM completely. GB for GB at this time HBM is probably twice as expensive as GDDR5 and you have the cost of the interposer ($1 per 100 mm^2 is about $10) though you can cut down the on the PCB.
One problem with the Fury X is that they had to add the expense of a CLC and high-quality fan (the Gentle Typhoon can't be that cheap, even in quantity). Also, since the board has a very high power limit (
over 430W) and was apparently designed for overclocking (even though no voltage control utility is yet available - go figure), it had to be built with the same beefy power-control components found in Hawaii and other high-TDP cards.
If we assume that AMD needed
something as a test bed for HBM, and that it had to be a shipping product because AMD can't afford to waste any money, then something just a little bigger than Hawaii starts to look like it would have been a better option. AMD could have created a chip with 3200 shaders (50 GCN 1.2 compute units) and the same 64 ROPs that Hawaii and Fiji got. Using HBM would mean that Hawaii's big 512-bit GDDR5 memory controllers could be cut out, which would save quite a bit of die space. Even with updated UVD/VCE blocks, you could probably keep the die size under 450 mm^2, which would be far cheaper to manufacture than Fury. The interposer would be correspondingly smaller, too. A chip like this would be designed to compete on price, performance, and perf/watt with GM204. Even a R9 390X which is just an overclocked Hawaii card is pretty close to a GTX 980 in pure performance; its drawbacks are its outdated feature set and enormous power consumption. HBM cuts way back on power usage, and with AMD trying to compete with the GTX 980 instead of the GTX 980 Ti and Titan X, there would be no need to overclock the balls off the chip just to eke out a bit of extra performance. You'd wind up with something closer to what the Fury Nano looks like it might be, except it would be purpose-built (rather than detuned) and could be manufactured much cheaper. One 8-pin connector, a much more modest power stage, and a small PCB with a less expensive cooler. It could debut at a price point of $499 and still make some money. At that price point, the 4GB RAM limit also wouldn't be a problem.
Fury should have never made it off the drawing board. 4 GB Vram prevents it from being sold in any large numbers in the professional market (at least they castrated DP). AMD should have realized that the costs of fury were too high with HBM and an interposer and the break even cost was prohibitive. After the disaster that was Tonga AMD should have been more careful about releasing new cards.
I think both Tonga and Fiji were tech demos that wound up being released because AMD is financially strapped and had to squeeze whatever money they could out of every product they have. I'm sure that Nvidia has worked on HBM tech demos as well and they may even have taped out, but they didn't bother to ship them because it wouldn't have been worth it and it might have damaged their brand image.
It's interesting to note that according to a leaked slide Maddie posted here a while back, both Tonga and Fiji look to have been produced on "28HPM" - a TSMC mobile process. Maybe they were originally designed to be on 20nm TSMC and had to be backported when it turned out that was a nonstarter, and it was easier/cheaper to port to 28HPM than to GloFo or a more performance-focused TSMC process. Or maybe AMD realized they would have to be working with a mobile-focused process on 14nm FinFET and decided to get a head start learning how to handle it.