I have only seen up to 4 layers of cache, but that may be a Milan-x limitation, rather than a Zen 4 limitation. I believe TSMC supports up to 12 layers though. I don’t know if they can test the die before stacking them. With HBM, they have “known good die stacks” to start with. Something can still go wrong in the bonding process through.
The cpu die with cache chips might be much more expensive since you are adding another set of steps where something can go wrong and reduce yield. I think only the very high end HPC products will get 4 layers. Those that end up in the consumer market as Ryzen parts might actually be salvage and/or single layer parts. If something goes wrong with a 4 layer stack, you might still be able to use it as a 2 layer, single layer, or no stacking if all of the cache die are unusable. They might have a specific single layer part for high end, but not ridiculously expensive Epyc processors. If something goes wrong with the single layer part, then they can probably still sell it as an Epyc or Ryzen without stacked cache enabled. There should be lots of opportunities for salvage, but there is still a huge amount of silicon going into these things.
If something like a GB (or 2) of SRAM on one package actually exists, I doubt it will be classified as “affordable” by most people. For certain HPC applications, maybe big database servers and other things with expensive, per core licensing, it may be well worth the cost, but probably still not “affordable”. We seem to be going to get some threadrippers and Ryzen parts with a single layer, but those aren’t going to be cheap. I don’t think Intel will have anything close; this isn’t doable on a monolithic die. Even if intel has HBM and AMD doesn’t, HBM is good for bandwidth, but it is still DRAM with DRAM-like latencies.