I dont know much about CPU development, but I thought that there are some simulation tools available to avoid redoing the silicone again and again.
Where does the information, that it is still not ready, come from? Would it be possible to get some intel about what is currently wrong with it?
Learning about the past problems and the process of fixing it would be most interesting to many I believe. Intel should publish some study about it to improve general knowledge of processor development.
So, a few comments on how hardware validation is done. First of all, the tools and methodologies absolutely exist to do this kind of testing, but it's not automatic. You need to create test benches, simulation environments, etc. But there's no
theoretical reason that A0 silicon cannot be completely free of RTL bugs. Analog can be somewhat trickier, but an industry standard way of mitigating that is to have test chips for high risk IP (think, high speed PHYs, power delivery, etc). Those are basically a collection of key circuits and some full IPs that you have actually manufactured so you can test them post-Si without taping out the full design. You can do test chips for digital as well, and they've also very useful for process learnings.
But back to debug, there are tradeoffs to consider. Ideally, you want to catch bugs as early in the process as possible (i.e. IP or sub-IP level simulation), because that's where the cost is lowest to fix them and you have the most observability into the issue. Larger (e.g. SoC-level) simulation is quite slow, but useful for flushing out more bugs, and as you get into FPGAs and emulation, much more expensive and lower observability, but helpful for finding yet more bugs. Post-Si is a whole nother ballgame, however. A single minute in post-Si can run more full-chip cycles than all of pre-Si validation combined, so it's the most powerful tool for identifying
whether a bug exists. However, you have very poor observability into where the bug is coming from, and obviously the lead time to fix it and the cost of doing so are brutal.
The story I heard was that one of Keller's big goals at Intel was rebuilding their pre-Si validation so they wouldn't be stuck in this sort of situation. But I also heard that he kind of took a sort of "damage is done" approach to SPR, and basically had them spam steppings in an attempt to get it to market as fast as possible. Supposedly the original A0 stepping was basically a glorified test chip.
Edit: Also, process issues can easily drive more steppings, even if nothing is wrong with the RTL. But that's clearly insufficient to explain the continued issues with SPR.