When Sun folks get together and bullshit about their theories of why Sun died, the one that comes up most often is another one of these supplier disasters. Towards the end of the DotCom bubble, we introduced the UltraSPARC-II. Total killer product for large datacenters. We sold lots. But then reports started coming in of odd failures. Systems would crash strangely. We'd get crashes in applications. All applications. Crashes in the kernel. Not very often, but often enough to be problems for customers. Sun customers were used to uptimes of years. The US-II was giving uptimes of weeks. We couldn't even figure out if it was a hardware problem or a software problem - Solaris had to be updated for the new machine, so it could have been a kernel problem. But nothing was reproducible. We'd get core dumps and spend hours pouring over them. Some were just crazy, showing values in registers that were simply impossible given the preceeding instructions. We tried everything. Replacing processor boards. Replacing backplanes. It was deeply random. It's very randomness suggested that maybe it was a physics problem: maybe it was alpha particles or cosmic rays. Maybe it was machines close to nuclear power plants. One site experiencing problems was near Fermilab. We actually mapped out failures geographically to see if they correlated to such particle sources. Nope. In desperation, a bright hardware engineer decided to measure the radioactivity of the systems themselves. Bingo! Particles! But from where? Much detailed scanning and it turned out that the packaging of the cache ram chips we were using was noticeably radioactive. We switched suppliers and the problem totally went away. After two years of tearing out hair out, we had a solution.
But it was too late. We had spent billions of dollars keeping our customers running. Swapping out all of that hardware was cripplingly expensive. But even worse, it severely damaged our customers trust in our products. Our biggest customers had been burned and were reluctant to buy again. It took quite a few years to rebuild that trust. At about the time that it felt like we had rebuilt trust and put the debacle behind us, the Financial Crisis hit...