Yeah, it happens. We think that just us "end-users" get hit by hardware failures, and we troubleshoot until we're blue in the face. But in a services environment, when a router goes fubar, a PSU dies, a critical mobo component fails or a cable just inexplicitly fries out, thousands or millions of people can be affected.
Usually, there's redundant backup contingencies. But in situations where the contingency relies on a critical area as well (for example, a power grid), even redundancy sometimes isn't enough.
The good thing is that MS was able to isolate the exact problem within hours, and then get the system repaired and back up again. You can bet a lot of people immediately dropped everything that they were doing at the time and focused on the problem. Beers were probably provided free after the situation was resolved...