As far as failure modes, I've posted at length on HardOCP. References available upon request (if I find time to dig them up).
For the record, I've been researching SSDs for almost 2 years by now, and recently got a 80GB X25-M G2.
---
I've looked at this topic somewhat in my spare time.
Basically, there are 3 failure modes:
1. Fail on erase - when you run out of spare blocks and every block is used. The SSD turns into a "read-only" device (like ROM). Unlike HDDs which typically fail on read (hence bad sectors, etc). The key reason why this is NOT a concern for data reliability is that failure is deterministic -- the SSD knows EXACTLY when it has run out of spare reallocation blocks, and the blocks reallocated can be viewed with SMART. At the very end of its lifetime, the number of reallocated blocks increases exponentially; one could, for example, use a conservative threshold where the SSD would be replaced if 10% of the reallocation blocks are used.
2. Unrecoverable bit errors - occurs 1 out of 10^15 bits as Intel claims (similar to typical hard drives) and results in a "silent" error on that read (a 1 gets accidentally read as a zero). This is because of inherent noise in depositing electrons into each flash cell, where an electron distribution "in-between" two states (e.g. 10 and 11) could be read as either one. Thus, MLC has an order of magnitude worse error rate than SLC, and requires more bits for ECC to obtain the 1 in 10^15 UBE figure. A solution would be to use more parity/error correction, or to retry the read.
3. Controller failure - this is generally poorly documented, but this refers to any failure outside of the NAND itself. HDD have a similar analogue. Could be caused by extreme temperature/humidity ranges, extreme shock, impaling the drive with a screwdriver, etc. Little info on how to solve this.
Because of 3, I would recommend backups regardless. And that's why RAID can be so dangerous for data stability - controller failure is unobvious, underappreciated, and very difficult to recover from.
---
1. See this from the anandtech article:
Intel actually includes additional space on the drive, on the order of 7.5 - 8% more (6 - 6.4GB on an 80GB drive) specifically for reliability purposes. If you start running out of good blocks to write to (nearing the end of your drive's lifespan), the SSD will write to this additional space on the drive. One interesting sidenote, you can actually increase the amount of reserved space on your drive to increase its lifespan. First secure erase the drive and using the ATA SetMaxAddress command just shrink the user capacity, giving you more spare area.
...
Intel's SSDs are designed so that when they fail, they attempt to fail on the next erase - so you don't lose data. If the drive can't fail on the next erase, it'll fail on the next program - again, so you don't lose existing data. You'll try and save a file and you'll get an error from the OS saying that the write couldn't be completed.
---
I can't speak for the other manufacturers, but to be competitive they have to have similar failure modes.
PS Block reallocation on SSDs is silent, automatic, and done by hardware. Each reallocated block increments the SMART count ID 5 by 1.
2. This is particularly interesting for an SSD because generally unrecoverable bit errors are not correlated across adjacent cells (low spatial autocorrelation), typically, while for HDDs and especially CDs they are (high spatial autocorrelation). This can be appreciated when you think of how large a scratch or a single dust particle is relative to a hard drive platter (thousands/millions of bits). For example (greatly exaggerated):
V = valid cell
F = failed cell
HDD : VVVVVVVFFFFFFFFFFFVVVVVVVVVVVVVVVFFFFFFFVVVVV
SSD : VVVVFVVVVVVFVFVVVVVFVVVFVFFVVVVVVFVVFVVVFVVV
By mathematics alone, a parity scheme with small blocks has a smaller probability of failing for an SSD.
--