How do SSDs fail?

wonderflu · Nov 5, 2009

Whenever discussions of SSDs come up, people always bring up the case that a particular part of the drive no longer becomes writable and the controller will supposedly will return an error but leave the unmodified data readable. How does the controller determine that a particular set of bits is no longer writable? Can a write only half finish? Are there any whitepapers describing graceful failure modes for SSDs? I am curious about the actual implementation in the drive's firmware.

Modelworks · Nov 5, 2009

A very easy way a controller can use , and it has been around for decades, is to write a bit to memory then read it back, if they don't match for whatever reason it tries to write it again , if that fails then the location is marked bad. Some will retry a location 1 time, some 10 times, just depends on the firmware.

You can still read the data that was in the that location, it is just in a state where it is stuck and cannot be changed. Sort of like a stuck pixel on a monitor. The pixel is still lit, you just can't change the color

Red Squirrel · Nov 5, 2009

Modelworks said:
A very easy way a controller can use , and it has been around for decades, is to write a bit to memory then read it back, if they don't match for whatever reason it tries to write it again , if that fails then the location is marked bad. Some will retry a location 1 time, some 10 times, just depends on the firmware.

You can still read the data that was in the that location, it is just in a state where it is stuck and cannot be changed. Sort of like a stuck pixel on a monitor. The pixel is still lit, you just can't change the color

Actually what happens if the bit is already 1, and it tries to write 1, then it checks if what it wrote worked? it will be 1. Does it check to see what the bit is first, then just try to write the opposite?

Modelworks · Nov 5, 2009

RedSquirrel said:
Actually what happens if the bit is already 1, and it tries to write 1, then it checks if what it wrote worked? it will be 1. Does it check to see what the bit is first, then just try to write the opposite?

It writes a 1 and then checks for a 1. It is only concerned with recording the data properly at this point. So it can take some time before it determines a bit is bad. Some drives use speed optimizations here to check the bit before writing, reading a bit is faster than writing so it will skip bits that are already correct for the current write operation. If a bad bit is found it marks the entire block bad and moves to the next free block . Some controllers will write and entire block then CRC that block to see if the write was successful. When no activity is taking place the controller goes into a subroutine of clearing a bit then writing a 1 to find the bit that was failing. If the bit is okay during that test then it marks the block good again, if not the entire block is marked bad.

Memory in flash type storage is organized with bits that become blocks. Then blocks that make up larger segments to fill the space. A block could be 4KB in size. So losing a block isn't losing much.

edit:
Forget to add that some controllers only allow writing as blocks and not bits. So it would write the entire block in one operation then read it back to confirm the operation.

DeathRayLoveMachine · Nov 6, 2009

It depends on the failure mode. For instance, when set on fire, hard drives and SSDs both provide equally fantastic recovery rates of 0%.

For certain failure modes, it is believed that SSDs will still allow you to recover whatever data was previously on the drive; that said, all the SSD failures I've ever heard of have been non-recoverable firmware issues.

Anyway, let me try to explain:

When you write data to a hard drive, the write head in your drive moves to a particular sector on your drive's platter that corresponds to the data you want to write, and then it writes the data on that sector. This means that when you want to overwrite data that's already there, your old data gets erased immediately, even before your drive is finished writing the new data. If the write request is interrupted, there's a very real chance your data will be corrupted.

SSDs have a different write process. Because flash cells will wear out if you write to them over and over again, SSDs use "wear-leveling" algorithms to remap their individual sectors; this way, even if you write to the same file repeatedly, the wear gets spread out across the entire drive.*
(*That's assuming wear-leveling algorithms actually work as advertised, which some people question.)

This means that unlike hard drives, SSDs can't overwrite data immediately after receiving a write request; instead, they have to maintain a pool of available flash memory, and they can only remap and erase old data AFTER it has been successfully overwritten. If the write process gets interrupted, this SHOULD leave the old data intact.

This should also mean that if an SSD dies because it's used up all its write/erase cycles, it will still have a (mostly) intact, read-only copy of all the data that was previously written to it. But like I said, this is mostly theoretical at this point, because all the SSD failures I've heard of so far were just garden variety brickings.

(I don't have a SATA SSD myself, although I'd love one and I've been researching them.)

jimhsu · Nov 6, 2009

As far as failure modes, I've posted at length on HardOCP. References available upon request (if I find time to dig them up).
For the record, I've been researching SSDs for almost 2 years by now, and recently got a 80GB X25-M G2.

---

I've looked at this topic somewhat in my spare time.

Basically, there are 3 failure modes:

1. Fail on erase - when you run out of spare blocks and every block is used. The SSD turns into a "read-only" device (like ROM). Unlike HDDs which typically fail on read (hence bad sectors, etc). The key reason why this is NOT a concern for data reliability is that failure is deterministic -- the SSD knows EXACTLY when it has run out of spare reallocation blocks, and the blocks reallocated can be viewed with SMART. At the very end of its lifetime, the number of reallocated blocks increases exponentially; one could, for example, use a conservative threshold where the SSD would be replaced if 10% of the reallocation blocks are used.

2. Unrecoverable bit errors - occurs 1 out of 10^15 bits as Intel claims (similar to typical hard drives) and results in a "silent" error on that read (a 1 gets accidentally read as a zero). This is because of inherent noise in depositing electrons into each flash cell, where an electron distribution "in-between" two states (e.g. 10 and 11) could be read as either one. Thus, MLC has an order of magnitude worse error rate than SLC, and requires more bits for ECC to obtain the 1 in 10^15 UBE figure. A solution would be to use more parity/error correction, or to retry the read.

3. Controller failure - this is generally poorly documented, but this refers to any failure outside of the NAND itself. HDD have a similar analogue. Could be caused by extreme temperature/humidity ranges, extreme shock, impaling the drive with a screwdriver, etc. Little info on how to solve this.

Because of 3, I would recommend backups regardless. And that's why RAID can be so dangerous for data stability - controller failure is unobvious, underappreciated, and very difficult to recover from.

---

1. See this from the anandtech article:

Intel actually includes additional space on the drive, on the order of 7.5 - 8% more (6 - 6.4GB on an 80GB drive) specifically for reliability purposes. If you start running out of good blocks to write to (nearing the end of your drive's lifespan), the SSD will write to this additional space on the drive. One interesting sidenote, you can actually increase the amount of reserved space on your drive to increase its lifespan. First secure erase the drive and using the ATA SetMaxAddress command just shrink the user capacity, giving you more spare area.

...

Intel's SSDs are designed so that when they fail, they attempt to fail on the next erase - so you don't lose data. If the drive can't fail on the next erase, it'll fail on the next program - again, so you don't lose existing data. You'll try and save a file and you'll get an error from the OS saying that the write couldn't be completed.

---

I can't speak for the other manufacturers, but to be competitive they have to have similar failure modes.

PS Block reallocation on SSDs is silent, automatic, and done by hardware. Each reallocated block increments the SMART count ID 5 by 1.

2. This is particularly interesting for an SSD because generally unrecoverable bit errors are not correlated across adjacent cells (low spatial autocorrelation), typically, while for HDDs and especially CDs they are (high spatial autocorrelation). This can be appreciated when you think of how large a scratch or a single dust particle is relative to a hard drive platter (thousands/millions of bits). For example (greatly exaggerated):

V = valid cell
F = failed cell

HDD : VVVVVVVFFFFFFFFFFFVVVVVVVVVVVVVVVFFFFFFFVVVVV
SSD : VVVVFVVVVVVFVFVVVVVFVVVFVFFVVVVVVFVVFVVVFVVV

By mathematics alone, a parity scheme with small blocks has a smaller probability of failing for an SSD.

--

Search

How do SSDs fail?

wonderflu

Junior Member

Modelworks

Lifer

Red Squirrel

No Lifer

Modelworks

Lifer

DeathRayLoveMachine

Member

jimhsu

Senior member

TRENDING THREADS