Question What exactly is the purpose of power loss protection caps?

bgstcola

Member
Aug 30, 2010
150
0
76
Hey, during my last SDD purchases I've always wondered if I need power loss protection caps.

I know that there are two ways to protect an SSD in case of a power loss. It can be done with caps or software (journaling). I keep reading the two following explanations but don't know which one is true:
  1. Caps are necessary in certain applications like databases, raid setups etc. They are not needed for normal desktop usage because journaling is just as effective in that scenario.
  2. Caps are superior to software protection also in normal desktop usage and the reason they are not in consumer drives (except a few) is because software protection is cheaper.
Which one is it? Maybe no one really knows because it depends on the implementation as well?

Until now I have always gone with drives with caps (Intel 750, Crucial mx200), but I think that maybe my reasoning has been wrong, and that the SSDs without caps are just as reliable for desktop use as long as it is a quality SSD (like Samsung EVO or PRO)?
 

Billy Tallis

Senior member
Aug 4, 2015
293
146
116
Only enterprise drives have power loss protection caps. Intel occasionally rebadges an enterprise drive for enthusiasts, but lately that's been their Optane 900/905 drives, not flash-based SSDs. You should generally avoid enterprise SSDs for consumer usage, because you'll be paying extra and not getting the benefits of an SLC cache (and sometimes not getting a manufacturer's warranty, depending on where you buy the enterprise drive).

The Crucial MX series SATA drives touted partial power loss protection. Instead of guaranteeing that all writes which have been sent to the drive will be completed even in the event of a power failure, the MX series only guarantees that data already on the drive will not be corrupted by incomplete writes. Writing to MLC, TLC and QLC is often a multi-step process where the most significant bit is written in the first pass to make the largest voltage adjustment, then another pass or two is done to fine-tune the voltage to the right level for representing a full 2, 3 or 4 bits with each memory cell.

The Crucial MX series style partial power loss protection isn't enough to guarantee the full mission-critical transactional consistency that servers often require. It does provide some hypothetical improvement to data resiliency, but it's probably not enough to care about these days. For starters, data being written to a drive's SLC cache isn't in danger of corrupting data already on the drive, because writing to SLC is always a single-pass process done to empty memory cells. A lot of TLC memory can also support a single-pass program sequence, which is how we get consumer drives with 1.5GB/s and higher sustained write speeds after the SLC cache is full. That means that many TLC drives don't need capacitors to avoid the failure mode that the Crucial MX series partial power loss protection defends against.

Desktop software often does not instruct the operating system to force data to be written to the drive immediately. An ordinary write() system call or equivalent leaves it up to the OS to decide when the data is actually sent to the SSD. The OS is free to cache written data in RAM for a while and may batch that write up with other write operations, and may end up sending data to the drive in a different order from when the write commands were issued by the application. This means a lot of ordinary desktop software is vulnerable to data loss regardless of whether the drive has power loss protection.
 
  • Like
Reactions: bgstcola

VirtualLarry

No Lifer
Aug 25, 2001
56,229
9,990
126
A lot of TLC memory can also support a single-pass program sequence, which is how we get consumer drives with 1.5GB/s and higher sustained write speeds after the SLC cache is full. That means that many TLC drives don't need capacitors to avoid the failure mode that the Crucial MX series partial power loss protection defends against.
Hey, that's good to know! Thanks!
 

bgstcola

Member
Aug 30, 2010
150
0
76
The Crucial MX series style partial power loss protection isn't enough to guarantee the full mission-critical transactional consistency that servers often require. It does provide some hypothetical improvement to data resiliency, but it's probably not enough to care about these days. For starters, data being written to a drive's SLC cache isn't in danger of corrupting data already on the drive, because writing to SLC is always a single-pass process done to empty memory cells. A lot of TLC memory can also support a single-pass program sequence, which is how we get consumer drives with 1.5GB/s and higher sustained write speeds after the SLC cache is full. That means that many TLC drives don't need capacitors to avoid the failure mode that the Crucial MX series partial power loss protection defends against.

Thanks! Corrupting data that is already on the drive is what worries me - not losing data in flight. If I buy a top brand name SSD like Samsung EVO, would that be just as well protected as the Crucial MX or an enterprise SSD?
 

Billy Tallis

Senior member
Aug 4, 2015
293
146
116
Corrupting data that is already on the drive is what worries me - not losing data in flight.

Realistically, the impact will probably be the same. The data at rest that some (but not all) TLC and QLC drives may be at risk of corrupting would in most cases be what was just written tens or a few hundred kilobytes earlier in the file. It's not going to affect a random file that has been sitting on the drive for a while.
 
  • Like
Reactions: bgstcola

fzabkar

Member
Jun 14, 2013
139
32
101
I would think that some kind of power loss protection would be needed to protect the integrity of the SSD's own internal firmware components. For example, what would happen if power were lost while the SSD was updating its Flash Translation Layer (which is modified by wear levelling)?
 

Billy Tallis

Senior member
Aug 4, 2015
293
146
116
For example, what would happen if power were lost while the SSD was updating its Flash Translation Layer (which is modified by wear levelling)?

Just like regular user data, the FTL cannot generally be modified in-place, because that's not how flash memory works. There will usually be an old copy of whatever FTL data was being updated, and that old metadata will point to the old version of user data (possibly several versions will hang around before being garbage collected). So it's possible to still lose some data if the user data was successfully written to flash but the corresponding FTL update transaction didn't complete. And when restoring from power loss, the drive might have to re-scan to check if there are pages that should be free according to the outdated FTL but were written to after the last successful FTL update was completed.

These are pretty much the same challenges that every journaling filesystem deals with. There's a ton of redundancy and similarity between modern filesystems and the FTL in a SSD, which is why there's so much interest and potential benefit in ideas like open-channel SSDs or key-value SSDs instead of SSDs that still pretend to work like hard drives.