As always, the actual cost is negligible. Just like ECC and other 'enterprise' features, these are only expensive because the companies use political pricing. They know their enterprise users can pay much more, so they need to find a trick justifying that. This usually means separating product lines and forcing those who can pay more, to actually do pay more.
The array of capacitors would add a dollar perhaps to the total price. I think it's well worth it. Just like ECC would be worth paying 2 dollars more as consumer. But if they gave everyone ECC, there would be no more opportunity to rip-off enterprise customers. They still want their wealthy customers to pay thousands of dollars for a few dollar additional costs. That is how the system works, mate.
I understand - and that's why they'd be cutting off their own leg, by offering these features cheaply. Unless they see themselves setting those losses off by massively increased sales and profit int he consumer sector.
I'm not sure you understand the whole meaning of bit-level redundancy? The whole idea here is to avoid 'bad sectors'. I also don't see how spare space is relevant in this context. The actual overhead is 1:16 (both in available space and in amplified writes).
The redundancy is crucial (pun intended) to protect the Flash Translation Layer (FTL) that stores the difference between logical LBA and physical NAND addressing.
So it's actually redundancy in the "file-system"-like layer of the SSD, and not the actual data? Much like a real file system, the organizational information and the data are likely stored separately? Or are they actually using 16/8 dies and spread parity across dies/half-dies? Spare space is relevant, as it is what you use, when flash starts dying. With parity spread throughout, you lose spare space, but you can continue using existing data (slowly) even if one cell fails.
Corruption on an SSD is much worse than corruption on a HDD. The corruption can cause the entire SSD to fail. Once the FTL is corrupted, your problems start. Many people who are having problems with their SSD, encounter exactly this issue.
Again, the FTL is a tiny subsection (similar to a FAT, if I had to guess) of the SSD's memory. Corruption in the organisational layer is always worse than actual data corruption, but usually this corruption comes from firmware bugs, memory and transmission errors...rarely from flash failure. Also, it's just as important to protect the FS-section of the data, as it si important to protect the FTL. Either of those failing means all your data becomes close to impossible to recover.
Now you are telling me this protection is irrelevant? We consumers should be happy with the crappy SSDs that we got today?
I'm saying that the problems are elsewhere, than in the flash layer. And that 1:16 redundancy isn't a whole lot, if indeed your flash is suddenly dying.
It is? Did I miss something? I always thought modern flash had to spend more than half of their raw capacity on error correction, otherwise no sector could be retrieved without corruption. If you believe this is incorrect, I highly anticipate your understanding of it.
According to Wikipedia, ECC is about 1/32 of the raw capacity, for NAND flash. It could be possible, that by using TLC, they are forced to use more ECC than that, and thus increase to 1/16 (and use that as base for their "RAID-5" claims). That will probably allow them to offer a competitive warranty and lifetime.
They surely won't be using more ECC than absolutely necessary, as this means less capacity per die, and thus makes the device more expensive per GB.
More than half would be pretty disastrous.
All in all, I suspect that what they're actually doing is spread ECC across dies, instead of keeping it in-die, and increase the ECC ratio to 1:16, to deal with cheaper (per GB) TLC dies. It does improve wear leveling, and it will cover more bit errors, but won't help with an entire page dying at once.
The problem with using actual RAID 5 is that write amplification goes way higher, because your minimal writable units grow massively (factor 15). This means a read-erase-write now takes 16 times the writes/erases it used to take. This doesn't sound good for longevity.