You are assuming one write transaction is sufficient when you have a power loss event.
I am talking about if you have lots of data (say an archive), and if you are not on battery, then the last write transaction before the power loss event will still result in the archive being corrupted if it didn't finish writing all of it.
Alright, but this is application-level corruption/inconsistency, not filesystem or storage device (LBA) level inconsistency. The application is responsible for its own stuff. The filesystem only guarantees that application writes that are synchronous to be completed before the I/O request is finished. Asynchronous writes by the application will be finished immediately if the buffer allows, but unlike sync writes the async writes have no such guarantee.
So we are talking about different things here.
Either having a UPS or laptop (which have batteries) is far over your .1%, I am thinking over 60% if not more.
You cannot be serious about that? We are talking about ordinary consumers? You are saying 60% of consumers have a UPS? Seriously? Well if you count the laptop battery as UPS that would indeed increase the number significantly, from 0,001% to 20% maybe. But still, that is far below the number you suggest.
And just to repeat: having the power supply guaranteed does not mean your SSD will not have unexpected power-loss. That has to do with tidy shutdowns more than anything.
All my SSDs show 0 for that stat. As to what triggers an "unexpected power-loss"event, it can vary from each SSD maker.
Not really, as far as i know the STANDBY IMMEDIATE command is responsible for that. If you have more information i'm all ears.
If you read the article I linked, you would know that the capacitors in all Crucial/Micron client drives only protect lower pages from corruption.
Which is not what i disputed, is it?
The capacitors provide absolutely zero protection for any data in the DRAM
Also not what i disputed.
including the FTL, meaning that Crucial is also resorting to journaling to recover the FTL in case of a sudden power loss.
Now this is some good point you make. And a very crucial one. Because even if it is true what i asserted, that Crucials capacitors allow for the pending write to succeed not to cause corruption to the MLC 'lower pages' as you describe, the FTL/mapping tables that are cached in DRAM might still be newer than what is written to NAND. And as such, the loss of the DRAM contents does not only lose recent async writes (buffered writes) but also causes the loss of the newer FTL version. And as you say, the only valid defence against such an occurrence would be to provide journalling of the mapping tables. If this is true, then the protection is not what i anticipated and asserted. And you would be very right to set this straight. So i guess i owe you an apology - it seems i did not fully understand the situation properly.
I think you are exaggerating the need for full power loss protection in client usage. If data corruption was a significant issue, then we would have people telling their stories and complaining about it in this very forum.
Well, aren't there quite a bit of stories about strange things happening with SSDs? We all know the stories about FTL corruption where you need to provide power to the SSD for half an hour or so, to allow it to make its FTL consistent with the data again. I think we can all understand that this is due to an FTL inconsistency - where the FTL does not match the data stored on NAND, very likely due to unexpected power-loss.
If this is true, then I would assert that the lack of proper protection against power-loss introduced a 'window of opportunity' at which the SSD can corrupt itself.
Even worse, this kind of corruption is hard to detect and to many people - especially consumers - it would not be all that apparent that the SSD is to blame. Strange error messages, hanging applications, blue screens or boot problems - would we suspect the SSD right away? With the SSD fixing itself meaning that it continues to function even with LBA-level corruption, the consumer might not suspect the SSD is to blame since it is working properly again.
So in this era where technology can 'sometimes work, sometimes not' the failure modes are much more subtle than a simple 'works' or 'failed' - which is far easier to diagnose.
My own argument would be that whenever you have a computer problem with weird symptoms, you do not want to suspect your SSD because you use an unprotected SSD that still has a window of opportunity where it can corrupt itself. We simply want reliable technology in 2015 - not something that works 99,9% of the time. Sadly though, this appears to me to be the case for consumers which get to use something that mostly works, only to satisfy the business users who require near-100% reliability and are willing to pay for it. This means the manufacturer has an interest to prevent consumer-grade technology/products becoming too reliable, forcing business-oriented users to buy the more expensive enterprise-grade products which have decent protections. The lack of ECC RAM memory for consumer systems only confirm this behaviour. Since there is no valid reason why not all systems would have ECC protection. The added cost is very marginal, and ECC is used in plenty other areas of the computer system. So why not spend a fraction of added cost to have the RAM memory protected? The only reason can be that the big companies want their more expensive enterprise-grade products to be separated for the consumer-grade stuff. And it is this practice in particular that i oppose. I think all consumers should have access to reliable computing technology in the year 2015. There is no valid technical reason why does cannot be feasible.
It is true that client SSDs are vulnerable to power losses to some extent, but there are proper mechanisms in place to protect the FTL and data-at-rest.
Journalling or full power-loss protection, i do not know any other ways to protect this risk? Do you?
And while i have no detailed knowledge about every protection SSDs utilise, i strongly suspect that budget SSDs have no proper journalling feature to protect the FTL from corruption/inconsistency. If this is true, it would confirm my bias for protected/unprotected SSDs and I would assert consumers should buy only protected SSDs that have proper protection against FTL corruption. Same with RAID5 bitcorrection (RAISE/RAIN or whatever companies like to call it). Every SSD should have it. Even RAID6 (double parity) - why not? The M500 uses 1:16 parity, whlie the MX100/MX200 uses 1:128. Why not increase this to 2:128? It would only cost 2/128 of storage capacity or 1/64th. That is not all that much to increase the uBER to enterprise-levels. It is so cheap to do this, pure software, no added physical costs to produce, so why not? From a technical point of view this only makes sense. But from a marketing strategy point of view it is stupid to allow SOHO/business-users to use cheap consumer-grade products which have less profit margin.
Throughput and IOPS are inverses of latency, so any decrease in latency will result in higher performance. That's why NVMe is more than just a marketing gimmick because it actually improves 4KB QD1 random performance substantially [..] RAID-0 doesn't help with that because it only increases parallelism - it doesn't improve the minimum latency at all.
Alright now i understand what you mean. But AHCI does not have any direct link with RAID0. You only meant to say that while RAID0 increases all performance specs, it cannot increase blocking random reads (commonly known as 4K QD=1) - fair enough!