Will the data in the cache of SSD get lost, when the power is shut down by accident

wangli1147

Junior Member
Jul 14, 2010
2
0
0
I am a postgraduate, and I'm working on an project on "Flash-based Database Management System".
Many SSD has a cache on it to improve the performance. Cache is volatile, and the data will get lost when the power is lost by accident.

My question is:
Is there a machnism in SSD, which help to protect the data in cache when power is shut by accident?

I will always waiting for your decent answer.
Thanks..:rolleyes:
 

sub.mesa

Senior member
Feb 16, 2010
611
0
0
Lost writes are not that bad; a different order of on-disk executed I/O is dangerous, however. Such happens with large write-back caches like Areca hardware controller, and begs for a BBU to protect the cache.

But you should know that HDDs currently buffer a lot of writes; they all get lost when power is lost. But that's why we have journaling filesystems; we will just lose the last seconds of the filesystem and it will 'reset' to a little bit in the past if the journal does its job properly, without the filesystem becoming inconsistent.

Intel X25-M has only a 256KiB internal buffer, embedded in the controller chip. I.e. it does NOT use the DRAM chip as buffer; as HDDs and other SSDs do. It is a real controller chip with internal buffers; a very advanced design that explains the low latency and some of the strong points of Intel's SSD product lineup.

A 'supercapacitor' can allow the usage of bigger write-back, such as required on some SSDs, but really if your filesystem is properly designed it won't mind a few lost writes. As long as the drive obeys cache flush commands.
 

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
As sub.mesa said, losing one or more writes isn't a problem. Well-designed file-systems, and other software, expect to lose writes on system crash and have recovery methods. E.g. NTFS has a journal.

A database log (or a file system journal) will protect a transaction in case of crash or power loss. A list of changes to be made are written to the journal before being made, then the changes are made, then the journal is updated to say that the changes have been made. If the power goes out at any stage - one of two things can happen. Power fails before the journal entry is complete - result journal entry either doesn't exist or is corrupted - the corrupted/incomplete journal entry is deleted, and no trace of the changes remains. Alternatively, if the journal entry is complete -the journal entry is used to ensure all the data changes are made and are correct.

It's therefore important that the order of writes is maintained - not necessarily perfectly. But, it's most important that the journal is fully completed before any actual data changes take place. The journal updates can occur in any order, as can the data updates - as long as the journal - data - confirmation order is strictly maintained.

Most hard drives will maintain this order correctly - the OS will request a cache flush at critical times - and the drive will correctly perform the flush operations. So, even if the drive uses write caching and loses the data, the correct ordering ensures that nothing critical is ever lost.

Some SSDs - e.g. Intel don't have a write cache at all. They have a tiny RAM buffer that is capable of handling one write command at a time. Any sensible file-system or database should have no trouble recovering from this.

Other SSDs have a big write cache. Assuming that they correctly handle write barriers and cache flush requests - they also should have virtually no risk of data corruption. (I don't know of anyone who has tested consumer level SSD - this may be difficult as the caches may not flush in obscure cases. There are some HDs around that don't correctly handle OS write barriers, and may write order-critical data out-of-order, necessitating OS patches to ensure data integrity).

Enterprise level SSDs with cache, feature a super capacitor - When power is lost, the capacitor holds enough power to operate the drive for a few seconds, giving enough time for the contents of the RAM buffer to be dumped to flash before the drive shuts down. In this case, there should be no risk of data loss due to power failure.

The big danger, as sub.mesa points out, is high-end RAID cards with write cache. These dramatically accelerate writes by breaking the write order requirement, which normally limits hard drive performance. This should be safe, because the cache RAM is protected with a battery pack (or on the latest cards, a super capacitor and flash memory). But without a battery pack, catastrophic and potentially irreprable file-system corruption can easily occur on power failure.
 
Last edited:

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
without a battery pack the raid card (any reasonable one) will refuse to write back. (case in point any smart-array hp card). also the memory is ECC with OTP; if it ever detects an ECC error - it will mark itself faulty and never function again kinda like your car's ecu or eFuse from the droid.

the super capacitor is just the same ram; more of it; when the system detects a loss of power it copies the ram to flash which will occur in the time the capacitor holds the charge. its actually brilliantly simple. the first time i saw this design was in the dothill san's we used. (who oem's these out to many top manufacturers).

battery back write cache is EOL this fall; you will notice all systems ship with FBWC now instead of BBWC because the battery life (72 hours new) degrades about 33% per year until it decides it is no longer viable in which a good controller (ahem smart array) will disable the function of the write-back instantly and notify you with its various agents for all o/s (CIM/acu/on boot).

the reason you want this is in event of failure that damages the system you could in fact take the cache and the drives to another system and it will indeed boot up and flush the cache.

Why is this important? I've seen alot of bugs in my days where the systems shutdown improperly (Too fast) for the controller to flush [on battery]. 72 hours later your raid is not consistent. A hard lock and reset or ASR (say a cpu goes kaput) also can cause issues that the battery or flash back write cache will get the data out.

nowadays 512meg to 1gb is pretty standard. i've seen some raid cards using ginormous amounts of ram - and software assisted arrays use all ram for various stages of cache.

I have two systems; one with no battery and one with a battery; the performance when tuned (aligned, stripe size, read/write cache ratio) for your application (say sql) is monumental.

What's funny is the modern raid controllers are too slow to handle SSD; not the ram that is fast. the cpu and bus ;)
 

wangli1147

Junior Member
Jul 14, 2010
2
0
0
:)Emulex,Mark R and sub.mesa, my friends!
Thank you very much!
Your help is very important and valuable for me!
 

NP Complete

Member
Jul 16, 2010
57
0
0
Just a point on the X-25 m/X-18 m - I believe that neither of them has a capacitor to prevent data loss during sudden power loss. Further, according to Anand's tear down http://www.anandtech.com/show/2614/10 the controllers do use a dram chip (256 KiB would be very difficult to fit the firmware in + read/write buffer).

Further, the initial Intel TRIM FW was recalled due to some sporadic data corruption, which I believe was due to the drives data mapping tables not being flushed back to the drive when the power was removed to quickly (i.e. when the system shut down too fast). Enterprise class drives fix this issue with a capacitor that allows the drive to flush it's buffers in a power loss event, and mechanical hard drives are actually able to use some of the spindles own momentum to finish flushing the buffers in a power loss event.

For any mission critical data storage on the X25/X18 m series, I would recommend using a UPS to prevent data loss.
 

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
A word of caution for using a UPS. Some folks will say a UPS is suitable to replace a battery. It is NOT! Just as RAID is NOT a replacement for backup. A UPS may protect you in the event of a power failure however other things do happen such as power supply failures, system failures (STOP events, etc.) even accidentally tripping over the power cord between the UPS and PSU! :D

Some hosts positively will NOT allow their cache to be configured in full write back (sometimes called delayed write) mode without a battery pack installed. Some do. (Areca does) It was rumored that Areca hosts allowed even longer delays for flushing intervals with a BBU installed. I cannot confirm it does as it seems the same either way but obviously the risk is real without a BBU. (especially with a 4GB cache!)

Additionally I can affirm that Intel SSDs are quite tolerant of "unsafe" shutdowns. I have one with hundreds logged in SMART and the worst thing to happen was a CHKDSK was required. Never a real problem. :)
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
whats even more trippy is when the cache fails (just happens sometimes) it will usually take out a drive or two (mark them bad). personally once a drive is marked bad i do not want any more. i might do the ole pull/push to get the raid back to safe state with a low priority rebuild (high priority stresses the rest of the drives in a situation where you do not want stress - rebuild) until a new replacement can be done.

you ever see that areca raid controler with a single (!!) fan that looks like it came off an old geforce mx card? WTF. talk about a massive fail. when the fan bearings go it will be a bad day for that storage system.

areca is the go-bots of raid. i mean look at the user interface. Engrish and bad html. i can undertstand wanting to save some bucks - find an ole p400 (if compatible) and throw a new battery on it and bust out ebay fan-out cables.

Good raid controllers also proactively sweep your drives for issues. You don't really want a loss of power in the event of a clean up - you do want the raid controller to handle remaps - that is its job.

Alot of the new NAS/RAID devices i see now do not even listen to tler/cer any more. they ship AV drives which have TLER=none - (0 seconds to recover sector) - the controllers just do their business. pretty slick way of dealing with the drive compatiblity issues but honestly their forums are full of people with massive failures still.
 

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
I haven't seen these issues with any of my SAS hosts.
I'm not a big fan of the fan (sorry for the pun!) but it works and I've yet to have one fail three years out being the longest. Also the fan's speed is monitored and if it fails it will sound an alarm. Everything is SNMP trappable with these configs so no problem here. (but where can one get a spare fan?)

There is an option to turn OFF fan monitoring and not use a fan but you MUST have good and I mean good airflow or the CPU will hit 70C and raise alarm...

I've had the occasional 1bit DRAM ECC error in the event log of the host but I replaced the cache module with a 4GB Micron ECC DIMM and it has been rock solid. The previous Hynix 2GB module would raise the "1bit ECC" alarm about once a month under heavy sustained i/o. They say this is hardly critical just ECC picking up a DRAM error but memory is cheap so I tossed it and got a larger module at the same time!

Also dropping arrays - it would seem the biggest complaint is with SATA drives. I ran into some issues with firmware(s) not liking WDC 640GB drives when they came out and such. Luckily my SAS arrays have been rock solid. Had a disk die (Fujitsu MBA SAS 15K) and replaced it and the array (RAID6) started rebuilding immediately. It was optimal in under two hours which wasn't bad for a sixteen 300GB disk array. :)