Will 3rd gen SSDs support DRAT?

Mark R · Nov 10, 2010

Has anyone heard anything about possible support for DRAT on new SSDs?

DRAT is a modification of the TRIM command, that should make it practical to support TRIM in RAID.

One of the problems with TRIM is that when a sector is TRIMmed, if the host tries to read the sector, the drive is permitted to return anything (zeros, garbage, etc. - and this does depend on the drive - e.g. some drives encrypt the data in flash, so all zeros, may be 'decrypted' as garbage - indeed, reading a TRIMmed sector could theoretically give different data at different times). In other words, TRIM is 'indeterminate'.

This is a huge problem with RAID, as if you TRIM a sector on a redundant array - you lose the redundancy, as the redundancy only works if the host can be certain what data is stored in a particular sector. It would be possible to TRIM a whole stripe - but, it then becomes impossible to check the RAID array for data integrity, unless you keep track of which stripes you have TRIMmed (which is disastrous for performance, not to mention complex with potential for funny bugs, etc.).

DRAT stands for 'determinate TRIM'. A drive that supports DRAT, guarantees that a TRIMmed sector will be zeroed-out. This allows a RAID engine to preserve consistent parity/mirroring without complex, performance crippling workarounds.

pcunite · Nov 13, 2010

A very high traffic forum I visit is not using raid with its SSD drives, but rather utilizing software to write out dual mySQL (and other data) backups. Everything else about SSD has made it worth it for them. Good to hear issues are being worked on. I guess proper and tested RAID support is still a few years away?

taltamir · Nov 13, 2010

that doesn't make any sense... writing 0s to an SSD whenever it is trimmed would be MUCH worse then not trimming it at all!
Besides which, the host CAN NOT read a sector directly, it asks to read a sector, and the drive uses a lookup table to determine in which sector it is really located (which can and DOES change thanks to wear leveling) and then return that result...
However, what would make sense is standardization of the response of the SSD controllers when they are asked to read a sector that is marked as "empty"... simply return all 0s without even reading it.
and no, tracking that would not be "disastrous" to performance because the drive ALREADY tracks that very same info as part of wear leveling AND trim.

AFAIK the problem with passing TRIM to RAID has nothing to do with being determinate, but with merely supporting it. Remember that TRIM could not even be passed into non member drives when RAID mode was on for a long time. While with RAID1, RAID0, and RAID10 the issue of being non-determinate doesn't exist.

I will grant that consistency is important for RAID5 and RAID6 though (although you really shouldn't be using those because of their myriad issues (including the write hole)... use a more advanced equivalent (unraid, raidz, etc))... I can see DRAT being added as a standard feature, it should be really simple to codify that it returns 0s or 1s (1s will be better for SSDs) for a trimmed sector.

VirtualLarry · Nov 13, 2010

I don't know if "all zeros" is such a wise idea. I thought that when a sector was erased, reading it back returned all "FF"s. At least that was the whole theory behind "Tony trim", used on the indilinx drives under XP, and the basis of the AS SDD wiper, with the "write FF" checkbox.

It would seem questionable to me, if writing "FF"s to the sector caused the controller to erase the sector and mark it in the page table as empty, but returned all zeros when read back.

This requires further thought on the part of the drive/controller mfgs, I think.

taltamir · Nov 13, 2010

Interesting info VirtualLarry, but how would it return FF in binary?
In hex F is 16. Simple hex to binary conversion shows that FF = 11111111.
In other words, it is just writing all 1s to the drive. This is logical because in an SSD an ERASED cell is a 1, and you want to erase them as part of trim.

However, trim does NOT write 1s to the location immediately... what it does is inform the controller that the area contains erased data, the controller then marks it as such in its lookup table, so that the next time it erases the group of sectors (it erases 128 sectors at a time) it will leave said sector blank (all 1s) and ready to write. It can prematurely erase it if needed, or wait until all of the data is marked as unused... and all unused space is treated as spare space by the controller... in other words, trim is a lot more then just erasing data. And it is only made possible because the controller fully tracks the condition of every sector of the SSD. So when you ask for a sector that has been trimmed, the controller already knows it was trimmed and has no issue with simply returning all 1s, or all 0s, or actually reading the last sector it pointed at (which is rather silly when you think about it)... Codifying that behavior would be useful.

i personally hate it when rough workarounds are made for backwards compatibility... they always break things later...

it would make more sense to have the drive return a code that states that the area contains no data (aka has been trimmed) and modify RAID controllers to intelligently handle such a a thing rather then have them lie to the controller by returning 0s. This makes sense because a controller that hasn't been specifically modified to work with raid already can not pass raid to the member drives anyways.
That being said, it is probably not a big deal.

Voo · Nov 13, 2010

Mark R said:
This is a huge problem with RAID, as if you TRIM a sector on a redundant array - you lose the redundancy, as the redundancy only works if the host can be certain what data is stored in a particular sector. It would be possible to TRIM a whole stripe - but, it then becomes impossible to check the RAID array for data integrity, unless you keep track of which stripes you have TRIMmed (which is disastrous for performance, not to mention complex with potential for funny bugs, etc.).

Ok, I don't understand that. I mean the disk itself only distinguishes between valid and invalid data. Even if the raid controller stores redundand data on it, it don't know and doesn't care - only the raid controller has to know, for the SSD it's data like any other kind. So it isn't allowed to TRIM it anyhow, otherwise you could overwrite the redundant data.
You don't even access blocks you just tell it the LBA value you're interested in, so you don't even know WHICH physical sector you're reading/writing to.

Someone has an explanation for me?

Mark R · Nov 13, 2010

taltamir said:
However, trim does NOT write 1s to the location immediately... what it does is inform the controller that the area contains erased data, the controller then marks it as such in its lookup table, so that the next time it erases the group of sectors (it erases 128 sectors at a time) it will leave said sector blank (all 1s) and ready to write. It can prematurely erase it if needed, or wait until all of the data is marked as unused... and all unused space is treated as spare space by the controller... in other words, trim is a lot more then just erasing data. And it is only made possible because the controller fully tracks the condition of every sector of the SSD. So when you ask for a sector that has been trimmed, the controller already knows it was trimmed and has no issue with simply returning all 1s, or all 0s, or actually reading the last sector it pointed at (which is rather silly when you think about it)... Codifying that behavior would be useful.

The behavior was codified originally. When a TRIMmed sector is read, it is allowed to return any data, because the sector is 'unused' and therefore could contain anything. However, there was the strong recommendation that if a sector is TRIMmed it should never reveal data that was previously written to another sector (i.e. if the drive remaps sectors for wear-levelling, stale data from an different sector should not be revealed).

The new recommendation is that when a TRIMmed sector is read, the SSD controller will recognise that the sector is blank and contains no data - and should therefore return dummy data of 'all zeros'. Drives that follow this behavior are able to communicate this to the host by stating that they support DRAT when queried by the drivers/RAID card.

Ok, I don't understand that. I mean the disk itself only distinguishes between valid and invalid data. Even if the raid controller stores redundand data on it, it don't know and doesn't care - only the raid controller has to know, for the SSD it's data like any other kind. So it isn't allowed to TRIM it anyhow, otherwise you could overwrite the redundant data.
You don't even access blocks you just tell it the LBA value you're interested in, so you don't even know WHICH physical sector you're reading/writing to.

You issue a trim command on an LBA - this tells the SSD to recycle the LBA. This may (or may not) result in the data on the flash being blanked.

So let's say that LBA 12345 contains the data 'hello world!'. If you send 'TRIM 12345' and then read 12345 back, you may get 'hello world!', all zeros, all FF, garbage, etc. back - depending on how the SSD controller is programmed and whether the flash block has been erased or not. It is theoretically possible that the first read might come back as 'hello world!' but 24 hours later, after the drive has done wear levelling and garbage collection, a second read might come back as all zeros.

Let's say the drive is part of a RAID5 array. You write 'hello world!' to the sector. The RAID controller reads the remainder of the stripe, calculates the parity and writes the parity to the parity drive. You then issue a TRIM command. A check read comes back as 'hello world!'. The RAID controller concludes that the parity is correct.

Some time later, the SSD controller blanks the flash. Now, reading the sector gives FFs. If another drive in the array dies, there is a problem. The data will be reconstructed using out-of-date parity, leading to data corruption when the array is rebuilt.

The advantage of DRAT, is that if the RAID controller knows that when it sends a TRIM, the data in that LBA will immediately appear to be all zeros, it can immediately recalculate the parity. This way, there is no surprise data corruption from out-of-date parity.

One work around is if you TRIM enough data to cover all the drives in the array. E.g. if you have 3 drives in RAID 5 with 64k stripes - and you TRIM 128k, there is an entire RAID stripe that contains no data. The parity doesn't matter, because there's nothing useful stored in that stripe. The problem is that if you need to check the RAID array, there is no way to know whether that stripe is useful or not. Because there is no data in that stripe, the parity is likely to be invalid - and the RAID controller will warn you that the RAID array is corrupted - even though it isn't. Some people have proposed upgrading RAID controllers to keep track of which stripes have been completely TRIMmed - so that during an array check, any errors on those stripes are ignored.

Emulex · Nov 13, 2010

I'd love to see that opensolaris raid-5 like protocol redone with leveling (DROBO anyone?) using an SSD to contain the ECC for a crapton of drives in a dataset.

7 2TB drives + 128gb ecc ssd

Voo · Nov 13, 2010

Mark R said:
Let's say the drive is part of a RAID5 array. You write 'hello world!' to the sector. The RAID controller reads the remainder of the stripe, calculates the parity and writes the parity to the parity drive. You then issue a TRIM command. A check read comes back as 'hello world!'. The RAID controller concludes that the parity is correct.

Ah yeah makes sense, since you don't compute parity on a per sector basis that will indeed be a problem

taltamir · Nov 13, 2010

Mark R said:
The behavior was codified originally. When a TRIMmed sector is read, it is allowed to return any data, because the sector is 'unused' and therefore could contain anything. However, there was the strong recommendation that if a sector is TRIMmed it should never reveal data that was previously written to another sector (i.e. if the drive remaps sectors for wear-levelling, stale data from an different sector should not be revealed).

The new recommendation is that when a TRIMmed sector is read, the SSD controller will recognise that the sector is blank and contains no data - and should therefore return dummy data of 'all zeros'. Drives that follow this behavior are able to communicate this to the host by stating that they support DRAT when queried by the drivers/RAID card.

Maybe we have different definitions for "codify"... if the behavior has been codified to be "do whatever you want" its not really codified IMAO. If you now modify it to be "do something specific" (aka, return dummy data of all 0s), thats codified.
Regardless of my understanding and belief (which is the above), the way I actually specifically used the word is as "regardless of how current behavior is codified, I agree we need to codify a new behavior called DRAT"

Will 3rd gen SSDs support DRAT?

Mark R

Diamond Member

pcunite

Senior member

taltamir

Lifer

VirtualLarry

No Lifer

taltamir

Lifer

Voo

Golden Member

Mark R

Diamond Member

Emulex

Diamond Member

Voo

Golden Member

taltamir

Lifer

TRENDING THREADS