Why does TRIM require a new SATA command...

glugglug · Jun 4, 2009

In most (ALL?) modern OSes, memory pages newly assigned to a process are zero pages, both for performance (can use COW from a single reserved zero page) and for security (don't want to reuse a physical page containing data another process deallocated).

Newly created files (if created with non-zero size) are by default filled with zeroes just like newly allocated memory.

A full format writes zeros to the bulk of the drive.

IMHO a good SSD controller should be checking the data in each cluster before writing to see if it contains all zeros. If it does, it should be treated like a TRIM, and just remove the logical cluster from the logical -> physical mapping; don't really do the write. And to keep compatibility, whenever reading a cluster that has been trimmed in this manner, just return all zeros instead of an error.

I realize this would use more SATA bandwidth than an explicit TRIM command, and add a tiny bit of extra overhead for the controller to do the check - although we are talking about a really tiny amount of extra overhead - if the cluster isn't all zeroes this should usually be detected on the first few bytes. But there would be several advantages:

1. Blocks not to write would get naturally detected in situations where the OS would not normally be issuing a TRIM command, i.e. new memory mapped file created, or file pre-allocation turned on in Azureus.

2. It becomes trivial to write an app to TRIM all available space on an OS without explicit TRIM support just by filling the drive with a file full of zeros. On NT based OSes, you don't need to fill the whole drive at once to do this, just use the NtFsControlFile API with a control code of FSCTL_MOVE_FILE to move a medium sized file full of zeros through each range of free clusters.

Moved to appropriate forum - Moderator Rubycon

bsobel · Jun 4, 2009

IMHO a good SSD controller should be checking the data in each cluster before writing to see if it contains all zeros. If it does, it should be treated like a TRIM, and just remove the logical cluster from the logical -> physical mapping; don't really do the write. And to keep compatibility, whenever reading a cluster that has been trimmed in this manner, just return all zeros instead of an error.

The point of TRIM is to ensure that the SSD has a chance to free the block long before an actual write need to occur to avoid the read/modify/write overhead otherwise required.

1. Blocks not to write would get naturally detected in situations where the OS would not normally be issuing a TRIM command, i.e. new memory mapped file created, or file pre-allocation turned on in Azureus.

Since the drive is not an object store, you then run into allocation issues since the number of blocks in use in the drive doesn't match the os's view of available space. Your basically creating sparse storage without any way of informing the OS of this, this would break in many scenarios.

glugglug · Jun 4, 2009

Originally posted by: bsobel

IMHO a good SSD controller should be checking the data in each cluster before writing to see if it contains all zeros. If it does, it should be treated like a TRIM, and just remove the logical cluster from the logical -> physical mapping; don't really do the write. And to keep compatibility, whenever reading a cluster that has been trimmed in this manner, just return all zeros instead of an error.

Click to expand...

The point of TRIM is to ensure that the SSD has a chance to free the block long before an actual write need to occur to avoid the read/modify/write overhead otherwise required.

1. Blocks not to write would get naturally detected in situations where the OS would not normally be issuing a TRIM command, i.e. new memory mapped file created, or file pre-allocation turned on in Azureus.

Click to expand...

Since the drive is not an object store, you then run into allocation issues since the number of blocks in use in the drive doesn't match the os's view of available space. Your basically creating sparse storage without any way of informing the OS of this, this would break in many scenarios.

What situation would this break, as long as the drive returns a block of zeroes for any implicitly trimmed clusters?

bsobel · Jun 4, 2009

What situation would this break, as long as the drive returns a block of zeroes for any implicitly trimmed clusters?

Well, in the pre-allocation case you'd have the drive allocating less clusters than is required to back the file. The OS wouldn't be aware of the difference so (if you had multiple writers doing this) you could run out of space on the device with a file the OS thought had backing store

In your scenario, when a file is deleted, how do you see that backing store being released, the OS writing 0's to the entire file?

glugglug · Jun 5, 2009

Originally posted by: bsobel

What situation would this break, as long as the drive returns a block of zeroes for any implicitly trimmed clusters?

Click to expand...

Well, in the pre-allocation case you'd have the drive allocating less clusters than is required to back the file. The OS wouldn't be aware of the difference so (if you had multiple writers doing this) you could run out of space on the device with a file the OS thought had backing store

On an SSD, not actually allocating physical clusters is preferred behavior. The # of logical clusters the drive tells the OS it has is less than the number of physical clusters that exist, and the logical cluster availabilty visible to the OS is going to be based on what the volume bitmap says, not whether those logical clusters have been assigned physical ones or not. On a conventional drive there is a 1:1 mapping of logical clusters to physical sectors (well 1:8 really since a sector is usually 512 bytes with the cluster 4K...). On an SSD this is not the case.

In your scenario, when a file is deleted, how do you see that backing store being released, the OS writing 0's to the entire file?

Yes. The idea being it doesn't actually require OS or even controller support since a third party application can easily write zeros to the file before deleting (or a separate utility you run similar to a defrag could do it to all free space). It could be awhile before RAID controllers are propagating the TRIM commands. But if the drive treats zero blocks in this manner, the controller doesn't need to know about it as anything special.

The explicit TRIM would still be the preferred method of marking the data in the logical clusters disposable, if it is supported at all layers. But this would make it possible to trim without OS and/or controller support, and would avoid some writes that the explicit trim wouldn't realize are unnecessary. I think you might be surprised how many files on your drive contain clusters of all zeros. The drive could even detect this on a 512 byte sector level instead of a cluster level. Most filesystems use 4K clusters nowadays. If you have a 1.5K file on your drive, the last 2.5K of the 4K cluster occupied by that file is zeroes. The 4K logical cluster is backed by 512 byte logical sectors on the disk. A smart controller could treat the 5 sectors in that last 2.5K of the 4K cluster as trimmed.

imagoon · Jun 9, 2009

TRIM is just a way for the OS to tell the SSD that a certain set of blocks has been deleted. Since most SSD's today can only erase entire 512KB Blocks of pages, a 512Bytes cluster delete will automatically result in a 512KB page being read purged and rewritten. Most OS's do not clear the contents of a cluster when it is deleted, it is simply marked as unallocated. Most disks running at the block level don't actually know (or care really) that certain sectors are "unallocated" as the OS handles that. TRIM allows the OS to tell the SSD controller on deletes that a sector has been deleted which allows the controller to read back the entire 512KB block to cache, erase the block and write it back with the empty sectors ready for use.

The other option is for the OS to "clear" that sector during a write operation, which depending on how fragmented the disk is can drastically increase response time. Take this hypothetical: OS says: write this 100 512byte file to disk. The disk has 100 blocks each with 1 512byte sector open. A trimed disk would write all 100 clusters at the normal write speed. An untrimmed disk would need to read 100 times the entire 512KB blocks for each sector write, clear the entire 512KB block and then send it to disk. Since this is unnecessary at on a magnetic disk, 100 writes would occur pretty close to native speed. However on SSD the controller now has to read 51200KB, and then write 51200KB to write that 51200Byte file.

Why would you want the OS handling low level hardware access to the SSD? TRIM instructs the controller to clear out "the junk" so the OS doesn't have to do it later. TRIM also lets the OS know ahead of time that it doesn't need to handle zeroing the disk (and thus incurring performance penalties of either a) reading the block or b) actually clearing it, and then reading it again to verify it is empty.) TRIM also is supported, by the way it is defined, to handle other disk configurations. What if the next SSD uses 2048KB blocks? The OS now needs to be configured to handle the block size changes or it will miss sections of blocks making the controller have to go back to the old method of clear before write.

Your method of adding zeros to the end of the file to erase deleted clusters during a write would negatively impact write speeds for the reasons above. SSD goes from 0's> 1's only once in small section. going from 1 > 0 requires an entire block clear.

Edit:

Ton of info here: http://www.anandtech.com/stora...howdoc.aspx?i=3531&p=1

Search

Why does TRIM require a new SATA command...

glugglug

Diamond Member

bsobel

Moderator Emeritus<br>Elite Member

glugglug

Diamond Member

bsobel

Moderator Emeritus<br>Elite Member

glugglug

Diamond Member

imagoon

Diamond Member

TRENDING THREADS