SSD defragmentation?

ashetos · Aug 29, 2013

This is disappointing. I put time and effort on this thread and it was not worth it. I hoped the technical forums would be more serious and less juvenile. I give up, you can run your gui benchmarks and parrot common misconceptions, I'm out of here.

mrpiggy · Aug 29, 2013

The problem with your test is that it is device dependent. All I/O to the NAND is done through the SSD's controller (there is no way around that). What the OS sees externally is NOT what happens at the controller-to-NAND I/O level and every different SSD model/brand does it differently in the optimizations between controller the firmware and it's NAND mapping tables. This means that depending on the device's firmware algorithms, your test will vary in results between different SSD's. This internal logic is done despite what the "direct i/o" commands are and output (speed in your case) is dependent on the optimization of the firmware in interpreting the external commands to actual internal action. Actually forcing a particular "direct i/o" read/write/location patterns upon the SSD's controller, can cause the controller to perform "less-than-optimal" internal firmware algorithms as the firmware engineers always look to optimize controller algorithm priority to "most common" expected file system I/O usage patterns (not particular ones as your test demands). So there might be some conflict in that regard. The internal r/w algorithms are also always balanced with firmware algorithms governing wear leveling optimizations which tend to have a constant high priority within most SSD's. So while your test assumes there should be nothing going on between or during external direct i/o commands, the SSD's firmware may very well have additional internal operations being performed.

Simply put, the controller is the valet to the parking spaces. No matter how you (direct i/o commands) demand how and where your car is parked, once the valet (controller) gets the cars (data), he will park the car how and where he sees fit despite whatever you tell him (i.e. he might swing by the basement bathroom to relieve himself). His efficiency or speed at parking cars is limited by the rules (firmware) he has to work by given by his employer (controller) and is going to be different between different garages (SSD model/brands).

You basically need to try your test across different SSD's. If the percentage difference between seq vs random data speed stays constant,then perhaps you have a case. if not, then it is more likely simply controller firmware optimizations such as NAND table mapping handling.

imagoon · Aug 29, 2013

The only time I have seen fragmentation be a problem so far is large databases (exchange mail is an example) and only in certain cases. The SSD part is minimal but the NTFS fragmentation can start slow down a table walk in SQL due to more work needing to be done by NTFS.sys to return the data. Exchange can do the same during a mass database search. However we are talking 100,000+ fragments before we found defraging using something like contig made a noticeable difference.

Lorne · Aug 29, 2013

Tecnico1931 said:
look up trim

hot120 said:
I am talking about access time on an SSD being the same. Transfer time is different. Your post is all over the place. Once again, regardless of the location of the data on the NAND, access time is the same. Transfer time may be different due to TRIM and the work taking place within the SSD controller. Regardless, fragmentation on an SSD is not comparable to a HDD. Access time is directly affected by fragmentation. That is what you should be chasing.

Still not getting it, Take TRIM and comparrisons of HDD out of the picture completely.

Mrpiggy, Close, But do the alogrithyms have an affect in a read only enviroment?

heres a better eg, And its a read only test and does not matter if it is direct IO read or FS read.
Said drive (SSD) has a large file on it in a perfect pattern across all NAND it uses all nice and tidy, A test shows read speed = X.
Now (jump back in time) same drive with same large file but its blocks are located all over the place (random) on this drive, A test shows read speed = Y.
Results show transfer speed of X > Y by 10-40MBps.
Why?
Drives will not show this problem when new and it takes a long time of read writes to fragment files enough to make a difference and small files even less, But Athetos test show it is there non the less.
Over time this latency adds up.

Imagoon.
We get the same thing with our production software that uses sql, I remove old jobs out and compact it often or we get a huge latency and it really shows on older POS computer systems.

Athetos.
You have something here and dont think you should give up, Just try to get Anand to test this on there bench and use up the wearlevel on there drives.

Cerb · Aug 29, 2013

Without TRIM, or with the drive being mostly full even with TRIM, fragmentation within the NAND happens as it gets used, and it gets slower. Slower, because the likelihood that it may need to read from another die on the current channel, or another block on the current die, increases, as soon as it reaches some limit of its erased NAND (they pretty much all try to keep some amount of the spare area unprogrammed for fast small writes). However, once that reaches whatever critical limit it needs to to become steady, it will then stay around that same speed. FI, the Samsung 840 is well-known for being able to drop down to ~80MBps writes, such that without making lots of free space and TRIM, it's going to stay that way. That could be a problem upgrading, for some people (no or manual TRIM), and could be a problem if you get a 120GB drive for 100GB of data (even with TRIM).

After it's been filled enough, it won't matter whether the writes from the host are sequential or random, because the controller has to deal with it as random, to a fair degree. It may be able to optimize the writes for performance, and not perform so badly, or it may optimize the writes for WA, and perform somewhat badly, or go in between. Either way, unless you give it extra OP space, or use TRIM and keep lots of free FS space, fragmentation of the NAND, and probably controller mapping data, depending on how that's done, are simply going to be part of using an SSD.

X-bit Labs' reviews test this very thing, in a much simpler manner than AT's consistency tests (which amount to the same thing, but with at least 2 more dimensions of presented data). Example:
http://www.xbitlabs.com/articles/storage/display/sandisk-ultra-plus_5.html#sect0

Now, sequential v. random on a fresh drive, is simply too complicated, given what little we generally know about the firmwares. As much as some of it can be NAND, some could be software running on it, and most vendors don't go spelling out what they do.

mrpiggy · Aug 29, 2013

Lorne said:
Still not getting it, Take TRIM and comparrisons of HDD out of the picture completely.

Mrpiggy, Close, But do the alogrithyms have an affect in a read only enviroment?

heres a better eg, And its a read only test and does not matter if it is direct IO read or FS read.
Said drive (SSD) has a large file on it in a perfect pattern across all NAND it uses all nice and tidy, A test shows read speed = X.
Now (jump back in time) same drive with same large file but its blocks are located all over the place (random) on this drive, A test shows read speed = Y.
Results show transfer speed of X > Y by 10-40MBps.
Why?
Drives will not show this problem when new and it takes a long time of read writes to fragment files enough to make a difference and small files even less, But Athetos test show it is there non the less.
Over time this latency adds up.

The controller still bases all actions on internal NAND tracking tables for reading and writing. The tables, though they are manipulated in faster RAM, are organized and re-organized constantly, with random fragmented data, I also assume that the controller spends more time interacting and manipulating the tables with large random fragmented data, compared to large amounts of sequential data. As the tables in RAM are also frequently written to permanent memory (In case of power outages, however I don't know how often) perhaps it is doing more in the background with large random data reads, since there is still a lot more location tracking, table manipulation involved, just because it's random (not talking about actual NAND data locations, but the tracking/mapping tables for the NAND memory bits).

I do imagine that some of the table organization algorithms involves optimizing the tables so often, but the r/w hits might be bigger if it has not done so yet. Maybe see if your random read back speed increases after some period of time?

hot120 · Aug 30, 2013

So, did we agree that an SSD does NOT need to be fragmented, and that the OP's questions have been put to rest? Or, is this still a black hole that cannot be escaped from? It's very interesting, but meaningless. Since none of us know what is going on within the controller, speculation of this kind will remain just that: speculation.

pandemonium · Aug 31, 2013

While this may not be a direct answer to the topic at hand, it should provide some insight to the durability of SSDs.

TechReport's synthetic SSD longevity test.

Cerb · Aug 31, 2013

hot120 said:
So, did we agree that an SSD does NOT need to be fragmented, and that the OP's questions have been put to rest? Or, is this still a black hole that cannot be escaped from? It's very interesting, but meaningless. Since none of us know what is going on within the controller, speculation of this kind will remain just that: speculation.

Neither, because of that last sentence. SSDs can surely use plenty of tricks to get sequential access a little faster, but fragmentation, with SSDs, and probably future HDDs, for that matter, is a 3-layered problem:

1. FS fragmentation, which needs to modification of the FS to handle well (basically, defrag while writing new contents of files being overly-fragmented, or otherwise prevent it from getting too bad w/ some files).

2. Mapping structure complexity. Different SSDs do it different ways, and there's no one true method of arranging them. Sorted trees can often be compacted to be fast, but may get larger/deeper, and thus slower, when many small writes occur, as one hypothetical I can think of. Random-access type B-trees would have the same kinds issues, in that the device could very well slow down while being used, if the trees are always kept balanced. But, smaller structures would mean more of the mapping data being inside the controller, rather than out in DRAM. If that could be done, that itself might be a net win, compared to having to wait on DRAM, since even at a only several hundred MHz, SRAM accesses and ALU cycles are generally cheap compared to DRAM addressing. There's not a perfect solution here, and different methods will influence (but not truly decide) the drive's random access performance.

As an aside, I wouldn't be surprised if some drives start using the pseudo-SLC modes available as a means of not using DRAM, and simply go from SRAM straight to pseudo-SLC space, as a means to save cost and power (or, using that to be able to use less DRAM, and power it off when idle, without sacrificing as much performance as doing that with plain MLC or TLC would).

3. NAND v. LBA mapping (NAND fragmentation). Repeated accesses to different blocks and/or dies on the same flash channel are going to be slower than accessing another. But, simple RAID-like, or effectively random, channel mapping to LBAs is a must for performance with lager accesses or sparse accesses. Ideally, you will get one access to one channel, than the next to another, so they can be overlapping in the SSD itself. A deterministic pseudo-random layout, FI, could prevent the problem Lorne talks about, at the cost of lower plain sequential performance.

Lorne · Sep 3, 2013

Just out of curiosity and its been a long time since I got into the technicals of HD sector mapping,, Does the data block still keep the address of the next data block at the end of its data? Or has this been moved to the FS.

Fred B · Sep 4, 2013

Have done a simple test to see the effect from fragmentation on ssd . it is not huge but there is difference .I put a 182 MB ping file in 167 fragments and the same ping file with no fragments on an ssd , XM-25 40GB , measure the time it takes to load the ping file . The non fragmented reads with less iq it reads 64k size and the fragmented is a mix of numbers .Done this test with bootvis by putting the pingfile in the autostart

Non fragmented 1.06 seconds
Fragmented 1.25 seconds

hot120 · Sep 4, 2013

Fred B said:
Have done a simple test to see the effect from fragmentation on ssd . it is not huge but there is difference .I put a 182 MB ping file in 167 fragments and the same ping file with no fragments on an ssd , XM-25 40GB , measure the time it takes to load the ping file . The non fragmented reads with less iq it reads 64k size and the fragmented is a mix of numbers .Done this test with bootvis by putting the pingfile in the autostart

Non fragmented 1.06 seconds
Fragmented 1.25 seconds

Nice! But, is that a reason to defragment the SSD? For a 0.19 difference in time? And what would the effect of defragmenting be? Would that speed things up? Is it possible for you to test that? Very interested to see what you come up with.

Fred B · Sep 5, 2013

I would not advise to run a defrag program on ssd just like that , there are a lot of things that can be done to take away a big part of the fragmentation with no tools .
But i have done a defrag on the ssd with 50 procent free space on it ,it is a dual boot pc with multiple drives , so it is easy to get rid of fragmentation by removing or copying the file back and forward to hd and back to ssd .

non fragmented boot in 11.40 -disk iq count 7426- disk service time 1118.81ms - disk iq time 1363.51ms- cpu samples 22800

fragmented Boot in 11.48- disk iq count 7961- disk service time 1165.16-disk iq time 1562.24ms-cpu samples 22988

The workload seems better ,top is fragmented

taltamir · Sep 5, 2013

ashetos said:
By alternating rw between randwrite and write I conducted sequential and random write experiments.

How does a write speed experiment relate, in any way shape or form, to defragmenting already written data in order to accelerate the reading of those files? (and your followup claim of improved SSD longevity when defragmented).
I would think the proper experiment to prove/disprove your hypothesis would be a READ speed experiment of defragmented vs fragmented files

I am not saying you are lying about the data because honestly I haven't even bothered reading it... because the whole PREMISE of the experiment is completely off

Fred B · Sep 5, 2013

Think you can say there is some difference but not much like from hd , and the faster the drive + system the less fragmentation take place . So it is dubble bonus with ssd ,they are so fast it is not a big deal .
For comparison i put the same ping file fragmented on sata 1 hd and it takes 18 seconds to load

Search

SSD defragmentation?

ashetos

Senior member

mrpiggy

Member

imagoon

Diamond Member

Lorne

Senior member

Cerb

Elite Member

mrpiggy

Member

hot120

Member

pandemonium

Golden Member

Cerb

Elite Member

Lorne

Senior member

Fred B

Member

hot120

Member

Fred B

Member

taltamir

Lifer

Fred B

Member

TRENDING THREADS