Go Back   AnandTech Forums > Hardware and Technology > Memory and Storage

Forums
· Hardware and Technology
· CPUs and Overclocking
· Motherboards
· Video Cards and Graphics
· Memory and Storage
· Power Supplies
· Cases & Cooling
· SFF, Notebooks, Pre-Built/Barebones PCs
· Networking
· Peripherals
· General Hardware
· Highly Technical
· Computer Help
· Home Theater PCs
· Consumer Electronics
· Digital and Video Cameras
· Mobile Devices & Gadgets
· Audio/Video & Home Theater
· Software
· Software for Windows
· All Things Apple
· *nix Software
· Operating Systems
· Programming
· PC Gaming
· Console Gaming
· Distributed Computing
· Security
· Social
· Off Topic
· Politics and News
· Discussion Club
· Love and Relationships
· The Garage
· Health and Fitness
· Merchandise and Shopping
· For Sale/Trade
· Hot Deals with Free Stuff/Contests
· Black Friday 2013
· Forum Issues
· Technical Forum Issues
· Personal Forum Issues
· Suggestion Box
· Moderator Resources
· Moderator Discussions
   

Reply
 
Thread Tools
Old 08-24-2013, 05:29 AM   #1
ashetos
Member
 
Join Date: Jul 2013
Posts: 128
Default SSD defragmentation?

So I've been thinking, we all know that defragmenting an SSD is so write-heavy that it should be avoided at all costs, due to the reduction in SSD life-time;

but...

I keep wondering what the impact of a fragmented NTFS file-system is on:
1) the persistent re-mapping data structures
2) the clean-up algorithms

My guess is that, for all controllers out there, it is always better to have large sequentially allocated files (logical LBAs, not physical NAND placement but file-system placement). This should help because re-mapping meta-data would be easily coalesced by the garbage collection routines, which would be otherwise impossible if the file is fragmented.

So, if a perfectly de-fragmented file-system indeed reduces the size of the re-mapping meta-data, this would potentially have at least 3 other benefits:
1) Faster look-ups for accessing data blocks, easier read-ahead for reads and buffering for writes
2) More efficient TRIM handling as most unallocated areas should also be de-fragmented
3) Faster wear-leveling/garbage collection thanks to less meta-data

To conclude, I'm arguing that a perfectly defragmented file-system on an SSD would be consistently faster for all operations and possibly provide better NAND longevity in the long run.

So, maybe it is worth it to erase the SSD once in a while, and copy back all the files from a back-up to the SSD for perfect file placement?
ashetos is offline   Reply With Quote
Old 08-24-2013, 06:14 AM   #2
Hellhammer
AnandTech SSD Editor
 
Hellhammer's Avatar
 
Join Date: Apr 2011
Location: Helsinki, Finland
Posts: 505
Default

The beauty of an SSD is that it doesn't care about file placement. When you write to an SSD, the controller will fragment the data in order to write it as fast as possible by utilizing multiple NAND dies. For example, if you write a 100MB file, the controller won't write 100MB to the first die; instead it will break the file into pieces and write to multiple dies simultaneously to take advantage of parallelism.
__________________
SSD Editor for AnandTech
Hellhammer is offline   Reply With Quote
Old 08-24-2013, 06:18 AM   #3
Insert_Nickname
Golden Member
 
Join Date: May 2012
Posts: 1,731
Default

Quote:
Originally Posted by ashetos View Post
To conclude, I'm arguing that a perfectly defragmented file-system on an SSD would be consistently faster for all operations and possibly provide better NAND longevity in the long run.
Why? Besides every block in the NAND having the same access time, the LBA the OS sees has absolutely nothing to do with what's happening inside the SSD...

I think this is already covered by the internal garbage collection and controller management of writes.

But please correct me if I'm wrong...
Insert_Nickname is offline   Reply With Quote
Old 08-24-2013, 06:55 AM   #4
ashetos
Member
 
Join Date: Jul 2013
Posts: 128
Default

I believe we cannot look past logical LBAs. For example, let's say we have a huge file, 40GiB in a 80GiB SSD.

With TRIM support this means half the SSD space is allocated and half of it is free.

Now, depending on file-system placement, this could end up being from LBA 0-40GiB with perfect placement, or LBA 0,2,4,6,8...80GiB which is the worst possible placement.

In the first case, the SSD firmware can coalesce and optimize the look-up meta-data to be as coarse grain as it makes sense, for instance every 2MiB blocks.

In the second case, the SSD firmware needs meta-data for every 4KiB of data, and it cannot coalesce, ever.

Since the amount of meta-data is typically larger than the SSD DRAM, accesses are impacted by the additional flash look-ups. Also, since meta-data need to be written to flash for recovery purposes, more meta-data means more synchronous flash writes. Both these overheads should already be significant.

And if we add workload patterns in the mix, we will notice that applications are bound to do frequent sequential accesses or large I/O requests (which are equivalent to sequential accesses with smaller I/O requests). Then the pattern mismatch between OS access and flash meta-data will have additional performance impact.
ashetos is offline   Reply With Quote
Old 08-24-2013, 07:22 AM   #5
Hellhammer
AnandTech SSD Editor
 
Hellhammer's Avatar
 
Join Date: Apr 2011
Location: Helsinki, Finland
Posts: 505
Default

Logical LBAs have nothing to do with the physical location of the data. If the OS requests a write to logical LBA 1, that doesn't mean the write will go to physical (i.e. NAND) LBA 1. SSDs have a NAND mapping table (also called indirection table) and its purpose is to keep logical and physical LBAs in sync. For example, the write to logical LBA 1 may be mapped to physical LBA 2.

Most SSDs do 1:1 mapping, which means every single page is tracked, even if it's empty. It requires more DRAM than more efficient designs but it's fast and simple (and yes, the whole table is usually cached to DRAM, not just parts of it).

There's absolutely no use to defragment an SSD because only logical LBAs will be defragmented, there may not be any change in the physical LBAs as the data is fragmented anyway (and it has to be for performance purposes).
__________________
SSD Editor for AnandTech

Last edited by Hellhammer; 08-24-2013 at 07:25 AM.
Hellhammer is offline   Reply With Quote
Old 08-24-2013, 07:33 AM   #6
ashetos
Member
 
Join Date: Jul 2013
Posts: 128
Default

Quote:
Originally Posted by Hellhammer View Post
Logical LBAs have nothing to do with the physical location of the data. If the OS requests a write to logical LBA 1, that doesn't mean the write will go to physical (i.e. NAND) LBA 1. SSDs have a NAND mapping table (also called indirection table) and its purpose is to keep logical and physical LBAs in sync. For example, the write to logical LBA 1 may be mapped to physical LBA 2.

Most SSDs do 1:1 mapping, which means every single page is tracked, even if it's empty. It requires more DRAM than more efficient designs but it's fast and simple (and yes, the whole table is usually cached to DRAM, not just parts of it).

There's absolutely no use to defragment an SSD because only logical LBAs will be defragmented, there may not be any change in the physical LBAs as the data is fragmented anyway (and it has to be for performance purposes).
I understand Logical LBAs have nothing to do with the physical location of the data, that was not my point though. My point was that a sequentially allocated logical address range can be mapped to a sequentially allocated physical address range with a minimum amount of data.

As far as 1:1 mapping, this indeed would make things simpler, but you need a huge amount of SSD DRAM and I really can't take your word that most SSDs have a 1:1 mapping. The only one I'm aware of is the intel enterprise model which is very expensive.
ashetos is offline   Reply With Quote
Old 08-24-2013, 07:49 AM   #7
Hellhammer
AnandTech SSD Editor
 
Hellhammer's Avatar
 
Join Date: Apr 2011
Location: Helsinki, Finland
Posts: 505
Default

Quote:
Originally Posted by ashetos View Post
As far as 1:1 mapping, this indeed would make things simpler, but you need a huge amount of SSD DRAM and I really can't take your word that most SSDs have a 1:1 mapping. The only one I'm aware of is the intel enterprise model which is very expensive.
Intel is the only one that has publicly said they do 1:1 mapping. However, it's not too hard to recognize SSDs that use 1:1 mapping because it needs a ton of DRAM and the amount needs to scale up with the NAND capacity. Usually it's 1MB of DRAM per 1GB of NAND or more.

With a different mapping scheme you can get by with 1MB of DRAM per 10GB of NAND (e.g. Intel X-25M) or even less (SandForce stores the table in the controller's SRAM).

Even with other mapping schemes you don't need defragmenting because the controller will do it on its.
__________________
SSD Editor for AnandTech
Hellhammer is offline   Reply With Quote
Old 08-24-2013, 08:50 AM   #8
Cerb
Elite Member
 
Cerb's Avatar
 
Join Date: Aug 2000
Posts: 14,979
Default

FS fragmentation can slow things down, still, so don't fill it up to 90%, and expect great results over time. But, the NAND isn't the issue there. Handling many fragments takes more CPU/RAM time, and more requests over the SATA interface. Fewer requests generally means faster access. As far as the LBAs go, you should leave free space (get a bigger SSD than is required to store your data). Then, larger writes will be simpler most-sequential writes.

The SSD's NAND is going to get fragmented anyway, and it's degree of fragmentation is handled by the drive itself.

Unfortunately, as far as the FS goes, there aren't minimal-write options, TMK, right now. FI, if badly-fragmented files could be copied, but others left alone, and the drive were not used as its own temp space, defragging wouldn't be so bad (with multi-GB in-RAM buffers, it aught to be doable). As it is, you're gaining very little by doing it, yet it could wear your SSD out by an amount that might very well take you years of regular use. On the bright side, random access is typically pretty good, and most real-world access is either fairly random, or fairly low-bandwidth.
__________________
"The computer can't tell you the emotional story. It can give you the exact mathematical design, but what's missing is the eyebrows." - Frank Zappa
Cerb is offline   Reply With Quote
Old 08-24-2013, 09:46 AM   #9
postmortemIA
Diamond Member
 
postmortemIA's Avatar
 
Join Date: Jul 2006
Location: Midwest USA
Posts: 6,368
Default

I think that controller's top priority is load leveling in order to produce longevity: ensuring that same blocks are not written during write operations, so you'd get very fragmented drive as a result.
__________________
D1. Win7 x64 i7-3770 on Z77, HD7850, 2707WFP, 840, X-Fi D2. Win7 x64 E8400 on P35
L1. OSX 10.9 rMBP 13 L2. Vista x86 E1505
M. Galaxy S4

postmortemIA is online now   Reply With Quote
Old 08-24-2013, 09:59 AM   #10
ashetos
Member
 
Join Date: Jul 2013
Posts: 128
Default

I think that the performance difference between sequential writes and random writes for same size write requests proves that FS fragmentation has performance impact.

Now, it's true that internal SSD fragmentation will occur anyway due to wear-leveling. But there is potential for cells in the same wear-leveling group to be mapped to the same LBA group and thus achieve sequential-class performance instead of random-class performance.
ashetos is offline   Reply With Quote
Old 08-24-2013, 01:09 PM   #11
hot120
Member
 
Join Date: Sep 2009
Posts: 39
Default

You can't look at fragmentation of an SSD the same way you look at fragmentation of a HDD. I think that is where you are lost at.
__________________
HAF 932
Intel I7 970
Antec Kuhler 920
Intel 520 180GB SSD
Corsair 6 X 2GB 1600
EVGA FTW SLI3 MB
hot120 is offline   Reply With Quote
Old 08-24-2013, 01:36 PM   #12
ashetos
Member
 
Join Date: Jul 2013
Posts: 128
Default

Quote:
Originally Posted by hot120 View Post
You can't look at fragmentation of an SSD the same way you look at fragmentation of a HDD. I think that is where you are lost at.
Come on, I'm not looking at it the same way!
ashetos is offline   Reply With Quote
Old 08-24-2013, 01:48 PM   #13
DrPizza
Administrator
Elite Member
Goat Whisperer
 
DrPizza's Avatar
 
Join Date: Mar 2001
Location: Western NY
Posts: 43,882
Default

I'm far from an SSD expert, but let me take a stab at this:
An analogy:
Suppose you have a warehouse with 16 rows. You're arguing that if a particular product consisted of 32 boxes, that it would be more efficient to put all 30 boxes in the same row.
But, in the SSD, it is more efficient to split up those boxes, because there are 16 guys operating forklifts, and each forklift goes down one of the rows. Thus, to retrieve your entire product, you would have one forklift go back and forth 32 times. But, splitting those packages up, the other 15 forklifts could be retrieving boxes simultaneously - so each forklift makes two trips at the same time every other forklift is making its two trips. - I hope this analogy is correct (SSD experts, feel free to let me know if I'm way off with this analogy), and if it's correct, it should help you understand why what you're proposing isn't more efficient.


Oh, fragmentation on an HDD : there's one forklift.
__________________
Fainting Goats
DrPizza is online now   Reply With Quote
Old 08-24-2013, 01:55 PM   #14
ashetos
Member
 
Join Date: Jul 2013
Posts: 128
Default

Quote:
Originally Posted by DrPizza View Post
I'm far from an SSD expert, but let me take a stab at this:
An analogy:
Suppose you have a warehouse with 16 rows. You're arguing that if a particular product consisted of 32 boxes, that it would be more efficient to put all 30 boxes in the same row.
But, in the SSD, it is more efficient to split up those boxes, because there are 16 guys operating forklifts, and each forklift goes down one of the rows. Thus, to retrieve your entire product, you would have one forklift go back and forth 32 times. But, splitting those packages up, the other 15 forklifts could be retrieving boxes simultaneously - so each forklift makes two trips at the same time every other forklift is making its two trips. - I hope this analogy is correct (SSD experts, feel free to let me know if I'm way off with this analogy), and if it's correct, it should help you understand why what you're proposing isn't more efficient.


Oh, fragmentation on an HDD : there's one forklift.
No, I'm not suggesting that. I'm aware of NAND parallelism, blocks, pages, planes and what not. So in effect I'm talking about let's say 512K boxes, and arguing whether different 16-box combinations perform differently.
ashetos is offline   Reply With Quote
Old 08-24-2013, 02:07 PM   #15
ashetos
Member
 
Join Date: Jul 2013
Posts: 128
Default

For additional clarification, data interleaving across NAND channels can be algorithmic and thus stateless and orthogonal to the discussion.
ashetos is offline   Reply With Quote
Old 08-24-2013, 02:09 PM   #16
Emulex
Diamond Member
 
Join Date: Jan 2001
Location: ATL
Posts: 9,540
Default

It is not just unused disk space that contributes to wear leveling, if a block gets worn quickly and it can swap in a block that doesn't appear to be changing, that is a logical "preservation" move.

the entire ssd is used for wear leveling, blocks that do not change are just as good as overprovision blocks (maybe better) since a huge portion of your drive is o/s static content.

that make sense? defragmenting always has advantages (recovery of 1 contiguous block is far easier than 6 million blocks), but i'd suggest you ask the folks that make your controller what their design thoughts were.

trim is not the #1 thought when they designed the ssd. it is unreliable and until sata 3.1 can't even be tagged - so i'm guessing controller is cool with hourly/daily swipes of trim when not busy more so than firing them off without NCQ during a busy activity time.
__________________
-------------------------
NAS: Dell 530 Q6600 8gb 4tb headless VHP
KID PC1: Mac Pro Dual nehalem - 6gb - GF120 - HP ZR30W
Browser: Dell 530 Q6600 4GB - Kingston 96gb -gt240- hp LP3065 IPS - 7ult
Tabs: IPAD 1,2,3 IPOD3,HTC flyer, Galaxy Tab - all rooted/jb
Couch1: Macbook Air/Macbook White
Couch2: Macbook Pro 17 2.66 Matte screen - 8GB - SSD
HTPC: Asus C2Q8300/X25-V - Geforce 430- 7ult - Antec MicroFusion 350
Emulex is offline   Reply With Quote
Old 08-24-2013, 02:21 PM   #17
ashetos
Member
 
Join Date: Jul 2013
Posts: 128
Default

Quote:
Originally Posted by Emulex View Post
It is not just unused disk space that contributes to wear leveling, if a block gets worn quickly and it can swap in a block that doesn't appear to be changing, that is a logical "preservation" move.

the entire ssd is used for wear leveling, blocks that do not change are just as good as overprovision blocks (maybe better) since a huge portion of your drive is o/s static content.

that make sense? defragmenting always has advantages (recovery of 1 contiguous block is far easier than 6 million blocks), but i'd suggest you ask the folks that make your controller what their design thoughts were.

trim is not the #1 thought when they designed the ssd. it is unreliable and until sata 3.1 can't even be tagged - so i'm guessing controller is cool with hourly/daily swipes of trim when not busy more so than firing them off without NCQ during a busy activity time.
That makes perfect sense. You are also probably right that TRIM is not the priority of the firmware implementation.

I am very interested in finding out how much degradation FS fragmentation causes. My guesses for the root cause are harder garbage collection, and bigger look-up data structures.

I have 2 reasons to believe wear leveling across static and dynamic data does not make fragmentation irrelevant:
1) Random write performance would be identical to sequential write performance which is not the case
2) Wear leveling algorithms group NAND cells together, in classes, because bit granularity would be too much. Thus, large ranges of LBAs and NAND cells can be associated with minimal meta-data.
ashetos is offline   Reply With Quote
Old 08-24-2013, 05:53 PM   #18
Emulex
Diamond Member
 
Join Date: Jan 2001
Location: ATL
Posts: 9,540
Default

the LBA to flash location is what intel and samsung use the ram for primarily. It is why an ssd without a tantalum/supercap can really get screwed up since you have a table with LBA to flash in ram that has to be written to ssd.

so asssuming it is a 1:1 map (why?) fragmentation would not matter, if it is a b-tree then the more active mapping could require more work and ram(older ssd)
__________________
-------------------------
NAS: Dell 530 Q6600 8gb 4tb headless VHP
KID PC1: Mac Pro Dual nehalem - 6gb - GF120 - HP ZR30W
Browser: Dell 530 Q6600 4GB - Kingston 96gb -gt240- hp LP3065 IPS - 7ult
Tabs: IPAD 1,2,3 IPOD3,HTC flyer, Galaxy Tab - all rooted/jb
Couch1: Macbook Air/Macbook White
Couch2: Macbook Pro 17 2.66 Matte screen - 8GB - SSD
HTPC: Asus C2Q8300/X25-V - Geforce 430- 7ult - Antec MicroFusion 350
Emulex is offline   Reply With Quote
Old 08-24-2013, 05:59 PM   #19
hot120
Member
 
Join Date: Sep 2009
Posts: 39
Default

Quote:
Originally Posted by ashetos View Post
Come on, I'm not looking at it the same way!
Yes, you are. Access times for an SSD are the same (all electronic), regardless of the location of the data (physical NAND). For a HDD, that is not the case. The read/write head would have to access different PHYSICAL locations on the platter (mechanical arm, spinning platter). That is why defragmentation on a SSD is pointless. Can you understand that?
__________________
HAF 932
Intel I7 970
Antec Kuhler 920
Intel 520 180GB SSD
Corsair 6 X 2GB 1600
EVGA FTW SLI3 MB
hot120 is offline   Reply With Quote
Old 08-25-2013, 04:38 AM   #20
ashetos
Member
 
Join Date: Jul 2013
Posts: 128
Default

Quote:
Originally Posted by Emulex View Post
the LBA to flash location is what intel and samsung use the ram for primarily. It is why an ssd without a tantalum/supercap can really get screwed up since you have a table with LBA to flash in ram that has to be written to ssd.

so asssuming it is a 1:1 map (why?) fragmentation would not matter, if it is a b-tree then the more active mapping could require more work and ram(older ssd)
Yes, if the manufacturer does use a super capacitor he gets the luxury to keep SSD RAM contents without updating the flash until only the last moment (power failure).

With a 1:1 mapping it is almost madatory to have something like a super capacitor, cause differently you would need to flush the metadata after each individual re-map (every 4k, ouch).

The b-tree is actually more difficult to implement, especially if you put some effort to store extents instead of pages. That means, a range of pages that are sequential is treated as a single tree node instead of a tree node for each page. Data structures with extents could be responsible for the disparity between sequential and random accesses for writes.

I also wonder if vendors use something else than a b-tree, who knows, hash tables or something more complicated.
ashetos is offline   Reply With Quote
Old 08-25-2013, 04:39 AM   #21
ashetos
Member
 
Join Date: Jul 2013
Posts: 128
Default

Quote:
Originally Posted by hot120 View Post
Yes, you are. Access times for an SSD are the same (all electronic), regardless of the location of the data (physical NAND). For a HDD, that is not the case. The read/write head would have to access different PHYSICAL locations on the platter (mechanical arm, spinning platter). That is why defragmentation on a SSD is pointless. Can you understand that?
I don't like your tone. Can I understand that? You can't even follow the discussion.
ashetos is offline   Reply With Quote
Old 08-25-2013, 09:12 AM   #22
Cerb
Elite Member
 
Cerb's Avatar
 
Join Date: Aug 2000
Posts: 14,979
Default

Quote:
Originally Posted by ashetos View Post
Yes, if the manufacturer does use a super capacitor he gets the luxury to keep SSD RAM contents without updating the flash until only the last moment (power failure).

With a 1:1 mapping it is almost madatory to have something like a super capacitor, cause differently you would need to flush the metadata after each individual re-map (every 4k, ouch).
You need something like that anyway. The SSD needs some way to be sure that when the power goes out, any currently-pending writes to NAND can complete successfully (if they haven't started hitting the NAND, they can be ignored). Just making the mapping data smaller doesn't remove the risk. It needs to be able to detect the falling the voltage, act to not leave its state corrupted, then go down with the system.
__________________
"The computer can't tell you the emotional story. It can give you the exact mathematical design, but what's missing is the eyebrows." - Frank Zappa
Cerb is offline   Reply With Quote
Old 08-25-2013, 10:47 AM   #23
ashetos
Member
 
Join Date: Jul 2013
Posts: 128
Default

Quote:
Originally Posted by Cerb View Post
You need something like that anyway. The SSD needs some way to be sure that when the power goes out, any currently-pending writes to NAND can complete successfully (if they haven't started hitting the NAND, they can be ignored). Just making the mapping data smaller doesn't remove the risk. It needs to be able to detect the falling the voltage, act to not leave its state corrupted, then go down with the system.
Of course, you are right. All SSDs should have something like a super capacitor. I am pretty sure though that most don't, and this explains discussions about data corruption after power outage with certain SSDs.

You can keep the SSD state consistent with synchronous flash writes, for the cheap models, but at the expense of performance. It is possible though, and you can achieve relatively high performance if you tolerate torn writes, which don't break block device semantics, and don't corrupt file-systems.
ashetos is offline   Reply With Quote
Old 08-25-2013, 02:13 PM   #24
Puffnstuff
Platinum Member
 
Join Date: Mar 2005
Posts: 2,293
Default

Wear leveling controllers make fragmentation a non issue.
__________________
Loop 1: EVGA x58 E760 i7 930, HK 3.0 cpu block, EK chipset block & dual bay res, swiftech mcp655 pump, feser x360 rad, scythe fan controller & panaflow fans, bp & feser fittings, distilled water w/primochill liquid utopia, samsung 840 pro 256gb
Loop 2: Dead and buried.
PNY 780 ti gpu, Powered by Enermax Platimax 1350 watt

Last edited by Puffnstuff; 08-25-2013 at 10:09 PM. Reason: spelling
Puffnstuff is offline   Reply With Quote
Old 08-25-2013, 02:58 PM   #25
ashetos
Member
 
Join Date: Jul 2013
Posts: 128
Default

Quote:
Originally Posted by Puffnstuff View Post
BOT wear leveling controllers make fragmentation a non issue.
Sorry but what is BOT?
ashetos is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 10:48 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.