Defrag programs

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Do we really need to bring back a month dead thread?

Windows might not load the whole of a file or program, but what it does load is done so sequentially.

But the chunks that it loads sequentially are so small that it's irrelevant. A page is 4K on a 32-bit system so it does demand paging in 4K chunks, there is some read-ahead on the assumption that if you want some of those pages you'll probably want more in the near future but IIRC Windows will only do up to 64K read-ahead. Any drive ever produced can read 64K fast enough for you not to notice any latency in the read.

Having the files stored contiguously on the drive will still make the system more responsive when the file is needed, thus reducing the amount of seeks needed overall.

Not at all. It might if the files were read in one large chunk (i.e. like 'cat filename' will do) but that's not how it works. Windows reads from dozens of files at a time all in very small chunks so the fact that they're contiguous doesn't mean anything because it's seeking around into the other files all of the time too. XP even goes so far as to intentionally fragment certain boot files so that it can lay out the chunks that it needs on bootup in the proper order to minimize seeking and speed up the boot process.

There are corner cases where large contiguous files are beneficial like in the cases of audio and video editing, but in the general use case seeking is the dominant factor by a large margin and making all of your files contiguous doesn't help that at all.
 

EricMartello

Senior member
Apr 17, 2003
910
0
0
Originally posted by: Nothinman
Do we really need to bring back a month dead thread?

Absolutely - it's a forum, it's purpose is discussion.

But the chunks that it loads sequentially are so small that it's irrelevant. A page is 4K on a 32-bit system so it does demand paging in 4K chunks, there is some read-ahead on the assumption that if you want some of those pages you'll probably want more in the near future but IIRC Windows will only do up to 64K read-ahead. Any drive ever produced can read 64K fast enough for you not to notice any latency in the read.

There is more to it than simply reading and writing. First off, read-ahead cache is based on approximation and "guessing". Windows attempts to load what it predicts will be required next, and this works well, but has little to do with the whole picture. This function was added for to increase responsiveness - remember how clunky XP feels when compared to something like Windows 95, or even windows 3.1.

Let's use a real-world example - database server. While a HD's raw access and seek times may be stellar, let's say 70 MB/s peak sustained for an average server-class drive, the real world transfer rate is significantly less - perhaps more like 10 MB/s peak.

Why is that?

Because the drive needs to access data in a certain order for it to be usable by the OS, and in doing so, a queue is formed in the HD's cache, which lets the drive determine the best way to access the data. The speed at which it can execute this queue depends on what the OS is requesting, as well as how it is physically laid out on the drive itself. In the case of a database server, the less fragmented data is, the better, meaning the drive's real-world IO/s will increase, or be closer to the maximum rated IO.

Not at all. It might if the files were read in one large chunk (i.e. like 'cat filename' will do) but that's not how it works. Windows reads from dozens of files at a time all in very small chunks so the fact that they're contiguous doesn't mean anything because it's seeking around into the other files all of the time too. XP even goes so far as to intentionally fragment certain boot files so that it can lay out the chunks that it needs on bootup in the proper order to minimize seeking and speed up the boot process.

There are corner cases where large contiguous files are beneficial like in the cases of audio and video editing, but in the general use case seeking is the dominant factor by a large margin and making all of your files contiguous doesn't help that at all.

XP doesn't "fragment" anything, although with a tool like Bootvis you can rearrange the files on the drive to help speed booting up. Why do you think that files are "seeked" for in the first place? Because they're not all in one place! The purpose of defragmenting is to reduce the amount of seeks necessary by grouping related data together physically on the drive.

If you are going to claim the contrary, please provide some sort of reference that would support your statement. There is no point in going against commonly accepted practices if you aren't going to provide any sort of supporting information to your statement.

 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Let's use a real-world example - database server.

Totally ignoring the that the thread was almost completely about desktop usage and the example is useless because any database server worth using will bypass the filesystem cache because it knows better about it's own data layout and the OS's "approximation and guessing" will usually hurt performance more than they help.

Why do you think that files are "seeked" for in the first place? Because they're not all in one place!

Or because a different part of that file is now needed. If you're at file start + 10 pages and you need data located 200 pages in you'll want to seek there instead of reading all of the pages in between. The physical location of the data on the disk is irrelevant because unless page 210 just happens to be located directly after page 10 (and thus fragmented) you'll have to seek.

The purpose of defragmenting is to reduce the amount of seeks necessary by grouping related data together physically on the drive.

And it fails miserably because you can't programatically determine which files are related. And even in the cases that you can (say binaries and their required libraries) you can't always group them all together, for example just about every Windows binary depends on MFC.dll so how are you going to group all of those binaries near that library? This is also a convoluted example because MFC.dll will almost certainly be paged into memory and most processes won't need to touch the disk to get access to it but that's just another point as to why fragmentation has so little affect on performance.

If you are going to claim the contrary, please provide some sort of reference that would support your statement. There is no point in going against commonly accepted practices if you aren't going to provide any sort of supporting information to your statement.

I would say that the burden of proof is on the defragmenters because they're the ones claiming to increase filesystem performance X number of times for a fee. Can you prove beyond a shadow of a doubt that it does what they say it does?
 

Hadsus

Golden Member
Aug 14, 2003
1,135
0
76
The biggest problem I have with Diskeeper is that they are major spammers and their support is extremely lacking. For instance, I emailed them regarding my new Seagate 7200.10 and the perpendicular recording technology....asking them if there were any defrag issues connected with that (I really didn't expect any but thought I'd ask just to make sure). They never responded. If you DL their defrag program use an addy you don't mind spammed. I had to ask them multiple times to stop and then they finally did.
 

Auric

Diamond Member
Oct 11, 1999
9,591
2
71
I'll throw in some love for O&O.

Diskkeeper (not to be confused with Cartman's Trapper Keeper), sports an interface which can best be described as an interfarce. Otherwise it seems techncially good, especially for the technically challenged -ergo for those who want things done automagically as best as possible within such limitations.

PerfectDisk 7 was a turd. Dunno aboot 8.

O&O is relatively lean and mean and I maintain is best for those who know how they use various volumes and thus can choose the optimum defrag method rather than relying upon automagic schtuff which can only be a compromise.
 

EricMartello

Senior member
Apr 17, 2003
910
0
0
Originally posted by: Nothinman
If you are going to claim the contrary, please provide some sort of reference that would support your statement. There is no point in going against commonly accepted practices if you aren't going to provide any sort of supporting information to your statement.

I would say that the burden of proof is on the defragmenters because they're the ones claiming to increase filesystem performance X number of times for a fee. Can you prove beyond a shadow of a doubt that it does what they say it does?

You can set up a database server in such a way that the filesystem is bypassed, but that doesn't change the fragmentation issue. Why do you think the MySQL "optimize table" function does? It essentially performs a defrag on the data file for the selected table. Good think MySQL is chock full of useless functions, right?

As for defrag being a failure in achieving its goal, believe what you want...but that won't change the fact that it does work quite well, especially when you're dealing with games and such.

http://technet2.microsoft.com/WindowsSe...9b-a43d-ddebb5ec33981033.mspx?mfr=true

That's a pretty clear-cut overview of how defragging works, and it's quite obvious to someone with even a rudimentary understanding of how computers work that using a defrag program will improve system responsiveness by reducing disk seeks. It certainly doesn't seem like the defragmenting programs are hustling snake oil as you imply.

If you want a more consumer-oriented example, let's look at World of Warcraft. If you have this game, and you look in the folder that contains the game files, you will see many files > 4KB (which is NTFS default cluster size). Now let's say you are talking about one of the texture files, which is easily over 400MB in size. The game loads a good portion of this into memory for each level, and for each part of that file that is non-contiguous, ADDITIONAL SEEKS need to be performed to find the pieces and load them. Each piece of the file is stored in a 4KB cluster, and as you may or may not know, even the fastest drive can be brought to a crawl when seeking such small chunks of data scattered across the drive, not to mention the processing overhead that it entails. In severely fragmented file systems, this can make for a very slow system, so your game would be skipping frames or have broken audio, etc.

Now it's your turn, show me something other than your own claims that defragging is a farce. I doubt you can.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
You can set up a database server in such a way that the filesystem is bypassed, but that doesn't change the fragmentation issue. Why do you think the MySQL "optimize table" function does? It essentially performs a defrag on the data file for the selected table. Good think MySQL is chock full of useless functions, right?

You can't compare the two, a database has a very limited set of data types and the admin specifically tells the database how all of that data relates to each other. So the database understands the layout and relationships of the data in it's tables but the same can't be said for a filesystem. Essentially a filesystem is a database with every column set to the largest blob or text type available so that you can cram any kind of data you want in any column, I doubt any DBA in their right mind would setup a database like that.

That's a pretty clear-cut overview of how defragging works, and it's quite obvious to someone with even a rudimentary understanding of how computers work that using a defrag program will improve system responsiveness by reducing disk seeks.

MS can't even keep their use of the term virtual memory straight in their own docs so I would take them all with a grain of salt. I understand the process but unless you have some numbers to prove that it has actually improved performance that paper means nothing.

Each piece of the file is stored in a 4KB cluster, and as you may or may not know, even the fastest drive can be brought to a crawl when seeking such small chunks of data scattered across the drive, not to mention the processing overhead that it entails.

The processing overhead is minimal, issuing 10 reads instead of 2 isn't going to cost anything in CPU time and most modern drives can read much more than 4K in one request, the OS I/O scheduler and the drive will both reorder and combine the read requests the best that they can so it won't be nearly as bad as you're saying.

I'm not saying that fragmentation has absolutely 0 effect, I'm saying that in the large majority of the cases the effects are small enough that it doesn't matter and even in some cases the fragmentation can actually improve performance.

Now it's your turn, show me something other than your own claims that defragging is a farce. I doubt you can.

Because a bunch of companies selling software tell you that their software is good and will make things better for you doesn't automaticaly make it so, if anything it should make the subject more suspect. You don't own any of those memory defragmentation tools, do you?

I still haven't seen any numbers from you, all you've posted so far is a lot of speculation and one link that is also completely devoid of any numbers. If you can come up with a test case that will give some real data on how fragmentation is affecting your system's performance I'll be more than happy to run them locally to verify the findings for you but so far the only proof I need is that I never defragment any of my volumes and my system never slows down.

I ran a quick test myself just to see if I could get any variation, sorry the file isn't that big but it was the most fragmented one in that directory so I chose it. I forced my system to drop it's filesystem cache between each operation so that the times would be from disk and not from memory, this was the best I could do without doing a full reboot between each command. As you can see the average read time after defragmentation is only .020s quicker than the fragmented read time. I suppose I should have done 3 fragmented reads too so that I could have averaged them but I didn't think of that until after I had defrag'd the file and I don't know of a way to artificially fragment a file quickly.
 

EricMartello

Senior member
Apr 17, 2003
910
0
0
Originally posted by: Nothinman
As you can see the average read time after defragmentation is only .020s quicker than the fragmented read time.

Uh-huh, I rest my case.

And that's just 1 file in a synthetic test. You don't seem to be getting the point about reading data either. 4KB in itself isn't a lot, but a large file is comprised of thousands of these clusters, and each delay, even if it is only .02s, will add up.

You wanted numbers, thanks for saving me. I really don't care what you think, I and many others will continue to defrag regularly and enjoy our miniscule performance gains. :D
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Uh-huh, I rest my case.

Then you didn't read the whole post or you don't understand statistics. Out of the 3 fragmented runs 1 of them was nearly the same as the fragmented run and the average is what I would consider to be within the margin of error, i.e. there was no significant change in the read speed and that was with one large contiguous read. If the file had be read sporadically like normally happens in a system I would wager that the times would be even closer.

And that's just 1 file in a synthetic test. You don't seem to be getting the point about reading data either. 4KB in itself isn't a lot, but a large file is comprised of thousands of these clusters, and each delay, even if it is only .02s, will add up.

No, you're the one missing the ponit. that .02s could have been caused by any other number of factors because the system is doing more than just that one cat command. Any other process on the system could have submitted an I/O that delayed the cat command for that amount of time. That .02s easily falls into the noise when you look at the system as a whole. Sure, if everything was exclusive to your one process without any outside influences defragmentation might make a noticable difference but it's like driving a sports car in the city, the top speed is irrelevant because all of the traffic lights, cars, people, etc will keep you driving as slow as everyone else.

You wanted numbers, thanks for saving me. I really don't care what you think, I and many others will continue to defrag regularly and enjoy our miniscule performance gains.

You mean you'll continue to enjoy your placebo pills, but whatver, it's your time and money to waste.

I'll try one more time to get you to understand, I found a bigger and more fragmented file. Defragging it only took it down to 5 extents and not 1, but it's still a significant drop from 3239 extents. If you look at this case you'll see that the worst case run after defrag is worse than the worst case before defrag by 0.377s. The average of the after defrag run was 48.103 and before defrag was 52.554 so while the average was better by a whole 4.451s the worst defragged case was still worse than one of the fragmented cases. Unless your machine is extremely compartmentalized and only lets one process touch memory, cpu, disk, etc at a time you're not going to get any tangible improvements from defragging your filesystem. Hell if you're running Windows, Explorer's constant polling of CD/DVD drives is probably enough to cause delays equal to or greater than what I saw in these tests and an AntiVirus is definitely going to cause much larger delays as it has to intercept every file open() and scan them.

I still stand by my original comment: "I'm not saying that fragmentation has absolutely 0 effect, I'm saying that in the large majority of the cases the effects are small enough that it doesn't matter and even in some cases the fragmentation can actually improve performance." It's not a black and white, win or lose, gain or loss, etc. You can't be sure of the effect of reorganizing your data on disk unless you know all of the access patterns of all of the processes on your machine and I'm willing to bet that you don't have that information.

Edit: the forum ate my attached code so here it is in a quote block:

# filefrag scsi0.vmdk
scsi0.vmdk: 3239 extents found
# echo 3 > /proc/sys/vm/drop_caches
# time ca
# du -hs scsi0.vmdk
9.1G scsi0.vmdk
# time cat scsi0.vmdk >/dev/null

real 2m53.805s
user 0m0.260s
sys 0m10.277s
# time cat scsi0.vmdk >/dev/null

real 2m50.079s
user 0m0.272s
sys 0m9.917s
# time cat scsi0.vmdk >/dev/null

real 2m53.780s
user 0m0.228s
sys 0m10.313s
# xfs_fsr -v scsi0.vmdk
scsi0.vmdk
extents before:3239 after:5 scsi0.vmdk
# echo 3 > /proc/sys/vm/drop_caches
# time cat scsi0.vmdk >/dev/null

real 2m45.814s
user 0m0.256s
sys 0m9.385s
# time cat scsi0.vmdk >/dev/null

real 2m54.182s
user 0m0.296s
sys 0m10.253s
# echo 3 > /proc/sys/vm/drop_caches
# time cat scsi0.vmdk >/dev/null

real 2m44.315s
user 0m0.240s
sys 0m10.161s