NTFS compression on HDD - speed, fragmentation.. worth it?

CakeMonster

Golden Member
Nov 22, 2012
1,604
781
136
I'm sure most of AT'ers remember the article om Tom's from two years ago about usings NTFS compression on SSDs in order to free up space and gain read/write speed.

http://www.tomshardware.com/reviews/ssd-ntfs-compression,3073-4.html

What I'm curious about is using NTFS compression on regular HDDs. It would seem like a no-brainer with regards to speed, after all HDDs are much slower than SSDs. But I was thinking about fragmentation, could that be a problem, both with regards to possibly more fragmentation and support for various 3rd party disk defragmenters.

The Tom's article is two years old, so average PC's today have faster cores and more cores and I imagine that CPU load would be even less of a drawback.

(Yes I know that the speed gains are dependent on how compressible each file is, so there's no need to derail the topic with pointing that out)

I am asking this question in general, in order to learn as much as possible about it. I am also curious about the future of file system compression, will it at one point become so "cheap" with regards to CPU load that it becomes standard in file systems? If you don't think so, then why not when there are speed gains and space saved?

(If you're curious about my specific use, I intended to mainly use it for my Steam folder and my "My Documents" folder, both containing data accessed often and currently on my HDD, not my SSD.)
 

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
Extremely unlikely to be worth it on a HDD.

It is going to be extremely rare to fill a significant portion of a modern HDD with compressible files -- keep in mind that any already compressed audio and video files are not going to compress further from NTFS compression.

Also, every derivative of Windows NT map disk cache to MMU pages -- the file I/O and paging systems are extremely interwoven. The page table can only contain entries that are multilples of 4KB. But that 4KB is going to be an image of a section of the uncompressed version of the file, which is no longer going to be aligned with 4KB disk clusters.

NTFS allows for compressed files to do things like share clusters (parts of a single cluster can be used by 2 files), and the defragging logic gets a lot more complicated. The built in safe file-section moving APIs used by defraggers in Windows ensure defraggers won't break with this, but still, they won't end up doing as good a job as they would with uncompressed files.

The main bottleneck with HDDs has always been random access, not sequential. Compression doesn't help with this, in fact it makes it worse, because an app attempting to read the middle of a file will first need the OS to find out what offset that ends up being in the compressed version of the file, and then likely transform a read that would have been cluster aligned into one that isn't.

The only reason there is a potential benefit on SSDs is because freeing up space with TRIM is basically the same as increasing the spare area.

CPU load is negligible compared to any HDD operation, even with decompression involved, unless you are dealing with something like an ARM processor decompressing data from a desktop HDD. But the compression will increase your disk load.
 
Last edited:

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Given how little additional time it would have taken them, I don't get why Tom's didn't add in an HDD to their comparison. Would have solved this problem of not knowing :).

Given how big HDDs are, now, and seeing as how seek times haven't improved, I don't bother, except on folders that show very high on-disk space relative to their logical space, and where that's quite a bit of space that you can save.

To give you an example, turning NTFS compression on, and waiting for it apply, saved over 9GB on my Dwarf Fortress folder. Not shabby, IMO. It also saves some space with saved games, and my Bethesda games. So, yes, I use it, even here in 2013.

But, being able to map 4K here to 4K there aught to be enough, by itself, to not do it for every file and directory on the drive. One of the things that makes most IO caching as efficient as it is is that a page can be mapped identically between RAM and the disk, so if it needs flushing out, where it gets flushed out to, and even if it needs flushing out to anywhere (read-only data can simply be discarded), is very simple, thanks to the hardware handling some of the more difficult stuff for you, with lots of CAM.
 

CakeMonster

Golden Member
Nov 22, 2012
1,604
781
136
Thanks for the replies, I forgot about this thread for a while. Cerb, your last paragraph flew right over my head, but from what I understand from both replies.. NTFS weakness in fragmentation or sector size makes reading/writing not necessarily better? Do I understand it correctly that if the file system had been better, compression could have given us "free" benefits with practically no downsides considering today's CPU power?
 

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
Thanks for the replies, I forgot about this thread for a while. Cerb, your last paragraph flew right over my head, but from what I understand from both replies.. NTFS weakness in fragmentation or sector size makes reading/writing not necessarily better? Do I understand it correctly that if the file system had been better, compression could have given us "free" benefits with practically no downsides considering today's CPU power?

Sort of......

It's not really a filesystem specific issue. Modern versions of Windows make file I/O closely tied to paging for efficiency. I believe (not certain) that Linux does the same thing. The CPU only supports MMU pages that are multiples of 4KB, and will always want that amount of data fetched on a page fault.

What they could do to overcome this is do memory compression similar to OS/X Mavericks as well as filesystem compression, with both the RAM and filesystem using the same compression scheme. So compressed files could be paged into compressed RAM. Page faults in the compressed RAM trigger paging in from disk. Page faults in uncompressed RAM should mostly trigger decompression of whatever compressed RAM area it is cached in before being swapped to real disk.
 

CakeMonster

Golden Member
Nov 22, 2012
1,604
781
136
Thanks, is there anything in development or how does the future look in both desktop and server system with regards to compression? I would imagine there could be cost savings in the amount of storage needed, even despite the fact that many files already have compressed data, if you look at the large scale picture. There is also the performance aspect. I guess its my engineering mindset that makes me think that there's unused potential here if there only were file systems that just did it transparently and without hassle or noticeable overhead.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Some FSes support compression of larger buffer sizes (I think ZFS does), but that's about it. The core problem with compression is that it must add additional CPU use into the system, and may involve additional system calls, or even context switches. Also, the only way to get much better compression is to increase the amount of data being compressed, which also means increasing the amount of data that needs to be used to decompress it. Compression can be nice if you know you won't have to care about random small IO performance, but there's no way around having significant added overhead.