Will somebody benchmark WHOLE DISK COMPRESSION (HD,SSD,RAMDISK) please!!

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ElenaP

Member
Dec 25, 2009
88
0
0
www.ReclaiMe.com
Nothinman,

in practice, the thing just gets damn slow if there are 10,000 fragments of the file and you happen to need a full read. This does not affect the overall performance significantly, but certain operations (like a full-text search in a large email base) make you wonder "what is it doing now". If you need a copy of such a file, read speed can be like one megabyte per second, sometimes even less. In general, I'd agree that fragmentation is overblown, but this specific scenario is really bad.

Consider that reading 10,000 fragments costs 1/5400/2*10,000 ~ a minute on a 5400 RPM drive in rotational delay alone.

taltamir,
NTFS never writes more than 64KB (16 clusters) compression block. If the compression cannot gain at least one cluster, the block is written uncompressed. A compressed file may have alternating compressed/plain blocks physically stored.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
taltamir,
NTFS never writes more than 64KB (16 clusters) compression block. If the compression cannot gain at least one cluster, the block is written uncompressed. A compressed file may have alternating compressed/plain blocks physically stored.

thank you for clarifying your post. This however does not change any of my points.
10,000 fragments is obscene, yet you call it "not uncommon". Fragmentation is just irrelevant as ever. And the performance issue is still be cause of having to read modify, then write (not fragmentation).
 

ElenaP

Member
Dec 25, 2009
88
0
0
www.ReclaiMe.com
Just checked and I happen to have two disk image files (VMWare) about 2GB each with 8,000+ disjoint fragments each on this machine. Not as obscene as 10,000, but still quite good. Plus more than 10 files with 1,000+ fragments, including an Outlook PST database. All of these are compressed. The most fragmented uncompressed file features about 100 fragments.

Also, the minimum size for 10,000 fragments is about 5MB (megabyte), not 5TB (terabyte).
 
Last edited:

taltamir

Lifer
Mar 21, 2004
13,576
6
76
Also, the minimum size for 10,000 fragments is about 5MB (megabyte), not 5TB (terabyte).

That was a slip, I meant to write GB not TB, but GB is wrong, it is indeed MB. I didn't pay attention to my own arithmatic, sorry.

Just checked and I happen to have two disk image files (VMWare) about 2GB each with 8,000+ disjoint fragments each on this machine. Not as obscene as 10,000, but still quite good. Plus more than 10 files with 1,000+ fragments, including an Outlook PST database. All of these are compressed. The most fragmented uncompressed file features about 100

which is still totally and completely irrelevant to their performance degradation issue.
Basically:
1. Compression causes fragmentation
2. Compression causes performance degradation

While you are saying:
1. Compression causes fragmentation.
2. Fragmentation causes performance degradation.

Fragmentation itself causes a tiny insignificant portion of the performance degradation that is experienced with compressed files.
 
Last edited:

ElenaP

Member
Dec 25, 2009
88
0
0
www.ReclaiMe.com
We need a benchmark then. That would be interesting to develop.

Initial condition - an unfragmented file compressed, say, 16:14. This is easy to create.

Test 1 - write the data with exactly the same compression ratio over the original file, say, random 4K writes. The file will be recompressed but no fragmentation should result (although we'd need to verify that).

Test 2 - write the data with a compression ratio of 16:15. This should cause a file to fragment.

Test 3 - write a different data set with the same compression ratio of 16:15 to the already fragmented file. The overhead of unpack-change-pack-write loop would be similar between all cases (difference being 14 vs 15 clusters per compression unit).

If there is any dramatic change in write speed between (1) and (3), that would be due to fragmentation. Does that sound like a proper benchmark?
 

ElenaP

Member
Dec 25, 2009
88
0
0
www.ReclaiMe.com
Scrap it, we need something less sinister.

Say, create a regular file filled with a compressible pattern.
Write a similar pattern to that file, measure speed. This is an uncompressed sample.
Now, compress the file. It will inevitably fragment.
Write a similar pattern again, that is compressed+fragmented sample.
Finally defragment the file and re-measure. That gives a compressed not fragmented sample.

That even sounds simple.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
wouldn't you rather spend 4x compressing/deduping everything that is not likely to change and disable compression on files that are likely to change? while you can't (can you?) change the compressor to be more efficient (compact dos command) i guess unless you reverse engineer it? you could use a date last modified system to locate file (much like modern defraggers) to locate the most likely candidates for compression.

its not on/off on ntfs. its per FILE at the time you select such file(dir/drive). If you turn off compression that doesn't uncompress a file does it?

Maybe an NTFS expert can chime in here. It would seem logical that NTFS turns into a dedupe storage system eventually (VMFS has this? does hyper-V CSV have this?) to save on disk storage. Likewise if you could run spare cpu cycles to monster-crunch files that don't change that often down a bit you'd see the performance gain.

Makes you wonder if the PST file in moderm outlook is some sort of hybrid file that is compressed using built in compression engine of ntfs already.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
nobody wants to use the compact command on their drive to see how much it speeds up load times?
 

hal2kilo

Lifer
Feb 24, 2009
26,262
12,427
136
sql server 2008/R2(10)/DENALI (11) has built in table level compression. IT IS AWESOME.

Imaging you have your databases spliced over iscsi - files 1 lun, tmpdb 1 lun, log 1 lun.

peak speed is gigabit right? But everyone stores stuff in databases that is highly compressible. you can even turn it on and off (mixed) or batch run it.

ie. compress always table A
and/or compress on/off when you determine load is too high disable compression
and/or compress every night.

Compress backups (yay!) and compress TCP/IP connects (YAY!) as well!

-------

I'd rather have a table fit in a page than split too.

Keep in mind i consider SQL server an OS. It sits on top of windows but has numa, resource governor, i/o controller (logs are less lazy than main table writes).

It's become so smart the old adage that you must separate log from core storage is not really a must any more. Light years more advanced than mysql.

anyhoo.


My idea was to compress items that aren't updated frequently on batch using the COMPACT command (dos command). things that are updated alot - just don't.

Things that do not compress skip.

NTFS allows you to set compression on/off at a file level , dir , or whole drive. you can also turn it off but the objects remain compressed.

Anyone care to try that?

Suprised compression hasn't been built into database software for years considering most have zero filled fields.
 

0roo0roo

No Lifer
Sep 21, 2002
64,795
84
91
http://ntfscompressor.sourceforge.net/
found a tool that lets you compress based on compressibility/days since last access and excludes certain file types.

still can't work right on win7 and such because of all the junction point or whatever weirdness... but you can use it on other non system directories.
 

installaware

Junior Member
Dec 3, 2012
2
0
0
I have the evidence here actually. Did an iometer run without compression and with compression. The results are ... interesting to say the least.

Test conditions:
1. Small test file
2. 4 Outstanding IOs
3. Compression enabled using Win7 x64 "right click > properties"
4. Processor - E8400
5. Test interval - 1 sec warm up time, 5 sec test. I found out that longer test times actually introduce more drift (i.e. disk accesses, TRIM, etc)

Run Normal MB/s Normal IOPS Compression MB/s Compression IOPS
1MB; 100% Read; 0% random (1) 153.458158 153.458158 4686.168947 4686.168947
1MB; 0% Read; 0% random 81.301776 81.301776 22.910449 22.910449
4K; 100% Read; 100% random 74.333424 19029.35662 496.191356 127024.987
4K; 0% Read; 100% random 72.17019 18475.56867 1.460037 373.769533

CPU utilization for compression cases peaked at 56% and remained there basically constantly. This supports my theory that compression is singlethreaded (at least while reading/writing to a single file).

Note on graph below, the y-axis (MB/s) is in LOG scale (on a semilog graph). My 1MB sequential read run for uncompressed was a little screwed up (warm up time) so ignore that.

Also realize IOmeter uses highly compressible data for benchmarks.

graphak.png

Did you use Drive Press to convert the drive? The Windows NTFS conversion process misses a significant portion of files on the drive (as illustrated at magicrar.com/drive-press.html) which would definitely impact your benchmarks and statistics.

Even running a conversion process on an external drive (while the OS hosted on the drive is inactive) does not work around this issue according to my own tests (just as a user of the Drive Press utility).

To give you an idea of the significant impact this tool makes on NTFS drive compression: My average space savings are 20%-25% with Drive Press (when they are only 5%-10% with built-in Windows compression).

TRIM is not an issue within Windows itself, but I employ the best practice of keeping the root of the drive clear so the temp file Intel's SSD Toolbox creates during manual TRIMs is not affected by drive compression.

I think this tool also improves overall SSD performance, because there is more spare area available on the drive - thus assisting the drive's built-in wear leveling efforts.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Did you use Drive Press to convert the drive? The Windows NTFS conversion process misses a significant portion of files on the drive (as illustrated at magicrar.com/drive-press.html) which would definitely impact your benchmarks and statistics.

Even running a conversion process on an external drive (while the OS hosted on the drive is inactive) does not work around this issue according to my own tests (just as a user of the Drive Press utility).

To give you an idea of the significant impact this tool makes on NTFS drive compression: My average space savings are 20%-25% with Drive Press (when they are only 5%-10% with built-in Windows compression).

TRIM is not an issue within Windows itself, but I employ the best practice of keeping the root of the drive clear so the temp file Intel's SSD Toolbox creates during manual TRIMs is not affected by drive compression.

I think this tool also improves overall SSD performance, because there is more spare area available on the drive - thus assisting the drive's built-in wear leveling efforts.

Why would anyone trust their data to a black box piece of software that sells itself as magic?
 

installaware

Junior Member
Dec 3, 2012
2
0
0
It's not a black box, it is based on NTFS compression - but it fixes a bug present in all Windows versions NT 6.x+ (that includes Vista, 7, and 8 - as well as server editions). That is how it manages to beat Windows's own compression so spectacularly.