Will somebody benchmark WHOLE DISK COMPRESSION (HD,SSD,RAMDISK) please!!

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
Please use the windows 7 or 2008R2 "compress the whole darn drive" then benchmark.


I'd love to see bottleneck scenarios on a modern multi-core (4 core or 2c4t) cpu.

you'd have to create a disk image.

loop:
benchmark
compress the whole drive
benchmark
restore the image since disabling compression doesn't decompress the objects.

switch from HD to SSD to RAID (HD/SDD) and maybe even ramdisk.

Maybe ANAND is too old for this fun - but it makes me wonder if i have a dual 6-core westmere with ESXi - 6gb SATA and SAS - i don't mind carving off a couple of cores for compression - already do for sql server 2008 (compress *.*) at a program level and why yes it is faster when used correctly (!!).

Also note: you can not compress certain objects by directory in NTFS. so if you have swap or pagefile or JPEG/PR0N VIDS you could dimension this even further on the test.

I think it would be a worthy ANAND article - today's compression.

1. ESXi compresses pages (ramdoubler!) before it has to swap them out since 4.1
2. many modern SAN will compress objects and deduplicate them. heck vmware does ram (ramdoubler?) and vmfs (clustered storage filesystem) deduplication at a low level already.
3. Backup Exec System Restore 2010 has the ability to send to a datastore for dedupe so those 50 pc's at work that were imaged all at once contain 90% of the same code - can be deduped.
4. SQL server since 2008 has TCP/TABLE/BACKUP compression because we have more CPU than DISK I/O. SQL server is really an os and is called SQLOS by some old times - soft-numa - etc - It must have been a good idea. This is one product microsoft can proudly call their own (best acquisition lol).

Anyone think its worth the time?

CPU usage would go way up but load time on constrained i/o devices (laptop 4200rpm drive) might yield more usability? or not?

What about dedupe? Ramdoubler? Is it time we can afford to use those again in consumer land? I'd rather have a GC process compress stale objects then page them to reduce pagefile wear or prevent paging completely. I can't afford more than 8GB for a personal pc right yet.

Inquiring minds would like to know..
 

razel

Platinum Member
May 14, 2002
2,337
90
101
You've thought about this quite a bit since you got a bit of a plan and have all the questions laid out. You probably got enough knowledge and resources as well. It's always the last 10% that's the hard part. You got 90% to your answer, why not do that last 10% yourself? :)
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
man i got a kid - i have to go read some stories and put to bed - I was hoping someone like Anand would analyze compression specific entities to increase disk i/o.
 

DominionSeraph

Diamond Member
Jul 22, 2009
8,391
31
91
Just for fun I'm compressing my Program Files directory under XP to see what kind of compression level we're talking about. 15GB mostly in Resident Evil 4 and Demigod. Not gonna benchmark anything, though.
Earlier I did my GIMP directory and found 18 seconds for the program to load uncompressed vs. 17 compressed with the directory as a whole compressed to 0.65; but I only did one run and my timing method wasn't the most precise, either.
Chrome compresses to 0.68 original, which is more compression than I would've thought.

I'd like to see some benchmarks, but it's gonna be so highly variable depending on the compression level of what's on the disk, layout compared to usage, with superfetch throwing a huge curve into things.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
using the compact command? google it. you can tell it to find all dll and exe's on your drive in one swoop
 

kmmatney

Diamond Member
Jun 19, 2000
4,363
1
81
For people with SSDs in laptops, compression can be an important factor to gain more space. I have an 80GB SSD in my laptop, and have compressed a lot of folders that I don't use often, but still might need. I haven't compressed my program files yet, though.

Edit: I wonder if turning on drive or folder compression negatively effects SandForce SSDs?
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
you can turn on compression (compress items) then turn it off (but the stuff remains compressed) iirc from some old training i did.

question is? why do binaries compress? I thought most dll's are encrypted and compressed already? Why wouldn't the companies (microsoft,etc) run MAD zip compression level 11 on every bit of their files (compression is a form of encryption)
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
you can turn on compression (compress items) then turn it off (but the stuff remains compressed) iirc from some old training i did.

question is? why do binaries compress? I thought most dll's are encrypted and compressed already? Why wouldn't the companies (microsoft,etc) run MAD zip compression level 11 on every bit of their files (compression is a form of encryption)

No, AFAIK MS has never encrypted or compressed their binaries.

But in theory compression should help with load times since there will be less I/O to do and CPUs should be fast enough to decompress the data in memory with little to no noticeable latency. But the real world affects of that are very hard to benchmark.
 

jimhsu

Senior member
Mar 22, 2009
705
0
76
From what I can tell, compression will most likely increase latency (slightly) and increase bandwidth (slightly to dramatically).

However some people have shown that 4kb random writes are impacted quite dramatically by compression. (we're talking 50 to 5 MB/s). Can someone validate?
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
From what I can tell, compression will most likely increase latency (slightly) and increase bandwidth (slightly to dramatically).

However some people have shown that 4kb random writes are impacted quite dramatically by compression. (we're talking 50 to 5 MB/s). Can someone validate?

just to be clear, you are saying 4k random writes drop an order of magnitude when you turn on compression?
 

jimhsu

Senior member
Mar 22, 2009
705
0
76
I have the evidence here actually. Did an iometer run without compression and with compression. The results are ... interesting to say the least.

Test conditions:
1. Small test file
2. 4 Outstanding IOs
3. Compression enabled using Win7 x64 "right click > properties"
4. Processor - E8400
5. Test interval - 1 sec warm up time, 5 sec test. I found out that longer test times actually introduce more drift (i.e. disk accesses, TRIM, etc)

Run Normal MB/s Normal IOPS Compression MB/s Compression IOPS
1MB; 100% Read; 0% random (1) 153.458158 153.458158 4686.168947 4686.168947
1MB; 0% Read; 0% random 81.301776 81.301776 22.910449 22.910449
4K; 100% Read; 100% random 74.333424 19029.35662 496.191356 127024.987
4K; 0% Read; 100% random 72.17019 18475.56867 1.460037 373.769533

CPU utilization for compression cases peaked at 56% and remained there basically constantly. This supports my theory that compression is singlethreaded (at least while reading/writing to a single file).

Note on graph below, the y-axis (MB/s) is in LOG scale (on a semilog graph). My 1MB sequential read run for uncompressed was a little screwed up (warm up time) so ignore that.

Also realize IOmeter uses highly compressible data for benchmarks.

graphak.png
 
Last edited:

jimhsu

Senior member
Mar 22, 2009
705
0
76
What I can conclude:

Compression is good for read-heavy databases that are sparse (i.e. easily compressable). You can experience substantial to extreme boosts provided CPU resources are not taxed.
Compression is HORRIBLE for write heavy databases, especially random writes.
For everything in between, it depends on your application. I'd say for databases with a 90/10 read/write ratio or greater, you might consider compression.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
What I can conclude:

Compression is good for read-heavy databases that are sparse (i.e. easily compressable). You can experience substantial to extreme boosts provided CPU resources are not taxed.
Compression is HORRIBLE for write heavy databases, especially random writes.
For everything in between, it depends on your application. I'd say for databases with a 90/10 read/write ratio or greater, you might consider compression.

Well SQL Server won't let you mount a compressed mdf/ldf so that's not an option there. I doubt other databases like MySQL or PostgreSQL would care though.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
jimhsu, is this mislabeled or did you make it a semilog scale graph? if so than it is an extreme difference that isn't quite conveyed to someone not used to reading semilog graphs

A assume 100% read is "read" and "0% read" is "write"?
you clearly show huge improvements in read, and huge losses in write performance. thank you for sharing the data.
 
Last edited:

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
sql server 2008/R2(10)/DENALI (11) has built in table level compression. IT IS AWESOME.

Imaging you have your databases spliced over iscsi - files 1 lun, tmpdb 1 lun, log 1 lun.

peak speed is gigabit right? But everyone stores stuff in databases that is highly compressible. you can even turn it on and off (mixed) or batch run it.

ie. compress always table A
and/or compress on/off when you determine load is too high disable compression
and/or compress every night.

Compress backups (yay!) and compress TCP/IP connects (YAY!) as well!

-------

I'd rather have a table fit in a page than split too.

Keep in mind i consider SQL server an OS. It sits on top of windows but has numa, resource governor, i/o controller (logs are less lazy than main table writes).

It's become so smart the old adage that you must separate log from core storage is not really a must any more. Light years more advanced than mysql.

anyhoo.


My idea was to compress items that aren't updated frequently on batch using the COMPACT command (dos command). things that are updated alot - just don't.

Things that do not compress skip.

NTFS allows you to set compression on/off at a file level , dir , or whole drive. you can also turn it off but the objects remain compressed.

Anyone care to try that?
 

jimhsu

Senior member
Mar 22, 2009
705
0
76
Yes, it's semilog. The data is unreadable as a linear scale graph.

Using highly compressable data (IOmeter), you get at most a 30x performance increase, and at least a 49x performance DECREASE. You can see why your data workload matters so much - even a 1-2% increase in writes can affect performance drastically.
 

SickBeast

Lifer
Jul 21, 2000
14,377
19
81
There are articles on Phoronix.com benchmarking the BTRFS filesystem using compression vs not using it. It's under Linux, but the results may give you some insight into the performance boost that is possible. AFAIK BTRFS was faster across the board when compression was enabled. From what I remember, the boost was around 20%.
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
Bring up the command prompt: Start => Accessories => Command Prompt. You’ll have a small black screen as per the MS-DOS days

In Vista and 7 - Change Directory to Users with the command CD \Users
In XP Change Directory to Documents and Settings with the command CD \Documents and Settings

Enter the compression command as follows: compact /c /s /i. Your system will go through and compress all the files in this directory and subdirectories. It could take 10 to 20 minutes depending on the amount of data.

When complete do the same with Program Files. Issue the command CD \Program Files (for 64 bit systems do the same with Program Files (x86).


Enter the compression command as follows: compact /c /s /i. Your system will go through and compress all the files in this directory and subdirectories. It could take 10 to 20 minutes depending on the amount of data.

When complete do the same with Windows Files. Issue the command CD \Windows


Enter the compression command as follows: compact /c /s /i. Your system will go through and compress all the files in this directory and subdirectories. It could take or so 20 minutes


When complete we are going to compress all other .exe and .dll files across the whole drive. Issue the command CD\. You should only have C:\> prompt showing.


Enter the compression command as follows compact /c /s /i *.exe. This will compress all other .exe files across your drive. When complete issue the command compact /c /s /i *.dll. This will compress all .dll files across the entire drive.


When you complete steps 1 through 9, restart your system into safe mode. To do this, restart your system and as it is starting tap the F8 key a few times. When the safe mode dialogue comes up select the top option Safe Mode and your system will boot into safe mode.


Repeat steps 1 to 9. This will compress some of the files that were locked in normal mode.


When you have completed steps 1 to 9, restart your system normally.

What you have just done is applied NTFS compression to all Program Files, User files and Windows files. These now occupy about 2/3 the space on your hard drive they normally did. Your hard drive will spend 1/3 less time reading these files when and as your computer accesses them because there is 1/3 less data to read from the hard drive. You won’t see much of a performance increase at this stage since many of the files will be fragmented. In a moment we’ll move onto the defragmentation, optimal file placement and confinement of those files to the outer tracks of your hard drive where transfer performance for those files will be increased by an average of 50%.

from ultimate defrag (defrag pointless in ssd i spose?)
 

jimhsu

Senior member
Mar 22, 2009
705
0
76
Can someone explain though why write performance (specifically RANDOM write) drops so dramatically with compression enabled? Are writes not being parallelized correctly in this scenario? Is my CPU (3.6 GHz E8400) holding me back?
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
depends is the app writing with threads? Is your drives thread safe? do you actually have NCQ?

i see all 4 of my cores (q6600) pushing up to 30-40% so i'd have to disagree.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Can someone explain though why write performance (specifically RANDOM write) drops so dramatically with compression enabled? Are writes not being parallelized correctly in this scenario? Is my CPU (3.6 GHz E8400) holding me back?

Because the filesystem compresses data in chunks larger than the cluster size, so whenever a write is issued the system has to read in that chunk, decompress it, make the update, recompress it and write it back.
 

ElenaP

Member
Dec 25, 2009
88
0
0
www.ReclaiMe.com
Typically, a compression unit on NTFS would be 16 clusters, 64KB. So the system has to recompress 64K even if you write one byte. There is another problem, more important, that compression coupled with random writes induces bad fragmentation.
The compression unit is 16 clusters. If the original data was compressed into 10 clusters and after the write it is now 11 clusters, there is no way to contiguously store that block. It is not uncommon for a compressed file have upwards of 10,000 fragments after some use. In practice this affects e.g. email software databases.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Typically, a compression unit on NTFS would be 16 clusters, 64KB. So the system has to recompress 64K even if you write one byte. There is another problem, more important, that compression coupled with random writes induces bad fragmentation.
The compression unit is 16 clusters. If the original data was compressed into 10 clusters and after the write it is now 11 clusters, there is no way to contiguously store that block. It is not uncommon for a compressed file have upwards of 10,000 fragments after some use. In practice this affects e.g. email software databases.

I wouldn't call that more important, the effects of file fragmentation is hugely overblown by developers of software to fix it.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
fragmentation not that significant.
but if a compression unit is 16 clusters that means random writes suffer from read-modify-write cycles just like an SSD without trim (but for different reasons of course).

so to write 4kb you need to read 64 kb, decompress it, modify the data, recompress it, then write it back down as 64kb or more (more if it doesn't compress as well).

The smallest 10,000 fragment file you can have is a 5,120,000 byte file (4.88MB using 1024 base conversion) where each every single one of its 10,000 fragments is non contiguous... Or a SIGNIFICANTLY larger file which is still extremely fragmented, both are patently ridiculous...

a 64kb file has a MAX of 128 fragment (because it takes up exactly 128 sectors, since each sector is 512 bytes)
 
Last edited: