Will somebody benchmark WHOLE DISK COMPRESSION (HD,SSD,RAMDISK) please!!

Discussion in 'Memory and Storage' started by Emulex, Nov 13, 2010.

  1. Emulex

    Emulex Diamond Member

    Joined:
    Jan 28, 2001
    Messages:
    9,759
    Likes Received:
    0
    Please use the windows 7 or 2008R2 "compress the whole darn drive" then benchmark.


    I'd love to see bottleneck scenarios on a modern multi-core (4 core or 2c4t) cpu.

    you'd have to create a disk image.

    loop:
    benchmark
    compress the whole drive
    benchmark
    restore the image since disabling compression doesn't decompress the objects.

    switch from HD to SSD to RAID (HD/SDD) and maybe even ramdisk.

    Maybe ANAND is too old for this fun - but it makes me wonder if i have a dual 6-core westmere with ESXi - 6gb SATA and SAS - i don't mind carving off a couple of cores for compression - already do for sql server 2008 (compress *.*) at a program level and why yes it is faster when used correctly (!!).

    Also note: you can not compress certain objects by directory in NTFS. so if you have swap or pagefile or JPEG/PR0N VIDS you could dimension this even further on the test.

    I think it would be a worthy ANAND article - today's compression.

    1. ESXi compresses pages (ramdoubler!) before it has to swap them out since 4.1
    2. many modern SAN will compress objects and deduplicate them. heck vmware does ram (ramdoubler?) and vmfs (clustered storage filesystem) deduplication at a low level already.
    3. Backup Exec System Restore 2010 has the ability to send to a datastore for dedupe so those 50 pc's at work that were imaged all at once contain 90% of the same code - can be deduped.
    4. SQL server since 2008 has TCP/TABLE/BACKUP compression because we have more CPU than DISK I/O. SQL server is really an os and is called SQLOS by some old times - soft-numa - etc - It must have been a good idea. This is one product microsoft can proudly call their own (best acquisition lol).

    Anyone think its worth the time?

    CPU usage would go way up but load time on constrained i/o devices (laptop 4200rpm drive) might yield more usability? or not?

    What about dedupe? Ramdoubler? Is it time we can afford to use those again in consumer land? I'd rather have a GC process compress stale objects then page them to reduce pagefile wear or prevent paging completely. I can't afford more than 8GB for a personal pc right yet.

    Inquiring minds would like to know..
     
  2. Loading...

    Similar Threads - benchmark DISK COMPRESSION Forum Date
    Is this a normal benchmark for an Samsung 960 Evo 1TB? Memory and Storage Dec 28, 2016
    Which is this disk benchmark software ?? Memory and Storage Apr 28, 2013
    best disk benchmark? Memory and Storage Mar 26, 2013
    Howto Benchmark Disk I/O for SAN, RAID, Virtual Disk? Memory and Storage Aug 30, 2012
    Disk IO benchmark Memory and Storage Jan 22, 2009

  3. razel

    razel Golden Member

    Joined:
    May 14, 2002
    Messages:
    1,880
    Likes Received:
    8
    You've thought about this quite a bit since you got a bit of a plan and have all the questions laid out. You probably got enough knowledge and resources as well. It's always the last 10% that's the hard part. You got 90% to your answer, why not do that last 10% yourself? :)
     
  4. Emulex

    Emulex Diamond Member

    Joined:
    Jan 28, 2001
    Messages:
    9,759
    Likes Received:
    0
    man i got a kid - i have to go read some stories and put to bed - I was hoping someone like Anand would analyze compression specific entities to increase disk i/o.
     
  5. DominionSeraph

    DominionSeraph Diamond Member

    Joined:
    Jul 22, 2009
    Messages:
    8,280
    Likes Received:
    4
    Just for fun I'm compressing my Program Files directory under XP to see what kind of compression level we're talking about. 15GB mostly in Resident Evil 4 and Demigod. Not gonna benchmark anything, though.
    Earlier I did my GIMP directory and found 18 seconds for the program to load uncompressed vs. 17 compressed with the directory as a whole compressed to 0.65; but I only did one run and my timing method wasn't the most precise, either.
    Chrome compresses to 0.68 original, which is more compression than I would've thought.

    I'd like to see some benchmarks, but it's gonna be so highly variable depending on the compression level of what's on the disk, layout compared to usage, with superfetch throwing a huge curve into things.
     
  6. Emulex

    Emulex Diamond Member

    Joined:
    Jan 28, 2001
    Messages:
    9,759
    Likes Received:
    0
    using the compact command? google it. you can tell it to find all dll and exe's on your drive in one swoop
     
  7. kmmatney

    kmmatney Diamond Member

    Joined:
    Jun 19, 2000
    Messages:
    4,360
    Likes Received:
    1
    For people with SSDs in laptops, compression can be an important factor to gain more space. I have an 80GB SSD in my laptop, and have compressed a lot of folders that I don't use often, but still might need. I haven't compressed my program files yet, though.

    Edit: I wonder if turning on drive or folder compression negatively effects SandForce SSDs?
     
  8. Emulex

    Emulex Diamond Member

    Joined:
    Jan 28, 2001
    Messages:
    9,759
    Likes Received:
    0
    you can turn on compression (compress items) then turn it off (but the stuff remains compressed) iirc from some old training i did.

    question is? why do binaries compress? I thought most dll's are encrypted and compressed already? Why wouldn't the companies (microsoft,etc) run MAD zip compression level 11 on every bit of their files (compression is a form of encryption)
     
  9. Nothinman

    Nothinman Elite Member

    Joined:
    Sep 14, 2001
    Messages:
    30,672
    Likes Received:
    0
    No, AFAIK MS has never encrypted or compressed their binaries.

    But in theory compression should help with load times since there will be less I/O to do and CPUs should be fast enough to decompress the data in memory with little to no noticeable latency. But the real world affects of that are very hard to benchmark.
     
  10. jimhsu

    jimhsu Senior member

    Joined:
    Mar 22, 2009
    Messages:
    703
    Likes Received:
    0
    From what I can tell, compression will most likely increase latency (slightly) and increase bandwidth (slightly to dramatically).

    However some people have shown that 4kb random writes are impacted quite dramatically by compression. (we're talking 50 to 5 MB/s). Can someone validate?
     
  11. taltamir

    taltamir Lifer

    Joined:
    Mar 21, 2004
    Messages:
    13,578
    Likes Received:
    0
    just to be clear, you are saying 4k random writes drop an order of magnitude when you turn on compression?
     
  12. jimhsu

    jimhsu Senior member

    Joined:
    Mar 22, 2009
    Messages:
    703
    Likes Received:
    0
    I have the evidence here actually. Did an iometer run without compression and with compression. The results are ... interesting to say the least.

    Test conditions:
    1. Small test file
    2. 4 Outstanding IOs
    3. Compression enabled using Win7 x64 "right click > properties"
    4. Processor - E8400
    5. Test interval - 1 sec warm up time, 5 sec test. I found out that longer test times actually introduce more drift (i.e. disk accesses, TRIM, etc)

    Run Normal MB/s Normal IOPS Compression MB/s Compression IOPS
    1MB; 100% Read; 0% random (1) 153.458158 153.458158 4686.168947 4686.168947
    1MB; 0% Read; 0% random 81.301776 81.301776 22.910449 22.910449
    4K; 100% Read; 100% random 74.333424 19029.35662 496.191356 127024.987
    4K; 0% Read; 100% random 72.17019 18475.56867 1.460037 373.769533

    CPU utilization for compression cases peaked at 56% and remained there basically constantly. This supports my theory that compression is singlethreaded (at least while reading/writing to a single file).

    Note on graph below, the y-axis (MB/s) is in LOG scale (on a semilog graph). My 1MB sequential read run for uncompressed was a little screwed up (warm up time) so ignore that.

    Also realize IOmeter uses highly compressible data for benchmarks.

    [​IMG]
     
    #11 jimhsu, Nov 17, 2010
    Last edited: Nov 17, 2010
  13. jimhsu

    jimhsu Senior member

    Joined:
    Mar 22, 2009
    Messages:
    703
    Likes Received:
    0
    What I can conclude:

    Compression is good for read-heavy databases that are sparse (i.e. easily compressable). You can experience substantial to extreme boosts provided CPU resources are not taxed.
    Compression is HORRIBLE for write heavy databases, especially random writes.
    For everything in between, it depends on your application. I'd say for databases with a 90/10 read/write ratio or greater, you might consider compression.
     
  14. Nothinman

    Nothinman Elite Member

    Joined:
    Sep 14, 2001
    Messages:
    30,672
    Likes Received:
    0
    Well SQL Server won't let you mount a compressed mdf/ldf so that's not an option there. I doubt other databases like MySQL or PostgreSQL would care though.
     
  15. taltamir

    taltamir Lifer

    Joined:
    Mar 21, 2004
    Messages:
    13,578
    Likes Received:
    0
    jimhsu, is this mislabeled or did you make it a semilog scale graph? if so than it is an extreme difference that isn't quite conveyed to someone not used to reading semilog graphs

    A assume 100% read is "read" and "0% read" is "write"?
    you clearly show huge improvements in read, and huge losses in write performance. thank you for sharing the data.
     
    #14 taltamir, Nov 17, 2010
    Last edited: Nov 17, 2010
  16. Emulex

    Emulex Diamond Member

    Joined:
    Jan 28, 2001
    Messages:
    9,759
    Likes Received:
    0
    sql server 2008/R2(10)/DENALI (11) has built in table level compression. IT IS AWESOME.

    Imaging you have your databases spliced over iscsi - files 1 lun, tmpdb 1 lun, log 1 lun.

    peak speed is gigabit right? But everyone stores stuff in databases that is highly compressible. you can even turn it on and off (mixed) or batch run it.

    ie. compress always table A
    and/or compress on/off when you determine load is too high disable compression
    and/or compress every night.

    Compress backups (yay!) and compress TCP/IP connects (YAY!) as well!

    -------

    I'd rather have a table fit in a page than split too.

    Keep in mind i consider SQL server an OS. It sits on top of windows but has numa, resource governor, i/o controller (logs are less lazy than main table writes).

    It's become so smart the old adage that you must separate log from core storage is not really a must any more. Light years more advanced than mysql.

    anyhoo.


    My idea was to compress items that aren't updated frequently on batch using the COMPACT command (dos command). things that are updated alot - just don't.

    Things that do not compress skip.

    NTFS allows you to set compression on/off at a file level , dir , or whole drive. you can also turn it off but the objects remain compressed.

    Anyone care to try that?
     
  17. jimhsu

    jimhsu Senior member

    Joined:
    Mar 22, 2009
    Messages:
    703
    Likes Received:
    0
    Yes, it's semilog. The data is unreadable as a linear scale graph.

    Using highly compressable data (IOmeter), you get at most a 30x performance increase, and at least a 49x performance DECREASE. You can see why your data workload matters so much - even a 1-2% increase in writes can affect performance drastically.
     
  18. Blain

    Blain Lifer

    Joined:
    Oct 9, 1999
    Messages:
    23,643
    Likes Received:
    2
    Tell tweakboy to do it... He'll benchmark anything.
     
  19. SickBeast

    SickBeast Lifer

    Joined:
    Jul 21, 2000
    Messages:
    14,334
    Likes Received:
    0
    There are articles on Phoronix.com benchmarking the BTRFS filesystem using compression vs not using it. It's under Linux, but the results may give you some insight into the performance boost that is possible. AFAIK BTRFS was faster across the board when compression was enabled. From what I remember, the boost was around 20%.
     
  20. Emulex

    Emulex Diamond Member

    Joined:
    Jan 28, 2001
    Messages:
    9,759
    Likes Received:
    0
    Bring up the command prompt: Start => Accessories => Command Prompt. You’ll have a small black screen as per the MS-DOS days

    In Vista and 7 - Change Directory to Users with the command CD \Users
    In XP Change Directory to Documents and Settings with the command CD \Documents and Settings

    Enter the compression command as follows: compact /c /s /i. Your system will go through and compress all the files in this directory and subdirectories. It could take 10 to 20 minutes depending on the amount of data.

    When complete do the same with Program Files. Issue the command CD \Program Files (for 64 bit systems do the same with Program Files (x86).


    Enter the compression command as follows: compact /c /s /i. Your system will go through and compress all the files in this directory and subdirectories. It could take 10 to 20 minutes depending on the amount of data.

    When complete do the same with Windows Files. Issue the command CD \Windows


    Enter the compression command as follows: compact /c /s /i. Your system will go through and compress all the files in this directory and subdirectories. It could take or so 20 minutes


    When complete we are going to compress all other .exe and .dll files across the whole drive. Issue the command CD\. You should only have C:\> prompt showing.


    Enter the compression command as follows compact /c /s /i *.exe. This will compress all other .exe files across your drive. When complete issue the command compact /c /s /i *.dll. This will compress all .dll files across the entire drive.


    When you complete steps 1 through 9, restart your system into safe mode. To do this, restart your system and as it is starting tap the F8 key a few times. When the safe mode dialogue comes up select the top option Safe Mode and your system will boot into safe mode.


    Repeat steps 1 to 9. This will compress some of the files that were locked in normal mode.


    When you have completed steps 1 to 9, restart your system normally.

    What you have just done is applied NTFS compression to all Program Files, User files and Windows files. These now occupy about 2/3 the space on your hard drive they normally did. Your hard drive will spend 1/3 less time reading these files when and as your computer accesses them because there is 1/3 less data to read from the hard drive. You won’t see much of a performance increase at this stage since many of the files will be fragmented. In a moment we’ll move onto the defragmentation, optimal file placement and confinement of those files to the outer tracks of your hard drive where transfer performance for those files will be increased by an average of 50%.

    from ultimate defrag (defrag pointless in ssd i spose?)
     
  21. jimhsu

    jimhsu Senior member

    Joined:
    Mar 22, 2009
    Messages:
    703
    Likes Received:
    0
    Can someone explain though why write performance (specifically RANDOM write) drops so dramatically with compression enabled? Are writes not being parallelized correctly in this scenario? Is my CPU (3.6 GHz E8400) holding me back?
     
  22. Emulex

    Emulex Diamond Member

    Joined:
    Jan 28, 2001
    Messages:
    9,759
    Likes Received:
    0
    depends is the app writing with threads? Is your drives thread safe? do you actually have NCQ?

    i see all 4 of my cores (q6600) pushing up to 30-40% so i'd have to disagree.
     
  23. Nothinman

    Nothinman Elite Member

    Joined:
    Sep 14, 2001
    Messages:
    30,672
    Likes Received:
    0
    Because the filesystem compresses data in chunks larger than the cluster size, so whenever a write is issued the system has to read in that chunk, decompress it, make the update, recompress it and write it back.
     
  24. ElenaP

    ElenaP Member

    Joined:
    Dec 25, 2009
    Messages:
    88
    Likes Received:
    0
    Typically, a compression unit on NTFS would be 16 clusters, 64KB. So the system has to recompress 64K even if you write one byte. There is another problem, more important, that compression coupled with random writes induces bad fragmentation.
    The compression unit is 16 clusters. If the original data was compressed into 10 clusters and after the write it is now 11 clusters, there is no way to contiguously store that block. It is not uncommon for a compressed file have upwards of 10,000 fragments after some use. In practice this affects e.g. email software databases.
     
  25. Nothinman

    Nothinman Elite Member

    Joined:
    Sep 14, 2001
    Messages:
    30,672
    Likes Received:
    0
    I wouldn't call that more important, the effects of file fragmentation is hugely overblown by developers of software to fix it.
     
  26. taltamir

    taltamir Lifer

    Joined:
    Mar 21, 2004
    Messages:
    13,578
    Likes Received:
    0
    fragmentation not that significant.
    but if a compression unit is 16 clusters that means random writes suffer from read-modify-write cycles just like an SSD without trim (but for different reasons of course).

    so to write 4kb you need to read 64 kb, decompress it, modify the data, recompress it, then write it back down as 64kb or more (more if it doesn't compress as well).

    The smallest 10,000 fragment file you can have is a 5,120,000 byte file (4.88MB using 1024 base conversion) where each every single one of its 10,000 fragments is non contiguous... Or a SIGNIFICANTLY larger file which is still extremely fragmented, both are patently ridiculous...

    a 64kb file has a MAX of 128 fragment (because it takes up exactly 128 sectors, since each sector is 512 bytes)
     
    #25 taltamir, Nov 21, 2010
    Last edited: Nov 21, 2010