ZFS SAN Performance, Part 2

XavierMace

Diamond Member
Apr 20, 2013
4,307
450
126
I'm starting a new thread on this as my last one got completely derailed. I got the latency issue figured out, one bad port on the SAN's HBA. With that out of the way, everything is running rock solid so now I'm on a quest to find ways to improve performance without breaking the bank. Boot drive was cloned over to a new SSD and I added a second SSD as a ZIL (both 128Gb Crucial MX100's). Here's the latest disk benchmark from a VM running off the SAN via FC. Note, this is with 3 other VM's running on the SAN and a large file copy in progress:

Code:
-----------------------------------------------------------------------
CrystalDiskMark 3.0.3 x64 (C) 2007-2013 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   302.183 MB/s
          Sequential Write :   307.771 MB/s
         Random Read 512KB :   280.154 MB/s
        Random Write 512KB :   281.427 MB/s
    Random Read 4KB (QD=1) :    18.600 MB/s [  4541.1 IOPS]
   Random Write 4KB (QD=1) :    13.446 MB/s [  3282.6 IOPS]
   Random Read 4KB (QD=32) :   239.555 MB/s [ 58485.1 IOPS]
  Random Write 4KB (QD=32) :   200.424 MB/s [ 48931.6 IOPS]

  Test : 1000 MB [C: 23.8% (14.2/59.7 GB)] (x5)
  Date : 2014/10/18 21:20:35
    OS : Windows Server 2012 Datacenter (Full installation) [6.2 Build 9200] (x64)

Based on the above, I believe I've reached the limit of my HP Smart Array P410 as it runs SATA drives at 3Gb/s speeds. With that in mind, do those numbers look right? They seem low to me, but that might just be unrealistic expectations on the current controller. I'll be replacing it with a pair of LSI 9211-8i's shortly, do you guys think that should bump it up a bit. Refresher on the storage config:

Solaris 11.2 with napp-it.
2x Xeon X5550
48Gb RAM
128Gb SSD Boot
128Gb SSD Zil
HP P410 in HBA mode
8x 2TB Seagate 7200rpm SATA drives
4x4Gb/s FC paths to host

What else can you guys see that I can do to boost performance without breaking the bank? $1k Cache SSD's need not apply. Spindle count is obviously a factor as well, and that's the reason for the second controller so I can work on adding another set of disks in the future.
 

Jovec

Senior member
Feb 24, 2008
579
2
81
Without me speaking to hardware, you can look here to see tests with various raid levels with up to 24 drives and then target a drive number and raid level that suits your needs (read vs write vs balanced).
 
Last edited:

imagoon

Diamond Member
Feb 19, 2003
5,199
0
0
My first question would be what is the write:read ratio. Typical best case for 7200 RPM SATA IOPs is 75-100 per disk (as a general rule of thumb.) Since you are seeing way more than that, I can see that the ZIL is having a pretty decent effect. Can you disable the SSD cache and rerun the test? If you are getting 600-800 IOPs at the higher QD then you are disk limited and not controller limited. 3GB/s per SATA channel is over 350MB/s which only the SSD would have a prayer of answering.

I also consider the SSD boot disk a complete waste of an SSD. If anything add it to the zPool as another Zil.

Your numbers don't really convince me yet that you have overloaded the controller. Since the sequential read and write are not up to 350MB/s yet. Since you are running in HBA mode, the onboard chip is likely barely coming out of nap mode since it doesn't have any work to do.
 

gea

Senior member
Aug 3, 2014
244
18
81
Your performance values are not too bad under load.
The question is now, whether you want to tune your setup for a slight performance boost or if you need a much better overall performance.

In the first case you need to find bottlenecks:
- disable sync and recheck (optionally buy a better ZIL like a Intel S3700)
- compare a SSD pool (need a faster pool or pool layout)
- compare local benches (need a faster network)

A faster HBA can help like more RAM or a L2ARC but I would not expect a huge difference. If you need a much faster pool, use SSD only pools.
 

XavierMace

Diamond Member
Apr 20, 2013
4,307
450
126
Without me speaking to hardware, you can look here to see tests with various raid levels with up to 24 drives and then target a drive number and raid level that suits your needs (read vs write vs balanced).

Yeah, I had seen that, unfortunate for RaidZ2 setups they skip from 6 to 10 drives and my read/write speeds are below that.

My first question would be what is the write:read ratio. Typical best case for 7200 RPM SATA IOPs is 75-100 per disk (as a general rule of thumb.) Since you are seeing way more than that, I can see that the ZIL is having a pretty decent effect. Can you disable the SSD cache and rerun the test? If you are getting 600-800 IOPs at the higher QD then you are disk limited and not controller limited. 3GB/s per SATA channel is over 350MB/s which only the SSD would have a prayer of answering.

I also consider the SSD boot disk a complete waste of an SSD. If anything add it to the zPool as another Zil.

Your numbers don't really convince me yet that you have overloaded the controller. Since the sequential read and write are not up to 350MB/s yet. Since you are running in HBA mode, the onboard chip is likely barely coming out of nap mode since it doesn't have any work to do.

Pretty heavy on the read side.

I agree on the IOPS but I guess I was expecting more than 300MB/s on an 8 disk setup. Based on the stats in the link Jovec posted, that doesn't seem like an unreasonable expectation. I realize from the VM side the IOPS are the bigger concern, but one of the VM's is my file server and I wouldn't mind a bit better read/write speeds.

I didn't want to waste a 3.5" bay and port on the SAS backplane for the boot drive and I had zero luck getting Solaris to install on a USB thumb drive. So it was either go with a cheap crappy 2.5" spindle or a cheap SSD. For $70 I just went with the SSD.

Your performance values are not too bad under load.
The question is now, whether you want to tune your setup for a slight performance boost or if you need a much better overall performance.

In the first case you need to find bottlenecks:
- disable sync and recheck (optionally buy a better ZIL like a Intel S3700)
- compare a SSD pool (need a faster pool or pool layout)
- compare local benches (need a faster network)

A faster HBA can help like more RAM or a L2ARC but I would not expect a huge difference. If you need a much faster pool, use SSD only pools.

Realistically, would a 100Gb S3700 make a NOTICEABLE difference in performance over the MX100? SSD pool is way out of budget. This is just a home lab setup. The SAN's got 16Gb/s worth of FC connectivity, so I don't think that's the problem. I'm at 48Gb of RAM with 16TB of storage, would more RAM actually make a noticeable change?

It's not that I'm necessarily unsatisfied with the performance I'm getting, I just wanted to make sure I'm getting the performance I should be getting and if there's any cost effective (meaning cheap) changes I could make for a noticeable improvement, mostly on the throughput side. I know the IOPS is limited by rotation speed and number of spindles and more spindles is on the to-do list.
 

gea

Senior member
Aug 3, 2014
244
18
81
Realistically, would a 100Gb S3700 make a NOTICEABLE difference in performance over the MX100?
.

performance wise: yes
https://b3n.org/ssd-zfs-zil-slog-be...500-seagate-600-pro-crucial-mx100-comparison/

But you do not need to speculate: disable sync and redo a benchmark
(check especially small random writes)

related SSD pools:
One option is to use your 128 GB SSDs as a mirror for critical VMs
and the spindle pool for the rest + any 32GB+ Bootdisk

related RAM
Check arcstat. More RAM can help if your arcstat hitrate is below say 80%.
With 48 GB RAM I would give the Solaris Storage VM 16-32 GB (depends on other VMs)
 
Last edited:

XavierMace

Diamond Member
Apr 20, 2013
4,307
450
126
performance wise: yes
https://b3n.org/ssd-zfs-zil-slog-be...500-seagate-600-pro-crucial-mx100-comparison/

But you do not need to speculate: disable sync and redo a benchmark
(check especially small random writes)

related SSD pools:
One option is to use your 128 GB SSDs as a mirror for critical VMs
and the spindle pool for the rest + any 32GB+ Bootdisk

related RAM
Check arcstat. More RAM can help if your arcstat hitrate is below say 80%.
With 48 GB RAM I would give the Solaris Storage VM 16-32 GB (depends on other VMs)

Yes, I'm aware of that benchmark. But what does that actually look like real world? Am I actually going to notice the difference? I could say I can increase your video cards performance by 20% by overclocking. But if I'm just increasing your framerate from 80fps to 96fps, odds are in real world usage you aren't going to notice the difference.

Disabling sync and re-benching would show me the performance difference between no ZIL and my current but that's not what we are looking at.

I will check arcstat, but Solaris isn't a VM. This is a dedicated physical box doing nothing but storage duties so it has access to the whole 48Gb. I have two separate ESXI hosts.
 

imagoon

Diamond Member
Feb 19, 2003
5,199
0
0
Disabling sync and re-benching would show me the performance difference between no ZIL and my current but that's not what we are looking at.

No. It will tell you if the ZIL is the performance hang up for the large sequential tests. ZFS has some syncing rules where the entire disk group will be slowed to make sure the ZIL is synced to the spinning disks. ZIL also can add CPU time to reads as the file system may need to "build" your read request from the ZIL and magnetic at the same time.
 

gea

Senior member
Aug 3, 2014
244
18
81
Yes, I'm aware of that benchmark. But what does that actually look like real world? Am I actually going to notice the difference? I could say I can increase your video cards performance by 20% by overclocking. But if I'm just increasing your framerate from 80fps to 96fps, odds are in real world usage you aren't going to notice the difference.

Disabling sync and re-benching would show me the performance difference between no ZIL and my current but that's not what we are looking at.

I will check arcstat, but Solaris isn't a VM. This is a dedicated physical box doing nothing but storage duties so it has access to the whole 48Gb. I have two separate ESXI hosts.

If you use sync writes (example ESXi over NFS), your write performance is limited by your ZIL. If you compare with sync=disabled, you will know about the possibility of your pool. With a good ZIL (needs low latency and write performance should stay high under load), both values can be quite the same (at least over your network). Without or with a slow ZIL the difference can be huge (down to 10-20%).

Arcstat is not related to virtualisation. It gives informations about the Arc and L2Arc cache hit/miss rate. If this is high (> 90%) additional RAM won't increase performance. If it quite low, additional RAM as readcache can help (This depends on workload.). With 48 GB RAM on a dedicated box, I would not expect that this is a bottleneck.
 

XavierMace

Diamond Member
Apr 20, 2013
4,307
450
126
No. It will tell you if the ZIL is the performance hang up for the large sequential tests. ZFS has some syncing rules where the entire disk group will be slowed to make sure the ZIL is synced to the spinning disks. ZIL also can add CPU time to reads as the file system may need to "build" your read request from the ZIL and magnetic at the same time.

Ok, misunderstood what he was getting at.

If you use sync writes (example ESXi over NFS), your write performance is limited by your ZIL. If you compare with sync=disabled, you will know about the possibility of your pool. With a good ZIL (needs low latency and write performance should stay high under load), both values can be quite the same (at least over your network). Without or with a slow ZIL the difference can be huge (down to 10-20%).

I'm presenting the RAW LUN's over FC, no NFS.

Arcstat is not related to virtualisation. It gives informations about the Arc and L2Arc cache hit/miss rate. If this is high (> 90%) additional RAM won't increase performance. If it quite low, additional RAM as readcache can help (This depends on workload.). With 48 GB RAM on a dedicated box, I would not expect that this is a bottleneck.

I know Arcstat is not related to virtualization, I was just just clarifying the SAN isn't virtualized. I will check it however.