ESXi slower upload speed...

mvbighead

Diamond Member
Apr 20, 2009
3,793
1
81
Currently running ESXi on an AMD 880 board with an X3 740 (unlocked to X4).

The loads are extremely light, not a lot of data going in and out, so throughput should be roughly wire speed I would figure.

Problem is, I've done some checking last night to get ideas and found the following few things:

1) My RAID5 array has a TERRIBLE write speed (roughly 10MBs). I installed a simple XP VM and ran the ATTO bench mark tool on it, and the best it got was about 10MBs. I then attached a 2TB single disk and ran the tool against that and got normal disk speed numbers (roughly 110MBs).

2) My network upload speed seems to be a max of 25MB/s. 1Gbps should get me close to 120MB/s. The downloads were at 70MB/s or so, which I figured to be fair enough.

At this point, I have tried to the two NICs (one Realtek and one Intel) and both achieve the same speed. This leads me to believe I either have an issue with the cable (which I am hesitant to believe) or a problem with the host software configuration.

As for #1, I need to figure out how to get my RAID controller to let me reconfigure the RAID array. At this point, I want to configure RAID 10 to get away from a parity based array on SATA disks. I still don't think the performance should be this bad on RAID5, but that's what I aim to test.

Any ideas? My next thought is to re-install HyperV 2012R2 and try from there. I'll do this after trying another cable or two, but I am pretty close to doing this anyway just to see.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
1) You don't mention the RAID controller you used. What is it?
 

mvbighead

Diamond Member
Apr 20, 2009
3,793
1
81
1) You don't mention the RAID controller you used. What is it?

Good point. I can't remember specifically. It's on the VMWare HCL. It's a four port PCI-x4 something or other.

It was LSI MegaRaid something or other maybe. I dunno, I've had it a while and I don't get any boot prompts for it when teh system boots.
 

Dahak

Diamond Member
Mar 2, 2000
3,752
25
91
sounds like it might be a raid issue, either controller, controller firmware, failing drive in the raid(I suspect this more) as everything going away from the raid seems ok

It was LSI MegaRaid something or other maybe. I dunno, I've had it a while and I don't get any boot prompts for it when teh system boots.

you may need to enable an option to allows viewing optional bios(bioses?) screens.

Or you can try just pressing the raid setup key, which is usually CRTL+M i believe
 

mvbighead

Diamond Member
Apr 20, 2009
3,793
1
81
I will say that will likely help with the RAID array, but the network upload would appear to be limited by something else.

I created a share on a drive not associated with the array and the upload was still at a max of about 25MB/s. The download from the single drive was 70ish, which I figure to be ok given any other overhead on the NIC (though I don't think there is 30-40 consumed elsewhere as most of the items just sit idle).

At this point, if I can get into the array, I might be able to get somewhere on that, but the bigger problem for me right now is network throughput.
 

Dahak

Diamond Member
Mar 2, 2000
3,752
25
91
Is that speed through a vm or direct from esxi?

if its the vm check the nic that the vm uses, i know you mentioned you tried an intel and a realtek nic, try it on the intel nic physical, and try the E1000 as the virtual nic instead of the vmx nic
 

mvbighead

Diamond Member
Apr 20, 2009
3,793
1
81
Is that speed through a vm or direct from esxi?

if its the vm check the nic that the vm uses, i know you mentioned you tried an intel and a realtek nic, try it on the intel nic physical, and try the E1000 as the virtual nic instead of the vmx nic

Yeah, I ran through all the NICs there. Flexible, E1000, VMXnet, etc. All the same.

I have also experienced similar issues with a FreeNAS VM on there. Same bad upload, download is ok but not great.

Found an article(http://community.spiceworks.com/top...ansfers-to-esxi-guest-windows-server-2008r2):
To recap, I determined it wasn't an ESXi specific issue as I had the same behavior in both ESXi 5 and HyperV. I was able to do some bench marks with the drive subsystem to determine it wasn't the bottleneck.

Here is what fixed it for me:
- Disable SMB2 (49MB/Sec vs 80+MB/sec)
- Disabled Chimney
- Turned NetDMA on in the BIOS

Might be giving some of this a go to see if it helps. Only thing is, I am guessing I might have to configure these settings via SSH, as I don't recall seeing them in the client. And if this is strictly at the VM level, I am not certain if this will or won't be the answer given the other VMs I have doing similar things.

Another guy mentions IRQ conflicts that resolved his issue, so I may be looking into that as well. Should be fun I am sure.
 
Last edited:

mvbighead

Diamond Member
Apr 20, 2009
3,793
1
81
RAID Controller is this: MegaRAID SAS 8344ELP

Firmware is way out of date, so updating that now.
 

mfenn

Elite Member
Jan 17, 2010
22,400
5
71
www.mfenn.com
What software are you using to test the network throughput?

As for the RAID card, what block size are you using on your RAID 5 array? What write caching setting are you using? Are you using adaptive read ahead? Do you have the read cache enable? Do you have the battery backup unit?
 

mvbighead

Diamond Member
Apr 20, 2009
3,793
1
81
At this point, I am gauging the speed with Atto from a disk perspective and using the Windows 7 transfer speed (IE - drag drop from local to share and read the numbers).

I've done this between Win7 to NAS and to XP machine, and the results are identical.

As for the controller, no idea. I haven't been able to get into the interface as it seems to be in disagreement with the board prior to the firmware upgrade. I just now got access to it.

I'm out of gas, so I'll either check in the morning or afternoon tomorrow to see if the RAID firmware helped at all.
 

mfenn

Elite Member
Jan 17, 2010
22,400
5
71
www.mfenn.com
Network file system tests are a really bad way to measure network bandwidth because there's too many extra variables that can impact performance. You should use a dedicated network testing tool like iperf that just uses plain old TCP, not touching the disk or interacting with any higher-level application stack.

Try to get into the RAID card's BIOS and look at the array details. MegaRAID cards can perform very well under ESXi, but must be properly configured.
 

mvbighead

Diamond Member
Apr 20, 2009
3,793
1
81
Well, I got it all reconfigured. I may be redoing it again, but for now have HyperV 2012 R2 running. Only thing is, managing via workgroup is a complete PAIN. I've not yet gotten the HyperV Manager tool to connect at all (I know I'll be missing certain functions due to Win7), so I tried five9 briefly that seemed to work.

As for the RAID, I configured to RAID0s and spanned them for RAID10. Trying to build a VM to benchmark.

I'll probably go back to Vmware after more testing. Just wanted to try a different platform and see what kinda of throughput I get.
 

mvbighead

Diamond Member
Apr 20, 2009
3,793
1
81
Well, I reconfigured the RAID and had installed HyperV2012R2, but my remote management abilities for that purely sucked. So I loaded up ESXi this morning.

Write speeds are now sitting at around 150, but read is stuck below 100. I definitely can feel the improvement in write speed, but would like to see more even numbers. I need to reinstall ESXi due to the Realtek NIC, so I'll rebuild the RAID one more time and make sure I consistently select 1MB. The initial config was weird as I had to build two separate arrays, and then span them. I am used to just selecting RAID10 and going from there. Meh, I am seeing improvement so that's at least going in the right direction.
 

mvbighead

Diamond Member
Apr 20, 2009
3,793
1
81
So no details on the RAID settings I asked about?

The allocation units were small. No battery, RAID cache wasn't enabled, etc.

This time around, I rebuilt the RAID array (likely for the last time). For RAID10, I get no options other than 64k. However, the subarrays gave me options, but this time around I set it all to 64k so everything was the same. I enabled RAID cache despite no battery as I have the server on battery backup and if I lose this array, I won't be all that upset about it.

I did intend to get details on the first array, but was just trying to get some progress going.

At this point, with RAID10 and cache enabled, I am getting 160ish read/write, which is good enough for me. I imagine I'd get 80ish write and roughly 240ish read if I went back to RAID5, but I don't need the space and would prefer not use a parity based array with SATA.
 

mfenn

Elite Member
Jan 17, 2010
22,400
5
71
www.mfenn.com
The allocation units were small. No battery, RAID cache wasn't enabled, etc.

This time around, I rebuilt the RAID array (likely for the last time). For RAID10, I get no options other than 64k. However, the subarrays gave me options, but this time around I set it all to 64k so everything was the same. I enabled RAID cache despite no battery as I have the server on battery backup and if I lose this array, I won't be all that upset about it.

I did intend to get details on the first array, but was just trying to get some progress going.

At this point, with RAID10 and cache enabled, I am getting 160ish read/write, which is good enough for me. I imagine I'd get 80ish write and roughly 240ish read if I went back to RAID5, but I don't need the space and would prefer not use a parity based array with SATA.

So if I'm reading between the lines correctly, your original array write cache was set to write-through? If so, that is probably why your array was slow. ESXi needs write-back caching mode in order to perform well.

EDIT: Also, parity-based RAID modes are safer than mirror-based modes on SATA drives. With parity-based RAID, the RAID controller can repair a bad sector using the parity information (or recompute the parity if the parity went bad). With a mirror-based RAID, it has no idea which side of the mirror has the correct data, so it has to just guess, resulting in corruption 50% of the time a read error is encountered.
 
Last edited:

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
EDIT: Also, parity-based RAID modes are safer than mirror-based modes on SATA drives. With parity-based RAID, the RAID controller can repair a bad sector using the parity information (or recompute the parity if the parity went bad). With a mirror-based RAID, it has no idea which side of the mirror has the correct data, so it has to just guess, resulting in corruption 50% of the time a read error is encountered.
If the array encounters a read error on drive A, reads from drive B, and does not encounter a read error, then that data replaces what was in that location on drive A, and is correct (as far as can be known, without adding checksums). Barring drive firmware errors, disk controller errors, or software errors higher up the chain (which are what the chesksumming would be for), the chance of such corruption should be 1 in 156TB, I'm pretty sure, with two-drive mirrors, and should not increase with more mirrors.

There would only be 50% chance of corruption in the case where you were already in the rare circumstance of one drive having data suffering from silent corruption, and the other drive not, because each drive wrote what it was told to just fine, and read it back just fine (the corruption chance remains the same for that sector, or set of sectors, regardless of whether a read error is encountered or not).

If the data is wrong, but a read error is not encountered, RAID 6 might be able to recover it, as one set of stripes could make for 0...but, only if the parity is verified on every read. With RAID 5, also a parity RAID, identifying the incorrect stripe requires either an implementation with CRCs, non-parity checking and correction codes, or for the drive to encounter a read error to identify the bad data, which is equally recoverable with a mirror.
 
Last edited: