LUN/Datastore Issue

XavierMace · Jul 30, 2014

Ran into an issue with my home lab over the weekend that has me a bit perplexed. My storage admin experience is extremely limited so feel free to point out if I missed something obvious.

Setup: 2x ESXI 5.5 Hosts connected to Solaris/napp-it based SAN running ZFS via Fiber. One host was powered off at the time of the event.

Situation: ESXI Hosts show massive datastore latency (as high as 15 SECONDS).

Details:

Setup has been running fine for some time now. UPS randomly powered off for about 20 seconds on Saturday then powered back on. Devices did not kick over to battery power. UPS had passed a self test the night before.

Once UPS came back up I powered up the SAN. SAN appeared to boot normally. Disk pool looked fine, LUN's look fine.

Powered the host back up. Host boots off local disks and booted with out issue. Started powering VM's back up. Some VM's booted back up but were basically non-responsive. Other VM's powered themselves off during boot process. vmKernel log shows connection to VMDK timed out. Checked performance tab which showed the above mentioned massive latency spikes. So I began troubleshooting as best as I could with my limited storage experience.

1) DD benchmark on the SAN itself shows normal throughput. So hardware appears to be working properly.

2) Manually kicked off a ZFS Scrub which completed and stated no errors were found. So ZFS seems to be intact.

3) Recreated one of the VM's from backup on the local storage on the host. VM is running perfectly with no latency, so host appears to be fine.

4) Powered on second host which had been shut down for the summer and therefore should have been unaffected by all this. Second host sees the same latency issues as the first one. So, again, host doesn't seem to be the problem.

5) Created new LUN on the SAN and presented it to the hosts. Downloaded VM off the old datastore onto the new datastore on the new LUN. No latency, VM is running normal. So again, hardware seems to be fine and this would seem to indicate the data is intact, at least partially.

6) Attempted to run VMDK check via SSH, timed out.

7) Dismounted and remounted the datastores, no change.

8) Used a VM recovery tool browse the contents of the VM's, proceeded to download files off the file server VM. Slow going, but the data again all appears to be intact.

So, I'm about out of ideas. In addition to the above steps, did the standard reboots, enable/disable on the fiber switch. All of which had no effect. Any ideas? I've got backups so I can nuke the LUN's and recreate them if that's where I'm at, I just don't like running into a problem I can't explain.

imagoon · Jul 30, 2014

Are you exporting NFS or iSCSI to the Hosts? If NFS, I would assume there is silent ZFS corruption someplace. I don't use ZFS much but it seems prone to these types of issues when power is lost because stuff was in RAM and not committed but the ZFS system is self stick checks ok. If you have ZFS snapshots you might be able to roll back and lose whatever was in done since that snap.

If iSCSI

--edit--
Actually start with vmkernel.log and see what the system is complaining about. You should see errors in the log that details what the time outs are in much more detail than the VMWare clients.

--edit off--
Start with VOMA:

http://pubs.vmware.com/vsphere-51/i...UID-6F991DB5-9AF0-4F9F-809C-B82D3EED7DAF.html

Perform a scan on the datastores themselves. They may be corrupted.

Your problem sounds a lot like the hosts are looking for something (a cluster for a vmdk or whatever), failing then timing out. I highly suggest looking at vmkernel.log and see what it is complaining about.

--edit--

VOMA is only read only. It main tells you "yup your boned, restore from backups" and doesn't attempt any recovery.

XavierMace · Jul 30, 2014

Neither, presenting LUN's directly to the Hosts over the fiber.

I had assumed there was data loss, but so far all the data I've pulled off the existing datastore's and put on the new one has been intact. Admittedly, it's not like I've gone file by file but I've mounted a few ISO's that were on the effected LUN's, re-registered one VM, and edited a couple of text files. They've all been fine. I'm by no means done moving data, but I'd think I should have seen some evidence of loss by now.

Like I said, vmkernel log shows it timed out reading the VMDK file.

VOMA also timed out trying to read the VMDK file.

imagoon · Jul 30, 2014

Sorry completely glossed over fiber. I assume you mean fiber channel. Remember that Ethernet can be fiber also. 10gig fiber to the hosts is pretty common.

So if the LUNs are being presented and voma is timing out, I would next inspect the SAN's logs. Time outs should be logged at least in everything that I have worked on does. I am not sure where ZFS itself logs errors but maybe there is info there also.

XavierMace · Jul 30, 2014

No worries. Fair point though, yes, we are talking fiber channel. 4x 4Gb connections on hosts (QLogic HBA's) and SAN connected through a Brocade Silkworm 4100.

I can't find any indication om the SAN of an issue. And like I said, I created a new LUN on the same physical storage (same ZFS pool) and it's fine. That would seem to indicate to me that the ZFS system is good. Or am I missing something here? Like I said, I'm pretty new to this.

imagoon · Jul 30, 2014

XavierMace said:
No worries. Fair point though, yes, we are talking fiber channel. 4x 4Gb connections on hosts (QLogic HBA's) and SAN connected through a Brocade Silkworm 4100.

I can't find any indication om the SAN of an issue. And like I said, I created a new LUN on the same physical storage (same ZFS pool) and it's fine. That would seem to indicate to me that the ZFS system is good. Or am I missing something here? Like I said, I'm pretty new to this.

Well my understanding of ZFS is that it keeps block in ram to cache, may do delayed writes etc. It is possible that all the blocks check good but data was lost since it wasn't flushed to disk. this can cause VMFS issues with block chains etc. Like I said check the log. Maybe the LUN is requesting a block and block error is coming back.

XavierMace · Aug 2, 2014

So my plan was to spend the week getting all the data off the LUN then blow it away and recreate it on the weekend. However as I've been copying the data off all week, I've been watching the latency stabilize. The average and idle latency is now back to just about normal and the high spikes are less frequent and less severe.

Is either ESXI or Solaris doing a background repair or does this make any sense to you guys.

imagoon · Aug 2, 2014

ESXi won't background repair unless told to. Solaris I have no idea.

gea · Aug 3, 2014

If this is a napp-it-one with a virtualized SAN under ESXi you need to know:
- ESXi 5.5 is buggy with e1000 (can be fixed with a nic setting, best: use vmxnet3)
- ESXi 5.5 U1 fix this issue but introduce a NFS stability bug (there is a fix from VMware)

In general:
use NFS (as it auto-reconnects, this is not the case with iSCSI)
use ESXi 5.5U1 with the NFS bugfix

and
for secure sync write (no dataloss on a powerloss), allow sync write
and add a fast ZIL logdevice to your pool like a Intel S3700

see
http://napp-it.org/downloads/index_en.html

imagoon · Aug 3, 2014

gea said:
If this is a napp-it-one with a virtualized SAN under ESXi you need to know:
- ESXi 5.5 is buggy with e1000 (can be fixed with a nic setting, best: use vmxnet3)

this is fixed, patch up the host and vmware tools

- ESXi 5.5 U1 fix this issue but introduce a NFS stability bug (there is a fix from VMware)

In general:
use NFS (as it auto-reconnects, this is not the case with iSCSI)

Huh? All of my iSCSI devices auto reconnect. I can reboot a switch and all the connections will reestablish without touching ESXi hosts.

use ESXi 5.5U1 with the NFS bugfix
and
for secure sync write (no dataloss on a powerloss), allow sync write
and add a fast ZIL logdevice to your pool like a Intel S3700

see
http://napp-it.org/downloads/index_en.html

FYI we use iSCSI because we see a nearly 10% increase is disk IO from the hosts to the same devices. I am lot less experienced with NFS and ZFS in general because neither provided us with a benefit really. Seems logical to sync writes. Just would kill performance in a lot cases. I'll keep my 16GB of nonvolatile write cache.

XavierMace · Aug 3, 2014

gea said:
If this is a napp-it-one with a virtualized SAN under ESXi you need to know:
- ESXi 5.5 is buggy with e1000 (can be fixed with a nic setting, best: use vmxnet3)
- ESXi 5.5 U1 fix this issue but introduce a NFS stability bug (there is a fix from VMware)

In general:
use NFS (as it auto-reconnects, this is not the case with iSCSI)
use ESXi 5.5U1 with the NFS bugfix

and
for secure sync write (no dataloss on a powerloss), allow sync write
and add a fast ZIL logdevice to your pool like a Intel S3700

see
http://napp-it.org/downloads/index_en.html

It's separate physical SAN, no iSCSI.

Would you happen to have a document on adding a ZIL logdevice? I've got a spare SSD laying around.

Edit: Latest update. Ok, confirmed there's at least some data corruption. The vCenter VM won't boot even after migrating to the new LUN, file system errors. Moved one of the DC's over and it's up and running but now I'm seeing higher than normal latency spikes on the new LUN. Ran HD Tune and the results were abysmal (170MB/s, 15ms avg latency). Just for giggles, ran dskchk on the DC, no errors found. So, it's looking like it's back to doing this the hard way and recreating VM's. At least it's usable on the new LUN for the time being. Just to be safe, I'm creating a new DC from scratch on the new LUN to confirm no latency spikes. I have one system already created from scratch on the new LUN but it's disk usage is almost non-existent so that may not be an accurate check.

If nothing else, it's been a learning experience. Now I just need to get a new UPS.

Edit 2: New DC on New LUN, 785MB/s, 0.1ms latency. That's more like it. Do a DC promo, make it the new primary and I can nuke and recreate the other.

gea · Aug 3, 2014

XavierMace said:
Would you happen to have a document on adding a ZIL logdevice? I've got a spare SSD laying around.

Adding a logdevice: napp-it menu Pools -Extend
Removing a logdevice: napp-it menu Disks - Remove

A ZIL Logdevice is used in Solaris when you enable sync write on a filesystem or when you disable writeback-cache on a iSCSI LU. It logs last 5s of writes that are not yet commited from your pool and restore this after a crash to keep filesystem contents valid, example a VMFS filesystem (ZFS itself is always consistent).

If you decide to use a dedicated ZIL device, you need a SSD or DRAM device with a SuperCap, very low latency and high IOPS write values. (One of the best of all in the market is a ZeusRAM, the Intel s3700 or a very modern SLC SSD) .

Do not use any old SSD without a SuperCap. Its not worth the efford.

Why you should use a dedicated ZIL with ESXi.
It keeps write values high while it offers a secure crash save write behaviour. Without your write performace can go down to 10% of the values that you have when you disable sync/ enable write back with the danger of a corrupted guest filesystem in ESXi aftert a crash.

XavierMace · Aug 3, 2014

How much space does it need? I've got a spare Agility 4 256Gb SSD.

gea · Aug 3, 2014

XavierMace said:
How much space does it need? I've got a spare Agility 4 256Gb SSD.

A Zil needs less than 10s of what is delivered as a write load over your network. Even on a 10 Gbe Network, about 8 GB is enough.

This is the reason why "the best of all - a ZeusRam" has only 8 GB.
To check if a SSD is a good ZIL, do a benchmark with sync=always vs sync=disabled
compare:some benches that I have done http://napp-it.org/doc/manuals/benchmarks.pdf

XavierMace · Aug 3, 2014

Good info, I'll check out those benchmarks.

That said, while it's unlikely I'll saturate the fibre connection, I do theoretically have 16Gb/s of bandwidth. So, to be safe, I'd probably want more than 8Gb.

imagoon · Aug 3, 2014

XavierMace, overall how well is napp-it working? I personally tend to stay away from anything that is built it on your own but I would consider it for UAT / Test environments.

XavierMace · Aug 4, 2014

I was running FreeNAS before but I decided I wanted Fibre Channel which FreeNAS doesn't support. That left me with basically one free option. Napp-it/Solaris.

napp-it itself has been great. The interface isn't the greatest but they made sure everything is there. It's primarily just an interface for Solaris' built in features, so I'm not sure I would say it's a built on your own thing. Most of my issues have been with Solaris itself. Oracle has their head so far up their ass that if I didn't already have the fibre hardware, I would have just said screw it and kept using FreeNAS. If you want to make a USB install disk, you need to have an existing Solaris install. I never did get it to actually install off USB, ended up having to track down a DVD burner and burn a DVD. Setting up a static IP address is the most ridiculously convoluted process I've ever seen.

If you are in the unlikely situation that you need Fibre support, it certainly gets the job done. If you don't, I'd go with something else. However, that's no fault of Napp-it, that's Solaris' fault.

imagoon · Aug 4, 2014

XavierMace said:
I was running FreeNAS before but I decided I wanted Fibre Channel which FreeNAS doesn't support. That left me with basically one free option. Napp-it/Solaris.

napp-it itself has been great. The interface isn't the greatest but they made sure everything is there. It's primarily just an interface for Solaris' built in features, so I'm not sure I would say it's a built on your own thing. Most of my issues have been with Solaris itself. Oracle has their head so far up their ass that if I didn't already have the fibre hardware, I would have just said screw it and kept using FreeNAS. If you want to make a USB install disk, you need to have an existing Solaris install. I never did get it to actually install off USB, ended up having to track down a DVD burner and burn a DVD. Setting up a static IP address is the most ridiculously convoluted process I've ever seen.

If you are in the unlikely situation that you need Fibre support, it certainly gets the job done. If you don't, I'd go with something else. However, that's no fault of Napp-it, that's Solaris' fault.

Ah ok. Solaris has always been "its own beast."

What I meant on "built it on your own" was also hardware. It just sounds like you had and plugged all these drives in and then used the OS to make ZFS pools.

Even ZFS is something I am a bit wary of because the idea is sound and it generally operates well but it seems the groups that were responsible for it (Sun and open) have started to splinter a bit. I think there is also some Oracle rights issues out there now. Sucks because if a group had picked it up and locked it in to play I could see it as a major player. Sadly it seems to be a curiosity that is losing ground as any innovations it brought get rolled more and more in to other systems.

gea · Aug 4, 2014

Beside the closed source ZFS development with Oracle Solaris 11, there is a very active community around the free OpenSource fork Illumos where a great number of former Sun engineers are working in enterprises like Joyent, Delphix or Nexenta or collected under the roof of openzfs.org.

They all aim to build a common ZFS feature set, not only on Solaris and its derivates but also on BSD, OSX and Linux.

If you are now looking at a Solaris based storage server (for me its hard to find a better option beside expensive systems like NetApp and Co), you may look at OmniOS as it offers a free Solaris derivate with newest ZFS features together with a commercial support option. For me (napp-it developer), one of the best current ZFS platforms.

XavierMace · Aug 4, 2014

imagoon said:
Ah ok. Solaris has always been "its own beast."

What I meant on "built it on your own" was also hardware. It just sounds like you had and plugged all these drives in and then used the OS to make ZFS pools.

Even ZFS is something I am a bit wary of because the idea is sound and it generally operates well but it seems the groups that were responsible for it (Sun and open) have started to splinter a bit. I think there is also some Oracle rights issues out there now. Sucks because if a group had picked it up and locked it in to play I could see it as a major player. Sadly it seems to be a curiosity that is losing ground as any innovations it brought get rolled more and more in to other systems.

Correct. Installed Solaris 11 onto an SSD, installed napp-it, then used napp-it to do most everything else. I can't fault ZFS or Solaris for this issue since, based on Gea's posts, I don't have a log drive which would have allowed it to recover nor do I have snapshots (no disk space) and am relying on file system level backups.

Going with Solaris 11 as the base meant I can run ZFS 34 which adds encryption.

Again, I think napp-it itself is great other than not being the most user friendly to get setup and running. Admittedly, with my setup that's like asking for a "How to build your own car from scratch for Dummies" book, but no question FreeNas was easier to setup. Solaris is also probably partly the cause for that. That said, the Solaris/napp-it combo is definitely more robust and with the FC setup, it's blazing fast for lower cost setup.

gea · Aug 4, 2014

Sun had developped ZFS 10 years ago for datacenter and big data use as an alternative to "use Windows NT everywhere". This is now perfect in this area if you look at a free option.

If you are coming from the soho/home area you are overhelmed about the options - even with a Web-GUI that helps to manage some of these features (ex embedded CIFS/NFS server, Comstar, Crossbow, Snaps, Scrubs).

No other OS beside Solaris and derivates offers a similar storage feature set "out of the box, managed by the makers of the core OS"

imagoon · Aug 4, 2014

I wouldn't go so far as to call it perfect. I am also pretty sure Solaris 11 isn't free. The issue to me is that Sun released it so now there are forks and then Oracle buttoned it all back up. What will tend to happen now is all the forks will fight to "be the best" like most of the projects do and then there will be Oracle ZFS which will basically be an industry standard. The forks will likely drift making support between the projects a pain etc.

Mean while all the ZFS benefits are disappearing because other OS's have most of it out of the box now.

XavierMace · Aug 4, 2014

imagoon said:
I wouldn't go so far as to call it perfect. I am also pretty sure Solaris 11 isn't free. The issue to me is that Sun released it so now there are forks and then Oracle buttoned it all back up. What will tend to happen now is all the forks will fight to "be the best" like most of the projects do and then there will be Oracle ZFS which will basically be an industry standard. The forks will likely drift making support between the projects a pain etc.

Mean while all the ZFS benefits are disappearing because other OS's have most of it out of the box now.

It's free.

http://www.oracle.com/technetwork/s...11/downloads/index.html?ssSourceSiteId=ocomen

Server 2k12 ReFS has come a long way to be sure, but it's still far inferior to ZFS IMO. Not saying ZFS is the be all end all, right for everybody option. I'd still stick with old fashioned RAID with BBWC for anything business critical.

But I completely agree on the issue with the forks.

imagoon · Aug 4, 2014

XavierMace said:
It's free.

http://www.oracle.com/technetwork/s...11/downloads/index.html?ssSourceSiteId=ocomen

Server 2k12 ReFS has come a long way to be sure, but it's still far inferior to ZFS IMO. Not saying ZFS is the be all end all, right for everybody option. I'd still stick with old fashioned RAID with BBWC for anything business critical.

But I completely agree on the issue with the forks.

Can you patch? RHEL gives the installs away free also, just sucks if you ever need to patch it.

It if it is (patchable) free I may need to mess with it again. Last time I messed with Solaris was for BMC Server Automation and BMC Remedy stuff.

XavierMace · Aug 5, 2014

imagoon said:
Can you patch? RHEL gives the installs away free also, just sucks if you ever need to patch it.

It if it is (patchable) free I may need to mess with it again. Last time I messed with Solaris was for BMC Server Automation and BMC Remedy stuff.

I haven't had to a major revision update, but I recall running through some updates when I first installed it.

LUN/Datastore Issue

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member