God I hate Solaris

sourceninja · Mar 6, 2009

So can anyone explain this to me?

We are migrating SANS this weekend. So our first step is to move to our new fiber switch fabric. No card switching or san switching, just unplugging from the old switches and plugging into the new ones. Same zones and everything.

ESX handled this fine, Ubuntu linux handled this fine, even novell handled this fine. But not solaris 9 and 10.

So our solaris machines decided that they were going to change the targets, so instead of /dev/dsk/c4d3t1s7 its now /dev/dsk/c4d3t2s7. That's not a big deal, we just update the vfstab and reboot to make sure everything mounts itself.

Only it doesn't. Instead fsck fails on one of the volumes. Says it can't stat the device /dev/rdsk/c4d3t2s7. So I run fsck manually. Sure enough it finds a single superblock error and fixes it. I mount the partition manually and all the data is there and everything works fine. So I reboot.

Same problem only this time fsck finds no errors. I try tons of things, nothing fixes it, the volume simply will not mount on boot and errors out with the same fsck error.

Now the weird part is the solution. I can't explain this. Another admin decides to test something while I'm getting coffee. He comments out the line in the vfstab and reboots. The server starts fine because the drive is not being mounted. He then uncomments the same exact line and reboots again. This time everything mounts and works just fine.

I can't explain it. I rebooted 2 more times to be sure and it is fixed. Can anyone explain why this worked?

degibson · Mar 7, 2009

Originally posted by: sourceninja
I can't explain it. I rebooted 2 more times to be sure and it is fixed. Can anyone explain why this worked?

Gnomes.

crontab · Mar 7, 2009

i've migrated a tons of hosts to new switches and never seen anything like this, although I added the new switches to the existing zones and then when i finished moving, i removed the old switch from the zone. Performed while everything was up... All devices remained the same...

i don't understand how the target changed, unless you're not using any form of multipathing, like mpxio or if you using Veritas's vxdmp.

If it did have to change to a new target or SAN controller, how you find them? reboot -- -r or /reconfigure or devdsadm?

Also why are your targets in c#t#d#s# just numbers. They should be 5000 #### ... as the target, the wwid of the controller or primary controller its attached to.

sourceninja · Mar 7, 2009

Well, the machine in question only has a single fiber card, no redundancy so we didn't think multipathing was required. I'm going to reevaluate that. That is why our targets are just numbers.

According to the guy from xiotech who helped setup our switches, the zones were copied over and it should of been transparent. In fact all other machines that were not solaris had no trouble and booted right up. So I never ran any kind of command to find the new target. The system was off, we copied the zones and moved the fiber connection. Booted it up and got errors that followed.

Fallen Kell · Mar 8, 2009

devfsadm should be able to find any new disks attached to a system without needing to do a reconfigure reboot. I have seen issues with fibrechannel cards depending on which drivers you are using. Sun has a webpage discussing exactly which driver you should be using depending on which card you are using (sorry, I have it bookmarked at work, not at home).

God I hate Solaris

sourceninja

Diamond Member

degibson

Golden Member

crontab

Member

sourceninja

Diamond Member

Fallen Kell

Diamond Member

TRENDING THREADS