• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

God I hate Solaris

sourceninja

Diamond Member
So can anyone explain this to me?

We are migrating SANS this weekend. So our first step is to move to our new fiber switch fabric. No card switching or san switching, just unplugging from the old switches and plugging into the new ones. Same zones and everything.

ESX handled this fine, Ubuntu linux handled this fine, even novell handled this fine. But not solaris 9 and 10.

So our solaris machines decided that they were going to change the targets, so instead of /dev/dsk/c4d3t1s7 its now /dev/dsk/c4d3t2s7. That's not a big deal, we just update the vfstab and reboot to make sure everything mounts itself.

Only it doesn't. Instead fsck fails on one of the volumes. Says it can't stat the device /dev/rdsk/c4d3t2s7. So I run fsck manually. Sure enough it finds a single superblock error and fixes it. I mount the partition manually and all the data is there and everything works fine. So I reboot.

Same problem only this time fsck finds no errors. I try tons of things, nothing fixes it, the volume simply will not mount on boot and errors out with the same fsck error.

Now the weird part is the solution. I can't explain this. Another admin decides to test something while I'm getting coffee. He comments out the line in the vfstab and reboots. The server starts fine because the drive is not being mounted. He then uncomments the same exact line and reboots again. This time everything mounts and works just fine.

I can't explain it. I rebooted 2 more times to be sure and it is fixed. Can anyone explain why this worked?
 
i've migrated a tons of hosts to new switches and never seen anything like this, although I added the new switches to the existing zones and then when i finished moving, i removed the old switch from the zone. Performed while everything was up... All devices remained the same...

i don't understand how the target changed, unless you're not using any form of multipathing, like mpxio or if you using Veritas's vxdmp.

If it did have to change to a new target or SAN controller, how you find them? reboot -- -r or /reconfigure or devdsadm?

Also why are your targets in c#t#d#s# just numbers. They should be 5000 #### ... as the target, the wwid of the controller or primary controller its attached to.
 
Well, the machine in question only has a single fiber card, no redundancy so we didn't think multipathing was required. I'm going to reevaluate that. That is why our targets are just numbers.

According to the guy from xiotech who helped setup our switches, the zones were copied over and it should of been transparent. In fact all other machines that were not solaris had no trouble and booted right up. So I never ran any kind of command to find the new target. The system was off, we copied the zones and moved the fiber connection. Booted it up and got errors that followed.
 
devfsadm should be able to find any new disks attached to a system without needing to do a reconfigure reboot. I have seen issues with fibrechannel cards depending on which drivers you are using. Sun has a webpage discussing exactly which driver you should be using depending on which card you are using (sorry, I have it bookmarked at work, not at home).
 
Back
Top