- Apr 20, 2013
- 4,307
- 450
- 126
This week I added a second small SAN to my lab for backups/snapshots and rebuilt my primary SAN. I also switched out my fiber switch. I learned a few lessons in the process and thought I would share them on the off chance anybody else runs into them.
LUN ID's on Solaris based storage systems... My primary SAN has been running Solaris 11.2 with napp-it for some time now. It's been great. However, I don't need a GUI so the smaller footprint of OmniOS appealed to me. Since it's still Solaris 11 based, I wasn't expecting any surprises. Got everything installed, got fiber channel setup. Ran a rescan on my ESXI hosts, don't see the LUN's. I screwed around with it for over 2 hours. Triple checked my host groups, target groups, wwn's, etc, etc. Completely nuked the LUN and started over twice. I know the fiber connectivity is working because I can reboot into the HBA's BIOS, do a fiber scan and see the other SAN. So I sat there digging through manuals trying to find something to explain it. Finally found a little note on one of VMWare's manuals that it won't see LUN's with an ID number out of a certain range. Compared my Solaris SAN to my OmniOS SAN. Lo and behold, LUN ID's are handled differently. I left the ID to auto on both systems. On the Solaris 11.2 box, it auto assigned starting at 0. Meaning the first LUN had an ID of 0, second LUN had an ID of 1, and so on. On the OmniOS box, it uses a random number. 7962 in my case. This is outside the range ESXI will search for LUN's. Recreated the LUN and manually assigned ID 10 to it. Ran a re-scan and BAM, all is right with the world.
Reflashing LSI (or rebranded) controllers to IT mode.... This is a well known process and very well documented by the guys at ServeTheHome.com. However, I ran into an issue when doing it on my system. The motherboard in my primary SAN has an LSI SAS controller onboard. Assuming that might cause issues, I disabled it in the BIOS before doing the reflash. Followed the guide, no errors or indicators of issues. But controller was still showing as an IBM M1015. I thought maybe I missed something and did the reflash again. No joy. Swapped in the second controller, same result. I couldn't find any mention or indicators of any sort of write protect jumper on the card (plus, logically that would have cause the flash to error out), so I decided to try it in a different board. Worked perfectly. Reinstalled the now flashed M1015's into the original case. Launched the option ROM and to my surprise it shows 3x 9211's in the system. So, despite the onboard controller being disabled (as far as I could tell), the flash utility flashed the onboard controller instead of the add-on card. I don't have anything hooked up to the onboard controller, so no idea if it's still functional. Something to be aware of though.
Cisco MDS fiber switches.... I was already running a Brocade Silkworm 4100 but due to the complete lack of economical proper rack mounts I decided to replace it with a Cisco MDS 9134. Moved all my SFP's over. No communication between devices. Turns out Cisco configures the MDS's to only work with Cisco branded SFP's. So you can either buy a tool to reflash your existing SFP's with the Cisco ID or you can buy Cisco branded ones. I went with the latter route. Replaced all the SFP's, verified the switch recognizes all the SFP's now and that they are showing as connected at 4Gb/s. Still no communication. Turns out on the Cisco MDS you have to create at least 1 vSAN. By default it tosses them all in the default vSAN which blocks all connectivity.
Hopefully those tips will help somebody else in the future.
LUN ID's on Solaris based storage systems... My primary SAN has been running Solaris 11.2 with napp-it for some time now. It's been great. However, I don't need a GUI so the smaller footprint of OmniOS appealed to me. Since it's still Solaris 11 based, I wasn't expecting any surprises. Got everything installed, got fiber channel setup. Ran a rescan on my ESXI hosts, don't see the LUN's. I screwed around with it for over 2 hours. Triple checked my host groups, target groups, wwn's, etc, etc. Completely nuked the LUN and started over twice. I know the fiber connectivity is working because I can reboot into the HBA's BIOS, do a fiber scan and see the other SAN. So I sat there digging through manuals trying to find something to explain it. Finally found a little note on one of VMWare's manuals that it won't see LUN's with an ID number out of a certain range. Compared my Solaris SAN to my OmniOS SAN. Lo and behold, LUN ID's are handled differently. I left the ID to auto on both systems. On the Solaris 11.2 box, it auto assigned starting at 0. Meaning the first LUN had an ID of 0, second LUN had an ID of 1, and so on. On the OmniOS box, it uses a random number. 7962 in my case. This is outside the range ESXI will search for LUN's. Recreated the LUN and manually assigned ID 10 to it. Ran a re-scan and BAM, all is right with the world.
Reflashing LSI (or rebranded) controllers to IT mode.... This is a well known process and very well documented by the guys at ServeTheHome.com. However, I ran into an issue when doing it on my system. The motherboard in my primary SAN has an LSI SAS controller onboard. Assuming that might cause issues, I disabled it in the BIOS before doing the reflash. Followed the guide, no errors or indicators of issues. But controller was still showing as an IBM M1015. I thought maybe I missed something and did the reflash again. No joy. Swapped in the second controller, same result. I couldn't find any mention or indicators of any sort of write protect jumper on the card (plus, logically that would have cause the flash to error out), so I decided to try it in a different board. Worked perfectly. Reinstalled the now flashed M1015's into the original case. Launched the option ROM and to my surprise it shows 3x 9211's in the system. So, despite the onboard controller being disabled (as far as I could tell), the flash utility flashed the onboard controller instead of the add-on card. I don't have anything hooked up to the onboard controller, so no idea if it's still functional. Something to be aware of though.
Cisco MDS fiber switches.... I was already running a Brocade Silkworm 4100 but due to the complete lack of economical proper rack mounts I decided to replace it with a Cisco MDS 9134. Moved all my SFP's over. No communication between devices. Turns out Cisco configures the MDS's to only work with Cisco branded SFP's. So you can either buy a tool to reflash your existing SFP's with the Cisco ID or you can buy Cisco branded ones. I went with the latter route. Replaced all the SFP's, verified the switch recognizes all the SFP's now and that they are showing as connected at 4Gb/s. Still no communication. Turns out on the Cisco MDS you have to create at least 1 vSAN. By default it tosses them all in the default vSAN which blocks all connectivity.
Hopefully those tips will help somebody else in the future.
