K12 Public High School switch issues

Lexxon

Member
Nov 22, 2004
28
0
0
Hi all,

Probably not the most professional or even the best place to ask this, but I know there's some good minds on these forums.

I work part-time along with one full-time employee managing an entire Public School District. I'm still in college, and neither of us are 100% trained, but we manage with what we've got.

However, the most recent issue has us stumped.

At the high school, we've always had an on/off issue with some of the network switches. These are old Bay Networks BayStack switches, ranging from 350's to 450's (maybe even some 310s). For the most part they still work fine.

However, recently, after a day or especially a weekend, any number of these switches will simply stop working until they are power cycled. After a power cycle the switches work perfectly fine, but some of them do not even last a full 24 hours without simply not forwarding traffic anymore. The lights on the front remain on, but do not do anything. We use a program called "The Dude" to see which switches are down, and use the network map to power cycle the switches. After that everything works for the rest of the day.

We've switched around where the switch links go, etc., but it doesn't seem to make a difference. There are also fiber switches that connect each closet together (albeit only a 100-200Mbps fiber link, older tech), but these switches don't seem to be the issue. Either one of the core switches will go down or a different switch in the closets.

Is this just a sign of hardware failure, or maybe an overload of traffic? We have recently finally upgraded our internet connection from a shared T1 line with ~1500 users, to a dedicated Comcast Cable line for each building, connected through VPN. The issues have magnified since this upgrade, although they were still around before it.

Thanks in advance for any help!
 

cprince

Senior member
May 8, 2007
963
0
0
hmmm. It could be that there are computers on the network with bad NICs that cause the switches to error out. Could you log into the switch to see the specific error message?
 

Lexxon

Member
Nov 22, 2004
28
0
0
I look occasionally, but I rarely get any errors that are worth anything. I'll be looking more and more, though. It seems after a power cycle, though, that a lot of the cache is cleared. The one I just tried was a BayStack 310.

Are there any specific tools we could run during the day to track down bad NICs? We've tried a few with little luck, but we have a TON of old machines, nearing the 10 year mark, so bad hardware is very likely.
 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
What immediately comes to mind is spanning-tree. What is your network diatmeter and how big is your broadcast domain? Meaning how many switches are there and how are they connected and is therer any "daisy chaining" switches where a single access closet has multiple switches linked together.

Also has any thought been given to the root bridge of your network? Some good sound design principles could likely fix your problems. A lot of this can't be done in a forum but we can try to help.

Also I'd document all the switches and what software they are running and then search or call bay networks to see if you've got a bug.
 

jlazzaro

Golden Member
May 6, 2004
1,743
0
0
i dont see bad NICs causing any of the problems you mentioned above...

when these switches stop forwarding traffic, are you able to console into them? are you polling them for any useful SNMP information and are you logging to a central Syslog server? are you sure your network is completely loop-free?

as im sure you're aware, those switches are ANCIENT and have always had problems...
 

Lexxon

Member
Nov 22, 2004
28
0
0
Bay Networks got gobbled up by Nortel some time ago, I don't know how much they support the old switches. They're well out of warranty I'm sure.

There are some spanning tree options in the switches, this was all set up while I was still in middle school (!), all the original network architects are long gone. We have a new firewall we just put in, but the root switches have been the same for quite a while.

Generally the closets are 3-5 switches deep, daisy chained except for the fiber link to the server room. So generally an entire closet will simply go down, likely due to the fiber link switch in the closet going out (or, god willing, the main fiber switch)

It's a good challenge for a learning network admin such as myself, but times like this I wish I knew more about how this was built and how everything works. We do employ a consultant and could bounce some ideas off him (at a price).

In a perfect world we'd upgrade everything to gigabit regardless (a lot of our links are 10Mbps!), but in a city budget, that's not going to happen anytime soon.

Thanks for the tips as always!
 

spikespiegal

Golden Member
Oct 10, 2005
1,219
9
76
Since you don't have a lot of resources to trouble-shoot this you might consider updating a single core switch in each closet as a possible solution until you can afford to upgrade the whole mess. Or, simply upgrade one core switch in the buggiest closet just to see if it fixes the problem. That one switch upgrade might be enough to isolate down chain events.

I've used the 'one switch fix' in areas where even passive hubs were daisy changed to death and budgets were extremely limited.
 

WavingBlue75

Member
Feb 4, 2008
39
0
0
Get a hold of a port sniffer and see if you certain type of traffic creating havoc? I hate to say this but with the age of your equipment you could have multiple equipment failures and probably need to plan for significant replacement of your infrastructure.