Linux networking help needed: cannot SSH to another box

spanky

Lifer
Jun 19, 2001
25,716
4
81
Hi guys,

Please let me know if this isn't the correct forum to post. After the forum makeover, I get lost sometimes. Also, I'm not a linux guru by any means, so please excuse my novice terminology.

Here is a rundown of some servers i have at my disposal:

NEC1 - 10.10.10.100
NEC2 - 10.10.10.101
Dell - 10.10.10.102
DB - 10.10.10.22
FYI, 10.10.10.x is a private network with no DHCP (all IPs are statically assigned).

NEC1, NEC2, and Dell are fresh installs of RedHat EL4 ES Update4. When I installed NEC1, it was sitting on another DHCP-enabled LAN (10.1.127.x). After RH was installed, I was able to SSH, SCP, and ping to and from other servers without any issues. Then I brought NEC1 over to the private LAN (10.10.10.x). I changed my "/etc/sysconfig/network-scripts/ifcfg-eth0" file from:

DEVICE=eth0
BOOTPROTO=dhcp
HWADDR=xx:xx:xx:xx:xx:xx
ONBOOT=yes
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes


to

DEVICE=eth0
BOOTPROTO=none
BROADCAST=10.10.10.255
HWADDR=xx:xx:xx:xx:xx:xx
IPADDR=10.10.10.101
NETMASK=255.255.255.0
NETWORK=10.10.10.0
ONBOOT=yes
TYPE=Ethernet
USERCTL=no
IPV6INIT=no
PEERDNS=yes


I used the "/etc/sysconfig/network-scripts/ifcfg-eth0" file from the DB server as a template. The DB server has been up on the network for a while, and I have not had any difficulties with it before. Thus, I assume its good and used its ifc-cfg-eth0 file for a template.

After I changed the ifcfg-eth0 file from NEC1, I did a "service network restart". NEC1 can ping DB, and DB can ping NEC1. NEC1 cannot SSH to DB, but DB can SSH into NEC1. After a long delay, the following is the error msg I get when I try to SSH from NEC1 to DB:

ssh_exchange_identification: read: Connection reset by peer

When I installed RH, I disabled the firewall. I still checked to see if it was up, by running "service iptables status". It returms "Firewall is stopped".

I googled the error msg, but I wasn't able to make much of it. I thought maybe I screwed up something with the networking, so I brought up NEC2. I installed RH on that box, and it never touched the 10.1.127.x network. As soon as the OS was installed, I stuck a static IP on him, but it everything was exactly the same as NEC1.

Then I remembered that I had installed RH EL4 ES Update4 on the Dell server the day before. This server was installed on the DHCP-enabled 10.1.127.x network. Everything has been working normally. I changed him over to the 10.10.10.x network. I had expected it to also have problems SSH'ing to other server, but I was wrong. The Dell server actually worked fine. Apparently, there is something different between Dell and NEC1/NEC2, but I just don't know what it is.

So to summarize, here is what's going on:

1. NEC1, NEC2, and Dell all have a fresh install of RH EL4 ES Update4
2. All three servers have statically assigned IPs on a private network
3. NEC1 and NEC2 have network connectivity, but they cannot SSH to other servers. Dell works fine.

I'm trying to figure out why NEC1/NEC2 cannot SSH out. I'm pretty much stumped. Has anyone ever encountered an issue like this before? Can someone give me some suggestions on how I should proceed here? Thanx fellas :sun:



Edit #1 - I originally had this problem last Thursday and Friday. Today, I am still unable to SSH from NEC1/NEC2, but its not giving me the error "ssh_exchange_identification: read: Connection reset by peer" anymore. Rather, it just hangs at the command line, waiting and waiting... :confused:



Edit #2 - See this pic

with the putty window on the left, Dell is ssh'ing into NEC1.

with the putty window on the right, NEC2 is trying to ssh into NEC1, but it's hanging. does anyone know why that might be?
 

nweaver

Diamond Member
Jan 21, 2001
6,813
1
0
reboot the entire box?

Perhaps the ssh server is still stuck listening on the wrong (DHCP'ed) IP.

netstat -anp | grep LISTEN to see where the ssh daemon is listening.
 

nweaver

Diamond Member
Jan 21, 2001
6,813
1
0
also, unless I missed it, you have not configured a default route OUT of the network (although that shouldn't matter with any local/same subnet machines)
 

spanky

Lifer
Jun 19, 2001
25,716
4
81
hi nweaver,

Originally posted by: nweaver
reboot the entire box?

Perhaps the ssh server is still stuck listening on the wrong (DHCP'ed) IP.

netstat -anp | grep LISTEN to see where the ssh daemon is listening.

i'll give that a shot, thanx for the suggestion :)



Originally posted by: nweaver
also, unless I missed it, you have not configured a default route OUT of the network (although that shouldn't matter with any local/same subnet machines)

sorry about the confusion. is it absolutely necessary to have a default route out of the network?

i was told to basically have this 10.10.10.x isolated, so that some benchmarking tests could be run. right now, there is a default gateway that is set to some bogus IP ("route add default gw 10.10.10.x").

 

spanky

Lifer
Jun 19, 2001
25,716
4
81
Originally posted by: nweaver
reboot the entire box?

Perhaps the ssh server is still stuck listening on the wrong (DHCP'ed) IP.

netstat -anp | grep LISTEN to see where the ssh daemon is listening.


hi again nweaver,

as for what port the ssh daemon is listening... shouldn't that only affect other servers trying to ssh into my box?

the following is what i get after i reboot and run netstat:

[root@expsvr1 ~]#
[root@expsvr1 ~]#
[root@expsvr1 ~]# netstat -anp | grep LISTEN
tcp 0 0 0.0.0.0:111 0.0.0.0:* LIST
tcp 0 0 10.10.10.100:1521 0.0.0.0:* LIST
tcp 0 0 127.0.0.1:631 0.0.0.0:* LIST
tcp 0 0 127.0.0.1:25 0.0.0.0:* LIST
tcp 0 0 0.0.0.0:826 0.0.0.0:* LIST
tcp 0 0 :::22 :::* LIST
unix 2 [ ACC ] STREAM LISTENING 7480 3489/xfs /tm
unix 2 [ ACC ] STREAM LISTENING 9771 3642/gdm-binary /tm
unix 2 [ ACC ] STREAM LISTENING 9855 4334/X /tm
unix 2 [ ACC ] STREAM LISTENING 7073 3318/acpid /va
unix 2 [ ACC ] STREAM LISTENING 7615 3544/tnslsnr /va
unix 2 [ ACC ] STREAM LISTENING 7617 3544/tnslsnr /va
unix 2 [ ACC ] STREAM LISTENING 7939 3598/dbus-daemon-1 /va
unix 2 [ ACC ] STREAM LISTENING 7339 3424/gpm /de
[root@expsvr1 ~]#
[root@expsvr1 ~]#
[root@expsvr1 ~]#



Edit - sorry about the formatting up there. here is a screenshot

 

cleverhandle

Diamond Member
Dec 17, 2001
3,566
3
81
Just to clarify, do you have iptables stopped on both machines? It sounds to me like a firewalling issue.

Failing any obvious answers there, I would load up a protocol analyzer like ethereal/wireshark on each end of the connection and try to see what's going on that way.
 

spanky

Lifer
Jun 19, 2001
25,716
4
81
Originally posted by: cleverhandle
Just to clarify, do you have iptables stopped on both machines? It sounds to me like a firewalling issue.

Failing any obvious answers there, I would load up a protocol analyzer like ethereal/wireshark on each end of the connection and try to see what's going on that way.

yup, the firewall is off on all my RH boxes. i verified this by running "service iptables status", and it returns "Firewall is stopped".

ethereal sounds like a good idea, thanx :thumbsup:
 

nweaver

Diamond Member
Jan 21, 2001
6,813
1
0
go into your sshd config and hard set the address there, and then restart your ssh server. It almost looks like your ssh is listening on IPv6 only?
 

spanky

Lifer
Jun 19, 2001
25,716
4
81
Originally posted by: nweaver
go into your sshd config and hard set the address there, and then restart your ssh server. It almost looks like your ssh is listening on IPv6 only?

hi nweaver,

should i be editting the ssh_config or the sshd_config file? i am thinking that the problem lies on the client side, because:

1. NEC1 cannot ssh into DB
2. Dell can ssh into DB
3. DB can ssh into NEC1
4. Dell can ssh into NEC1

is there a particular reason why you asked me to edit the sshd_config file?
 

spanky

Lifer
Jun 19, 2001
25,716
4
81
See this pic

with the putty window on the left, Dell is ssh'ing into NEC1.

with the putty window on the right, NEC2 is trying to ssh into NEC1, but it's hanging. does anyone know why that might be?
 

cleverhandle

Diamond Member
Dec 17, 2001
3,566
3
81
Is SSH protocol 2 enabled in /etc/ssh_config for expsvr2? Have you tried copying over the ssh_config from qalabrh to expsvr via flash/floppy (after backing up the original)? It's got to be something client side...
 

spanky

Lifer
Jun 19, 2001
25,716
4
81
Originally posted by: cleverhandle
Is SSH protocol 2 enabled in /etc/ssh_config for expsvr2? Have you tried copying over the ssh_config from qalabrh to expsvr via flash/floppy (after backing up the original)? It's got to be something client side...

hi cleverhandle,

here is something kind of strange. on expsvr2 (NEC2), i edited "/etc/ssh_config" so that only protocol 2 is enabled (removed protocol 1). this did not appear to have any affect on my original problem.

however... if i go to any server that NEC1/NEC2 could not ssh to (ie - DB) and edit "/etc/sshd_config" to only use protocol 2, then i am able to ssh in from NEC/NEC2. this is really strange to me, becuz i dont understand how this is a server side issue. but... it works, so i am befuddled.

basically, here are the steps that i took:

1. in any server that i am failing to ssh into (from NEC1/NEC2), edit "/etc/sshd_config" to only use protocol 2.
2. restart sshd

something else that i dont understand... if i leave "/etc/sshd_config" as default on the server side, if i want to force protocol 2 from the client side, i should be able to do so with "ssh -2 10.10.10.22". but when i run this command, it still fails (same problem as in the original post). so right now, the only way i know how to get around this issue is to only enable protocol 2 support on the server side. does this make sense to anyone?
 

cleverhandle

Diamond Member
Dec 17, 2001
3,566
3
81
It does seem rather odd. There's supposed to be an auto-negotiation kind of process going on - the connection should use the highest protocol the two endpoints mutually support. It sounds like in your case that something's going wrong in the negotiation process. Could be a bug due to different versions perhaps. Also, is this OpenSSH all around? Maybe one of the machines is using the proprietary SSH for some reason?

In any event, it's rather a moot point. There's no good reason to leave SSH protocol 1 enabled on a server anyway. So just set all your servers to use protocol 2 only and be done with it.