We recently replaced the head-node of our compute cluster but are having a strange NFS problem. Transfers from the head node (NFS server) to the nodes is very fast, but transfers from the nodes back to the NFS server are very slow. The following examples are for a 90MB file, while the cluster is idle network is switched gigabit over copper.
cp over NFS, server -> node:
real 0m0.844s
user 0m0.012s
sys 0m0.821s
cp over NFS, node -> server:
real 11m21.770s
user 0m0.013s
sys 0m0.938s
To make sure it wasn't a network or disk issue I used rcp to do the same tests:
rcp, server -> node:
real 0m0.833s
user 0m0.006s
sys 0m0.414s
rcp, node->server:
real 0m0.888s
user 0m0.005s
sys 0m0.157s
I goit similar results for scp. So it seems the NICS, cable, switch, disks are OK. I've also done th the same tests from different nodes with similar results.
Here is the node config:
fstab:
192.168.17.10:/home /home nfs rw,hard,sync,rsize=8192,wsize=8192,bg 0 0
192.168.17.10:/data /data nfs rw,sync,hard,rsize=8192,wsize=8192,bg 0 0
kernel:
X86: 2.6.11.4-21.9-smp #1 SMP Fri Aug 19 11:58:59 UTC 2005 i686 i686 i386 GNU/Linux
X86_64: 2.6.11.4-21.10-smp #1 SMP Tue Nov 29 14:32:49 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux
The only NFS package I found installed is nfs-utils-1.0.7-3
Here is the server config:
/etc/exports:
/data 192.168.17.0/255.255.255.0(rw,no_root_squash,sync)
/home 192.168.17.0/255.255.255.0(rw,no_root_squash,sync)
kernel:
2.6.11.4-21.9-smp #1 SMP Fri Aug 19 11:58:59 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux
nfs-utils is the same revision also.
The head node is a dual opteron, scsi raid5, onboard GigE NICs, some of the nodes are dual xeons, others are dual opterons, both with IDE drives. Same problem exists on both node types. OS is SuSe, but I don't know exactly which version.
Any ideas?
cp over NFS, server -> node:
real 0m0.844s
user 0m0.012s
sys 0m0.821s
cp over NFS, node -> server:
real 11m21.770s
user 0m0.013s
sys 0m0.938s
To make sure it wasn't a network or disk issue I used rcp to do the same tests:
rcp, server -> node:
real 0m0.833s
user 0m0.006s
sys 0m0.414s
rcp, node->server:
real 0m0.888s
user 0m0.005s
sys 0m0.157s
I goit similar results for scp. So it seems the NICS, cable, switch, disks are OK. I've also done th the same tests from different nodes with similar results.
Here is the node config:
fstab:
192.168.17.10:/home /home nfs rw,hard,sync,rsize=8192,wsize=8192,bg 0 0
192.168.17.10:/data /data nfs rw,sync,hard,rsize=8192,wsize=8192,bg 0 0
kernel:
X86: 2.6.11.4-21.9-smp #1 SMP Fri Aug 19 11:58:59 UTC 2005 i686 i686 i386 GNU/Linux
X86_64: 2.6.11.4-21.10-smp #1 SMP Tue Nov 29 14:32:49 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux
The only NFS package I found installed is nfs-utils-1.0.7-3
Here is the server config:
/etc/exports:
/data 192.168.17.0/255.255.255.0(rw,no_root_squash,sync)
/home 192.168.17.0/255.255.255.0(rw,no_root_squash,sync)
kernel:
2.6.11.4-21.9-smp #1 SMP Fri Aug 19 11:58:59 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux
nfs-utils is the same revision also.
The head node is a dual opteron, scsi raid5, onboard GigE NICs, some of the nodes are dual xeons, others are dual opterons, both with IDE drives. Same problem exists on both node types. OS is SuSe, but I don't know exactly which version.
Any ideas?