• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

NFS problem

Armitage

Banned
We recently replaced the head-node of our compute cluster but are having a strange NFS problem. Transfers from the head node (NFS server) to the nodes is very fast, but transfers from the nodes back to the NFS server are very slow. The following examples are for a 90MB file, while the cluster is idle network is switched gigabit over copper.

cp over NFS, server -> node:
real 0m0.844s
user 0m0.012s
sys 0m0.821s

cp over NFS, node -> server:
real 11m21.770s
user 0m0.013s
sys 0m0.938s

To make sure it wasn't a network or disk issue I used rcp to do the same tests:

rcp, server -> node:
real 0m0.833s
user 0m0.006s
sys 0m0.414s

rcp, node->server:
real 0m0.888s
user 0m0.005s
sys 0m0.157s

I goit similar results for scp. So it seems the NICS, cable, switch, disks are OK. I've also done th the same tests from different nodes with similar results.

Here is the node config:
fstab:
192.168.17.10:/home /home nfs rw,hard,sync,rsize=8192,wsize=8192,bg 0 0
192.168.17.10:/data /data nfs rw,sync,hard,rsize=8192,wsize=8192,bg 0 0
kernel:
X86: 2.6.11.4-21.9-smp #1 SMP Fri Aug 19 11:58:59 UTC 2005 i686 i686 i386 GNU/Linux
X86_64: 2.6.11.4-21.10-smp #1 SMP Tue Nov 29 14:32:49 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux
The only NFS package I found installed is nfs-utils-1.0.7-3

Here is the server config:
/etc/exports:
/data 192.168.17.0/255.255.255.0(rw,no_root_squash,sync)
/home 192.168.17.0/255.255.255.0(rw,no_root_squash,sync)
kernel:
2.6.11.4-21.9-smp #1 SMP Fri Aug 19 11:58:59 UTC 2005 x86_64 x86_64 x86_64 GNU/Linux
nfs-utils is the same revision also.

The head node is a dual opteron, scsi raid5, onboard GigE NICs, some of the nodes are dual xeons, others are dual opterons, both with IDE drives. Same problem exists on both node types. OS is SuSe, but I don't know exactly which version.

Any ideas?
 
Try changing sync to async - it'll be faster, at the expense of data integrity. The scp test you did more accurately reflects async behavior, because the file written on the server still goes through the usual buffered write strategy - it's not synchronously written directly to disc. Depending on what RAID controller you have this could also be set up to be totally write through when using NFS. Good for integrity, bad for performance. You should be able to trade one for the other.

If you use NFSv3 (the default these days), rsize and wsize are unnecessary and possibly counterproductive.

In general, with NFS, writes ARE slower than reads. The gap should not be as wide as you are experiencing, however.
 
Back
Top