What would cause the load average to skyrocket on a server even though total cpu usage is very low? Ex:
This is my NFS server which I've always had performance issues with and I can't figure out why. Was in middle of watching a movie and it just died on me because it can't keep up with the stream. Trying to figure out if some kind of job is running but there's nothing that jumps out at me even though the load is crazy high.
Is there any way to troubleshoot this further to figure out what's hitting the system so hard?
I really need to redo this setup one of these days, it's always had crap performance. Going to try iSCSI next time as NFS has been nothing but trouble, it's too damn slow.
This is the output of nfsstat but that output is too ugly to really make much of it and I don't really know what any of it means, but it might help?
Code:
top - 22:14:46 up 500 days, 9:45, 2 users, load average: 7.04, 8.96, 8.37
Tasks: 395 total, 1 running, 394 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 1.4%sy, 0.0%ni, 98.3%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8031548k total, 7902432k used, 129116k free, 186680k buffers
Swap: 8175612k total, 5060k used, 8170552k free, 6841072k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20923 p2puser 20 0 102m 2096 984 S 1.7 0.0 1:26.39 sshd
11686 root 20 0 3932 236 192 S 1.3 0.0 6603:02 netresolv
2458 root 20 0 0 0 0 S 0.3 0.0 316:43.62 kondemand/0
2465 root 20 0 0 0 0 S 0.3 0.0 85:49.09 kondemand/7
20924 p2puser 20 0 57492 2560 1652 S 0.3 0.0 0:21.98 sftp-server
28677 root 20 0 15300 1496 944 R 0.3 0.0 0:00.21 top
1 root 20 0 19356 536 292 S 0.0 0.0 0:01.11 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.06 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 1:41.90 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 6:06.15 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/0
6 root RT 0 0 0 0 S 0.0 0.0 0:32.29 watchdog/0
7 root RT 0 0 0 0 S 0.0 0.0 0:25.89 migration/1
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/1
9 root 20 0 0 0 0 S 0.0 0.0 3:11.19 ksoftirqd/1
10 root RT 0 0 0 0 S 0.0 0.0 0:24.29 watchdog/1
11 root RT 0 0 0 0 S 0.0 0.0 0:14.96 migration/2
12 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/2
13 root 20 0 0 0 0 S 0.0 0.0 4:57.36 ksoftirqd/2
14 root RT 0 0 0 0 S 0.0 0.0 0:19.71 watchdog/2
15 root RT 0 0 0 0 S 0.0 0.0 1:05.47 migration/3
16 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/3
17 root 20 0 0 0 0 S 0.0 0.0 3:49.03 ksoftirqd/3
18 root RT 0 0 0 0 S 0.0 0.0 0:21.47 watchdog/3
19 root RT 0 0 0 0 S 0.0 0.0 2:42.98 migration/4
20 root RT 0 0 0 0 S 0.0 0.0 0:00.00 stopper/4
21 root 20 0 0 0 0 S 0.0 0.0 3:38.70 ksoftirqd/4
22 root RT 0 0 0 0 S 0.0 0.0 0:22.77 watchdog/4
23 root RT 0 0 0 0 S 0.0 0.0 3:16.96 migration/5
This is my NFS server which I've always had performance issues with and I can't figure out why. Was in middle of watching a movie and it just died on me because it can't keep up with the stream. Trying to figure out if some kind of job is running but there's nothing that jumps out at me even though the load is crazy high.
Is there any way to troubleshoot this further to figure out what's hitting the system so hard?
I really need to redo this setup one of these days, it's always had crap performance. Going to try iSCSI next time as NFS has been nothing but trouble, it's too damn slow.
This is the output of nfsstat but that output is too ugly to really make much of it and I don't really know what any of it means, but it might help?
Code:
Server rpc stats:
calls badcalls badclnt badauth xdrcall
258600560 0 0 0 0
Server nfs v3:
null getattr setattr lookup access readlink
369 0% 13037471 0% 264994 0% 2745731 0% 6513496 0% 4190 0%
read write create mkdir symlink mknod
2924090640 64% 1604801168 35% 105677 0% 11540 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
43677 0% 8259 0% 71880 0% 0 0% 493 0% 339331 0%
fsstat fsinfo pathconf commit
195677 0% 697 0% 339 0% 22489 0%
Server nfs v4:
null compound
3 0% 1486640 99%
Server nfs v4 operations:
op0-unused op1-unused op2-future access close commit
0 0% 0 0% 0 0% 57826 1% 8543 0% 334 0%
create delegpurge delegreturn getattr getfh link
7 0% 0 0% 8518 0% 1016757 27% 324285 8% 0 0%
lock lockt locku lookup lookup_root nverify
0 0% 0 0% 0 0% 315992 8% 0 0% 0 0%
open openattr open_conf open_dgrd putfh putpubfh
8543 0% 0 0% 4 0% 0 0% 1481601 39% 0 0%
putrootfh read readdir readlink remove rename
3 0% 464118 12% 334 0% 0 0% 2 0% 0 0%
renew restorefh savefh secinfo setattr setcltid
4761 0% 0 0% 0 0% 0 0% 19 0% 2 0%
setcltidconf verify write rellockowner bc_ctl bind_conn
2 0% 0 0% 68232 1% 0 0% 0 0% 0 0%
exchange_id create_ses destroy_ses free_stateid getdirdeleg getdevinfo
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
getdevlist layoutcommit layoutget layoutreturn secinfononam sequence
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
set_ssv test_stateid want_deleg destroy_clid reclaim_comp
0 0% 0 0% 0 0% 0 0% 0 0%