Dirty Cache Buffer Causing Novell Server To Crash. HELP!!!

Windogg

Lifer
Oct 9, 1999
10,241
0
0
I have a server that is driving me insane. Large file writes (400MB+) causes dirty cache buffers to go through the roof. Total cache buffers are 48,000. Without fail, as soon as dirty cache buffers hit 36,000, CPU utilization hits 100% and within 30 seconds the server hard locks and requires a cold reset.

The specs are as follows:
HP Netserver LH3
2 x 400Mhz Pentium II w/ 512K cache
256MB ECC PC100 SDRAM
4 x 18GB Quantum Atlas HDD
HP RAID Adapter (RAID5)
Intel PRO100 Management Ethernet Adapter
Adaptec 2490UW-SCSI
HP SureStor (Quantum) 20/40GB DLT Tape Backup
Novell NetWare v4.11
About 75 concurrent users.

Reads of all sizes are no problem and seem to put no load whatsoever on the server. The server also handles DHCP. I have tried disabling realtime anti-virus scans to see if it helps, it didn't.

Is the bottleneck at the RAID adapter? Would beefing up the CPUs to Xeons with 1MB+ cache help? Or is my only solution to have users break up their databases? I personally hate the HP servers and would prefer a nice Dell or self built unit. Please help before I commit Servercide.

Windogg
 

Xanathar

Golden Member
Oct 14, 1999
1,435
0
0
I would personally try dumping in a tad more memory if you are doing 400 meg writes and it has to keep 40 gigs cached.
 

Windogg

Lifer
Oct 9, 1999
10,241
0
0
Xanathar: How much more would you recommend? Any idea how another 256MB would help? The memory will give a larger cushion but is there anyway to reduce the dirty cache?

Thanks for the help.

Windogg
 

CTR

Senior member
Jun 12, 2000
654
0
0
Do you get any error messages? What does your abend.log say, if anything, about these recent lockups? Have you run dsrepair to check invalid directory services attributes for some of these files? Are all of the files' sizes being correctly reported?

More memory might not help your problem. Your CPU is hitting 100% when the OS goes to free up those dirty cache buffers, which should not happen. When you run out of cache buffers, the filesystem will slow waaaay down, but shouldn't jack your processor.

Have you ever "set upgrade low priority threads = on"? If so, that is bad. It can cause the backgroud processes such as file compression and cache cleanup to take up too much processor time. You ought to set that flag to off.

 

Windogg

Lifer
Oct 9, 1999
10,241
0
0
CTR: DSREPAIR indicated that the file system is OK. I'm gonna check the log later for what might be causing the problem. The two messgages that really stand out are "Cache Allocator Out of Memory" and "Short Term Memory Allocator Out Of Memory. xxx Attempts to Get More Memory Failed"

Someone suggested that I set Maximum Concurrent Disk Cache Writes to 1000. I checked the STARTUP.NCF and there is no line so I'll be giving that one a shot.

My understanding of Novell is pretty rudimentary so please bear with me for this one. Thanks for the help.

windogg
 

CTR

Senior member
Jun 12, 2000
654
0
0
Those messages are very typical for the situation you are describing with the large files being committed. So you probably do need more memory, but it might not fix your problem with the lockups.

Check your drivers for storage and NIC to make sure they are all up to date. Are you using a HAM driver or a DSK driver for the RAID controller? What service pack are you running?

If you are loading conlog, see if you can capture some of the other errors.
 

CTR

Senior member
Jun 12, 2000
654
0
0
Oh one more thing:

Since you are taking the server down anyway, go ahead and dismount your volumes and vrepair them. Make sure you have the latest vrepair.nlm on your c: partition, along with the appropriate name space files.
 

Xanathar

Golden Member
Oct 14, 1999
1,435
0
0
Oops, got lost in Diablo 2 and forgot where ive been posting. Windogg, If you are caching that much data you do have a bottleneck, Be it memory, harddrive, or processor. Due to the raid 5 the problem isnt the harddrives <or cant be econmocally fixed> And its dboutful your 100bt ethenernet is pushing it that hard. So the possibilty is a driver issue with the harddrives. Or Check your processor useage, if its higher then 60% continuously, time to pop in some faster models. Finally with memory Check to see how many slots you have open, If its 2 or more then start of with 1 128 meg ecc chip and see how it goes, if youve only got 1 slot thou, better just get the 256 right off the bat.

for testing purpos try writing iso files <600megs> and then also copying like an NT4 cd <thousands of little files> <the nt4 cd would simulate more of database read/writes>

For note, You should never have that many dirty cache buffers.

Definetely run vrepair <dont forget long file name files>
 

reality bites

Member
Mar 14, 2000
95
0
0
Definately update the SCSI driver and NIC driver and the service pack if not done already. Novell has a formula for RAM requirements on their knowledge base,I don't have time or I would send you a direct link, take a look. I highly doubt the processor speed is the problem.
 

Windogg

Lifer
Oct 9, 1999
10,241
0
0
Thanks for all the help people.

I'll call up the Novell experts at corporate and ask them about a service pack. My forte is NT and kinda lost on Novell. I'm getting better though.

Windogg