Linux OOM Killer ...

Armitage

Banned
Feb 23, 2001
8,086
0
0
What does it kill?
As I understand the OOM killer was introduced in some of the 2.4 kernels. It's job is to look for & kill certain processes when the system completely runs out of memory (physical + swap).

So, my question is, does this feature exist in the 2.4.17 kernel, and if so, is the identity of the OOM killer's victims logged anywhere?

I've got a situation here where an important process is dying unexpectedly when the machine is under very heavy memory load. I suspect the OOM killer.

Sometimes 1GB of ram just isn't enough...


 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Yes it's still around and yes it should log the process name and pid in the kernel logs (/var/log/kern.log on Debian systems), dmesg will print a log of recent kernel logs if there's no file it's going to.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Thanks, that's it.
I don't have root on the machine in question, but grepping dmesg confirmed that my process is falling victim to the oom killer :(
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
If you can talk to the maintainer of the box you may want to try 2.4.18 or even the -rmap patches, I believe Rik has made the OOM killer a good bit smarter in the -rmap patches.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Yea, we may have an "opportunity" to rebuild this machine soon anyway. But I doubt a new kernel will help this situation to much. We're going to have to balance the load out better, and maybe rework the memory model of the app somewhat. But it may be worth try the new kernel.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
We're going to have to balance the load out better, and maybe rework the memory model of the app somewhat. But it may be worth try the new kernel.

That's really the better solution, if your app is starving the system of memory you need to either fix the app or add more memory/swap.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
It's already got 1GB ram and 2GB swap. And alpha ram isn't cheap!
The situation hasn't been good for awhile. But recently two big cron jobs that used to run sequentially have started to overlap due to increased run size and other loads on the machine. That's what spiked it. So, spacing out the cron jobs a bit will solve the problem in the short term, but we really have a long term scalability problem.
Off to look at the code...