add article to reboot linux on panic on hung processes
Signed-off-by: Nico Schottelius <nico@kr.ethz.ch>
This commit is contained in:
parent
0b3c275516
commit
3b8a1339ed
1 changed files with 48 additions and 0 deletions
|
@ -0,0 +1,48 @@
|
||||||
|
[[!meta title="Reboot Linux if task blocked for more than n seconds"]]
|
||||||
|
|
||||||
|
If you've run into the situation that your Linux box does not respond
|
||||||
|
to ssh anymore and you want it to reboot, because some processes are
|
||||||
|
taking away all the system resources, this article may be for you.
|
||||||
|
|
||||||
|
The usual message that can be seen on the console of such a system is
|
||||||
|
|
||||||
|
INFO: task java:4242 blocked for more than 120 seconds.
|
||||||
|
|
||||||
|
According to
|
||||||
|
[cateee.net/](http://cateee.net/lkddb/web-lkddb/BOOTPARAM_HUNG_TASK_PANIC.html)
|
||||||
|
the panic on hung feature was added to Linux as of 2.6.30.
|
||||||
|
Looking at **kernel/hung_task.c**, around lines 96-99 and 105-106, Linux 2.6.35:
|
||||||
|
|
||||||
|
96 printk(KERN_ERR "INFO: task %s:%d blocked for more than "
|
||||||
|
97 "%ld seconds.\n", t->comm, t->pid, timeout);
|
||||||
|
98 printk(KERN_ERR "\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
|
||||||
|
99 " disables this message.\n");
|
||||||
|
[...]
|
||||||
|
105 if (sysctl_hung_task_panic)
|
||||||
|
106 panic("hung_task: blocked tasks");
|
||||||
|
|
||||||
|
We can see that if the sysctl_hung_task_panic is true (!=0),
|
||||||
|
the system will panic. A system that panic'ed isn't of much
|
||||||
|
use for me either (similar to hanging), thus I would like to
|
||||||
|
reboot it.
|
||||||
|
|
||||||
|
Setting up the sysctl **kernel.panic** to a value greater than
|
||||||
|
0 tells the kernel to reboot after that amount of seconds after
|
||||||
|
a panic.
|
||||||
|
|
||||||
|
Furthermore the default timeout after a task is considered hanging
|
||||||
|
is 120 seconds, which my users would like increase to 5 minutes.
|
||||||
|
Thus the full setup to make a Linux system reboot after a process
|
||||||
|
hung for 300 seconds, triggered through a panic is
|
||||||
|
|
||||||
|
# Reboot 5 seconds after panic
|
||||||
|
kernel.panic = 5
|
||||||
|
|
||||||
|
# Panic if a hung task was found
|
||||||
|
kernel.hung_task_panic = 1
|
||||||
|
|
||||||
|
# Setup timeout for hung task to 300 seconds
|
||||||
|
kernel.hung_task_timeout_secs = 300
|
||||||
|
|
||||||
|
|
||||||
|
[[!tag eth unix]]
|
Loading…
Reference in a new issue