Lazy Linux Admins Going to Server Rooms Less: Forced Reboot, Auto Reboot after Kernel Panic and Email Notification after Reboot
Posted on In Linux, Network, TutorialHaving to go the the server room to reset servers is the most headache thing for admins managing a cluster of Linux servers in a remote site. Either you can ping the server but can not ssh to it, or you even can not ping it. There are various reasons that may cause a Linux server crash or fail to be connected to by SSH. The most common two from my experience are: there may be a bad behaving progress that use up almost all physical memory and swap or there may be a kernel panic. In this post, I describe several techniques I learned to make myself go to the server room less by dealing with these kinds of failures.
Table of Contents
Force Linux to reboot even you could not start a shell via SSH
If the server is too busy, creating the shell via SSH may also fail even though sshd is alive. Some times, you get lucky that you can remotely execute some commands by ssh directly. You may try to make use of the magical SysRq to force Linux to restart.
ssh root@server_home \
'echo 1 > /proc/sys/kernel/sysrq; echo b > /proc/sysrq-trigger'
Reference: Force Linux to reboot.
After this command, if you find your server disappear from the network, it may be rebooting itself. Wait for a while and it may come back.
Make Linux reboot automatically after a kernel panic
Some times, you get bad luck that there is a kernel panic. Almost everything including the network stop working and you can not connect to the server any more. That is not good but may not be too bad if we did some home work before by configuring Linux to reboot itself after kernel panics.
Linux has a nice feature that reboots itself after a timeout if a kernel panic happened. Usually, it is disabled. We can turn it on as we are lazy system admins. It can be enabled by setting the kernel.panic
kernel parameter.
For a running system:
# echo 20 >/proc/sys/kernel/panic
Here, 20 is the number of seconds before the kernel reboots. 0 means this feature is disabled.
To make the configuration persistent, you have at least 2 choices:
- add the kernel parameter
panic=20
to your bootloader (grub or grub2). - add
kernel.panic = 20
to /etc/sysctl.conf .
I prefer the second method that writes the configuration to /etc/sysctrl.conf.
For more details, please check How to make Linux automatically reboot after a kernel panic.
Email notifications after Linux reboot
Auto reboot is good. It will be better that the server also notifies the admins after a reboot. The technique discussed at How to email admins automatically after Linux server starts makes the server send email notifications after reboots.
It makes use of the @reboot
cron jobs and mailx
by adding an entry like
@reboot date | mailx -S smtp=smtp://smtp.example.com -s "`hostname` started" -r zma@example.com zma@example.com
For sending emails, you may either https://www.systutorials.com/sending-email-using-mailx-in-linux-through-internal-smtp/ or https://www.systutorials.com/sending-email-from-mailx-command-in-linux-using-gmails-smtp/.
simply hire a intern and let him do these headache works
Interns could be more productive.
What if the remote root ssh is disabled for security purpose ? How do you reboot remotely ?
Shouldn’t these critical server have remote power management tools like ILO by HP, DRAC by DELL.
I think remote power management tools are the best options in such conditions.
The tip here is for server management via ssh. Of course a piece of hardware independent of the software in the server is more reliable.