Automatically start service that crashed

Written by Robert -

There is a problem with a service that gets closed at random intervals. This service needs to run. The application itself is closed by the oom killer service that closes the application. I'm unable to find the cause of the issue at this moment, but it appears that the system is running out of memory. To figure out why this is happening, I created a script. This script runs every minute and checks if there is a PID running for perl. If that is not the case, it start's the service and sends the results of the free -m (free memory in MegaBytes) with the time and date to a log file.

#!/bin/bash
pidof /usr/bin/perl >/dev/null
# is there an instance of /usr/bin/perl running? Send the data to /dev/null

if [[ $? -ne 0 ]] ; then
        # If there isn't anything running, do the following:
        /etc/init.d/servicename start > /dev/null
        # Starts the servicename service
        echo "--------------" >> /var/log/servicename.txt
        echo "--------------" >> /var/log/servicename.txt
        echo "--------------" >> /var/log/servicename.txt
        # Creates three lines with - symbols to easily see the different blocks
        echo "Restart servicename Protocol: $(date)" >> /var/log/servicename.txt
        # Adds a line with the current date and time
        free -m >> /var/log/servicename.txt
        # Outputs the current
        tail -n 20 /var/log/messages >> /var/log/servicename.txt
        # Outputs the last 20 lines of the /var/log/messages to the servicename.txt log file. 
    # You can increase this number to get more lines in the log file. 
else
        echo "servicename was working fine: $(date)" >> /var/log/servicenameisfine.txt
fi


Keep in mind that this is not a solution to the problem but a way to minimize impact while searching for a possible problem in the custom written application that keeps crashing.
Here is an example of the output of the script, when the service isn't running:

--------------
--------------
--------------
Restart servicename Protocol: Thu Jun  2 15:52:01 CEST 2016
             total       used       free     shared    buffers     cached
Mem:          8002       6569       1432          0        367       3729
-/+ buffers/cache:       2472       5529
Swap:        10031         13      10018
Jun  2 15:37:01 web /usr/sbin/cron[26211]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:38:01 web /usr/sbin/cron[26244]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:39:01 web /usr/sbin/cron[26287]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:40:01 web /usr/sbin/cron[26311]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:41:01 web /usr/sbin/cron[26384]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:42:01 web /usr/sbin/cron[26412]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:43:01 web /usr/sbin/cron[26435]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:44:01 web /usr/sbin/cron[26490]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:45:01 web /usr/sbin/cron[26575]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:46:01 web /usr/sbin/cron[26636]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:47:01 web /usr/sbin/cron[26691]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:48:01 web /usr/sbin/cron[26748]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:49:01 web /usr/sbin/cron[26800]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:50:01 web /usr/sbin/cron[26836]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:51:01 web /usr/sbin/cron[26928]: (root) CMD (/root/servicenamestart.sh)
Jun  2 15:52:01 web /usr/sbin/cron[26970]: (root) CMD (/root/servicenamestart.sh)


Here is an example of the output of when the service is properly running:

servicename was working fine: Thu Jun  2 15:51:01 CEST 2016
servicename was working fine: Thu Jun  2 15:56:01 CEST 2016
servicename was working fine: Thu Jun  2 16:01:01 CEST 2016


After creating the script, make sure it's executable by using the chmod +x command. After making it executable you can add it to the crontab by invoking the crontab -e command. In this case I wanted to run this every 5 minutes.

*/5 * * * * /root/servicenamestart.sh


Depending on the distribution, the cron service might have to be restarted. Because the server this script runs on is using SuSe Enterprise Server, the service cron has to be restarted by invoking the command service cron restart

Comments