How to get notifications if a SystemD unit fails

I just noticed that my backups, triggered by a SystemD timer, failed (a few times in a row) …and that got me thinking.

Why don’t SystemD (and similar) errors show in the System Notifications?

How can I have them show as either notifications or a list of failed units (e.g. as a plasmoid in the panel)?

1 Like

Can be done with a bash script

#!/bin/bash

set -x

cd $XDG_RUNTIME_DIR
echo -n > systemd-status.old

while true; do
    systemctl status --failed > systemd-status
    systemctl --user status --failed >> systemd-status
    
    if ! cmp -s systemd-status{,.old}; then
        notify-send --app-name=systemd-monitor "$(cat systemd-status)"
    fi
    
    mv -f systemd-status{,.old}
    sleep 5
done
2 Likes

That is pretty cool.

I was thinking something more KDE-ish and persistent perhaps.

But if no-one has a better suggestion, I might start off with the script, thanks.

I just found @tubbadu’s Scriptinator and I hope I can figure out how to use that to do what I want.

To be brutally honest, I am a bit lost in the help of it. I will poke it around a bit if I can figure it out :slight_smile:

1 Like

Let’s see if this works:

#!/bin/bash

set -x

cd $XDG_RUNTIME_DIR
echo -n > systemd-status.old

systemctl status --failed > systemd-status
systemctl --user status --failed >> systemd-status
 
if ! cmp -s systemd-status{,.old}; then
    # notify-send --app-name=systemd-monitor "$(cat systemd-status)"
    echo "{PlasmoidStatusStart}attention{PlasmoidStatusEnd}"
    echo "{PlasmoidIconStart}(dialog-error){PlasmoidIconEnd}"
    echo "{PlasmoidTooltipStart}"
    cat systemd-status
    echo "{PlasmoidToolTipEnd}"
fi
    
mv -f systemd-status{,.old}

edit: fixed the script … I think … no I didn’t, at least it does not seem to work with Scriptinator

wow someone is using my plasmoids! :smiley:
you can use the Scriptinator hidden in the system tray (you have to enable it by rightclicking on the tray > configure system tray > entries > Scriprinator and set it to “Show when relevant”. Then you can use the script suggested above (perhaps without the while loop), adding the echo {PlasmoidStatusStart}insert new status here{PlasmoidStatusEnd}:

#!/bin/bash

set -x

cd $XDG_RUNTIME_DIR
echo -n > systemd-status.old

systemctl status --failed > systemd-
systemctl --user status --failed >> systemd-status

if ! cmp -s systemd-status{,.old}; then
    notify-send --app-name=systemd-monitor "$(cat systemd-status)"
    echo "{PlasmoidStatusStart}active{PlasmoidStatusEnd}"
    echo "{PlasmoidIconStart}dialog-error{PlasmoidIconEnd}"
else
    echo "{PlasmoidStatusStart}passive{PlasmoidStatusEnd}"
    echo "{PlasmoidIconStart}dialog-ok{PlasmoidIconEnd}"
fi

mv -f systemd-status{,.old}

save this script somewhere and copy its location (let’s say /home/hook/Documents/systemd-status-scriptinator.sh). then open the system tray and right-click on Scriptinator > configure scriptinator. Let’s say you want to check for systemd errors every 30 seconds, and at startup.
set “Init script” and “periodic script” both to bash /home/hook/Documents/systemd-status-scriptinator.sh (or whatever path your script is). Set the timeout to 30 (or to whatever time you wish) then apply and click OK, and you should be done! Scriptinator will run the script every 30 seconds, checking for systemd error. If errors are found, it will appear in your panel with a red error icon, and it will instead stay quiet hidden in the tray with a “no problem” icon if no error is found. You can optionally set it to trigger the script also on click, so if you’re trying to solve the problem you don’t have to wait 30 seconds to understand if you fixed it or not

hope this helps! feel free to ask anything!

(disclaimer: I haven’t tested the script, it may not work as intended)
(disclaimer 2: I wrote scriptinator in my free time, so it has some problems and may not always work as it should. If you find any problem, feel free to report it and I’ll fix them as soon as I can work on it)

3 Likes

make sure that dialog-error has no brackets (don’t do (dialog-error)), as scriptinator will only take what’s inside the {PlasmoidStatus___} tags and place it as icon, without verifying its existence (although it may be a cool feature, I can add it in the future)

echo "{PlasmoidStatusStart}attention{PlasmoidStatusEnd}"

setting it to attention will make it pulse forever, until a new status is set. You may want to add, in the OnClick script, a way to make it stop pulsing (for instance setting its status to active, so that it will still be visible, but without pulsing)

1 Like

Thanks!

Scriptinator seems to work, but something in the script seems not to trigger the change. I just triggered the backup unit and it failed, but Scriptinator still shows a :white_check_mark: even after I click it. (I did set it up at init, periodic and on-click).

can you please try to run the script inside a terminal to see what the output is? So we can understand if the problem is the script or scriptinator itself

I think the main problem is that the script cleans up after itself. The script only reports the error the first time it was ran after a SystemD unit failed.

So if I run it directly after the unit fails, I get:

+ cd /run/user/1000
+ echo -n
+ systemctl status --failed
+ systemctl --user status --failed
+ cmp -s systemd-status systemd-status.old
+ echo '{PlasmoidStatusStart}attention{PlasmoidStatusEnd}'
{PlasmoidStatusStart}attention{PlasmoidStatusEnd}
+ echo '{PlasmoidIconStart}(dialog-error){PlasmoidIconEnd}'
{PlasmoidIconStart}(dialog-error){PlasmoidIconEnd}
+ echo '{PlasmoidTooltipStart}'
{PlasmoidTooltipStart}
+ cat systemd-status
× borgmatic.service - borgmatic backup
     Loaded: loaded (/etc/systemd/system/borgmatic.service; static)
     Active: failed (Result: exit-code) since Wed 2023-09-27 16:21:23 CEST; 12s ago
TriggeredBy: ● borgmatic.timer
    Process: 30288 ExecStartPre=sleep 1m (code=exited, status=0/SUCCESS)
    Process: 30503 ExecStart=systemd-inhibit --who=borgmatic --what=sleep:shutdown --why=Prevent interrupting scheduled backup /usr/bin/borgmatic --verbosity -2 --syslog-verbosity 1 (code=exited, status=1/FAILURE)
   Main PID: 30503 (code=exited, status=1/FAILURE)
        CPU: 803ms

sep 27 16:21:23 leza borgmatic[30504]: CRITICAL /etc/borgmatic/config.yaml: An error occurred
sep 27 16:21:23 leza borgmatic[30504]: CRITICAL backupserver: Error running actions for repository
sep 27 16:21:23 leza borgmatic[30504]: CRITICAL Remote: ssh: connect to host xmarksthespot.wheremymonkeyis.at port 22111: No route to host
                                       Connection closed by remote host. Is borg working on the server?
sep 27 16:21:23 leza borgmatic[30504]: CRITICAL Command 'borg create --exclude-from /etc/borgmatic/excludes --exclude-caches --exclude-if-present .nobackup --info ssh://backup@xmarksthespot.wheremymonkeyis.at/./leza::{hostname}-{now:%Y-%m-%dT%H:%M:%S.%f} /etc /home /root/.borgmatic' returned non-zero exit status 2.
sep 27 16:21:23 leza borgmatic[30504]: CRITICAL
sep 27 16:21:23 leza borgmatic[30504]: CRITICAL Need some help? https://torsion.org/borgmatic/#issues
sep 27 16:21:23 leza systemd-inhibit[30503]: /usr/bin/borgmatic failed with exit status 1.
sep 27 16:21:23 leza systemd[1]: borgmatic.service: Main process exited, code=exited, status=1/FAILURE
sep 27 16:21:23 leza systemd[1]: borgmatic.service: Failed with result 'exit-code'.
sep 27 16:21:23 leza systemd[1]: Failed to start borgmatic backup.
+ echo '{PlasmoidToolTipEnd}'
{PlasmoidToolTipEnd}
+ mv -f systemd-status systemd-status.old

But any time after that I get only this:

+ cd /run/user/1000
+ echo -n
+ systemctl status --failed
+ systemctl --user status --failed
+ cmp -s systemd-status systemd-status.old
+ echo '{PlasmoidStatusStart}passive{PlasmoidStatusEnd}'
{PlasmoidStatusStart}passive{PlasmoidStatusEnd}
+ echo '{PlasmoidIconStart}dialog-ok{PlasmoidIconEnd}'
{PlasmoidIconStart}dialog-ok{PlasmoidIconEnd}
+ mv -f systemd-status systemd-status.old

(@jinliu had a different approach in mind when he wrote it and as stand-alone his worked fine. It’s my fault I am not skilled enough to figure out how to do it otherwise.)

gothca, the problem as you say is that the script is done to throw a notification once when a new error is detected, and then wait for a new error. What we’re trying to achieve is instead a way to detect when the error is raised, and then keep the error icon until… well, until you notice it and fix it, I guess. We can do something like this:

  • at the beginning, an empty systemd-status.old file is created
  • periodically, it checks if the new systemd-status file is different from the old one
    • if nothing changed, it means no errors were raised, so it can just exit (leaving the same icon as before)
    • if it is different, then a new error appeared. we will then set the icon to the error one and the status to attention, until you (for example) click on it.
  • when you click on it, you are saying “all errors happened until now are now fixed”. So when clicked it will move the systemd-status file to .old, meaning that it will start listening for other errors. It should then set the “no error” icon now, and set the status to passive (or active if you still want to see it in the tray)

would this solution be good for you?

1 Like

Exactly!

Your suggestion sounds really good.

I think maybe only the following:

…would instead mean no new errors were raised. But I think the logic still works as intended.

yeah exactly!

so you can set up scriptinator to work like this:

init script: create an empty systemd-status.old:

cd $XDG_RUNTIME_DIR && echo -n > systemd-status.old

periodic script: check for new errors, and change scriptinator status if found:

cd $XDG_RUNTIME_DIR

systemctl status --failed > systemd-
systemctl --user status --failed >> systemd-status

if ! cmp -s systemd-status{,.old}; then
    echo "{PlasmoidStatusStart}active{PlasmoidStatusEnd}"
    echo "{PlasmoidIconStart}dialog-error{PlasmoidIconEnd}"
else
    # do nothing
fi

onClick script: set scriptinator status to no-error and reset the systemd-status.old to the current situation

cd $XDG_RUNTIME_DIR
systemctl status --failed > systemd-status.old
systemctl --user status --failed >> systemd-status.old

echo "{PlasmoidStatusStart}passive{PlasmoidStatusEnd}" # set to active if you want the icon not to hide in the tray
echo "{PlasmoidIconStart}dialog-ok{PlasmoidIconEnd}"

hope this works!

1 Like

Does not seem to work, I’m afraid.

There was an error in the second script, so I commented out 10th (else) line. I also s/systemd-/systemd-status in the 4th line.

Even then it does not seem to trigger Scriptinator.

Is this how it should be?

Progress!

But then a new problem was that it triggers every time, because when it runs systemctl status --failed it records the timestamp when it ran. So it was always different from the old.

So I changed it to use the less chatty systemctl --failed instead.

While I was at it, I ran it past shellcheck --shell sh to make it work in sh too.

scriptinator_is_systemd_ok.sh is now:

#!/bin/sh

cd "$XDG_RUNTIME_DIR" || exit

systemctl --failed > systemd-status

if ! cmp --quiet systemd-status systemd-status.old
then
    echo "{PlasmoidStatusStart}attention{PlasmoidStatusEnd}"
    echo "{PlasmoidIconStart}system-error{PlasmoidIconEnd}"
    echo "{PlasmoidTooltipStart}"
    systemctl --failed | sed -e '/^$/,$d'
    echo "{PlasmoidTooltipEnd}"
# else
    # do nothing
fi

and scriptinator_systemd_is_ok_now.sh:

#!/bin/sh

cd "$XDG_RUNTIME_DIR" || exit

systemctl --failed > systemd-status.old

echo "{PlasmoidStatusStart}passive{PlasmoidStatusEnd}" # set to active if you want the icon not to hide in the tray
echo "{PlasmoidIconStart}system{PlasmoidIconEnd}"
echo "{PlasmoidTooltipStart}SystemD is running fine.{PlasmoidTooltipEnd}"

I also changed the settings to have “custom tooltip” enabled, and now it works pretty much like I want it to :smiley:

Thank you both, @tubbadu and @jinliu !

For anyone else trying to set it up, this are the Scriptinator settings:

3 Likes

As a side note, if you tend to keep a session of htop running, you can monitor systemd units.

2 Likes

Thanks! I tried your scripts. It works fine except some minor issues:

  1. My system doesn’t have “system” and “system-error” icons, so I use “system-run” and “computer-fail-symbolic” instead.
  2. I don’t see the “SystemD is running fine.” tooltip. It shows “Behold! The Scriptinator!” instead. No idea why.
  3. If I click the tray icon to dismiss it, then restart the failed systemd service, the icon would re-appear, since output of systemctl changed. So I modified scriptinator_is_systemd_ok.sh:
#!/bin/sh

cd "$XDG_RUNTIME_DIR" || exit

systemctl --failed > systemd-status

if ! cmp --quiet systemd-status systemd-status.old
then
    if grep failed systemd-status
    then
        echo "{PlasmoidStatusStart}attention{PlasmoidStatusEnd}"
        echo "{PlasmoidIconStart}system-error{PlasmoidIconEnd}"
        echo "{PlasmoidTooltipStart}"
        systemctl --failed | sed -e '/^$/,$d'
        echo "{PlasmoidTooltipEnd}"
    else
        cp systemd-status systemd-status.old
    fi
# else
    # do nothing
fi

I started using btop and btm too, but that is a great tip, thanks!

Oooh, computer-fail is a good one!

I think that’s an issue with Scriptinator. I noticed that too. What seems to happen is that when you click i, it shows the correct tooltip for a really short time and, at least in my case, at the upper-left corner of the screen.

Yes, that is a limitation in my version, that I noticed too, but was too tired to figure out how to fix it. Your modification makes a lot of sense, thanks!

Uhh, I know this went in a different direction, but answering part of the original question:

There’s OnFailure (which lets you specify another service) and ExecStopPost (which lets you specify an arbitrary command). Something like this:

[Unit]
Description=My X service

[Service]
ExecStart=/usr/bin/false
ExecStopPost=/usr/bin/notify-send "Service X failed!"
1 Like