Discussion:
OnFailure=
(too old to reply)
Jakob Schürz
2018-03-07 23:37:48 UTC
Permalink
Raw Message
Hi there!

I build a test-unit

# cat ***@.service
[Unit]
Description=Testservice notification
OnFailure=notification-telegram@%n.service

[Service]
Type=simple
Restart=on-failure
#RestartSec=2
ExecStart=/bin/%i
SyslogIdentifier=test@%i.service
StartLimitBurst=5
StartLimitInterval=10


And the notification-Unit notification-telegram@%n.service

# cat notification-***@.service
[Unit]
Description=Send failure-notification about %i to telegram

[Service]
User=jakob
ExecStart=/bin/bash -c "/usr/local/bin/ntfy -b telegram send
\"FAILED\n$(systemctl status %i)\""

When i start the Test-Unit with systemctl start ***@false i get 5
Messages in telegram...

The log is:
Mär 08 00:31:53 aldebaran systemd[1]: Started Testservice notification.
Mär 08 00:31:53 aldebaran systemd[1]: ***@false.service: Main process
exited, code=exited, status=1/FAILURE
Mär 08 00:31:53 aldebaran systemd[1]: ***@false.service: Failed with
result 'exit-code'.
Mär 08 00:31:53 aldebaran systemd[1]: ***@false.service: Triggering
OnFailure= dependencies.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Service
hold-off time over, scheduling restart.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Scheduled
restart job, restart counter is at 1.
Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
Mär 08 00:31:54 aldebaran systemd[1]: Started Testservice notification.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Main process
exited, code=exited, status=1/FAILURE
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Failed with
result 'exit-code'.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Triggering
OnFailure= dependencies.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Service
hold-off time over, scheduling restart.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Scheduled
restart job, restart counter is at 2.
Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
Mär 08 00:31:54 aldebaran systemd[1]: Started Testservice notification.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Main process
exited, code=exited, status=1/FAILURE
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Failed with
result 'exit-code'.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Triggering
OnFailure= dependencies.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Service
hold-off time over, scheduling restart.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Scheduled
restart job, restart counter is at 3.
Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
Mär 08 00:31:54 aldebaran systemd[1]: Started Testservice notification.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Main process
exited, code=exited, status=1/FAILURE
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Failed with
result 'exit-code'.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Triggering
OnFailure= dependencies.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Service
hold-off time over, scheduling restart.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Scheduled
restart job, restart counter is at 4.
Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Start request
repeated too quickly.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Failed with
result 'exit-code'.
Mär 08 00:31:54 aldebaran systemd[1]: Failed to start Testservice
notification.
Mär 08 00:31:54 aldebaran systemd[1]: ***@false.service: Triggering
OnFailure= dependencies.


You see, the Unit from OnFailure= is called 5 times, not at the "Failed
to start Testservice notification"-time.

The man-page says:

OnFailure=
A space-separated list of one or more units that are
activated when this unit enters the "failed" state. A service unit using
Restart= enters the failed state only after the
start limits are reached.


But in this testcase, the unit listet in OnFailure is called every time,
the unit failes, restarts again fails again, and after 5 times
(=StartLimitBurst), the unit falls into failed state... Here should be
the only one time, where "OnFailure=" is hit...

My systemd-Version is 237-3 from debian.

Should i file a Bug in bugs.freedesktop.org?

Jakob
Andrei Borzenkov
2018-03-08 06:03:38 UTC
Permalink
Raw Message
Post by Jakob Schürz
Hi there!
I build a test-unit
[Unit]
Description=Testservice notification
[Service]
Type=simple
Restart=on-failure
#RestartSec=2
ExecStart=/bin/%i
StartLimitBurst=5
StartLimitInterval=10
[Unit]
Description=Send failure-notification about %i to telegram
[Service]
User=jakob
ExecStart=/bin/bash -c "/usr/local/bin/ntfy -b telegram send
\"FAILED\n$(systemctl status %i)\""
Messages in telegram...
Mär 08 00:31:53 aldebaran systemd[1]: Started Testservice notification.
exited, code=exited, status=1/FAILURE
result 'exit-code'.
OnFailure= dependencies.
hold-off time over, scheduling restart.
restart job, restart counter is at 1.
Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
Mär 08 00:31:54 aldebaran systemd[1]: Started Testservice notification.
exited, code=exited, status=1/FAILURE
result 'exit-code'.
OnFailure= dependencies.
hold-off time over, scheduling restart.
restart job, restart counter is at 2.
Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
Mär 08 00:31:54 aldebaran systemd[1]: Started Testservice notification.
exited, code=exited, status=1/FAILURE
result 'exit-code'.
OnFailure= dependencies.
hold-off time over, scheduling restart.
restart job, restart counter is at 3.
Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
Mär 08 00:31:54 aldebaran systemd[1]: Started Testservice notification.
exited, code=exited, status=1/FAILURE
result 'exit-code'.
OnFailure= dependencies.
hold-off time over, scheduling restart.
restart job, restart counter is at 4.
Mär 08 00:31:54 aldebaran systemd[1]: Stopped Testservice notification.
repeated too quickly.
result 'exit-code'.
Mär 08 00:31:54 aldebaran systemd[1]: Failed to start Testservice
notification.
OnFailure= dependencies.
You see, the Unit from OnFailure= is called 5 times, not at the "Failed
to start Testservice notification"-time.
OnFailure=
A space-separated list of one or more units that are
activated when this unit enters the "failed" state. A service unit using
Restart= enters the failed state only after the
start limits are reached.
This is apparently wrong, because service briefly goes via "failed"
state every time it fails. It is true that if Restart= is set it
immediately follows by "activating" state again, but OnFailure actions
are still taken.

So from end-user perspective unit indeed remains "failed" only when
limits are reached, but internally it does transition via "failed" state
every time.
Post by Jakob Schürz
But in this testcase, the unit listet in OnFailure is called every time,
the unit failes, restarts again fails again, and after 5 times
(=StartLimitBurst), the unit falls into failed state... Here should be
the only one time, where "OnFailure=" is hit...
My systemd-Version is 237-3 from debian.
Should i file a Bug in bugs.freedesktop.org?
You should create issue on github, this this where primary bug tracker
is today:

https://github.com/systemd/systemd/
Jakob Schürz
2018-03-08 09:06:32 UTC
Permalink
Raw Message
[...]
Post by Andrei Borzenkov
This is apparently wrong, because service briefly goes via "failed"
state every time it fails. It is true that if Restart= is set it
immediately follows by "activating" state again, but OnFailure actions
are still taken.
So from end-user perspective unit indeed remains "failed" only when
limits are reached, but internally it does transition via "failed" state
every time.
Post by Jakob Schürz
But in this testcase, the unit listet in OnFailure is called every time,
the unit failes, restarts again fails again, and after 5 times
(=StartLimitBurst), the unit falls into failed state... Here should be
the only one time, where "OnFailure=" is hit...
My systemd-Version is 237-3 from debian.
Should i file a Bug in bugs.freedesktop.org?
You should create issue on github, this this where primary bug tracker
https://github.com/systemd/systemd/
Thanks. So i filed a bug there;

https://github.com/systemd/systemd/issues/8398

Jakob

Loading...