Discussion:
Debugging active timers that do not trigger
(too old to reply)
Daniel Drake
2018-11-08 06:35:16 UTC
Permalink
Hi,

On Endless we have the following eos-autoupdater.timer:

[Unit]
Description=Endless OS Automatic Update Timer
Documentation=man:eos-autoupdater(8)
ConditionKernelCommandLine=!endless.live_boot
ConditionKernelCommandLine=ostree

[Timer]
OnBootSec=15m
OnUnitInactiveSec=1h
RandomizedDelaySec=30min

[Install]
WantedBy=multi-user.target

This ordinarily works fine, but we have seen a couple of random, rare
occasions where this timer doesn't trigger the target
eos-autoupdater.service. I have one case here in front of me now with
details below.

In the list-timers output you can see it has "n/a" for NEXT/LAST etc.
There is no evidence of eos-autoupdater.service having started at any
point in the journal (nor any crashes).

This is not a major concern as it seems to only happen rarely, and
fixes itself upon reboot. Also so far we have only reproduced this on
systemd-237; it's hard to judge whether it's fixed in a newer version
due to the low occurance rate of the issue. But I would be curious if
there are any easy debugging steps I can follow when we see this -
I'll leave the system running in this state for a couple of days in
case there are suggestions.

$ systemctl status eos-autoupdater.timer
● eos-autoupdater.timer - Endless OS Automatic Update Timer
Loaded: loaded (/lib/systemd/system/eos-autoupdater.timer; enabled;
vendor preset: enabled)
Active: active (elapsed) since Wed 2018-11-07 15:11:14 CST; 23h ago
Trigger: n/a
Docs: man:eos-autoupdater(8)

Nov 07 15:11:14 endless systemd[1]: Started Endless OS Automatic Update Timer.

$ systemctl status eos-autoupdater.service
● eos-autoupdater.service - Endless OS Automatic Updater
Loaded: loaded (/lib/systemd/system/eos-autoupdater.service;
indirect; vendor preset: enabled)
Active: inactive (dead)
Docs: man:eos-autoupdater(8)


$ systemctl list-timers
NEXT LEFT LAST
PASSED UNIT ACTIVATES
Thu 2018-11-08 15:34:06 CST 1h 17min left Wed 2018-11-07 15:26:02 CST
22h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Thu 2018-11-08 17:10:45 CST 2h 54min left Thu 2018-11-08 14:10:44 CST
5min ago eos-phone-home.timer eos-phone-home.service
Mon 2018-11-12 00:00:00 CST 3 days left n/a
n/a fstrim.timer fstrim.service
n/a n/a n/a
n/a eos-autoupdater.timer eos-autoupdater.service
n/a n/a Wed 2018-11-07 15:27:05 CST
22h ago systemd-readahead-done.timer systemd-readahead-done.service



$ systemctl show eos-autoupdater.timer
Unit=eos-autoupdater.service
NextElapseUSecMonotonic=infinity
LastTriggerUSecMonotonic=0
Result=success
AccuracyUSec=1min
RandomizedDelayUSec=30min
Persistent=no
WakeSystem=no
RemainAfterElapse=yes
Id=eos-autoupdater.timer
Names=eos-autoupdater.timer
Requires=sysinit.target
WantedBy=multi-user.target
Conflicts=shutdown.target
Before=timers.target multi-user.target eos-autoupdater.service shutdown.target
After=sysinit.target
Triggers=eos-autoupdater.service
Documentation=man:eos-autoupdater(8)
Description=Endless OS Automatic Update Timer
LoadState=loaded
ActiveState=active
SubState=elapsed
FragmentPath=/lib/systemd/system/eos-autoupdater.timer
UnitFileState=enabled
UnitFilePreset=enabled
StateChangeTimestamp=Wed 2018-11-07 15:26:36 CST
StateChangeTimestampMonotonic=934682450
InactiveExitTimestamp=Wed 2018-11-07 15:11:14 CST
InactiveExitTimestampMonotonic=13380144
ActiveEnterTimestamp=Wed 2018-11-07 15:11:14 CST
ActiveEnterTimestampMonotonic=13380144
ActiveExitTimestampMonotonic=0
InactiveEnterTimestampMonotonic=0
CanStart=yes
CanStop=yes
CanReload=no
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Wed 2018-11-07 15:11:14 CST
ConditionTimestampMonotonic=13380053
AssertTimestamp=Wed 2018-11-07 15:11:14 CST
AssertTimestampMonotonic=13380122
Transient=no
Perpetual=no
StartLimitIntervalUSec=10s
StartLimitBurst=5
StartLimitAction=none
FailureAction=none
SuccessAction=none
InvocationID=c1bf78021112483db79c39221fd58d80
CollectMode=inactive


$ ls -l /var/lib/systemd/timers/
total 0
-rw-r--r-- 1 root root 0 Nov 7 15:11 stamp-fstrim.timer


Thanks
Daniel
Andrei Borzenkov
2018-11-08 10:46:36 UTC
Permalink
Post by Daniel Drake
Hi,
[Unit]
Description=Endless OS Automatic Update Timer
Documentation=man:eos-autoupdater(8)
ConditionKernelCommandLine=!endless.live_boot
ConditionKernelCommandLine=ostree
[Timer]
OnBootSec=15m
It is possible that system never ends booting. Do you have any pending
jobs (systemctl list-jobs)? What "systemctl is-system-running" says?
Post by Daniel Drake
OnUnitInactiveSec=1h
RandomizedDelaySec=30min
[Install]
WantedBy=multi-user.target
This ordinarily works fine, but we have seen a couple of random, rare
occasions where this timer doesn't trigger the target
eos-autoupdater.service. I have one case here in front of me now with
details below.
In the list-timers output you can see it has "n/a" for NEXT/LAST etc.
There is no evidence of eos-autoupdater.service having started at any
point in the journal (nor any crashes).
This is not a major concern as it seems to only happen rarely, and
fixes itself upon reboot. Also so far we have only reproduced this on
systemd-237; it's hard to judge whether it's fixed in a newer version
due to the low occurance rate of the issue. But I would be curious if
there are any easy debugging steps I can follow when we see this -
I'll leave the system running in this state for a couple of days in
case there are suggestions.
$ systemctl status eos-autoupdater.timer
● eos-autoupdater.timer - Endless OS Automatic Update Timer
Loaded: loaded (/lib/systemd/system/eos-autoupdater.timer; enabled;
vendor preset: enabled)
Active: active (elapsed) since Wed 2018-11-07 15:11:14 CST; 23h ago
Trigger: n/a
Docs: man:eos-autoupdater(8)
Nov 07 15:11:14 endless systemd[1]: Started Endless OS Automatic Update Timer.
$ systemctl status eos-autoupdater.service
● eos-autoupdater.service - Endless OS Automatic Updater
Loaded: loaded (/lib/systemd/system/eos-autoupdater.service;
indirect; vendor preset: enabled)
Active: inactive (dead)
Docs: man:eos-autoupdater(8)
$ systemctl list-timers
NEXT LEFT LAST
PASSED UNIT ACTIVATES
Thu 2018-11-08 15:34:06 CST 1h 17min left Wed 2018-11-07 15:26:02 CST
22h ago systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Thu 2018-11-08 17:10:45 CST 2h 54min left Thu 2018-11-08 14:10:44 CST
5min ago eos-phone-home.timer eos-phone-home.service
Mon 2018-11-12 00:00:00 CST 3 days left n/a
n/a fstrim.timer fstrim.service
n/a n/a n/a
n/a eos-autoupdater.timer eos-autoupdater.service
n/a n/a Wed 2018-11-07 15:27:05 CST
22h ago systemd-readahead-done.timer systemd-readahead-done.service
$ systemctl show eos-autoupdater.timer
Unit=eos-autoupdater.service
NextElapseUSecMonotonic=infinity
LastTriggerUSecMonotonic=0
Result=success
AccuracyUSec=1min
RandomizedDelayUSec=30min
Persistent=no
WakeSystem=no
RemainAfterElapse=yes
Id=eos-autoupdater.timer
Names=eos-autoupdater.timer
Requires=sysinit.target
WantedBy=multi-user.target
Conflicts=shutdown.target
Before=timers.target multi-user.target eos-autoupdater.service shutdown.target
After=sysinit.target
Triggers=eos-autoupdater.service
Documentation=man:eos-autoupdater(8)
Description=Endless OS Automatic Update Timer
LoadState=loaded
ActiveState=active
SubState=elapsed
FragmentPath=/lib/systemd/system/eos-autoupdater.timer
UnitFileState=enabled
UnitFilePreset=enabled
StateChangeTimestamp=Wed 2018-11-07 15:26:36 CST
StateChangeTimestampMonotonic=934682450
InactiveExitTimestamp=Wed 2018-11-07 15:11:14 CST
InactiveExitTimestampMonotonic=13380144
ActiveEnterTimestamp=Wed 2018-11-07 15:11:14 CST
ActiveEnterTimestampMonotonic=13380144
ActiveExitTimestampMonotonic=0
InactiveEnterTimestampMonotonic=0
CanStart=yes
CanStop=yes
CanReload=no
CanIsolate=no
StopWhenUnneeded=no
RefuseManualStart=no
RefuseManualStop=no
AllowIsolate=no
DefaultDependencies=yes
OnFailureJobMode=replace
IgnoreOnIsolate=no
NeedDaemonReload=no
JobTimeoutUSec=infinity
JobRunningTimeoutUSec=infinity
JobTimeoutAction=none
ConditionResult=yes
AssertResult=yes
ConditionTimestamp=Wed 2018-11-07 15:11:14 CST
ConditionTimestampMonotonic=13380053
AssertTimestamp=Wed 2018-11-07 15:11:14 CST
AssertTimestampMonotonic=13380122
Transient=no
Perpetual=no
StartLimitIntervalUSec=10s
StartLimitBurst=5
StartLimitAction=none
FailureAction=none
SuccessAction=none
InvocationID=c1bf78021112483db79c39221fd58d80
CollectMode=inactive
$ ls -l /var/lib/systemd/timers/
total 0
-rw-r--r-- 1 root root 0 Nov 7 15:11 stamp-fstrim.timer
Thanks
Daniel
_______________________________________________
systemd-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Daniel Drake
2018-11-12 04:52:08 UTC
Permalink
Post by Andrei Borzenkov
It is possible that system never ends booting. Do you have any pending
jobs (systemctl list-jobs)? What "systemctl is-system-running" says?
Thanks for the suggestion! It sounds like a good one - I did reproduce
this on first boot and we do have a known issue in that area affecting
systemd's perception of boot completion.
https://gitlab.gnome.org/GNOME/gdm/issues/439

Unfortunately I wasn't able to leave the system in that state after
all, so I can't check directly any more, but I'll do more testing
along these lines.

Thanks
Daniel
Michal Koutný
2018-11-15 11:03:53 UTC
Permalink
Post by Andrei Borzenkov
It is possible that system never ends booting. Do you have any pending
jobs (systemctl list-jobs)? What "systemctl is-system-running" says?
I don't think this is the case. The OnBootSec= is taken relatively to
the instant when the kernel started counting time (on Linux implemented
via CLOCK_MONOTONIC).

@Daniel, is it possible there are some daemon-reloads running
concurrently with the timer? More precisely, can it happen the timer
expires exactly when systemd reloads?

Michal
Daniel Drake
2018-11-16 08:49:08 UTC
Permalink
Post by Michal Koutný
@Daniel, is it possible there are some daemon-reloads running
concurrently with the timer? More precisely, can it happen the timer
expires exactly when systemd reloads?
I don't think so. The journal only show a single "systemd[1]:
Reloading." message and that happened as part of our initramfs
scripts, before the real-root systemd was run.

Daniel

Continue reading on narkive:
Loading...