Discussion:
systemd issues related to watchdog
(too old to reply)
prashantkumar dhotre
2018-03-21 18:04:16 UTC
Permalink
Hi systemd-experts

I am seeing few issues related to watchdog in systemd 230 version.
Could you please help me with few queries below ?


1) How do I test hardware watchdog config RunTimeWatchDiogSec and
ShutDownTimeWatchDogSec

2) If I enable RunTimeWatchDogSec, should I also run watchdog.service which
runs /usr/sbin/watchdog ?

3) What is the config to put say 2 min timeout for shutdown/reboot ?
ShutDownTimeWatchDogSec does not do that .

4) In console of my device ,when i reboot it, I sometimes , I see “
553.001000]
Uhhuh. NMI receÿ “ string message.

What is this indicate ? doe this indicate expiry of hw watchdog timer ?


5) how do I see present setting of hw watchdog timer ?

wdctl is not working


***@bng-evo-ptx5k-c-re0:~# wdctl /dev/watchdog0

wdctl: /dev/watchdog0: watchdog already in use, terminating.

wdctl: cannot open /dev/watchdog0: Device or resource busy


***@bng-evo-ptx5k-c-re0:~# wdctl /dev/watchdog1

wdctl: /dev/watchdog1: watchdog already in use, terminating.

wdctl: cannot open /dev/watchdog1: Device or resource busy

5) reboot-force seem to be overwriting the hardware watchdog timeout value.

I have changed reboot.target to make JobTimeoutSec=5sec
when system boots up i see that hardware watchdog is set to 1 min 4 sec.
but when 'systemctl reboot' timesout , reboot-force is invoked and that is
overwriting the
hardware watchdog timeout value to 4 min.
Is this a bug or I am missing some config?

Note that i have not set ShutdownWatchdogSec= in /etc/systemd/system.conf


$ grep -i hardware /var/tmp/j12_systemctl_reboot_jobtimeout5sec_1

Mar 18 23:42:11 re1 systemd[1]: Hardware watchdog 'iTCO_wdt', version 0
Mar 18 23:42:11 re1 systemd[1]: Set hardware watchdog to 1min 4s.
Mar 18 23:42:32 re1 systemd[1]: Set hardware watchdog to 4min.
6) RuntimeWatchdogSec does not seem to work.
I have set it to 90 sec, but i see my system getting rebooted due to
hardware watchdog getting triggered as I see in my console 'NMI received'
This will not happen if i have only watchdog.service
(running /usr/sbin/watchdog ) and not set *RuntimeWatchdogSec* .
this observation indicates that *RuntimeWatchdogSec* does not seem to do
what it is supposed to do.

7) hw watchdg time out seems to be be non-configurable and always
hardcoded to 1min 4 sec.
neither *RuntimeWatchdogSec* nor ShutDownTimeWatchDogSec setting are
having effect on hw wd timeout.
i see hw watchdog NMI at reboot after 1min4sec of reboot command even if
ShutDownTimeWatchDogSec is configured to 10 min
is this a systemd bug ?

Thanks
Lennart Poettering
2018-03-21 20:49:13 UTC
Permalink
Post by prashantkumar dhotre
Hi systemd-experts
I am seeing few issues related to watchdog in systemd 230 version.
Could you please help me with few queries below ?
1) How do I test hardware watchdog config RunTimeWatchDiogSec and
ShutDownTimeWatchDogSec
See the various suggestions in the responses to:

https://lists.freedesktop.org/archives/systemd-devel/2018-February/040428.html
Post by prashantkumar dhotre
2) If I enable RunTimeWatchDogSec, should I also run watchdog.service which
runs /usr/sbin/watchdog ?
No you should not. There can only be one consumer of each
/dev/watchdog device, and if that's systemd than nobody else will get access.
Post by prashantkumar dhotre
3) What is the config to put say 2 min timeout for shutdown/reboot ?
ShutDownTimeWatchDogSec does not do that .
ShutdownWatchdogSec= applies to the last phase of boot only, i.e. to
the phase where all services are already shut down, and only the final
unmounting and killing of whatever remains is done.

To apply a timeout for the first phase of shutdown, use JobTimeoutSec=
and JobTimeoutAction= in shutdown.target or so. See systemd.unit(5)
for details.
Post by prashantkumar dhotre
4) In console of my device ,when i reboot it, I sometimes , I see “
553.001000]
Uhhuh. NMI receÿ “ string message.
What is this indicate ? doe this indicate expiry of hw watchdog
timer ?
That's generated by the kernel, and is something the kernel folks
should be able to help you with.
Post by prashantkumar dhotre
5) how do I see present setting of hw watchdog timer ?
wdctl is not working
Hmm, yeah, it's an exclusive use device. However watchdog drivers
export their current settings in /sys, too:

grep . /sys/class/watchdog/*/*
Post by prashantkumar dhotre
5) reboot-force seem to be overwriting the hardware watchdog timeout value.
Yes, with the ShutdownWatchdogSec= setting mentioned above.
Post by prashantkumar dhotre
I have changed reboot.target to make JobTimeoutSec=5sec
when system boots up i see that hardware watchdog is set to 1 min 4 sec.
but when 'systemctl reboot' timesout , reboot-force is invoked and that is
overwriting the
hardware watchdog timeout value to 4 min.
Is this a bug or I am missing some config?
Note that i have not set ShutdownWatchdogSec= in
/etc/systemd/system.conf
The default for ShutdownWatchdogSec= is 10min, and most likely your hw
can't do that, hence the next closest 4min is set instead.
Post by prashantkumar dhotre
6) RuntimeWatchdogSec does not seem to work.
I have set it to 90 sec, but i see my system getting rebooted due to
hardware watchdog getting triggered as I see in my console 'NMI received'
This will not happen if i have only watchdog.service
(running /usr/sbin/watchdog ) and not set *RuntimeWatchdogSec* .
this observation indicates that *RuntimeWatchdogSec* does not seem to do
what it is supposed to do.
Hmm, if you "strace -p 1", do you see the watchdog ping ioctls
happening?

IIUC you have two watchdog devices. systemd can only manage one. Is it
possible that the other might be causing this?
Post by prashantkumar dhotre
7) hw watchdg time out seems to be be non-configurable and always
hardcoded to 1min 4 sec.
neither *RuntimeWatchdogSec* nor ShutDownTimeWatchDogSec setting are
having effect on hw wd timeout.
i see hw watchdog NMI at reboot after 1min4sec of reboot command even if
ShutDownTimeWatchDogSec is configured to 10 min
is this a systemd bug ?
Similar here, is it possible that the other watchdog is causing this?

Lennart
--
Lennart Poettering, Red Hat
Loading...