Discussion:
Hardware watchdog support, slug speed.
(too old to reply)
Sébastien Luttringer
2013-03-11 22:11:19 UTC
Permalink
Hello,

I thought that systemd and the watchdog driver sbc_fitpc2_wdt was not
friends. I use the both on an archlinux i686 setup to do some network
basics.
Everything works pretty well with watchdogd, I doesn't use the
advanced features and systemd would do the job perfectly. Hum. But
not.

The issue, is not a matter of support, because systemd (and the whole
things communicating with it) become very slow when I enable the
watchdog.

Some metrics:

systemctl call, without watchdog enabled : 0s
# time systemctl --no-page
systemctl --no-pager 0,00s user 0,01s system 27% cpu 0,049 total

systemctl call, with watchdog enabled.: 4s
# time systemctl --no-page
systemctl --no-pager 0,01s user 0,01s system 0% cpu 4,359 total

whole system boot-up without watchdog : 5s
# systemd-analyze
Startup finished in 1987ms (kernel) + 2884ms (userspace) = 4871ms

whole system boot-up with watchdog : 2min57s (aye!!!)
# systemd-analyze
Startup finished in 2154ms (kernel) + 175579ms (userspace) = 177733ms

Even the call to systemd-analyse is longer. Firstly I guess that the
driver is slowing the whole system, but only systemd is slow and when
using watchdogd, systemd is fast.

I use a simple configuration

# grep Watch /etc/systemd/system.conf
RuntimeWatchdogSec=1min
ShutdownWatchdogSec=2min

After reexecuting the daemon, I got the following info in the journal.

# systemctl daemon-reexec
# LC_ALL=C journalctl -f
mars 11 20:50:15 slice systemd[1]: systemd 198 running in system mode.
(+PAM -LIBWRAP -AUDIT -SELINUX -IMA -S...L +XZ)
mars 11 20:50:15 slice systemd[1]: Hardware watchdog 'SBC-FITPC2
Watchdog', version 0
mars 11 20:50:16 slice systemd[1]: Set hardware watchdog to 1min.
mars 11 20:50:16 slice systemd[1]: Failed to enable hardware watchdog:
Inappropriate ioctl for device

No load average.
# LC_ALL=C uptime
21:11:29 up 1 day, 23:12, 3 users, load average: 0.13, 0.18, 0.22

No tremendous read/write.
# dstat
You did not select any stats, using -cdngy by default.
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
1 1 98 0 0 0| 650B 3219B| 0 0 | 0 0 | 676 421
1 1 98 0 0 0| 0 0 | 162k 315k| 0 0 | 477 317

No process running at 100%. But systemd act as a slug.

Stracing pid 1, there is (even if the message in jounal is failed to
enable) an ioctl evey 30s has expected.

# strace -t -e ioctl -p 1
Process 1 attached
22:09:28 ioctl(38, WDIOC_KEEPALIVE, 0) = 0
22:09:59 ioctl(38, WDIOC_KEEPALIVE, 0) = 0
22:10:30 ioctl(38, WDIOC_KEEPALIVE, 0) = 0
22:11:00 ioctl(38, WDIOC_KEEPALIVE, 0) = 0

But everytime I call a systemd command, the ioctl is called. Here is
the output of the same strace, with a call on systemctl --no-pager.

# strace -t -e ioctl -p 1
Process 1 attached
22:12:41 ioctl(38, WDIOC_KEEPALIVE, 0) = 0
22:12:44 ioctl(38, WDIOC_KEEPALIVE, 0) = 0
22:12:44 ioctl(38, WDIOC_KEEPALIVE, 0) = 0
22:12:45 ioctl(38, WDIOC_KEEPALIVE, 0) = 0
22:12:46 ioctl(38, WDIOC_KEEPALIVE, 0) = 0
22:12:46 ioctl(38, WDIOC_KEEPALIVE, 0) = 0
22:12:47 ioctl(38, WDIOC_KEEPALIVE, 0) = 0
22:12:48 ioctl(38, WDIOC_KEEPALIVE, 0) = 0
22:12:48 ioctl(38, WDIOC_KEEPALIVE, 0) = 0
22:12:49 ioctl(38, WDIOC_KEEPALIVE, 0) = 0
22:12:49 ioctl(38, WDIOC_KEEPALIVE, 0) = 0

I guess issue come from the manager_loop (src/core/manager.c) which
call the watchog_ping each loop iteration.
Even when there is successive and positive return from manager_dispatch_*.

Seeing that's I figure that all my system with watchdog enabled are
"slugged" by watchdog activation. And it's true.
All my computers call ioctl more than 10th by second. But with
powerful intel modern computer I doesn't see it since weeks.

So, I thnk this code should be reworked and only do the ioctl at
interval defined in RuntimeWatchdogSec, firstly because we use too
many cycle (global speed/power) and
secondly, because there is no way to trigger a system reset by the
wathdog in these condition.
I'm also interogative about calling ioctl (and trying to reopen
/dev/watchdog) even when the initial setup of the watchdog have
failed.

I could provide some patches to fix this if you are interested.

Cheers,
--
Sébastien "Seblu" Luttringer
https://www.seblu.net
GPG: 0x2072D77A
Zbigniew Jędrzejewski-Szmek
2013-03-11 22:42:44 UTC
Permalink
Post by Sébastien Luttringer
After reexecuting the daemon, I got the following info in the journal.
# systemctl daemon-reexec
# LC_ALL=C journalctl -f
mars 11 20:50:15 slice systemd[1]: systemd 198 running in system mode.
(+PAM -LIBWRAP -AUDIT -SELINUX -IMA -S...L +XZ)
mars 11 20:50:15 slice systemd[1]: Hardware watchdog 'SBC-FITPC2
Watchdog', version 0
mars 11 20:50:16 slice systemd[1]: Set hardware watchdog to 1min.
Inappropriate ioctl for device
Hi Sébastien,
thank you for the great bug report.
Post by Sébastien Luttringer
So, I thnk this code should be reworked and only do the ioctl at
interval defined in RuntimeWatchdogSec, firstly because we use too
many cycle (global speed/power) and
secondly, because there is no way to trigger a system reset by the
wathdog in these condition.
I'm also interogative about calling ioctl (and trying to reopen
/dev/watchdog) even when the initial setup of the watchdog have
failed.
I could provide some patches to fix this if you are interested.
Definitely. There are some places in systemd where unexpected failure
results in excessive usage of resources. Patches (preferably two,
for the two separate issues you describe) are always welcome.

Zbyszek
Lennart Poettering
2013-03-11 22:46:55 UTC
Permalink
Post by Zbigniew Jędrzejewski-Szmek
Post by Sébastien Luttringer
After reexecuting the daemon, I got the following info in the journal.
# systemctl daemon-reexec
# LC_ALL=C journalctl -f
mars 11 20:50:15 slice systemd[1]: systemd 198 running in system mode.
(+PAM -LIBWRAP -AUDIT -SELINUX -IMA -S...L +XZ)
mars 11 20:50:15 slice systemd[1]: Hardware watchdog 'SBC-FITPC2
Watchdog', version 0
mars 11 20:50:16 slice systemd[1]: Set hardware watchdog to 1min.
Inappropriate ioctl for device
Hi Sébastien,
thank you for the great bug report.
Humm the ioctl() is supposed to be cheap. And it is on all hw I have
tested it with. It appears that ob the hw in question it is not so
cheap, but that really sounds like a driver issue to me.
Post by Zbigniew Jędrzejewski-Szmek
Post by Sébastien Luttringer
So, I thnk this code should be reworked and only do the ioctl at
interval defined in RuntimeWatchdogSec, firstly because we use too
many cycle (global speed/power) and
secondly, because there is no way to trigger a system reset by the
wathdog in these condition.
I'm also interogative about calling ioctl (and trying to reopen
/dev/watchdog) even when the initial setup of the watchdog have
failed.
I could provide some patches to fix this if you are interested.
Definitely. There are some places in systemd where unexpected failure
results in excessive usage of resources. Patches (preferably two,
for the two separate issues you describe) are always welcome.
Well, the ioctl issue above just indicates that the driver sucks, but
given how weakly the kernel iface is defined this is generally not a
reason not to continue to ping the hw.

To me this really appears as if the driver needs some updating, and we
shouldn't attempt to tape over that by calling the ioctl less often.

I mean, the whole logic of a watchdog is to ping it when we are still
alive and well, so that it gets triggered when we aren't. By pinging
them in every loop we do this when we are awake anyway, so it's
basically free...

Lennart
--
Lennart Poettering - Red Hat, Inc.
Zbigniew Jędrzejewski-Szmek
2013-03-11 22:57:47 UTC
Permalink
Post by Lennart Poettering
Post by Zbigniew Jędrzejewski-Szmek
Post by Sébastien Luttringer
After reexecuting the daemon, I got the following info in the journal.
# systemctl daemon-reexec
# LC_ALL=C journalctl -f
mars 11 20:50:15 slice systemd[1]: systemd 198 running in system mode.
(+PAM -LIBWRAP -AUDIT -SELINUX -IMA -S...L +XZ)
mars 11 20:50:15 slice systemd[1]: Hardware watchdog 'SBC-FITPC2
Watchdog', version 0
mars 11 20:50:16 slice systemd[1]: Set hardware watchdog to 1min.
Inappropriate ioctl for device
Hi Sébastien,
thank you for the great bug report.
Humm the ioctl() is supposed to be cheap. And it is on all hw I have
tested it with. It appears that ob the hw in question it is not so
cheap, but that really sounds like a driver issue to me.
It is definitely not cheap:

static void wdt_send_data(unsigned char command, unsigned char data)
{
outb(data, DATA_PORT);
msleep(200);
outb(command, COMMAND_PORT);
msleep(100);
}
Post by Lennart Poettering
Post by Zbigniew Jędrzejewski-Szmek
Post by Sébastien Luttringer
So, I thnk this code should be reworked and only do the ioctl at
interval defined in RuntimeWatchdogSec, firstly because we use too
many cycle (global speed/power) and
secondly, because there is no way to trigger a system reset by the
wathdog in these condition.
I'm also interogative about calling ioctl (and trying to reopen
/dev/watchdog) even when the initial setup of the watchdog have
failed.
I could provide some patches to fix this if you are interested.
Definitely. There are some places in systemd where unexpected failure
results in excessive usage of resources. Patches (preferably two,
for the two separate issues you describe) are always welcome.
Well, the ioctl issue above just indicates that the driver sucks, but
given how weakly the kernel iface is defined this is generally not a
reason not to continue to ping the hw.
To me this really appears as if the driver needs some updating, and we
shouldn't attempt to tape over that by calling the ioctl less often.
I mean, the whole logic of a watchdog is to ping it when we are still
alive and well, so that it gets triggered when we aren't. By pinging
them in every loop we do this when we are awake anyway, so it's
basically free...
Looking at 'git log drivers/watchdog/sbc_fitpc2_wdt.c', this board is
prone to hanging if not treated very carefully. Hence probably those
sleeps in there. If is a driver/hardward issue to some extent. The question
is whether it makes sense to support it.

Zbyszek
Sébastien Luttringer
2013-03-11 23:53:46 UTC
Permalink
On Mon, Mar 11, 2013 at 11:57 PM, Zbigniew Jędrzejewski-Szmek
Post by Zbigniew Jędrzejewski-Szmek
Post by Lennart Poettering
Post by Sébastien Luttringer
After reexecuting the daemon, I got the following info in the journal.
Humm the ioctl() is supposed to be cheap. And it is on all hw I have
tested it with. It appears that ob the hw in question it is not so
cheap, but that really sounds like a driver issue to me.
static void wdt_send_data(unsigned char command, unsigned char data)
{
outb(data, DATA_PORT);
msleep(200);
outb(command, COMMAND_PORT);
msleep(100);
}
Nice ! This explain why it's so sensitive with this hardware. I see
there is others drivers doing this.

rwolf ~/scm/misc/linux/drivers/watchdog $ git grep sleep|cat
pc87413_wdt.c: pr_info("Watchdog disabled, sleeping again...\n");
pcwd.c: msleep(500);
sbc8360.c: msleep_interruptible(100);
sbc8360.c: msleep_interruptible(100);
sbc8360.c: msleep_interruptible(100);
sbc_fitpc2_wdt.c: msleep(200);
sbc_fitpc2_wdt.c: msleep(100);
smsc37b787_wdt.c: pr_info("Watchdog disabled, sleeping again...\n");
smsc37b787_wdt.c: pr_info("Watchdog initialized and sleeping
(nowayout=%d)...\n",
wm831x_wdt.c: gpio_set_value_cansleep(driver_data->update_gpio,

I'm not sure ioctl to watchdog API can be considered as nonblocking
and can be used in an event loop calmly.
Post by Zbigniew Jędrzejewski-Szmek
Looking at 'git log drivers/watchdog/sbc_fitpc2_wdt.c', this board is
prone to hanging if not treated very carefully. Hence probably those
sleeps in there. If is a driver/hardward issue to some extent. The question
is whether it makes sense to support it.
Honoring the configuration file, allow without effort to support all watchdog.
--
Sébastien "Seblu" Luttringer
https://www.seblu.net
GPG: 0x2072D77A
Lennart Poettering
2013-03-12 13:46:36 UTC
Permalink
Post by Sébastien Luttringer
On Mon, Mar 11, 2013 at 11:57 PM, Zbigniew Jędrzejewski-Szmek
Post by Zbigniew Jędrzejewski-Szmek
Post by Lennart Poettering
Post by Sébastien Luttringer
After reexecuting the daemon, I got the following info in the journal.
Humm the ioctl() is supposed to be cheap. And it is on all hw I have
tested it with. It appears that ob the hw in question it is not so
cheap, but that really sounds like a driver issue to me.
static void wdt_send_data(unsigned char command, unsigned char data)
{
outb(data, DATA_PORT);
msleep(200);
outb(command, COMMAND_PORT);
msleep(100);
}
Nice ! This explain why it's so sensitive with this hardware. I see
there is others drivers doing this.
rwolf ~/scm/misc/linux/drivers/watchdog $ git grep sleep|cat
pc87413_wdt.c: pr_info("Watchdog disabled, sleeping again...\n");
pcwd.c: msleep(500);
sbc8360.c: msleep_interruptible(100);
sbc8360.c: msleep_interruptible(100);
sbc8360.c: msleep_interruptible(100);
sbc_fitpc2_wdt.c: msleep(200);
sbc_fitpc2_wdt.c: msleep(100);
smsc37b787_wdt.c: pr_info("Watchdog disabled, sleeping again...\n");
smsc37b787_wdt.c: pr_info("Watchdog initialized and sleeping
(nowayout=%d)...\n",
wm831x_wdt.c: gpio_set_value_cansleep(driver_data->update_gpio,
I'm not sure ioctl to watchdog API can be considered as nonblocking
and can be used in an event loop calmly.
Ah, if there are sleeps of 500ms in the ioctl and this is used
everywhere then we have to consider the ioctl of the "blocking"
kind. What a fuckup.

I think the best approach for this might be to make it "non-blocking" in
userspace, i.e. spawn a thread that invokes the ioctl and be done with
it. This should be hidden away as well as we can in watchdog.c, because
threads in PID 1 are just an awful idea, but I guess that's what it takes.

Lennart
--
Lennart Poettering - Red Hat, Inc.
Sébastien Luttringer
2013-03-11 23:45:13 UTC
Permalink
On Mon, Mar 11, 2013 at 11:46 PM, Lennart Poettering
Post by Lennart Poettering
Post by Zbigniew Jędrzejewski-Szmek
Hi Sébastien,
thank you for the great bug report.
Thanks. I tried to come with technical facts :)
Post by Lennart Poettering
Humm the ioctl() is supposed to be cheap. And it is on all hw I have
tested it with. It appears that ob the hw in question it is not so
cheap, but that really sounds like a driver issue to me.
Before everything, ioctl is a syscall, and even if the hardware /
driver is buggy, doing a syscall every loop turn (for nothing) is a
performance cost which should be avoided.
This is even more true on embedded devices where power is also
important. You can do more than 10 ioctl each seconds (10 context
swtich) in the pid 1.
I don't thinks you can tell it's really cheap.
Post by Lennart Poettering
Post by Zbigniew Jędrzejewski-Szmek
Post by Sébastien Luttringer
I could provide some patches to fix this if you are interested.
Definitely. There are some places in systemd where unexpected failure
results in excessive usage of resources. Patches (preferably two,
for the two separate issues you describe) are always welcome.
Ok, maybe I should wait for Lennart final words, to start hacking.
Post by Lennart Poettering
Well, the ioctl issue above just indicates that the driver sucks, but
given how weakly the kernel iface is defined this is generally not a
reason not to continue to ping the hw.
Why there is a variable to define in how seconds the watchdog is
ping'ed if you don't honor it?
Before blaming the driver/kernel developers, we can look into "our"
interfaces and see why we don't respect it.
Post by Lennart Poettering
To me this really appears as if the driver needs some updating, and we
shouldn't attempt to tape over that by calling the ioctl less often.
Some hardware watchdog expect a ping between an interval not starting
at 0. We cannot respect this by our hysteric behaviour. Even if an API
was well define.
So I guess 10 ioctl/seconds (on my iTCO_wdt), only consume power and
slow down my laptop for free!

# modinfo iTCO_wdt|grep timeout
parm: heartbeat:Watchdog timeout in seconds. 5..76 (TCO v1)
or 3..614 (TCO v2), default=30) (int)

Why not just call the ioctl at RuntimeWatchdogSec? As everyone except.
This let users choice how often they wants to call the hardware and
even let them adapt to a buggy hardware with an higher value.
Post by Lennart Poettering
I mean, the whole logic of a watchdog is to ping it when we are still
alive and well, so that it gets triggered when we aren't. By pinging
them in every loop we do this when we are awake anyway, so it's
basically free...
With all the respect I owe you Lennart, I disagree.
The logic is to ping at an fixed X interval, knowing that the watchdog
reset the hardware with no good news in an Y interval.
This means that a process must be able to "write" into a fd in Y seconds.
Fine tuning of watchdog can avoid some overload and buggy behaviour.

To resume:
- It's not free, it consume a lot of syscall by seconds in the main
systemd event loop.
- It doesn't honor the user configuration.
- It doesn't respect the hardware minimum value (not only because
there is no API).
- It affects all watchdog drivers, slowing all systems.
- It continue even if the initial configuration have failed, making
things worst by trying to reopen /dev/watchdog each loop turn.
- We are not able to use alternative watchdog device, when we have
more than one (case on my laptop).

Regards,
--
Sébastien "Seblu" Luttringer
https://www.seblu.net
GPG: 0x2072D77A
Lennart Poettering
2013-03-12 13:57:58 UTC
Permalink
Post by Sébastien Luttringer
Post by Lennart Poettering
Humm the ioctl() is supposed to be cheap. And it is on all hw I have
tested it with. It appears that ob the hw in question it is not so
cheap, but that really sounds like a driver issue to me.
Before everything, ioctl is a syscall, and even if the hardware /
driver is buggy, doing a syscall every loop turn (for nothing) is a
performance cost which should be avoided.
No, this is the main event loop. It's executed only on events. It is not
performance relevant.
Post by Sébastien Luttringer
This is even more true on embedded devices where power is also
important. You can do more than 10 ioctl each seconds (10 context
swtich) in the pid 1.
I don't thinks you can tell it's really cheap.
the main loop is not really a tight loop. It's something that runs on
some event, and not otherwise. There's little point in optimizing this
without having a clear knoweledge what the bootlenecks are.
Post by Sébastien Luttringer
Post by Lennart Poettering
Well, the ioctl issue above just indicates that the driver sucks, but
given how weakly the kernel iface is defined this is generally not a
reason not to continue to ping the hw.
Why there is a variable to define in how seconds the watchdog is
ping'ed if you don't honor it?
That variable declares how often to ping the hw *at least*. We optimize
power behaviour by always sleeping as long as we can, and not waking up
any earlier than strictly necessary. This also means we ping the hw when
we are woken up anyway, right before we go to sleep again, so that we
maximize our sleep time.
Post by Sébastien Luttringer
Before blaming the driver/kernel developers, we can look into "our"
interfaces and see why we don't respect it.
Blocking ioctls are unconditionally a stupid idea. There's no excuse for
that, and I am happy to tell that any kernel developer who wants to
listen...
Post by Sébastien Luttringer
Why not just call the ioctl at RuntimeWatchdogSec? As everyone except.
This let users choice how often they wants to call the hardware and
even let them adapt to a buggy hardware with an higher value.
The loop is optimized to run as seldom as possible, not as often as
possible.

Also, just invoking the ioctl less often doesn't fix the issue too
much. A sleep of 500ms in the event loop is never OK. The only sane way
out of this is making this ioctl non-blocking in userspace, by invoking
it in a thread.
Post by Sébastien Luttringer
- It's not free, it consume a lot of syscall by seconds in the main
systemd event loop.
Yeah, but the lesson here is not to call it less often. It is instead of
making it non-blocking.
Post by Sébastien Luttringer
- It doesn't honor the user configuration.
Hmm? It certainly does.
Post by Sébastien Luttringer
- It doesn't respect the hardware minimum value (not only because
there is no API).
Hmm? How would that matter?
Post by Sébastien Luttringer
- It continue even if the initial configuration have failed, making
things worst by trying to reopen /dev/watchdog each loop turn.
The enable ioctl is optional. It's the right thing to ignore it.
Post by Sébastien Luttringer
- We are not able to use alternative watchdog device, when we have
more than one (case on my laptop).
If this is necessary we can beef up the code to ping all available
watchdogs rather than just one.

Lennart
--
Lennart Poettering - Red Hat, Inc.
Loading...