[systemd-devel] [PATCH v3 0/4] watchdog handling with systemd

This patch adds the WatchdogRestartSec and WatchdogRebootSec
properties to services. Systemd will restart the service / reboot the
system if the watchdog timeout has not been updated for the configured
amount of time.
This functionality is only enabled if the watchdog timeout is set at
least once.
---
changes in v3:
- fix typo in service_timer_event (reboot vs. restart)
- use now(CLOCK_MONOTONIC) in service_coldplug()

changes in v2:
- adapt to changes from "d200735 dbus: more efficient implementation of properties"
- stop timers in service_done() and restart in service_coldplug()

man/systemd.service.xml | 32 ++++++++++++++++++++++++++++
src/dbus-service.c | 4 +++
src/load-fragment-gperf.gperf.m4 | 2 +
src/service.c | 42 ++++++++++++++++++++++++++++++++++++++
src/service.h | 4 +++
5 files changed, 84 insertions(+), 0 deletions(-)

diff --git a/man/systemd.service.xml b/man/systemd.service.xml
index 0baddd1..48f63ce 100644
--- a/man/systemd.service.xml
+++ b/man/systemd.service.xml
@@ -460,6 +460,38 @@
</varlistentry>

<varlistentry>
+ <term><varname>WatchdogRestartSec=</varname></term>
+ <listitem><para>Configures the time to
+ wait before restarting a service. This
+ is activated with the first
+ <citerefentry><refentrytitle>sd_notify</refentrytitle><manvolnum>3</manvolnum></citerefentry>
+ call with "WATCHDOG=1". If the time
+ between two such calls is larger than
+ the configured time then the service
+ is restarted. Defaults to 0s, meaning
+ watchdog triggered restart is
+ disabled.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>WatchdogRebootSec=</varname></term>
+ <listitem><para>Configures the time to
+ wait before rebooting the system. This
+ is basically the same as
+ <varname>WatchdogRestartSec=</varname>
+ but the whole system is rebooted
+ instead of just restarting the
+ service. The typical use-case is to
+ set this to
+ <varname>WatchdogRestartSec</varname>
+ + <varname>TimeoutSec</varname> to
+ reboot in case the service restart
+ fails. Defaults to 0s, meaning
+ watchdog triggered reboot is
+ disabled.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><varname>Restart=</varname></term>
<listitem><para>Configures whether the
main service process shall be
diff --git a/src/dbus-service.c b/src/dbus-service.c
index d7529ec..c219aba 100644
--- a/src/dbus-service.c
+++ b/src/dbus-service.c
@@ -43,6 +43,8 @@
" <property name=\"NotifyAccess\" type=\"s\" access=\"read\"/>\n" \
" <property name=\"RestartUSec\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"TimeoutUSec\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogRestartUSec\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogRebootUSec\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"WatchdogTimestamp\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"WatchdogTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
BUS_EXEC_COMMAND_INTERFACE("ExecStartPre") \
@@ -115,6 +117,8 @@ static const BusProperty bus_service_properties[] = {
{ "NotifyAccess", bus_service_append_notify_access, "s", offsetof(Service, notify_access) },
{ "RestartUSec", bus_property_append_usec, "t", offsetof(Service, restart_usec) },
{ "TimeoutUSec", bus_property_append_usec, "t", offsetof(Service, timeout_usec) },
+ { "WatchdogRestartUSec", bus_property_append_usec, "t", offsetof(Service, watchdog_restart_usec) },
+ { "WatchdogRebootUSec", bus_property_append_usec, "t", offsetof(Service, watchdog_reboot_usec) },
{ "WatchdogTimestamp", bus_property_append_usec, "t", offsetof(Service, watchdog_timestamp.realtime)},
{ "WatchdogTimestampMonotonic",bus_property_append_usec, "t", offsetof(Service, watchdog_timestamp.monotonic)},
BUS_EXEC_COMMAND_PROPERTY("ExecStartPre", offsetof(Service, exec_command[SERVICE_EXEC_START_PRE]), true ),
diff --git a/src/load-fragment-gperf.gperf.m4 b/src/load-fragment-gperf.gperf.m4
index 14c0606..f4d7523 100644
--- a/src/load-fragment-gperf.gperf.m4
+++ b/src/load-fragment-gperf.gperf.m4
@@ -134,6 +134,8 @@ Service.ExecStop, config_parse_exec, SERVICE_EXE
Service.ExecStopPost, config_parse_exec, SERVICE_EXEC_STOP_POST, offsetof(Service, exec_command)
Service.RestartSec, config_parse_usec, 0, offsetof(Service, restart_usec)
Service.TimeoutSec, config_parse_usec, 0, offsetof(Service, timeout_usec)
+Service.WatchdogRestartSec, config_parse_usec, 0, offsetof(Service, watchdog_restart_usec)
+Service.WatchdogRebootSec, config_parse_usec, 0, offsetof(Service, watchdog_reboot_usec)
Service.Type, config_parse_service_type, 0, offsetof(Service, type)
Service.Restart, config_parse_service_restart, 0, offsetof(Service, restart)
Service.PermissionsStartOnly, config_parse_bool, 0, offsetof(Service, permissions_start_only)
diff --git a/src/service.c b/src/service.c
index e107179..1e9bf98 100644
--- a/src/service.c
+++ b/src/service.c
@@ -112,6 +112,10 @@ static void service_init(Unit *u) {

s->timeout_usec = DEFAULT_TIMEOUT_USEC;
s->restart_usec = DEFAULT_RESTART_USEC;
+
+ s->watchdog_restart_watch.type = WATCH_INVALID;
+ s->watchdog_reboot_watch.type = WATCH_INVALID;
+
s->timer_watch.type = WATCH_INVALID;
#ifdef HAVE_SYSV_COMPAT
s->sysv_start_priority = -1;
@@ -208,14 +212,33 @@ static void service_connection_unref(Service *s) {
static void service_stop_watchdog(Service *s) {
assert(s);

+ unit_unwatch_timer(UNIT(s), &s->watchdog_restart_watch);
+ unit_unwatch_timer(UNIT(s), &s->watchdog_reboot_watch);
s->watchdog_timestamp.realtime = 0;
s->watchdog_timestamp.monotonic = 0;
}

+static void service_setup_watchdog_timer(Service *s, usec_t offset) {
+ int r;
+ assert(s);
+
+ if (s->watchdog_restart_usec) {
+ r = unit_watch_timer(UNIT(s), s->watchdog_restart_usec - offset, &s->watchdog_restart_watch);
+ if (r < 0)
+ log_warning("%s failed to install watchdog restart timer: %s", UNIT(s)->id, strerror(-r));
+ }
+ if (s->watchdog_reboot_usec) {
+ r = unit_watch_timer(UNIT(s), s->watchdog_reboot_usec - offset, &s->watchdog_reboot_watch);
+ if (r < 0)
+ log_warning("%s failed to install watchdog reboot timer: %s", UNIT(s)->id, strerror(-r));
+ }
+}
+
static void service_reset_watchdog(Service *s) {
assert(s);

dual_timestamp_get(&s->watchdog_timestamp);
+ service_setup_watchdog_timer(s, 0);
}

static void service_done(Unit *u) {
@@ -259,6 +282,8 @@ static void service_done(Unit *u) {

unit_ref_unset(&s->accept_socket);

+ service_stop_watchdog(s);
+
unit_unwatch_timer(u, &s->timer_watch);
}

@@ -1566,6 +1591,12 @@ static int service_coldplug(Unit *u) {

service_set_state(s, s->deserialized_state);
}
+ if (dual_timestamp_is_set(&s->watchdog_timestamp)) {
+ usec_t t;
+
+ t = now(CLOCK_MONOTONIC);
+ service_setup_watchdog_timer(s, t - s->watchdog_timestamp.monotonic);
+ }

return 0;
}
@@ -2896,6 +2927,17 @@ static void service_timer_event(Unit *u, uint64_t elapsed, Watch* w) {
assert(s);
assert(elapsed == 1);

+ if (w == &s->watchdog_restart_watch) {
+ log_error("%s watchdog timeout: restarting service...", u->id);
+ manager_add_job(u->manager, JOB_RESTART, u, JOB_FAIL, true, 0, 0);
+ return;
+ }
+ if (w == &s->watchdog_reboot_watch) {
+ log_error("%s watchdog timeout: rebooting...", u->id);
+ manager_add_job_by_name(u->manager, JOB_START, "reboot.target", JOB_REPLACE, true, 0, 0);
+ return;
+ }
+
assert(w == &s->timer_watch);

switch (s->state) {
diff --git a/src/service.h b/src/service.h
index dbae68b..d6a8290 100644
--- a/src/service.h
+++ b/src/service.h
@@ -101,6 +101,10 @@ struct Service {
usec_t timeout_usec;

dual_timestamp watchdog_timestamp;
+ usec_t watchdog_restart_usec;
+ usec_t watchdog_reboot_usec;
+ Watch watchdog_restart_watch;
+ Watch watchdog_reboot_watch;

ExecCommand* exec_command[_SERVICE_EXEC_COMMAND_MAX];
ExecContext exec_context;

--
1.7.7.3

Lennart Poettering

2012-02-01 18:42:16 UTC

Post by Michael Olbrich
This patch adds the WatchdogRestartSec and WatchdogRebootSec
properties to services. Systemd will restart the service / reboot the
system if the watchdog timeout has not been updated for the configured
amount of time.
This functionality is only enabled if the watchdog timeout is set at
least once.

Do we really need two timers for this? To me it appears more natural to
introduce one timer, and one option to configure what should happen if
the timeout is reached, simply because we might end up with more than
just two options here (in fact, already I figure we need four...):

WatchdogSec=...
WatchdogAction=restart|reboot|reboot-force|reboot-immediate

Where:

restart = restart the service
reboot = reboot the the machine cleanly, i.e. with shutting down all
services and unmounting/syncing all disks, via initrd if used
reboot-force = reboot the machine semi-cleanly, i.e. don't shut down
anything, but still unmount/sync disks and initrd
reboot-immediate = reboot the machine immediately, i.e. don't do
anything, but calling the reboot() system call right-away.

Does that make sense? Or can you make a case for having individual
timeouts for these?

Lennart

--
Lennart Poettering - Red Hat, Inc.

Michael Olbrich

2012-02-01 19:11:28 UTC

Do we really need two timers for this? To me it appears more natural to
introduce one timer, and one option to configure what should happen if
the timeout is reached, simply because we might end up with more than
WatchdogSec=...
WatchdogAction=restart|reboot|reboot-force|reboot-immediate
restart = restart the service
reboot = reboot the the machine cleanly, i.e. with shutting down all
services and unmounting/syncing all disks, via initrd if used
reboot-force = reboot the machine semi-cleanly, i.e. don't shut down
anything, but still unmount/sync disks and initrd
reboot-immediate = reboot the machine immediately, i.e. don't do
anything, but calling the reboot() system call right-away.
Does that make sense? Or can you make a case for having individual
timeouts for these?

The problem is, that restart and reboot are used to recover from two
different error sources. Restart is of errors in the service itself. Reboot
is to recover from outside problems.
The idea is to have multiple escalation layers:
Lets say the basic error is, that reading from a block device will block
forever (hardware or driver issue).
- First we try to restart the application. This will fail again.
- Next we try to reboot. This might work, if the block device is not needed
to shutdown the system
- If systemd is affected as well, then at some point the hardware watchdog
triggers. Hopefully after rebooting the block device will work again.

If just one action is required, that can be achieved with the appropriate
timeouts.

Michael

Lennart Poettering

2012-02-01 22:26:09 UTC

Do we really need two timers for this? To me it appears more natural to
introduce one timer, and one option to configure what should happen if
the timeout is reached, simply because we might end up with more than
WatchdogSec=...
WatchdogAction=restart|reboot|reboot-force|reboot-immediate
restart = restart the service
reboot = reboot the the machine cleanly, i.e. with shutting down all
services and unmounting/syncing all disks, via initrd if used
reboot-force = reboot the machine semi-cleanly, i.e. don't shut down
anything, but still unmount/sync disks and initrd
reboot-immediate = reboot the machine immediately, i.e. don't do
anything, but calling the reboot() system call right-away.
Does that make sense? Or can you make a case for having individual
timeouts for these?

The problem is, that restart and reboot are used to recover from two
different error sources. Restart is of errors in the service itself. Reboot
is to recover from outside problems.
Lets say the basic error is, that reading from a block device will block
forever (hardware or driver issue).
- First we try to restart the application. This will fail again.
- Next we try to reboot. This might work, if the block device is not needed
to shutdown the system
- If systemd is affected as well, then at some point the hardware watchdog
triggers. Hopefully after rebooting the block device will work again.
If just one action is required, that can be achieved with the appropriate
timeouts.

Hmm, so I think this should work differently. We already have a failure
logic for services, and if a service fails to send us WATCHDOG=1 in time
we should make use of that. That way, if service start up times out, if
a service crashes or exits with a return value != 0 or if it doesn't
send watchdog requests will be handled the same and would take advantage
of the normal, already existing restart logic, that has the holdoff time
implemented and everything.

And then, on top of that we should have a configurable restart
ratelimiter, that can be configured to disable restarting if triggered,
or trigger a reboot. Putting all that together we should provide more or
less what you are asking for but also have the benefit that the hold off
time applies and that we can benefit from the reboot logic also for
services that fail due to some other reason.

Does that make sense?

Lennart

--
Lennart Poettering - Red Hat, Inc.

Michael Olbrich

2012-02-02 12:07:23 UTC

Do we really need two timers for this? To me it appears more natural to
introduce one timer, and one option to configure what should happen if
the timeout is reached, simply because we might end up with more than
WatchdogSec=...
WatchdogAction=restart|reboot|reboot-force|reboot-immediate
restart = restart the service
reboot = reboot the the machine cleanly, i.e. with shutting down all
services and unmounting/syncing all disks, via initrd if used
reboot-force = reboot the machine semi-cleanly, i.e. don't shut down
anything, but still unmount/sync disks and initrd
reboot-immediate = reboot the machine immediately, i.e. don't do
anything, but calling the reboot() system call right-away.
Does that make sense? Or can you make a case for having individual
timeouts for these?

The problem is, that restart and reboot are used to recover from two
different error sources. Restart is of errors in the service itself. Reboot
is to recover from outside problems.
Lets say the basic error is, that reading from a block device will block
forever (hardware or driver issue).
- First we try to restart the application. This will fail again.
- Next we try to reboot. This might work, if the block device is not needed
to shutdown the system
- If systemd is affected as well, then at some point the hardware watchdog
triggers. Hopefully after rebooting the block device will work again.
If just one action is required, that can be achieved with the appropriate
timeouts.

So instead of just restarting the service, we set the state to "failed".
Then the existing restart-on-failure can kick in. Did I understand that
correctly?
Still, we need to configure what "a service fails to send us WATCHDOG=1 in
time" means. A timeout needs to be defined somewhere.
I see this in the service file, or maybe send "WATCHDOG=10s" and "in time"
means in the next 10 seconds, or something like that?

Post by Lennart Poettering
And then, on top of that we should have a configurable restart
ratelimiter, that can be configured to disable restarting if triggered,
or trigger a reboot. Putting all that together we should provide more or
less what you are asking for but also have the benefit that the hold off
time applies and that we can benefit from the reboot logic also for
services that fail due to some other reason.
Does that make sense?

So we try to restart the failed service. If we run into the startup timeout
we reboot. This should work correctly if the service sends the first
WATCHDOG=1 with the READY=1.

Still, this depends on a correctly working systemd. What we need is an
integrity check that is executed before writing the keep-alive to the
hardware watchdog.

Michael

Lennart Poettering

2012-02-02 18:42:46 UTC

Post by Lennart Poettering
Hmm, so I think this should work differently. We already have a failure
logic for services, and if a service fails to send us WATCHDOG=1 in time
we should make use of that. That way, if service start up times out, if
a service crashes or exits with a return value != 0 or if it doesn't
send watchdog requests will be handled the same and would take advantage
of the normal, already existing restart logic, that has the holdoff time
implemented and everything.

So instead of just restarting the service, we set the state to
"failed".

Yes, pretty much that. Currently failed is a boolean. We probably should
make that an enum though, so that we can distuingish the failure
reasons.

Post by Michael Olbrich
Then the existing restart-on-failure can kick in. Did I understand that
correctly?

Yes.

Post by Michael Olbrich
Still, we need to configure what "a service fails to send us WATCHDOG=1 in
time" means. A timeout needs to be defined somewhere.
I see this in the service file, or maybe send "WATCHDOG=10s" and "in time"
means in the next 10 seconds, or something like that?

I think a WatchdogSec= would make a ton of sense for this. If set to 0
(the default) we'd not have any watchdog logic, but if it is set for a
service it needs to send a message at least this often.

Hmm, it might make sense to pass the information that we expect a
watchdog msg in a certain interval also to the executed processes via an
env var.

Maybe we should just set WATCHDOG_USEC=4711000000 when spawning a process
with WatchdogSec=4711 set? That way, a the usual event loops of the
various services might just parse this env var and configure its wakeups
for it.

So we try to restart the failed service. If we run into the startup timeout
we reboot. This should work correctly if the service sends the first
WATCHDOG=1 with the READY=1.

Well, we'd try to restart a failed service, and if it fails too often in
a certain time frame we'd either just stop restarting it or reboot with
any of the three actions.

Post by Michael Olbrich
Still, this depends on a correctly working systemd. What we need is an
integrity check that is executed before writing the keep-alive to the
hardware watchdog.

Yes, I think I understand your need. Just haven't wrapped my head around
how this should really work in the end. For now I'd like to see this as
different steps:

a) track the watchdog alive messages (already merged)

b) hook up the watchdog with the existing failure logic and introduce
WatchdogSec for that.

c) pass watchdog frequency to executed processes via env var

d) replace failure boolean with an enum

e) extend start logic to do configurable rate limiting of starts,
plus optionally reboots when rate limiter is triggered. New options for
this:

StartLimitInterval=5min
StartLimitBurst=5000
StartLimitAction=none|reboot|reboot-force|reboot-immediate

f) hookup /dev/watchdog with all of this

Of these b) and c) should be fairly easy. I will implement d) in the
next half hour. e) is fairly easy since we already have ratelimit.c
which can be trivially reused i guess, and we actually already have a
static ratelimiter for this built-in, so this is mostly about adding
configurability and the reboot options to it. f) I am not entirely sure
how this should look in the end, need to think about this more.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Lennart Poettering

2012-02-03 04:07:53 UTC

Post by Lennart Poettering
a) track the watchdog alive messages (already merged)
b) hook up the watchdog with the existing failure logic and introduce
WatchdogSec for that.
c) pass watchdog frequency to executed processes via env var
d) replace failure boolean with an enum
e) extend start logic to do configurable rate limiting of starts,
plus optionally reboots when rate limiter is triggered. New options for
StartLimitInterval=5min
StartLimitBurst=5000
StartLimitAction=none|reboot|reboot-force|reboot-immediate
f) hookup /dev/watchdog with all of this
Of these b) and c) should be fairly easy. I will implement d) in the
next half hour. e) is fairly easy since we already have ratelimit.c
which can be trivially reused i guess, and we actually already have a
static ratelimiter for this built-in, so this is mostly about adding
configurability and the reboot options to it. f) I am not entirely sure
how this should look in the end, need to think about this more.

I ave now implemented d), it's available in git. There's now a
ServiceResult enum in service.h which should be extended for
SERVICE_FAILURE_WATCHDOG.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Michael Olbrich

2012-02-03 20:14:25 UTC

Signed-off-by: Michael Olbrich <***@pengutronix.de>
---
src/dbus-service.c | 7 +++++++
src/load-fragment-gperf.gperf.m4 | 3 +++
src/load-fragment.c | 1 +
src/load-fragment.h | 1 +
src/service.c | 14 +++++++++++++-
src/service.h | 16 ++++++++++++++++
6 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/src/dbus-service.c b/src/dbus-service.c
index fedfc1d..da9510b 100644
--- a/src/dbus-service.c
+++ b/src/dbus-service.c
@@ -46,6 +46,9 @@
" <property name=\"WatchdogUSec\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"WatchdogTimestamp\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"WatchdogTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"StartLimitInterval\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"StartLimitBurst\" type=\"u\" access=\"read\"/>\n" \
+ " <property name=\"StartLimitAction\" type=\"s\" access=\"read\"/>\n" \
BUS_EXEC_COMMAND_INTERFACE("ExecStartPre") \
BUS_EXEC_COMMAND_INTERFACE("ExecStart") \
BUS_EXEC_COMMAND_INTERFACE("ExecStartPost") \
@@ -101,6 +104,7 @@ static DEFINE_BUS_PROPERTY_APPEND_ENUM(bus_service_append_type, service_type, Se
static DEFINE_BUS_PROPERTY_APPEND_ENUM(bus_service_append_restart, service_restart, ServiceRestart);
static DEFINE_BUS_PROPERTY_APPEND_ENUM(bus_service_append_notify_access, notify_access, NotifyAccess);
static DEFINE_BUS_PROPERTY_APPEND_ENUM(bus_service_append_service_result, service_result, ServiceResult);
+static DEFINE_BUS_PROPERTY_APPEND_ENUM(bus_service_append_start_limit_action, start_limit_action, StartLimitAction);

static const BusProperty bus_exec_main_status_properties[] = {
{ "ExecMainStartTimestamp", bus_property_append_usec, "t", offsetof(ExecStatus, start_timestamp.realtime) },
@@ -123,6 +127,9 @@ static const BusProperty bus_service_properties[] = {
{ "WatchdogUSec", bus_property_append_usec, "t", offsetof(Service, watchdog_usec) },
{ "WatchdogTimestamp", bus_property_append_usec, "t", offsetof(Service, watchdog_timestamp.realtime) },
{ "WatchdogTimestampMonotonic",bus_property_append_usec, "t", offsetof(Service, watchdog_timestamp.monotonic) },
+ { "StartLimitInterval", bus_property_append_usec, "t", offsetof(Service, start_limit_interval) },
+ { "StartLimitBurst", bus_property_append_uint32, "u", offsetof(Service, start_limit_burst) },
+ { "StartLimitAction", bus_service_append_start_limit_action,"s", offsetof(Service, start_limit_action) },
BUS_EXEC_COMMAND_PROPERTY("ExecStartPre", offsetof(Service, exec_command[SERVICE_EXEC_START_PRE]), true ),
BUS_EXEC_COMMAND_PROPERTY("ExecStart", offsetof(Service, exec_command[SERVICE_EXEC_START]), true ),
BUS_EXEC_COMMAND_PROPERTY("ExecStartPost", offsetof(Service, exec_command[SERVICE_EXEC_START_POST]), true ),
diff --git a/src/load-fragment-gperf.gperf.m4 b/src/load-fragment-gperf.gperf.m4
index 9191f90..c93a757 100644
--- a/src/load-fragment-gperf.gperf.m4
+++ b/src/load-fragment-gperf.gperf.m4
@@ -135,6 +135,9 @@ Service.ExecStopPost, config_parse_exec, SERVICE_EXE
Service.RestartSec, config_parse_usec, 0, offsetof(Service, restart_usec)
Service.TimeoutSec, config_parse_usec, 0, offsetof(Service, timeout_usec)
Service.WatchdogSec, config_parse_usec, 0, offsetof(Service, watchdog_usec)
+Service.StartLimitInterval, config_parse_usec, 0, offsetof(Service, start_limit_interval)
+Service.StartLimitBurst, config_parse_unsigned, 0, offsetof(Service, start_limit_burst)
+Service.StartLimitAction, config_parse_start_limit_action, 0, offsetof(Service, start_limit_action)
Service.Type, config_parse_service_type, 0, offsetof(Service, type)
Service.Restart, config_parse_service_restart, 0, offsetof(Service, restart)
Service.PermissionsStartOnly, config_parse_bool, 0, offsetof(Service, permissions_start_only)
diff --git a/src/load-fragment.c b/src/load-fragment.c
index b2d43fb..b963d7f 100644
--- a/src/load-fragment.c
+++ b/src/load-fragment.c
@@ -1648,6 +1648,7 @@ int config_parse_unit_condition_null(
}

DEFINE_CONFIG_PARSE_ENUM(config_parse_notify_access, notify_access, NotifyAccess, "Failed to parse notify access specifier");
+DEFINE_CONFIG_PARSE_ENUM(config_parse_start_limit_action, start_limit_action, StartLimitAction, "Failed to parse start limit action specifier");

int config_parse_unit_cgroup_attr(
const char *filename,
diff --git a/src/load-fragment.h b/src/load-fragment.h
index fbb31f9..79fc76d 100644
--- a/src/load-fragment.h
+++ b/src/load-fragment.h
@@ -76,6 +76,7 @@ int config_parse_unit_condition_string(const char *filename, unsigned line, cons
int config_parse_unit_condition_null(const char *filename, unsigned line, const char *section, const char *lvalue, int ltype, const char *rvalue, void *data, void *userdata);
int config_parse_kill_mode(const char *filename, unsigned line, const char *section, const char *lvalue, int ltype, const char *rvalue, void *data, void *userdata);
int config_parse_notify_access(const char *filename, unsigned line, const char *section, const char *lvalue, int ltype, const char *rvalue, void *data, void *userdata);
+int config_parse_start_limit_action(const char *filename, unsigned line, const char *section, const char *lvalue, int ltype, const char *rvalue, void *data, void *userdata);
int config_parse_unit_cgroup_attr(const char *filename, unsigned line, const char *section, const char *lvalue, int ltype, const char *rvalue, void *data, void *userdata);
int config_parse_unit_cpu_shares(const char *filename, unsigned line, const char *section, const char *lvalue, int ltype, const char *rvalue, void *data, void *userdata);
int config_parse_unit_memory_limit(const char *filename, unsigned line, const char *section, const char *lvalue, int ltype, const char *rvalue, void *data, void *userdata);
diff --git a/src/service.c b/src/service.c
index 9bd3332..c85c62f 100644
--- a/src/service.c
+++ b/src/service.c
@@ -115,6 +115,9 @@ static void service_init(Unit *u) {

s->watchdog_watch.type = WATCH_INVALID;

+ s->start_limit_interval = 10*USEC_PER_SEC;
+ s->start_limit_burst = 5;
+
s->timer_watch.type = WATCH_INVALID;
#ifdef HAVE_SYSV_COMPAT
s->sysv_start_priority = -1;
@@ -125,7 +128,8 @@ static void service_init(Unit *u) {

exec_context_init(&s->exec_context);

- RATELIMIT_INIT(s->ratelimit, 10*USEC_PER_SEC, 5);
+ /* FIXME: when are the .service file values available? */
+ RATELIMIT_INIT(s->ratelimit, s->start_limit_interval, s->start_limit_burst);

s->control_command_id = _SERVICE_EXEC_COMMAND_INVALID;
}
@@ -3652,6 +3656,14 @@ static const char* const service_result_table[_SERVICE_RESULT_MAX] = {

DEFINE_STRING_TABLE_LOOKUP(service_result, ServiceResult);

+static const char* const start_limit_action_table[_SERVICE_START_LIMIT_MAX] = {
+ [SERVICE_START_LIMIT_NONE] = "none",
+ [SERVICE_START_LIMIT_REBOOT] = "reboot",
+ [SERVICE_START_LIMIT_REBOOT_FORCE] = "reboot-force",
+ [SERVICE_START_LIMIT_REBOOT_IMMEDIATE] = "reboot-immediate"
+};
+DEFINE_STRING_TABLE_LOOKUP(start_limit_action, StartLimitAction);
+
const UnitVTable service_vtable = {
.suffix = ".service",
.object_size = sizeof(Service),
diff --git a/src/service.h b/src/service.h
index 32341f0..752d905 100644
--- a/src/service.h
+++ b/src/service.h
@@ -100,6 +100,15 @@ typedef enum ServiceResult {
_SERVICE_RESULT_INVALID = -1
} ServiceResult;

+typedef enum StartLimitAction {
+ SERVICE_START_LIMIT_NONE,
+ SERVICE_START_LIMIT_REBOOT,
+ SERVICE_START_LIMIT_REBOOT_FORCE,
+ SERVICE_START_LIMIT_REBOOT_IMMEDIATE,
+ _SERVICE_START_LIMIT_MAX,
+ _SERVICE_START_LIMIT_INVALID = -1
+} StartLimitAction;
+
struct Service {
Unit meta;

@@ -116,6 +125,10 @@ struct Service {
usec_t watchdog_usec;
Watch watchdog_watch;

+ usec_t start_limit_interval;
+ unsigned start_limit_burst;
+ StartLimitAction start_limit_action;
+
ExecCommand* exec_command[_SERVICE_EXEC_COMMAND_MAX];
ExecContext exec_context;

@@ -203,4 +216,7 @@ NotifyAccess notify_access_from_string(const char *s);
const char* service_result_to_string(ServiceResult i);
ServiceResult service_result_from_string(const char *s);

+const char* start_limit_action_to_string(StartLimitAction i);
+StartLimitAction start_limit_action_from_string(const char *s);
+
#endif

--
1.7.8.3

Lennart Poettering

2012-02-07 14:39:34 UTC

Post by Michael Olbrich
+ s->start_limit_interval = 10*USEC_PER_SEC;
+ s->start_limit_burst = 5;
+

Just use RATELIMIT_INIT() here, as previously.

Post by Michael Olbrich
s->timer_watch.type = WATCH_INVALID;
#ifdef HAVE_SYSV_COMPAT
s->sysv_start_priority = -1;
@@ -125,7 +128,8 @@ static void service_init(Unit *u) {
exec_context_init(&s->exec_context);
- RATELIMIT_INIT(s->ratelimit, 10*USEC_PER_SEC, 5);
+ /* FIXME: when are the .service file values available? */
+ RATELIMIT_INIT(s->ratelimit, s->start_limit_interval,
s->start_limit_burst);

The logic here is that _init() is called immediately after malloc()ing
the Service object. It should just do constant initialization of
everything where just initialization to 0 is not good enough.

Then, later on the actual configuration is applied to the Service
object, overriding these static defaults, directly via the calls in
load-fragment.c. In fact, you should just modify load-fragment.c to
directly modify the ratelimit struct here, there's no need to
duplicate the variables in it. RateLimit is not an opaque struct.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Michael Olbrich

2012-02-03 20:14:23 UTC

---
man/systemd.service.xml | 17 +++++++++++++++++
src/dbus-service.c | 2 ++
src/load-fragment-gperf.gperf.m4 | 1 +
src/service.c | 30 ++++++++++++++++++++++++++++++
src/service.h | 3 +++
5 files changed, 53 insertions(+), 0 deletions(-)

diff --git a/man/systemd.service.xml b/man/systemd.service.xml
index 0baddd1..bf6a5ea 100644
--- a/man/systemd.service.xml
+++ b/man/systemd.service.xml
@@ -460,6 +460,23 @@
</varlistentry>

<varlistentry>
+ <term><varname>WatchdogSec=</varname></term>
+ <listitem><para>Configures the watchdog
+ timeout for a service. This is activated
+ with the first
+ <citerefentry><refentrytitle>sd_notify</refentrytitle><manvolnum>3</manvolnum></citerefentry>
+ call with "WATCHDOG=1". If the time
+ between two such calls is larger than
+ the configured time then the service
+ enters a failure state. By setting
+ <term><varname>Restart=</varname></term>
+ to <option>on-failure</option> or
+ <option>always</option> the service can
+ be restarted. Defaults to 0s, which
+ disables this feature.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><varname>Restart=</varname></term>
<listitem><para>Configures whether the
main service process shall be
diff --git a/src/dbus-service.c b/src/dbus-service.c
index 738dc7b..fedfc1d 100644
--- a/src/dbus-service.c
+++ b/src/dbus-service.c
@@ -43,6 +43,7 @@
" <property name=\"NotifyAccess\" type=\"s\" access=\"read\"/>\n" \
" <property name=\"RestartUSec\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"TimeoutUSec\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogUSec\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"WatchdogTimestamp\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"WatchdogTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
BUS_EXEC_COMMAND_INTERFACE("ExecStartPre") \
@@ -119,6 +120,7 @@ static const BusProperty bus_service_properties[] = {
{ "NotifyAccess", bus_service_append_notify_access, "s", offsetof(Service, notify_access) },
{ "RestartUSec", bus_property_append_usec, "t", offsetof(Service, restart_usec) },
{ "TimeoutUSec", bus_property_append_usec, "t", offsetof(Service, timeout_usec) },
+ { "WatchdogUSec", bus_property_append_usec, "t", offsetof(Service, watchdog_usec) },
{ "WatchdogTimestamp", bus_property_append_usec, "t", offsetof(Service, watchdog_timestamp.realtime) },
{ "WatchdogTimestampMonotonic",bus_property_append_usec, "t", offsetof(Service, watchdog_timestamp.monotonic) },
BUS_EXEC_COMMAND_PROPERTY("ExecStartPre", offsetof(Service, exec_command[SERVICE_EXEC_START_PRE]), true ),
diff --git a/src/load-fragment-gperf.gperf.m4 b/src/load-fragment-gperf.gperf.m4
index 14c0606..9191f90 100644
--- a/src/load-fragment-gperf.gperf.m4
+++ b/src/load-fragment-gperf.gperf.m4
@@ -134,6 +134,7 @@ Service.ExecStop, config_parse_exec, SERVICE_EXE
Service.ExecStopPost, config_parse_exec, SERVICE_EXEC_STOP_POST, offsetof(Service, exec_command)
Service.RestartSec, config_parse_usec, 0, offsetof(Service, restart_usec)
Service.TimeoutSec, config_parse_usec, 0, offsetof(Service, timeout_usec)
+Service.WatchdogSec, config_parse_usec, 0, offsetof(Service, watchdog_usec)
Service.Type, config_parse_service_type, 0, offsetof(Service, type)
Service.Restart, config_parse_service_restart, 0, offsetof(Service, restart)
Service.PermissionsStartOnly, config_parse_bool, 0, offsetof(Service, permissions_start_only)
diff --git a/src/service.c b/src/service.c
index b6bbfab..f030ccf 100644
--- a/src/service.c
+++ b/src/service.c
@@ -112,6 +112,9 @@ static void service_init(Unit *u) {

s->timeout_usec = DEFAULT_TIMEOUT_USEC;
s->restart_usec = DEFAULT_RESTART_USEC;
+
+ s->watchdog_watch.type = WATCH_INVALID;
+
s->timer_watch.type = WATCH_INVALID;
#ifdef HAVE_SYSV_COMPAT
s->sysv_start_priority = -1;
@@ -208,14 +211,27 @@ static void service_connection_unref(Service *s) {
static void service_stop_watchdog(Service *s) {
assert(s);

+ unit_unwatch_timer(UNIT(s), &s->watchdog_watch);
s->watchdog_timestamp.realtime = 0;
s->watchdog_timestamp.monotonic = 0;
}

+static void service_setup_watchdog_timer(Service *s, usec_t offset) {
+ int r;
+ assert(s);
+
+ if (s->watchdog_usec) {
+ r = unit_watch_timer(UNIT(s), s->watchdog_usec - offset, &s->watchdog_watch);
+ if (r < 0)
+ log_warning("%s failed to install watchdog timer: %s", UNIT(s)->id, strerror(-r));
+ }
+}
+
static void service_reset_watchdog(Service *s) {
assert(s);

dual_timestamp_get(&s->watchdog_timestamp);
+ service_setup_watchdog_timer(s, 0);
}

static void service_done(Unit *u) {
@@ -259,6 +275,8 @@ static void service_done(Unit *u) {

unit_ref_unset(&s->accept_socket);

+ service_stop_watchdog(s);
+
unit_unwatch_timer(u, &s->timer_watch);
}

@@ -1570,6 +1588,12 @@ static int service_coldplug(Unit *u) {

service_set_state(s, s->deserialized_state);
}
+ if (dual_timestamp_is_set(&s->watchdog_timestamp)) {
+ usec_t t;
+
+ t = now(CLOCK_MONOTONIC);
+ service_setup_watchdog_timer(s, t - s->watchdog_timestamp.monotonic);
+ }

return 0;
}
@@ -2922,6 +2946,12 @@ static void service_timer_event(Unit *u, uint64_t elapsed, Watch* w) {
assert(s);
assert(elapsed == 1);

+ if (w == &s->watchdog_watch) {
+ log_error("%s watchdog timeout: ...", u->id);
+ service_enter_dead(s, SERVICE_FAILURE_WATCHDOG_TIMEOUT, true);
+ return;
+ }
+
assert(w == &s->timer_watch);

switch (s->state) {
diff --git a/src/service.h b/src/service.h
index b1e8b90..32341f0 100644
--- a/src/service.h
+++ b/src/service.h
@@ -95,6 +95,7 @@ typedef enum ServiceResult {
SERVICE_FAILURE_EXIT_CODE,
SERVICE_FAILURE_SIGNAL,
SERVICE_FAILURE_CORE_DUMP,
+ SERVICE_FAILURE_WATCHDOG_TIMEOUT,
_SERVICE_RESULT_MAX,
_SERVICE_RESULT_INVALID = -1
} ServiceResult;
@@ -112,6 +113,8 @@ struct Service {
usec_t timeout_usec;

dual_timestamp watchdog_timestamp;
+ usec_t watchdog_usec;
+ Watch watchdog_watch;

ExecCommand* exec_command[_SERVICE_EXEC_COMMAND_MAX];
ExecContext exec_context;

--
1.7.8.3

Michael Olbrich

2012-02-07 12:18:11 UTC

Hi,

I've been thinking a bit more about this. There are some problems with
this. Consider a service that works like this:

do_setup()
sd_notify("READY=1")
while (true) {
do_work()
sd_notify("WATCHDOG=1")
}

Now the error is, that do_work() never finishes, even after restart.
First the service is restarted. That works (we get "READY=1"). The old
timeout is still there, but the watchdog timer is never started and no
further action is taken. Not good.
I see multiple options here:

1. Clear the timeout on restart, and require the service to send
"WATCHDOG=1" when started.
2. Consider "READY=1" (and any other startup notification) as the first
"WATCHDOG=1", when WatchdogSec ist set.
3. Same as 2. but only when restarting and at least on "WATCHDOG=1" was
received before.
4. Add a config option for 1., 2., 3.

I don't like 1.: For services with socket or D-Bus interfaces I'd like to
make it possible to send "WATCHDOG=1" from a separate process. Fork it at
startup and access the service via its actual interface and send
"WATCHDOG=1" as appropriate. This makes it possible to monitor services
that do not support it, without code modifications.
The startup notification comes from the actual service (e.g. via BusName)
and cannot be combined with the first "WATCHDOG=1".

2. might be a bit too restrictive for more relaxed scenarios, especially
when combined with rebooting.

And 3. is just the opposite. It's not enough for critical services. Also, I
think there is a bit too much magic here for my taste.

I'd probably prefer 4. with with an option to select between 1. and 2.

Comments?

Regards,
Michael

Lennart Poettering

2012-02-07 14:24:27 UTC

Post by Michael Olbrich
Hi,
I've been thinking a bit more about this. There are some problems with
2. might be a bit too restrictive for more relaxed scenarios, especially
when combined with rebooting.
And 3. is just the opposite. It's not enough for critical services. Also, I
think there is a bit too much magic here for my taste.
I'd probably prefer 4. with with an option to select between 1. and 2.
Comments?

The startup phase (o.e. until READY=1) is covered by the usual timeout
logic anyway, so I'd just say that we shouldn't even wait for the first
WATCHDOG=1 to be sent, but immediately start the watchdog when we leave
the startup phase, if WatchdogSec= is set. That way, the startup phase
is covered by TimeoutSec= and the runtime by WatchdogSec= and beyond
these settings not further conditionalized. I think this would be simple
and straightforward to understand?

Lennart

--
Lennart Poettering - Red Hat, Inc.

Lennart Poettering

2012-02-07 14:33:37 UTC

Post by Michael Olbrich
switch (s->state) {
diff --git a/src/service.h b/src/service.h
index b1e8b90..32341f0 100644
--- a/src/service.h
+++ b/src/service.h
@@ -95,6 +95,7 @@ typedef enum ServiceResult {
SERVICE_FAILURE_EXIT_CODE,
SERVICE_FAILURE_SIGNAL,
SERVICE_FAILURE_CORE_DUMP,
+ SERVICE_FAILURE_WATCHDOG_TIMEOUT,

I'd just shorten this to "watchdog" instead of "watchdog-timeout".

You also need to add this new enum entry to the string array for it down
in service.c.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Michael Olbrich

2012-02-03 20:14:22 UTC

Post by Lennart Poettering
b) hook up the watchdog with the existing failure logic and introduce
WatchdogSec for that.

This is the first patch. It works, but I'm not sure about the
implementation. I tried to use the existing Watch. However, as far as I can
tell there are collisions e.g. with exec post handling.

Post by Lennart Poettering
c) pass watchdog frequency to executed processes via env var

The second patch. This is rather trivial.

Post by Lennart Poettering
e) extend start logic to do configurable rate limiting of starts,
plus optionally reboots when rate limiter is triggered. New options for
StartLimitInterval=5min
StartLimitBurst=5000
StartLimitAction=none|reboot|reboot-force|reboot-immediate

The third patch. This is work in progress. Just the variables so far. I
tried to follow other examples. Not sure if I did everything correctly.
And I could not find any place to initialize the rate limiter with the
configured values. They are available when querying via dbus but not in any
initialization function that I could find. Where would I do that?

And can you elaborate reboot, reboot-force and reboot-immediate exactly
mean?

Michael

Michael Olbrich (3):
introduce WatchdogSec and hook up the watchdog with the existing
failure logic
set WATCHDOG_USEC environmen variable
WIP: service: add StartLimitInterval/StartLimitBurst/StartLimitAction

man/systemd.service.xml | 17 +++++++++++++
src/dbus-service.c | 9 +++++++
src/load-fragment-gperf.gperf.m4 | 4 +++
src/load-fragment.c | 1 +
src/load-fragment.h | 1 +
src/service.c | 50 +++++++++++++++++++++++++++++++++++++-
src/service.h | 19 ++++++++++++++
7 files changed, 100 insertions(+), 1 deletions(-)

--
1.7.8.3

Michael Olbrich

2012-02-03 20:14:24 UTC

---
src/service.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/src/service.c b/src/service.c
index f030ccf..9bd3332 100644
--- a/src/service.c
+++ b/src/service.c
@@ -1726,6 +1726,12 @@ static int service_spawn(
goto fail;
}

+ if (s->watchdog_usec > 0)
+ if (asprintf(our_env + n_env++, "WATCHDOG_USEC=%llu", (unsigned long long) s->watchdog_usec) < 0) {
+ r = -ENOMEM;
+ goto fail;
+ }
+
if (!(final_env = strv_env_merge(2,
UNIT(s)->manager->environment,
our_env,

--
1.7.8.3

Lennart Poettering

2012-02-07 14:25:32 UTC

Post by Michael Olbrich
---
src/service.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/src/service.c b/src/service.c
index f030ccf..9bd3332 100644
--- a/src/service.c
+++ b/src/service.c
@@ -1726,6 +1726,12 @@ static int service_spawn(
goto fail;
}
+ if (s->watchdog_usec > 0)
+ if (asprintf(our_env + n_env++, "WATCHDOG_USEC=%llu", (unsigned long long) s->watchdog_usec) < 0) {
+ r = -ENOMEM;
+ goto fail;
+ }
+

our_env needs to be allocated larger for this. You need to increase the
new0() line to 5 instead of the current 4 char* objects.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Michael Olbrich

2012-02-07 14:57:51 UTC

our_env needs to be allocated larger for this. You need to increase the
new0() line to 5 instead of the current 4 char* objects.

Not in this case I think. An earlier commit removed on env variable without
decreasing the new0() line. with this patch we have 3 possible env
variables here: NOTIFY_SOCKET, MAINPID and WATCHDOG_USEC, so we need 4
char* objects, right?

Michael

Lennart Poettering

2012-02-07 15:15:53 UTC

Post by Lennart Poettering
our_env needs to be allocated larger for this. You need to increase the
new0() line to 5 instead of the current 4 char* objects.

Oh, you are right, indeed!

Sorry for the confusion.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Lennart Poettering

2012-02-07 14:29:54 UTC

Post by Michael Olbrich
The third patch. This is work in progress. Just the variables so far. I
tried to follow other examples. Not sure if I did everything correctly.
And I could not find any place to initialize the rate limiter with the
configured values. They are available when querying via dbus but not in any
initialization function that I could find. Where would I do that?
And can you elaborate reboot, reboot-force and reboot-immediate exactly
mean?

reboot = just enqueue a job for reboot.target.

reboot-force = Just set m->exit_code to MANAGER_REBOOT. This will cause
the systemd event loop to terminate immeidately as your code returns
control to it, and then results in a reboot.

reboot-immediate = invoke reboot(LINUX_REBOOT_CMD_RESTART) right-away.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Michael Olbrich

2012-02-01 16:17:12 UTC

This patch adds WatchdogTimestamp[Monotonic] to the systemd service
D-Bus API. The timestamp is updated to the current time when the
service calls 'sd_nofity("WATCHDOG=1\n")'.
Using a timestamp instead of an 'alive' flag has two advantages:
1. No timeout is needed to define when a service is no longer alive.
This simplifies both configuration (no timeout value) and
implementation (no timeout event).
2. It is more robust. A 'dead' service might not be detected should
systemd 'forget' to reset an 'alive' flag. It is much less likely
to get a valid new timestamp if a service died.
---
changes in v2:
- adapt to changes from "d200735 dbus: more efficient implementation of properties"
- serialize timestamp (needed for correct daemon-reload/daemon-reload handling)

man/sd_notify.xml | 12 ++++++++++++
src/dbus-service.c | 5 +++++
src/service.c | 25 +++++++++++++++++++++++++
src/service.h | 2 ++
src/systemd/sd-daemon.h | 5 +++++
5 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/man/sd_notify.xml b/man/sd_notify.xml
index 0209146..9797a5f 100644
--- a/man/sd_notify.xml
+++ b/man/sd_notify.xml
@@ -151,6 +151,18 @@
itself. Example:
"MAINPID=4711"</para></listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>WATCHDOG=1</term>
+
+ <listitem><para>Tells systemd to
+ update the watchdog timestamp.
+ Services using this feature should do
+ this in regular intervals. A watchdog
+ framework can use the timestamps to
+ detect failed
+ services.</para></listitem>
+ </varlistentry>
</variablelist>

<para>It is recommended to prefix variable names that
diff --git a/src/dbus-service.c b/src/dbus-service.c
index e1f6370..d7529ec 100644
--- a/src/dbus-service.c
+++ b/src/dbus-service.c
@@ -43,6 +43,8 @@
" <property name=\"NotifyAccess\" type=\"s\" access=\"read\"/>\n" \
" <property name=\"RestartUSec\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"TimeoutUSec\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogTimestamp\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
BUS_EXEC_COMMAND_INTERFACE("ExecStartPre") \
BUS_EXEC_COMMAND_INTERFACE("ExecStart") \
BUS_EXEC_COMMAND_INTERFACE("ExecStartPost") \
@@ -86,6 +88,7 @@ const char bus_service_invalidating_properties[] =
"ExecStop\0"
"ExecStopPost\0"
"ExecMain\0"
+ "WatchdogTimestamp\0"
"MainPID\0"
"ControlPID\0"
"StatusText\0";
@@ -112,6 +115,8 @@ static const BusProperty bus_service_properties[] = {
{ "NotifyAccess", bus_service_append_notify_access, "s", offsetof(Service, notify_access) },
{ "RestartUSec", bus_property_append_usec, "t", offsetof(Service, restart_usec) },
{ "TimeoutUSec", bus_property_append_usec, "t", offsetof(Service, timeout_usec) },
+ { "WatchdogTimestamp", bus_property_append_usec, "t", offsetof(Service, watchdog_timestamp.realtime)},
+ { "WatchdogTimestampMonotonic",bus_property_append_usec, "t", offsetof(Service, watchdog_timestamp.monotonic)},
BUS_EXEC_COMMAND_PROPERTY("ExecStartPre", offsetof(Service, exec_command[SERVICE_EXEC_START_PRE]), true ),
BUS_EXEC_COMMAND_PROPERTY("ExecStart", offsetof(Service, exec_command[SERVICE_EXEC_START]), true ),
BUS_EXEC_COMMAND_PROPERTY("ExecStartPost", offsetof(Service, exec_command[SERVICE_EXEC_START_POST]), true ),
diff --git a/src/service.c b/src/service.c
index 4dcd306..e107179 100644
--- a/src/service.c
+++ b/src/service.c
@@ -205,6 +205,19 @@ static void service_connection_unref(Service *s) {
unit_ref_unset(&s->accept_socket);
}

+static void service_stop_watchdog(Service *s) {
+ assert(s);
+
+ s->watchdog_timestamp.realtime = 0;
+ s->watchdog_timestamp.monotonic = 0;
+}
+
+static void service_reset_watchdog(Service *s) {
+ assert(s);
+
+ dual_timestamp_get(&s->watchdog_timestamp);
+}
+
static void service_done(Unit *u) {
Service *s = SERVICE(u);

@@ -1476,6 +1489,9 @@ static void service_set_state(Service *s, ServiceState state) {
service_connection_unref(s);
}

+ if (state == SERVICE_STOP)
+ service_stop_watchdog(s);
+
/* For the inactive states unit_notify() will trim the cgroup,
* but for exit we have to do that ourselves... */
if (state == SERVICE_EXITED && UNIT(s)->manager->n_reloading <= 0)
@@ -2411,6 +2427,9 @@ static int service_serialize(Unit *u, FILE *f, FDSet *fds) {
unit_serialize_item_format(u, f, "main-exec-status-status", "%i", s->main_exec_status.status);
}
}
+ if (dual_timestamp_is_set(&s->watchdog_timestamp)) {
+ dual_timestamp_serialize(f, "watchdog-timestamp", &s->watchdog_timestamp);
+ }

return 0;
}
@@ -2511,6 +2530,8 @@ static int service_deserialize_item(Unit *u, const char *key, const char *value,
dual_timestamp_deserialize(value, &s->main_exec_status.start_timestamp);
else if (streq(key, "main-exec-status-exit"))
dual_timestamp_deserialize(value, &s->main_exec_status.exit_timestamp);
+ else if (streq(key, "watchdog-timestamp"))
+ dual_timestamp_deserialize(value, &s->watchdog_timestamp);
else
log_debug("Unknown serialization key '%s'", key);

@@ -3069,6 +3090,10 @@ static void service_notify_message(Unit *u, pid_t pid, char **tags) {
}

}
+ if (strv_find(tags, "WATCHDOG=1")) {
+ log_debug("%s: got WATCHDOG=1", u->id);
+ service_reset_watchdog(s);
+ }

/* Notify clients about changed status or main pid */
unit_add_to_dbus_queue(u);
diff --git a/src/service.h b/src/service.h
index 0b4f8be..dbae68b 100644
--- a/src/service.h
+++ b/src/service.h
@@ -100,6 +100,8 @@ struct Service {
usec_t restart_usec;
usec_t timeout_usec;

+ dual_timestamp watchdog_timestamp;
+
ExecCommand* exec_command[_SERVICE_EXEC_COMMAND_MAX];
ExecContext exec_context;

diff --git a/src/systemd/sd-daemon.h b/src/systemd/sd-daemon.h
index eb2a606..7b664bf 100644
--- a/src/systemd/sd-daemon.h
+++ b/src/systemd/sd-daemon.h
@@ -217,6 +217,11 @@ int sd_is_mq(int fd, const char *path);
MAINPID=... The main pid of a daemon, in case systemd did not
fork off the process itself. Example: "MAINPID=4711"

+ WATCHDOG=1 Tells systemd to update the watchdog timestamp.
+ Services using this feature should do this in
+ regular intervals. A watchdog framework can use the
+ timestamps to detect failed services.
+
Daemons can choose to send additional variables. However, it is
recommended to prefix variable names not listed above with X_.

--
1.7.7.3

Lennart Poettering

2012-02-01 18:36:18 UTC

On Wed, 01.02.12 17:17, Michael Olbrich (***@pengutronix.de) wrote:

heya,

sorry it took so much time to review these patches. And thanks for
staying on it.

Post by Michael Olbrich
This patch adds WatchdogTimestamp[Monotonic] to the systemd service
D-Bus API. The timestamp is updated to the current time when the
service calls 'sd_nofity("WATCHDOG=1\n")'.
1. No timeout is needed to define when a service is no longer alive.
This simplifies both configuration (no timeout value) and
implementation (no timeout event).
2. It is more robust. A 'dead' service might not be detected should
systemd 'forget' to reset an 'alive' flag. It is much less likely
to get a valid new timestamp if a service died.

Merged this one now, with some minor changes.

Thanks,

Lennart

--
Lennart Poettering - Red Hat, Inc.

Michael Olbrich

2012-02-01 16:17:14 UTC

This patch adds WatchdogRebootTimestamp[Monotonic] to the systemd
manager API. It contains the earliest point in time when systemd might
reboot the system because the timer for WatchdogRebootUSec for a
service expired.
If we assume the system takes Xus to shut down then
WatchdogRebootTimestamp + Xus should never be in the past. A watchdog
daemon handling the hardware watchdog can use this information to
determine when to let the hardware watchdog restart the system.
This is convenience information for a watchdog daemon. With this the
it can avoid a lot of D-Bus calls that are necessary to calculate the
same value.
---
changes in v2:
- adapt to changes from "d200735 dbus: more efficient implementation of properties"

src/dbus-manager.c | 39 +++++++++++++++++++++++++++++++++++++++
1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/src/dbus-manager.c b/src/dbus-manager.c
index 6d272cb..d49e947 100644
--- a/src/dbus-manager.c
+++ b/src/dbus-manager.c
@@ -223,6 +223,8 @@
" <property name=\"StartupTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"FinishTimestamp\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"FinishTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogRebootTimestamp\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogRebootTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"LogLevel\" type=\"s\" access=\"readwrite\"/>\n" \
" <property name=\"LogTarget\" type=\"s\" access=\"readwrite\"/>\n" \
" <property name=\"NNames\" type=\"u\" access=\"read\"/>\n" \
@@ -310,6 +312,41 @@ static int bus_manager_append_tainted(DBusMessageIter *i, const char *property,
return 0;
}

+static int bus_manager_append_watchdog(DBusMessageIter *i, const char *property, void *data) {
+ Iterator iter;
+ Unit *u;
+ Service *s;
+ const char *k;
+ uint64_t value;
+ dual_timestamp timestamp = {0, 0};
+ Manager *m = data;
+
+ assert(i);
+ assert(property);
+ assert(m);
+
+ HASHMAP_FOREACH_KEY(u, k, m->units, iter) {
+ if (!(s = SERVICE(u)))
+ continue;
+ if (!dual_timestamp_is_set(&s->watchdog_timestamp))
+ continue;
+ if (!dual_timestamp_is_set(&timestamp) || ((s->watchdog_timestamp.monotonic + s->watchdog_reboot_usec) < timestamp.monotonic)) {
+ timestamp.monotonic = s->watchdog_timestamp.monotonic + s->watchdog_reboot_usec;
+ timestamp.realtime = s->watchdog_timestamp.realtime + s->watchdog_reboot_usec;
+ }
+ }
+
+ if (strcmp(property, "WatchdogRebootTimestamp") == 0)
+ value = timestamp.realtime;
+ else
+ value = timestamp.monotonic;
+
+ if (!dbus_message_iter_append_basic(i, DBUS_TYPE_UINT64, &value))
+ return -ENOMEM;
+
+ return 0;
+}
+
static int bus_manager_append_log_target(DBusMessageIter *i, const char *property, void *data) {
const char *t;

@@ -509,6 +546,8 @@ static const BusProperty bus_manager_properties[] = {
{ "StartupTimestampMonotonic", bus_property_append_uint64, "t", offsetof(Manager, startup_timestamp.monotonic) },
{ "FinishTimestamp", bus_property_append_uint64, "t", offsetof(Manager, finish_timestamp.realtime) },
{ "FinishTimestampMonotonic", bus_property_append_uint64, "t", offsetof(Manager, finish_timestamp.monotonic) },
+ { "WatchdogRebootTimestamp", bus_manager_append_watchdog, "t", 0 },
+ { "WatchdogRebootTimestampMonotonic", bus_manager_append_watchdog, "t", 0 },
{ "LogLevel", bus_manager_append_log_level, "s", 0, 0, bus_manager_set_log_level },
{ "LogTarget", bus_manager_append_log_target, "s", 0, 0, bus_manager_set_log_target },
{ "NNames", bus_manager_append_n_names, "u", 0 },

--
1.7.7.3

Lennart Poettering

2012-02-01 19:05:27 UTC

Post by Michael Olbrich
This patch adds WatchdogRebootTimestamp[Monotonic] to the systemd
manager API. It contains the earliest point in time when systemd might
reboot the system because the timer for WatchdogRebootUSec for a
service expired.
If we assume the system takes Xus to shut down then
WatchdogRebootTimestamp + Xus should never be in the past. A watchdog
daemon handling the hardware watchdog can use this information to
determine when to let the hardware watchdog restart the system.
This is convenience information for a watchdog daemon. With this the
it can avoid a lot of D-Bus calls that are necessary to calculate the
same value.

I had a longer discussion with Kay about this the other day, i.e. how we
want the introduction of awatchdog look in the end. We kinda came to the
formula that systemd should supervise services and the hw watchdog
should supervise systemd. That way systemd would just write to the hw
watchdog in its core event loop (i.e. in PID1) but not make any
connection between the actual services and the hw watchdog. Now, your
idea seems to be that watchdog acts as a multiplexer for the hw watchdog
for services, right?

I am wondering now what the right way to handle this is in the
end. Would it make sense to drop the multiplexing thing for you, or is
there a strong case for doing this?

(As I figured out newer Intel chipsets all have watchdogs now, so I am
actually quite keen to see this implemented in systemd now, since I can
actually test it.)

Lennart

--
Lennart Poettering - Red Hat, Inc.

Michael Olbrich

2012-02-01 19:22:29 UTC

I had a longer discussion with Kay about this the other day, i.e. how we
want the introduction of awatchdog look in the end. We kinda came to the
formula that systemd should supervise services and the hw watchdog
should supervise systemd. That way systemd would just write to the hw
watchdog in its core event loop (i.e. in PID1) but not make any
connection between the actual services and the hw watchdog. Now, your
idea seems to be that watchdog acts as a multiplexer for the hw watchdog
for services, right?
I am wondering now what the right way to handle this is in the
end. Would it make sense to drop the multiplexing thing for you, or is
there a strong case for doing this?

Well my use-cases come from embedded scenarios. Traditionally there was on
application and it handled the hardware watchdog by itself. Nowadays there
are multiple applications and all need to be supervised. Restarting an
application should be the first action to recover, so that the other
applications can continue uninterrupted. However it must be possible to
restart (if necessary triggered by the watchdog) the whole system, in case
an application cannot recover. I'm not sure how this can be achieved
without some kind of watchdog multiplexing.

Michael

Chris Paulson-Ellis

2012-02-01 19:24:28 UTC

Post by Lennart Poettering
(As I figured out newer Intel chipsets all have watchdogs now, so I am
actually quite keen to see this implemented in systemd now, since I can
actually test it.)

Just a warning to anyone who's thinking of depending on the chipset
watchdog... In my experience, many boards are not correctly wired up to
reset properly when the chipset watchdog fires. Although it works most
of the time, I've had boards hang under testing using the iTCO_wdt
watchdog driver. I expect the chipset & processor reset fine, but the
rest of the board doesn't. On boards like these I use the softdog driver
instead as Linux rarely hangs so badly that it can't run the emergency
restart code. We're developing external watchdog hardware to allow us to
continue to use these cheap PC boards with confidence.

C.

Albert Strasheim

2012-02-01 19:27:31 UTC

Post by Chris Paulson-Ellis

Hello

Post by Lennart Poettering
(As I figured out newer Intel chipsets all have watchdogs now, so I am
actually quite keen to see this implemented in systemd now, since I can
actually test it.)

Some more info:

I've found that the "Optimal Defaults" BIOS option on some
motherboards disables the iTCO_wdt watchdog.

Luckily server boards usually also have an IPMI watchdog, which works
great under Linux.

Regards

Albert

Lennart Poettering

2012-02-01 19:29:05 UTC

Post by Chris Paulson-Ellis

Post by Lennart Poettering
(As I figured out newer Intel chipsets all have watchdogs now, so I am
actually quite keen to see this implemented in systemd now, since I can
actually test it.)

Just a warning to anyone who's thinking of depending on the chipset
watchdog... In my experience, many boards are not correctly wired up
to reset properly when the chipset watchdog fires. Although it works
most of the time, I've had boards hang under testing using the
iTCO_wdt watchdog driver. I expect the chipset & processor reset
fine, but the rest of the board doesn't. On boards like these I use
the softdog driver instead as Linux rarely hangs so badly that it
can't run the emergency restart code. We're developing external
watchdog hardware to allow us to continue to use these cheap PC
boards with confidence.

In systemd we will not make use of the watchdog unless enabled in the
configuration and it will default to off.

Lennart

--
Lennart Poettering - Red Hat, Inc.

Michael Olbrich

2012-02-01 16:17:15 UTC