Discussion:
[PATCH 0/4] systemd and watchdog
(too old to reply)
Michael Olbrich
2011-09-28 16:59:14 UTC
Permalink
Hi,

I've been playing with some ideas on how to add watchdog support to
systemd. I don't like talking about vaporware so here are some patches with
a prototype implementation. It should give you an idea on how this could be
done.

A few words on the ideas behind this: When working with a watchdog in
Linux, the typical scenario is one hardware watchdog, but multiple
processes that should be monitored. Beyond that the hardware watchdog
should be the last line of defence. A more graceful recovery should be
tried first.

How to implement this is systemd:
systemd already has the concept of a state for each service and a very
simple method (sd_notify) for the service to provide status information to
systemd.
This is implemented in the first patch. A service can send keep-alive
messages with sd_notify, and the timestamp of the latest message is exposed
as a service property.

The second patch implements service restart / reboot when no keep-alive
message was received for a certain amount of time.
Note: This only triggers if at least one keep-alive was received. I don't
think anything can be done if a service fails to start. This should be
handled outside of systemd.

I think, the watchdog hardware should be handled in a separate service, for
several reasons:
- It's not useful on systems without watchdog hardware. This gives us a
clean way to disable it.
- This is a rather critical part to implement. The code is much simpler
this way.
- There are many different requirements and options on how to handle the
watchdog hardware. It's a lot easier to replace a separate daemon with a
custom implementation, should it be necessary.

The third patch is helper code. It provides a single time stamp for when
systemd will reboot if no more keep-alive are sent.
This way the watchdog service only needs to make one D-Bus call to get the
necessary data.
The last patch adds a simple daemon that handled the watchdog device.

What do you think?

Regards,
Michael


Michael Olbrich (4):
WIP: service: add watchdog timestamp
WIP: service: add watchdog restart/reboot timeouts
WIP: manager: add a global watchdog reboot timestamp
WIP: add basic watchdog daemon

Makefile.am | 21 ++++++-
src/99-systemd.rules.in | 2 +
src/dbus-manager.c | 4 +
src/dbus-service.c | 8 +++
src/load-fragment-gperf.gperf.m4 | 2 +
src/manager.c | 20 ++++++
src/manager.h | 3 +
src/service.c | 49 +++++++++++++++
src/service.h | 6 ++
src/watchdogd.c | 119 ++++++++++++++++++++++++++++++++++++
units/systemd-watchdogd.service.in | 16 +++++
11 files changed, 248 insertions(+), 2 deletions(-)
create mode 100644 src/watchdogd.c
create mode 100644 units/systemd-watchdogd.service.in
--
1.7.5.4
Michael Olbrich
2011-09-28 16:59:15 UTC
Permalink
---
src/dbus-service.c | 4 ++++
src/service.c | 20 ++++++++++++++++++++
src/service.h | 2 ++
3 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/src/dbus-service.c b/src/dbus-service.c
index 3486623..5a740de 100644
--- a/src/dbus-service.c
+++ b/src/dbus-service.c
@@ -43,6 +43,8 @@
" <property name=\"NotifyAccess\" type=\"s\" access=\"read\"/>\n" \
" <property name=\"RestartUSec\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"TimeoutUSec\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogTimestamp\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
BUS_EXEC_COMMAND_INTERFACE("ExecStartPre") \
BUS_EXEC_COMMAND_INTERFACE("ExecStart") \
BUS_EXEC_COMMAND_INTERFACE("ExecStartPost") \
@@ -104,6 +106,8 @@ DBusHandlerResult bus_service_message_handler(Unit *u, DBusConnection *connectio
{ "org.freedesktop.systemd1.Service", "NotifyAccess", bus_service_append_notify_access, "s", &u->service.notify_access },
{ "org.freedesktop.systemd1.Service", "RestartUSec", bus_property_append_usec, "t", &u->service.restart_usec },
{ "org.freedesktop.systemd1.Service", "TimeoutUSec", bus_property_append_usec, "t", &u->service.timeout_usec },
+ { "org.freedesktop.systemd1.Service", "WatchdogTimestamp", bus_property_append_usec, "t", &u->service.watchdog_timestamp.realtime },
+ { "org.freedesktop.systemd1.Service", "WatchdogTimestampMonotonic",bus_property_append_usec,"t", &u->service.watchdog_timestamp.monotonic },
BUS_EXEC_COMMAND_PROPERTY("org.freedesktop.systemd1.Service", u->service.exec_command[SERVICE_EXEC_START_PRE], "ExecStartPre"),
BUS_EXEC_COMMAND_PROPERTY("org.freedesktop.systemd1.Service", u->service.exec_command[SERVICE_EXEC_START], "ExecStart"),
BUS_EXEC_COMMAND_PROPERTY("org.freedesktop.systemd1.Service", u->service.exec_command[SERVICE_EXEC_START_POST], "ExecStartPost"),
diff --git a/src/service.c b/src/service.c
index c2053ce..6fbc023 100644
--- a/src/service.c
+++ b/src/service.c
@@ -194,6 +194,19 @@ static void service_connection_unref(Service *s) {
s->accept_socket = NULL;
}

+static void service_stop_watchdog(Service *s) {
+ assert(s);
+
+ s->watchdog_timestamp.realtime = 0;
+ s->watchdog_timestamp.monotonic = 0;
+}
+
+static void service_reset_watchdog(Service *s) {
+ assert(s);
+
+ dual_timestamp_get(&s->watchdog_timestamp);
+}
+
static void service_done(Unit *u) {
Service *s = SERVICE(u);

@@ -1514,6 +1527,9 @@ static void service_set_state(Service *s, ServiceState state) {
service_connection_unref(s);
}

+ if (state == SERVICE_STOP)
+ service_stop_watchdog(s);
+
/* For the inactive states unit_notify() will trim the cgroup,
* but for exit we have to do that ourselves... */
if (state == SERVICE_EXITED && s->meta.manager->n_reloading <= 0)
@@ -2985,6 +3001,10 @@ static void service_notify_message(Unit *u, pid_t pid, char **tags) {
}

}
+ if (strv_find(tags, "WATCHDOG=1")) {
+ log_debug("%s: got WATCHDOG=1", u->meta.id);
+ service_reset_watchdog(s);
+ }

/* Notify clients about changed status or main pid */
unit_add_to_dbus_queue(u);
diff --git a/src/service.h b/src/service.h
index e28f74b..3801e84 100644
--- a/src/service.h
+++ b/src/service.h
@@ -98,6 +98,8 @@ struct Service {
usec_t restart_usec;
usec_t timeout_usec;

+ dual_timestamp watchdog_timestamp;
+
ExecCommand* exec_command[_SERVICE_EXEC_COMMAND_MAX];
ExecContext exec_context;
--
1.7.5.4
Michael Olbrich
2011-09-28 16:59:16 UTC
Permalink
---
src/dbus-service.c | 4 ++++
src/load-fragment-gperf.gperf.m4 | 2 ++
src/service.c | 27 +++++++++++++++++++++++++++
src/service.h | 4 ++++
4 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/src/dbus-service.c b/src/dbus-service.c
index 5a740de..34eabdf 100644
--- a/src/dbus-service.c
+++ b/src/dbus-service.c
@@ -43,6 +43,8 @@
" <property name=\"NotifyAccess\" type=\"s\" access=\"read\"/>\n" \
" <property name=\"RestartUSec\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"TimeoutUSec\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogRestartUSec\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogRebootUSec\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"WatchdogTimestamp\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"WatchdogTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
BUS_EXEC_COMMAND_INTERFACE("ExecStartPre") \
@@ -106,6 +108,8 @@ DBusHandlerResult bus_service_message_handler(Unit *u, DBusConnection *connectio
{ "org.freedesktop.systemd1.Service", "NotifyAccess", bus_service_append_notify_access, "s", &u->service.notify_access },
{ "org.freedesktop.systemd1.Service", "RestartUSec", bus_property_append_usec, "t", &u->service.restart_usec },
{ "org.freedesktop.systemd1.Service", "TimeoutUSec", bus_property_append_usec, "t", &u->service.timeout_usec },
+ { "org.freedesktop.systemd1.Service", "WatchdogRestartUSec", bus_property_append_usec, "t", &u->service.watchdog_restart_usec },
+ { "org.freedesktop.systemd1.Service", "WatchdogRebootUSec", bus_property_append_usec, "t", &u->service.watchdog_reboot_usec },
{ "org.freedesktop.systemd1.Service", "WatchdogTimestamp", bus_property_append_usec, "t", &u->service.watchdog_timestamp.realtime },
{ "org.freedesktop.systemd1.Service", "WatchdogTimestampMonotonic",bus_property_append_usec,"t", &u->service.watchdog_timestamp.monotonic },
BUS_EXEC_COMMAND_PROPERTY("org.freedesktop.systemd1.Service", u->service.exec_command[SERVICE_EXEC_START_PRE], "ExecStartPre"),
diff --git a/src/load-fragment-gperf.gperf.m4 b/src/load-fragment-gperf.gperf.m4
index 7749b88..fd5e586 100644
--- a/src/load-fragment-gperf.gperf.m4
+++ b/src/load-fragment-gperf.gperf.m4
@@ -130,6 +130,8 @@ Service.ExecStop, config_parse_exec, SERVICE_EXE
Service.ExecStopPost, config_parse_exec, SERVICE_EXEC_STOP_POST, offsetof(Service, exec_command)
Service.RestartSec, config_parse_usec, 0, offsetof(Service, restart_usec)
Service.TimeoutSec, config_parse_usec, 0, offsetof(Service, timeout_usec)
+Service.WatchdogRestartSec, config_parse_usec, 0, offsetof(Service, watchdog_restart_usec)
+Service.WatchdogRebootSec, config_parse_usec, 0, offsetof(Service, watchdog_reboot_usec)
Service.Type, config_parse_service_type, 0, offsetof(Service, type)
Service.Restart, config_parse_service_restart, 0, offsetof(Service, restart)
Service.PermissionsStartOnly, config_parse_bool, 0, offsetof(Service, permissions_start_only)
diff --git a/src/service.c b/src/service.c
index 6fbc023..62bb0ef 100644
--- a/src/service.c
+++ b/src/service.c
@@ -112,6 +112,11 @@ static void service_init(Unit *u) {

s->timeout_usec = DEFAULT_TIMEOUT_USEC;
s->restart_usec = DEFAULT_RESTART_USEC;
+
+ s->watchdog_restart_usec = DEFAULT_TIMEOUT_USEC;
+ s->watchdog_restart_watch.type = WATCH_INVALID;
+ s->watchdog_reboot_watch.type = WATCH_INVALID;
+
s->timer_watch.type = WATCH_INVALID;
#ifdef HAVE_SYSV_COMPAT
s->sysv_start_priority = -1;
@@ -197,14 +202,25 @@ static void service_connection_unref(Service *s) {
static void service_stop_watchdog(Service *s) {
assert(s);

+ unit_unwatch_timer(UNIT(s), &s->watchdog_restart_watch);
+ unit_unwatch_timer(UNIT(s), &s->watchdog_reboot_watch);
s->watchdog_timestamp.realtime = 0;
s->watchdog_timestamp.monotonic = 0;
}

static void service_reset_watchdog(Service *s) {
+ int r;
assert(s);

dual_timestamp_get(&s->watchdog_timestamp);
+ if (s->watchdog_restart_usec) {
+ if ((r = unit_watch_timer(UNIT(s), s->watchdog_restart_usec, &s->watchdog_restart_watch)) < 0)
+ log_warning("%s failed to install watchdog restart timer: %s", s->meta.id, strerror(-r));
+ }
+ if (s->watchdog_restart_usec) {
+ if ((r = unit_watch_timer(UNIT(s), s->watchdog_reboot_usec, &s->watchdog_reboot_watch)) < 0)
+ log_warning("%s failed to install watchdog reboot timer: %s", s->meta.id, strerror(-r));
+ }
}

static void service_done(Unit *u) {
@@ -2821,6 +2837,17 @@ static void service_timer_event(Unit *u, uint64_t elapsed, Watch* w) {
assert(s);
assert(elapsed == 1);

+ if (w == &s->watchdog_restart_watch) {
+ log_error("%s watchdog timeout: restarting service...", u->meta.id);
+ manager_add_job(u->meta.manager, JOB_RESTART, u, JOB_FAIL, true, 0, 0);
+ return;
+ }
+ if (w == &s->watchdog_reboot_watch) {
+ log_error("%s watchdog timeout: rebooting...", u->meta.id);
+ manager_add_job_by_name(u->meta.manager, JOB_START, "reboot.target", JOB_REPLACE, true, 0, 0);
+ return;
+ }
+
assert(w == &s->timer_watch);

switch (s->state) {
diff --git a/src/service.h b/src/service.h
index 3801e84..cab18c2 100644
--- a/src/service.h
+++ b/src/service.h
@@ -99,6 +99,10 @@ struct Service {
usec_t timeout_usec;

dual_timestamp watchdog_timestamp;
+ usec_t watchdog_restart_usec;
+ usec_t watchdog_reboot_usec;
+ Watch watchdog_restart_watch;
+ Watch watchdog_reboot_watch;

ExecCommand* exec_command[_SERVICE_EXEC_COMMAND_MAX];
ExecContext exec_context;
--
1.7.5.4
Michael Olbrich
2011-09-28 16:59:18 UTC
Permalink
---
Makefile.am | 21 ++++++-
src/99-systemd.rules.in | 2 +
src/watchdogd.c | 119 ++++++++++++++++++++++++++++++++++++
units/systemd-watchdogd.service.in | 16 +++++
4 files changed, 156 insertions(+), 2 deletions(-)
create mode 100644 src/watchdogd.c
create mode 100644 units/systemd-watchdogd.service.in

diff --git a/Makefile.am b/Makefile.am
index 570b6be..3959526 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -173,7 +173,8 @@ rootlibexec_PROGRAMS = \
systemd-detect-virt \
systemd-sysctl \
systemd-logind \
- systemd-uaccess
+ systemd-uaccess \
+ systemd-watchdogd

if ENABLE_BINFMT
rootlibexec_PROGRAMS += \
@@ -414,6 +415,7 @@ nodist_systemunit_DATA = \
units/systemd-ask-password-wall.service \
units/systemd-ask-password-console.service \
units/systemd-sysctl.service \
+ units/systemd-watchdogd.service \
units/halt.service \
units/poweroff.service \
units/reboot.service \
@@ -477,6 +479,7 @@ EXTRA_DIST = \
units/systemd-ask-password-wall.service.in \
units/systemd-ask-password-console.service.in \
units/systemd-sysctl.service.in \
+ units/systemd-watchdogd.service.in \
units/halt.service.in \
units/poweroff.service.in \
units/reboot.service.in \
@@ -1307,6 +1310,19 @@ systemd_kmsg_syslogd_LDADD = \
libsystemd-basic.la \
libsystemd-daemon.la

+systemd_watchdogd_SOURCES = \
+ src/watchdogd.c \
+ src/dbus-common.c
+
+systemd_watchdogd_CFLAGS = \
+ $(AM_CFLAGS) \
+ $(DBUS_CFLAGS)
+
+systemd_watchdogd_LDADD = \
+ libsystemd-basic.la \
+ libsystemd-daemon.la \
+ $(DBUS_LIBS)
+
systemctl_SOURCES = \
src/systemctl.c \
src/utmp-wtmp.c \
@@ -1896,7 +1912,8 @@ if ENABLE_LOCALED
endif
( cd $(DESTDIR)$(systemunitdir)/basic.target.wants && \
rm -f systemd-tmpfiles-clean.timer && \
- $(LN_S) ../systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.timer )
+ $(LN_S) ../systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.timer && \
+ $(LN_S) ../systemd-watchdogd.service systemd-watchdogd.service )
( cd $(DESTDIR)$(dbussessionservicedir) && \
rm -f org.freedesktop.systemd1.service && \
$(LN_S) ../system-services/org.freedesktop.systemd1.service org.freedesktop.systemd1.service )
diff --git a/src/99-systemd.rules.in b/src/99-systemd.rules.in
index b2481ae..8e9dfcd 100644
--- a/src/99-systemd.rules.in
+++ b/src/99-systemd.rules.in
@@ -12,6 +12,8 @@ SUBSYSTEM=="tty", KERNEL=="tty[a-zA-Z]*|hvc*|xvc*|hvsi*", TAG+="systemd"

KERNEL=="vport*", TAG+="systemd"

+KERNEL=="watchdog", TAG+="systemd"
+
SUBSYSTEM=="block", KERNEL!="ram*|loop*", TAG+="systemd"
SUBSYSTEM=="block", KERNEL!="ram*|loop*", ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG}=="1", ENV{SYSTEMD_READY}="0"

diff --git a/src/watchdogd.c b/src/watchdogd.c
new file mode 100644
index 0000000..44251cc
--- /dev/null
+++ b/src/watchdogd.c
@@ -0,0 +1,119 @@
+/*-*- Mode: C; c-basic-offset: 8; indent-tabs-mode: nil -*-*/
+
+/***
+ This file is part of systemd.
+
+ Copyright 2010 Michael Olbrich
+
+ systemd is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ systemd is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with systemd; If not, see <http://www.gnu.org/licenses/>.
+***/
+
+#include <dbus/dbus.h>
+
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <time.h>
+#include <string.h>
+#include <errno.h>
+#include <linux/watchdog.h>
+
+#include "log.h"
+#include "util.h"
+#include "dbus-common.h"
+#include "sd-daemon.h"
+
+int main(int argc, char *argv[]) {
+ int fd, ret;
+ struct timespec ts;
+ DBusConnection *bus = NULL;
+ DBusMessage *m, *reply = NULL;
+ DBusError error;
+ DBusMessageIter iter, sub;
+ usec_t watchdog_usec, now_usec;
+ const char *iface = "org.freedesktop.systemd1.Manager";
+ const char *property = "WatchdogRebootTimestampMonotonic";
+ int timeout = 10;
+ usec_t reboot_delay_usec = 1 * USEC_PER_SEC;
+
+ log_set_target(LOG_TARGET_AUTO);
+ log_parse_environment();
+ log_open();
+
+ umask(0022);
+
+ dbus_error_init(&error);
+
+ if ((ret = bus_connect(DBUS_BUS_SYSTEM, &bus, NULL, &error)) < 0) {
+ log_error("Failed to get D-Bus connection: %s", bus_error_message(&error));
+ return 1;
+ }
+ if (!(m = dbus_message_new_method_call("org.freedesktop.systemd1", "/org/freedesktop/systemd1", "org.freedesktop.DBus.Properties", "Get"))) {
+ log_error("Could not allocate message.");
+ return 1;
+ }
+ if (!dbus_message_append_args(m,
+ DBUS_TYPE_STRING, &iface,
+ DBUS_TYPE_STRING, &property,
+ DBUS_TYPE_INVALID)) {
+ log_error("Could not attach target and flag information to message.");
+ return 1;
+ }
+
+ if ((fd = open("/dev/watchdog", O_RDWR)) < 0) {
+ log_error("Could not open /dev/watchdog: %s", strerror(errno));
+ return 1;
+ }
+
+ if (ioctl(fd, WDIOC_SETTIMEOUT, &timeout) < 0) {
+ log_error("Failed to set watchdog timeout: %s", strerror(errno));
+ return 1;
+ }
+ log_info("Watchdog timeout set to %ds", timeout);
+ sd_notify(false,
+ "READY=1\n"
+ "STATUS=Watchdog started.");
+
+ timeout /= 2;
+ clock_gettime(CLOCK_MONOTONIC, &ts);
+ while (true) {
+ now_usec = timespec_load(&ts);
+ clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &ts, 0);
+ ts.tv_sec += timeout;
+
+ if (reply)
+ dbus_message_unref(reply);
+
+ if (!(reply = dbus_connection_send_with_reply_and_block(bus, m, -1, &error))) {
+ log_error("Error fetching reboot timestamp: %s", bus_error_message(&error));
+ continue;
+ }
+ if (!dbus_message_iter_init(reply, &iter)) {
+ log_error("Failed to parse reply (init).");
+ continue;
+ }
+ dbus_message_iter_recurse(&iter, &sub);
+ if (bus_iter_get_basic_and_next(&sub, DBUS_TYPE_UINT64, &watchdog_usec, false) < 0) {
+ log_error("Failed to parse reply (value).");
+ continue;
+ }
+ if (watchdog_usec && ((watchdog_usec + reboot_delay_usec) < now_usec)) {
+ log_error("Reboot failed: now = %llu, watchdog = %llu", now_usec, watchdog_usec);
+ continue;
+ }
+ ioctl(fd, WDIOC_KEEPALIVE, 0);
+ }
+
+
+}
diff --git a/units/systemd-watchdogd.service.in b/units/systemd-watchdogd.service.in
new file mode 100644
index 0000000..643b3da
--- /dev/null
+++ b/units/systemd-watchdogd.service.in
@@ -0,0 +1,16 @@
+# This file is part of systemd.
+#
+# systemd is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+
+[Unit]
+Description=Watchdog Daemon
+DefaultDependencies=no
+BindTo=dev-watchdog.device
+After=dev-watchdog.device
+
+[Service]
+Type=notify
+ExecStart=@rootlibexecdir@/systemd-watchdogd
--
1.7.5.4
Michael Olbrich
2011-09-28 16:59:17 UTC
Permalink
---
src/dbus-manager.c | 4 ++++
src/manager.c | 20 ++++++++++++++++++++
src/manager.h | 3 +++
src/service.c | 2 ++
4 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/src/dbus-manager.c b/src/dbus-manager.c
index 7b68156..2254ec8 100644
--- a/src/dbus-manager.c
+++ b/src/dbus-manager.c
@@ -223,6 +223,8 @@
" <property name=\"StartupTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"FinishTimestamp\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"FinishTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogRebootTimestamp\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogRebootTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"LogLevel\" type=\"s\" access=\"readwrite\"/>\n" \
" <property name=\"LogTarget\" type=\"s\" access=\"readwrite\"/>\n" \
" <property name=\"NNames\" type=\"u\" access=\"read\"/>\n" \
@@ -506,6 +508,8 @@ static DBusHandlerResult bus_manager_message_handler(DBusConnection *connection,
{ "org.freedesktop.systemd1.Manager", "StartupTimestampMonotonic", bus_property_append_uint64, "t", &m->startup_timestamp.monotonic },
{ "org.freedesktop.systemd1.Manager", "FinishTimestamp", bus_property_append_uint64, "t", &m->finish_timestamp.realtime },
{ "org.freedesktop.systemd1.Manager", "FinishTimestampMonotonic", bus_property_append_uint64, "t",&m->finish_timestamp.monotonic },
+ { "org.freedesktop.systemd1.Manager", "WatchdogRebootTimestamp", bus_property_append_uint64, "t", &m->watchdog_timestamp.realtime },
+ { "org.freedesktop.systemd1.Manager", "WatchdogRebootTimestampMonotonic", bus_property_append_uint64, "t",&m->watchdog_timestamp.monotonic },
{ "org.freedesktop.systemd1.Manager", "LogLevel", bus_manager_append_log_level, "s", m, bus_manager_set_log_level },
{ "org.freedesktop.systemd1.Manager", "LogTarget", bus_manager_append_log_target, "s", m, bus_manager_set_log_target },
{ "org.freedesktop.systemd1.Manager", "NNames", bus_manager_append_n_names, "u", m },
diff --git a/src/manager.c b/src/manager.c
index e626347..55290fa 100644
--- a/src/manager.c
+++ b/src/manager.c
@@ -3166,6 +3166,26 @@ bool manager_get_show_status(Manager *m) {
return plymouth_running();
}

+void manager_update_watchdog(Manager *m) {
+ Iterator i;
+ Unit *u;
+ Service *s;
+ const char *k;
+ dual_timestamp timestamp = {0, 0};
+
+ HASHMAP_FOREACH_KEY(u, k, m->units, i) {
+ if (!(s = SERVICE(u)))
+ continue;
+ if (!dual_timestamp_is_set(&s->watchdog_timestamp))
+ continue;
+ if (!dual_timestamp_is_set(&timestamp) || ((s->watchdog_timestamp.monotonic + s->watchdog_reboot_usec) < timestamp.monotonic)) {
+ timestamp.monotonic = s->watchdog_timestamp.monotonic + s->watchdog_reboot_usec;
+ timestamp.realtime = s->watchdog_timestamp.realtime + s->watchdog_reboot_usec;
+ }
+ }
+ m->watchdog_timestamp = timestamp;
+}
+
static const char* const manager_running_as_table[_MANAGER_RUNNING_AS_MAX] = {
[MANAGER_SYSTEM] = "system",
[MANAGER_USER] = "user"
diff --git a/src/manager.h b/src/manager.h
index 5deb569..872d2ea 100644
--- a/src/manager.h
+++ b/src/manager.h
@@ -147,6 +147,7 @@ struct Manager {
dual_timestamp initrd_timestamp;
dual_timestamp startup_timestamp;
dual_timestamp finish_timestamp;
+ dual_timestamp watchdog_timestamp;

char *generator_unit_path;

@@ -293,6 +294,8 @@ void manager_recheck_syslog(Manager *m);
void manager_set_show_status(Manager *m, bool b);
bool manager_get_show_status(Manager *m);

+void manager_update_watchdog(Manager *m);
+
const char *manager_running_as_to_string(ManagerRunningAs i);
ManagerRunningAs manager_running_as_from_string(const char *s);

diff --git a/src/service.c b/src/service.c
index 62bb0ef..8bc5518 100644
--- a/src/service.c
+++ b/src/service.c
@@ -206,6 +206,7 @@ static void service_stop_watchdog(Service *s) {
unit_unwatch_timer(UNIT(s), &s->watchdog_reboot_watch);
s->watchdog_timestamp.realtime = 0;
s->watchdog_timestamp.monotonic = 0;
+ manager_update_watchdog(s->meta.manager);
}

static void service_reset_watchdog(Service *s) {
@@ -221,6 +222,7 @@ static void service_reset_watchdog(Service *s) {
if ((r = unit_watch_timer(UNIT(s), s->watchdog_reboot_usec, &s->watchdog_reboot_watch)) < 0)
log_warning("%s failed to install watchdog reboot timer: %s", s->meta.id, strerror(-r));
}
+ manager_update_watchdog(s->meta.manager);
}

static void service_done(Unit *u) {
--
1.7.5.4
Albert Strasheim
2011-09-28 17:16:40 UTC
Permalink
Hello

On Wed, Sep 28, 2011 at 6:59 PM, Michael Olbrich
Post by Michael Olbrich
systemd already has the concept of a state for each service and a very
simple method (sd_notify) for the service to provide status information to
systemd.
This is implemented in the first patch. A service can send keep-alive
messages with sd_notify, and the timestamp of the latest message is exposed
as a service property.
Very cool. I've been wondering how we could restart services that hang
(e.g., deadlock or go into infinite loops) but don't crash.
Post by Michael Olbrich
The second patch implements service restart / reboot when no keep-alive
message was received for a certain amount of time.
Note: This only triggers if at least one keep-alive was received. I don't
think anything can be done if a service fails to start. This should be
handled outside of systemd.
A question at this point: are ExecStartPosts executed if a service
fails? If they are, and if they can obtain the main exit status (if
that's a well-defined concept), they could take further action.
Post by Michael Olbrich
I think, the watchdog hardware should be handled in a separate service, for
Agreed. We've had good results with an IPMI watchdog and Fedora's
watchdog package. I think it might even include a .service file, or
maybe I wrote a simple one.

Regards

Albert
Michael Olbrich
2011-10-12 13:12:18 UTC
Permalink
Hi,
Post by Michael Olbrich
What do you think?
No comments at all? I realize, that this is not a topic that is relevant
for most people. But maybe I could get some feedback if anything like this
has a chance to be merged at all, or if I should work on a solution outside
of systemd.

Regards,
Michael
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Kay Sievers
2011-10-12 16:48:01 UTC
Permalink
Post by Michael Olbrich
Post by Michael Olbrich
What do you think?
No comments at all? I realize, that this is not a topic that is relevant
for most people. But maybe I could get some feedback if anything like this
has a chance to be merged at all, or if I should work on a solution outside
of systemd.
Sorry, we've been kind of busy.

I like the general idea and Lennart and I talked about it briefly
already. We should look into the details of what systemd can provide
here. We need to be very careful though, to not turn systemd into
anything like a service monitor.

Sorry again for the delay, we'll get back to you with the next round
of review and planning.

Kay
Michael Olbrich
2011-10-24 16:04:01 UTC
Permalink
Hi,

I've updated the patch series. Mostly just rebased to the latest git
and some documentation. I hope this makes my ideas a bit clearer.

Regards,
Michael Olbrich

Michael Olbrich (4):
WIP: service: add watchdog timestamp
WIP: service: add watchdog restart/reboot timeouts
WIP: manager: add a global watchdog reboot timestamp
WIP: add basic watchdog daemon

Makefile.am | 21 ++++-
man/systemd.service.xml | 28 ++++++
src/99-systemd.rules.in | 2 +
src/dbus-manager.c | 4 +
src/dbus-service.c | 8 ++
src/load-fragment-gperf.gperf.m4 | 2 +
src/manager.c | 20 ++++
src/manager.h | 3 +
src/sd-daemon.h | 5 +
src/service.c | 49 ++++++++++
src/service.h | 6 +
src/watchdogd.c | 178 ++++++++++++++++++++++++++++++++++++
units/systemd-watchdogd.service.in | 16 +++
13 files changed, 340 insertions(+), 2 deletions(-)
create mode 100644 src/watchdogd.c
create mode 100644 units/systemd-watchdogd.service.in
--
1.7.7
Michael Olbrich
2011-10-24 16:04:04 UTC
Permalink
This patch adds WatchdogRebootTimestamp[Monotonic] to the systemd
manager API. It contains the earliest point in time when systemd might
reboot the system because the timer for WatchdogRebootUSec for a
service expired.
If we assume the system takes Xus to shut down then
WatchdogRebootTimestamp + Xus should never be in the past. A watchdog
daemon handling the hardware watchdog can use this information to
determine when to let the hardware watchdog restart the system.
This is convenience information for a watchdog daemon. With this the
it can avoid a lot of D-Bus calls that are necessary to calculate the
same value.
---
src/dbus-manager.c | 4 ++++
src/manager.c | 20 ++++++++++++++++++++
src/manager.h | 3 +++
src/service.c | 2 ++
4 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/src/dbus-manager.c b/src/dbus-manager.c
index 7b68156..2254ec8 100644
--- a/src/dbus-manager.c
+++ b/src/dbus-manager.c
@@ -223,6 +223,8 @@
" <property name=\"StartupTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"FinishTimestamp\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"FinishTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogRebootTimestamp\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogRebootTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"LogLevel\" type=\"s\" access=\"readwrite\"/>\n" \
" <property name=\"LogTarget\" type=\"s\" access=\"readwrite\"/>\n" \
" <property name=\"NNames\" type=\"u\" access=\"read\"/>\n" \
@@ -506,6 +508,8 @@ static DBusHandlerResult bus_manager_message_handler(DBusConnection *connection,
{ "org.freedesktop.systemd1.Manager", "StartupTimestampMonotonic", bus_property_append_uint64, "t", &m->startup_timestamp.monotonic },
{ "org.freedesktop.systemd1.Manager", "FinishTimestamp", bus_property_append_uint64, "t", &m->finish_timestamp.realtime },
{ "org.freedesktop.systemd1.Manager", "FinishTimestampMonotonic", bus_property_append_uint64, "t",&m->finish_timestamp.monotonic },
+ { "org.freedesktop.systemd1.Manager", "WatchdogRebootTimestamp", bus_property_append_uint64, "t", &m->watchdog_timestamp.realtime },
+ { "org.freedesktop.systemd1.Manager", "WatchdogRebootTimestampMonotonic", bus_property_append_uint64, "t",&m->watchdog_timestamp.monotonic },
{ "org.freedesktop.systemd1.Manager", "LogLevel", bus_manager_append_log_level, "s", m, bus_manager_set_log_level },
{ "org.freedesktop.systemd1.Manager", "LogTarget", bus_manager_append_log_target, "s", m, bus_manager_set_log_target },
{ "org.freedesktop.systemd1.Manager", "NNames", bus_manager_append_n_names, "u", m },
diff --git a/src/manager.c b/src/manager.c
index 111167a..b20db14 100644
--- a/src/manager.c
+++ b/src/manager.c
@@ -3174,6 +3174,26 @@ bool manager_get_show_status(Manager *m) {
return plymouth_running();
}

+void manager_update_watchdog(Manager *m) {
+ Iterator i;
+ Unit *u;
+ Service *s;
+ const char *k;
+ dual_timestamp timestamp = {0, 0};
+
+ HASHMAP_FOREACH_KEY(u, k, m->units, i) {
+ if (!(s = SERVICE(u)))
+ continue;
+ if (!dual_timestamp_is_set(&s->watchdog_timestamp))
+ continue;
+ if (!dual_timestamp_is_set(&timestamp) || ((s->watchdog_timestamp.monotonic + s->watchdog_reboot_usec) < timestamp.monotonic)) {
+ timestamp.monotonic = s->watchdog_timestamp.monotonic + s->watchdog_reboot_usec;
+ timestamp.realtime = s->watchdog_timestamp.realtime + s->watchdog_reboot_usec;
+ }
+ }
+ m->watchdog_timestamp = timestamp;
+}
+
static const char* const manager_running_as_table[_MANAGER_RUNNING_AS_MAX] = {
[MANAGER_SYSTEM] = "system",
[MANAGER_USER] = "user"
diff --git a/src/manager.h b/src/manager.h
index 5deb569..872d2ea 100644
--- a/src/manager.h
+++ b/src/manager.h
@@ -147,6 +147,7 @@ struct Manager {
dual_timestamp initrd_timestamp;
dual_timestamp startup_timestamp;
dual_timestamp finish_timestamp;
+ dual_timestamp watchdog_timestamp;

char *generator_unit_path;

@@ -293,6 +294,8 @@ void manager_recheck_syslog(Manager *m);
void manager_set_show_status(Manager *m, bool b);
bool manager_get_show_status(Manager *m);

+void manager_update_watchdog(Manager *m);
+
const char *manager_running_as_to_string(ManagerRunningAs i);
ManagerRunningAs manager_running_as_from_string(const char *s);

diff --git a/src/service.c b/src/service.c
index b0c775c..0e51071 100644
--- a/src/service.c
+++ b/src/service.c
@@ -206,6 +206,7 @@ static void service_stop_watchdog(Service *s) {
unit_unwatch_timer(UNIT(s), &s->watchdog_reboot_watch);
s->watchdog_timestamp.realtime = 0;
s->watchdog_timestamp.monotonic = 0;
+ manager_update_watchdog(s->meta.manager);
}

static void service_reset_watchdog(Service *s) {
@@ -221,6 +222,7 @@ static void service_reset_watchdog(Service *s) {
if ((r = unit_watch_timer(UNIT(s), s->watchdog_reboot_usec, &s->watchdog_reboot_watch)) < 0)
log_warning("%s failed to install watchdog reboot timer: %s", s->meta.id, strerror(-r));
}
+ manager_update_watchdog(s->meta.manager);
}

static void service_done(Unit *u) {
--
1.7.7
Lennart Poettering
2011-11-02 01:45:11 UTC
Permalink
Post by Michael Olbrich
This patch adds WatchdogRebootTimestamp[Monotonic] to the systemd
manager API. It contains the earliest point in time when systemd might
reboot the system because the timer for WatchdogRebootUSec for a
service expired.
If we assume the system takes Xus to shut down then
WatchdogRebootTimestamp + Xus should never be in the past. A watchdog
daemon handling the hardware watchdog can use this information to
determine when to let the hardware watchdog restart the system.
This is convenience information for a watchdog daemon. With this the
it can avoid a lot of D-Bus calls that are necessary to calculate the
same value.
Hmm, wouldn't it be nicer to generate this value on the fly on the
server side when a client asks for it? i.e. instead of having an actual
field watchdog_timestamp just fake it when the client asks for it?

WHat I am not getting here, don't you also want to send out
notifications each time the watchdog timestamps are refreshed?


Lennart
--
Lennart Poettering - Red Hat, Inc.
Michael Olbrich
2011-11-03 10:49:37 UTC
Permalink
Post by Lennart Poettering
Post by Michael Olbrich
This patch adds WatchdogRebootTimestamp[Monotonic] to the systemd
manager API. It contains the earliest point in time when systemd might
reboot the system because the timer for WatchdogRebootUSec for a
service expired.
If we assume the system takes Xus to shut down then
WatchdogRebootTimestamp + Xus should never be in the past. A watchdog
daemon handling the hardware watchdog can use this information to
determine when to let the hardware watchdog restart the system.
This is convenience information for a watchdog daemon. With this the
it can avoid a lot of D-Bus calls that are necessary to calculate the
same value.
Hmm, wouldn't it be nicer to generate this value on the fly on the
server side when a client asks for it? i.e. instead of having an actual
field watchdog_timestamp just fake it when the client asks for it?
Of course, I just need to figure out how to do this :-). Can you point me
to some other property that does something like that?
Post by Lennart Poettering
WHat I am not getting here, don't you also want to send out
notifications each time the watchdog timestamps are refreshed?
Same reason. I don't know how to do that yet. Again an pointer to an
existing property doing that would be great.

Michael
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Michael Olbrich
2011-10-24 16:04:05 UTC
Permalink
This patch introduces a small watchdog daemon that supervises systemd
and handles /dev/watchdog. This is mostly a prove of concept for now.
---
Makefile.am | 21 ++++-
src/99-systemd.rules.in | 2 +
src/watchdogd.c | 178 ++++++++++++++++++++++++++++++++++++
units/systemd-watchdogd.service.in | 16 +++
4 files changed, 215 insertions(+), 2 deletions(-)
create mode 100644 src/watchdogd.c
create mode 100644 units/systemd-watchdogd.service.in

diff --git a/Makefile.am b/Makefile.am
index dabe32a..d336938 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -173,7 +173,8 @@ rootlibexec_PROGRAMS = \
systemd-detect-virt \
systemd-sysctl \
systemd-logind \
- systemd-uaccess
+ systemd-uaccess \
+ systemd-watchdogd

if ENABLE_BINFMT
rootlibexec_PROGRAMS += \
@@ -416,6 +417,7 @@ nodist_systemunit_DATA = \
units/systemd-ask-password-wall.service \
units/systemd-ask-password-console.service \
units/systemd-sysctl.service \
+ units/systemd-watchdogd.service \
units/halt.service \
units/poweroff.service \
units/reboot.service \
@@ -479,6 +481,7 @@ EXTRA_DIST = \
units/systemd-ask-password-wall.service.in \
units/systemd-ask-password-console.service.in \
units/systemd-sysctl.service.in \
+ units/systemd-watchdogd.service.in \
units/halt.service.in \
units/poweroff.service.in \
units/reboot.service.in \
@@ -1309,6 +1312,19 @@ systemd_kmsg_syslogd_LDADD = \
libsystemd-basic.la \
libsystemd-daemon.la

+systemd_watchdogd_SOURCES = \
+ src/watchdogd.c \
+ src/dbus-common.c
+
+systemd_watchdogd_CFLAGS = \
+ $(AM_CFLAGS) \
+ $(DBUS_CFLAGS)
+
+systemd_watchdogd_LDADD = \
+ libsystemd-basic.la \
+ libsystemd-daemon.la \
+ $(DBUS_LIBS)
+
systemctl_SOURCES = \
src/systemctl.c \
src/utmp-wtmp.c \
@@ -1898,7 +1914,8 @@ if ENABLE_LOCALED
endif
( cd $(DESTDIR)$(systemunitdir)/basic.target.wants && \
rm -f systemd-tmpfiles-clean.timer && \
- $(LN_S) ../systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.timer )
+ $(LN_S) ../systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.timer && \
+ $(LN_S) ../systemd-watchdogd.service systemd-watchdogd.service )
( cd $(DESTDIR)$(dbussessionservicedir) && \
rm -f org.freedesktop.systemd1.service && \
$(LN_S) ../system-services/org.freedesktop.systemd1.service org.freedesktop.systemd1.service )
diff --git a/src/99-systemd.rules.in b/src/99-systemd.rules.in
index b2481ae..8e9dfcd 100644
--- a/src/99-systemd.rules.in
+++ b/src/99-systemd.rules.in
@@ -12,6 +12,8 @@ SUBSYSTEM=="tty", KERNEL=="tty[a-zA-Z]*|hvc*|xvc*|hvsi*", TAG+="systemd"

KERNEL=="vport*", TAG+="systemd"

+KERNEL=="watchdog", TAG+="systemd"
+
SUBSYSTEM=="block", KERNEL!="ram*|loop*", TAG+="systemd"
SUBSYSTEM=="block", KERNEL!="ram*|loop*", ENV{DM_UDEV_DISABLE_OTHER_RULES_FLAG}=="1", ENV{SYSTEMD_READY}="0"

diff --git a/src/watchdogd.c b/src/watchdogd.c
new file mode 100644
index 0000000..269f42e
--- /dev/null
+++ b/src/watchdogd.c
@@ -0,0 +1,178 @@
+/*-*- Mode: C; c-basic-offset: 8; indent-tabs-mode: nil -*-*/
+
+/***
+ This file is part of systemd.
+
+ Copyright 2010 Michael Olbrich
+
+ systemd is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ systemd is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with systemd; If not, see <http://www.gnu.org/licenses/>.
+***/
+
+#include <dbus/dbus.h>
+
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <time.h>
+#include <string.h>
+#include <errno.h>
+#include <getopt.h>
+#include <linux/watchdog.h>
+
+#include "log.h"
+#include "util.h"
+#include "dbus-common.h"
+#include "sd-daemon.h"
+
+static int timeout = 10;
+static usec_t reset_delay_usec = 10 * USEC_PER_SEC;
+
+static int help(void) {
+
+ printf("%s [OPTIONS...]\n\n"
+ "Handles /dev/watchdog and monitors systemd to check when a hardware\n"
+ "reset is necessary\n\n"
+ " -h --help Show this help\n"
+ " -T --timeout=TIMEOUT Watchdog timeout in seconds\n"
+ " -R --reset-delay=REBOOT The amount of time systemd get to reboot\n"
+ " the system gracefully in seconds\n",
+ program_invocation_short_name);
+
+ return 0;
+}
+
+static int parse_argv(int argc, char *argv[]) {
+
+ static const struct option options[] = {
+ { "help", no_argument, NULL, 'h' },
+ { "timeout", required_argument, NULL, 'T' },
+ { "reset-delay", required_argument, NULL, 'R' },
+ { NULL, 0, NULL, 0 }
+ };
+
+ int c;
+
+ assert(argc >= 0);
+ assert(argv);
+
+ while ((c = getopt_long(argc, argv, "+hT:R:", options, NULL)) >= 0) {
+
+ switch (c) {
+
+ case 'h':
+ help();
+ return 0;
+
+ case 'T':
+ timeout = atoi(optarg);
+ break;
+
+ case 'R':
+ reset_delay_usec = atoi(optarg) * USEC_PER_SEC;
+ break;
+
+ default:
+ log_error("Unknown option code %c", c);
+ help();
+ return -EINVAL;
+ }
+ }
+
+ return 1;
+}
+
+int main(int argc, char *argv[]) {
+ int fd, ret;
+ struct timespec ts;
+ DBusConnection *bus = NULL;
+ DBusMessage *m, *reply = NULL;
+ DBusError error;
+ DBusMessageIter iter, sub;
+ usec_t watchdog_usec, now_usec;
+ const char *iface = "org.freedesktop.systemd1.Manager";
+ const char *property = "WatchdogRebootTimestampMonotonic";
+
+ log_set_target(LOG_TARGET_AUTO);
+ log_parse_environment();
+ log_open();
+
+ if ((ret = parse_argv(argc, argv)) <= 0)
+ return ret;
+
+ umask(0022);
+
+ dbus_error_init(&error);
+
+ if ((ret = bus_connect(DBUS_BUS_SYSTEM, &bus, NULL, &error)) < 0) {
+ log_error("Failed to get D-Bus connection: %s", bus_error_message(&error));
+ return 1;
+ }
+ if (!(m = dbus_message_new_method_call("org.freedesktop.systemd1", "/org/freedesktop/systemd1", "org.freedesktop.DBus.Properties", "Get"))) {
+ log_error("Could not allocate message.");
+ return 1;
+ }
+ if (!dbus_message_append_args(m,
+ DBUS_TYPE_STRING, &iface,
+ DBUS_TYPE_STRING, &property,
+ DBUS_TYPE_INVALID)) {
+ log_error("Could not attach target and flag information to message.");
+ return 1;
+ }
+
+ if ((fd = open("/dev/watchdog", O_RDWR)) < 0) {
+ log_error("Could not open /dev/watchdog: %s", strerror(errno));
+ return 1;
+ }
+
+ if (ioctl(fd, WDIOC_SETTIMEOUT, &timeout) < 0) {
+ log_error("Failed to set watchdog timeout: %s", strerror(errno));
+ return 1;
+ }
+ log_info("Watchdog timeout set to %ds", timeout);
+ sd_notify(false,
+ "READY=1\n"
+ "STATUS=Watchdog started.");
+
+ timeout /= 2;
+ clock_gettime(CLOCK_MONOTONIC, &ts);
+ while (true) {
+ now_usec = timespec_load(&ts);
+ clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &ts, 0);
+ ts.tv_sec += timeout;
+
+ if (reply)
+ dbus_message_unref(reply);
+
+ if (!(reply = dbus_connection_send_with_reply_and_block(bus, m, -1, &error))) {
+ log_error("Error fetching reboot timestamp: %s", bus_error_message(&error));
+ continue;
+ }
+ if (!dbus_message_iter_init(reply, &iter)) {
+ log_error("Failed to parse reply (init).");
+ continue;
+ }
+ dbus_message_iter_recurse(&iter, &sub);
+ if (bus_iter_get_basic_and_next(&sub, DBUS_TYPE_UINT64, &watchdog_usec, false) < 0) {
+ log_error("Failed to parse reply (value).");
+ continue;
+ }
+ if (watchdog_usec && ((watchdog_usec + reset_delay_usec) < now_usec)) {
+ log_error("Reboot failed: now = %llu, watchdog = %llu", now_usec, watchdog_usec);
+ continue;
+ }
+ ioctl(fd, WDIOC_KEEPALIVE, 0);
+ }
+
+
+}
diff --git a/units/systemd-watchdogd.service.in b/units/systemd-watchdogd.service.in
new file mode 100644
index 0000000..643b3da
--- /dev/null
+++ b/units/systemd-watchdogd.service.in
@@ -0,0 +1,16 @@
+# This file is part of systemd.
+#
+# systemd is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+
+[Unit]
+Description=Watchdog Daemon
+DefaultDependencies=no
+BindTo=dev-watchdog.device
+After=dev-watchdog.device
+
+[Service]
+Type=notify
+ExecStart=@rootlibexecdir@/systemd-watchdogd
--
1.7.7
Lennart Poettering
2011-11-02 01:53:45 UTC
Permalink
Post by Michael Olbrich
This patch introduces a small watchdog daemon that supervises systemd
and handles /dev/watchdog. This is mostly a prove of concept for now.
Hmm, do we really want to do this externally? Maybe it would be nicer to
do this right from PID 1? I mean, it's just two ioctl()s right? i.e. a
tight loop that each time we get a TIMEOUT=1 notification message just
calls WDIOC_KEEPALIVE and WDIOC_SETTIMEOUT should be sufficient, right?
This sounds like something we could do in 10 lines in PID 1, which
otherwise would cost 200 lines outside of it. This would also allow us
to get rid of the extra properties on the Manager object, right?

Lennart
--
Lennart Poettering - Red Hat, Inc.
Michael Olbrich
2011-11-03 11:07:26 UTC
Permalink
Post by Lennart Poettering
Post by Michael Olbrich
This patch introduces a small watchdog daemon that supervises systemd
and handles /dev/watchdog. This is mostly a prove of concept for now.
Hmm, do we really want to do this externally? Maybe it would be nicer to
do this right from PID 1? I mean, it's just two ioctl()s right? i.e. a
tight loop that each time we get a TIMEOUT=1 notification message just
calls WDIOC_KEEPALIVE and WDIOC_SETTIMEOUT should be sufficient, right?
This sounds like something we could do in 10 lines in PID 1, which
otherwise would cost 200 lines outside of it. This would also allow us
to get rid of the extra properties on the Manager object, right?
As long as it's just about handling /dev/watchdog with a fixed interval,
then that's fine. However, depending on the use-case there might be other
requirements, e.g. whether it should be necessary or even possible to
disable the watchdog.
Also the watchdog hardware may be an external unit that a daemon
communicates with using a higher level protocol.

So how about a simple implementation inside systemd that is enabled if an
interval is configured (where would I put that?) and /dev/watchdog exists.

As long as we have something like patch 3, an external implementation can
handle any more complex scenarios.

Related to all this is something else I've been wondering about: Is it
possible to block any reboot requests? e.g. during a system update, or
something like that.

Michael
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Michael Olbrich
2011-10-24 16:04:02 UTC
Permalink
This patch adds WatchdogTimestamp[Monotonic] to the systemd service
D-Bus API. The timestamp is updated to the current time when the
service calls 'sd_nofity("WATCHDOG=1\n")'.
Using a timestamp instead of an 'alive' flag has two advantages:
1. No timeout is needed to define when a service is no longer alive.
This simplifies both configuration (no timeout value) and
implementation (no timeout event).
2. It is more robust. A 'dead' service might not be detected should
systemd 'forget' to reset an 'alive' flag. It is much less likely
to get a valid new timestamp if a service died.
---
src/dbus-service.c | 4 ++++
src/sd-daemon.h | 5 +++++
src/service.c | 20 ++++++++++++++++++++
src/service.h | 2 ++
4 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/src/dbus-service.c b/src/dbus-service.c
index 3486623..5a740de 100644
--- a/src/dbus-service.c
+++ b/src/dbus-service.c
@@ -43,6 +43,8 @@
" <property name=\"NotifyAccess\" type=\"s\" access=\"read\"/>\n" \
" <property name=\"RestartUSec\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"TimeoutUSec\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogTimestamp\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
BUS_EXEC_COMMAND_INTERFACE("ExecStartPre") \
BUS_EXEC_COMMAND_INTERFACE("ExecStart") \
BUS_EXEC_COMMAND_INTERFACE("ExecStartPost") \
@@ -104,6 +106,8 @@ DBusHandlerResult bus_service_message_handler(Unit *u, DBusConnection *connectio
{ "org.freedesktop.systemd1.Service", "NotifyAccess", bus_service_append_notify_access, "s", &u->service.notify_access },
{ "org.freedesktop.systemd1.Service", "RestartUSec", bus_property_append_usec, "t", &u->service.restart_usec },
{ "org.freedesktop.systemd1.Service", "TimeoutUSec", bus_property_append_usec, "t", &u->service.timeout_usec },
+ { "org.freedesktop.systemd1.Service", "WatchdogTimestamp", bus_property_append_usec, "t", &u->service.watchdog_timestamp.realtime },
+ { "org.freedesktop.systemd1.Service", "WatchdogTimestampMonotonic",bus_property_append_usec,"t", &u->service.watchdog_timestamp.monotonic },
BUS_EXEC_COMMAND_PROPERTY("org.freedesktop.systemd1.Service", u->service.exec_command[SERVICE_EXEC_START_PRE], "ExecStartPre"),
BUS_EXEC_COMMAND_PROPERTY("org.freedesktop.systemd1.Service", u->service.exec_command[SERVICE_EXEC_START], "ExecStart"),
BUS_EXEC_COMMAND_PROPERTY("org.freedesktop.systemd1.Service", u->service.exec_command[SERVICE_EXEC_START_POST], "ExecStartPost"),
diff --git a/src/sd-daemon.h b/src/sd-daemon.h
index 46dc7fd..17536f7 100644
--- a/src/sd-daemon.h
+++ b/src/sd-daemon.h
@@ -217,6 +217,11 @@ int sd_is_mq(int fd, const char *path);
MAINPID=... The main pid of a daemon, in case systemd did not
fork off the process itself. Example: "MAINPID=4711"

+ WATCHDOG=1 Tells systemd to update the watchdog timestamp.
+ Services using this feature should do this in
+ regular intervals. A watchdog framework can use the
+ timestamps to detect failed services.
+
Daemons can choose to send additional variables. However, it is
recommended to prefix variable names not listed above with X_.

diff --git a/src/service.c b/src/service.c
index 6184390..0d68d8d 100644
--- a/src/service.c
+++ b/src/service.c
@@ -194,6 +194,19 @@ static void service_connection_unref(Service *s) {
s->accept_socket = NULL;
}

+static void service_stop_watchdog(Service *s) {
+ assert(s);
+
+ s->watchdog_timestamp.realtime = 0;
+ s->watchdog_timestamp.monotonic = 0;
+}
+
+static void service_reset_watchdog(Service *s) {
+ assert(s);
+
+ dual_timestamp_get(&s->watchdog_timestamp);
+}
+
static void service_done(Unit *u) {
Service *s = SERVICE(u);

@@ -1522,6 +1535,9 @@ static void service_set_state(Service *s, ServiceState state) {
service_connection_unref(s);
}

+ if (state == SERVICE_STOP)
+ service_stop_watchdog(s);
+
/* For the inactive states unit_notify() will trim the cgroup,
* but for exit we have to do that ourselves... */
if (state == SERVICE_EXITED && s->meta.manager->n_reloading <= 0)
@@ -2993,6 +3009,10 @@ static void service_notify_message(Unit *u, pid_t pid, char **tags) {
}

}
+ if (strv_find(tags, "WATCHDOG=1")) {
+ log_debug("%s: got WATCHDOG=1", u->meta.id);
+ service_reset_watchdog(s);
+ }

/* Notify clients about changed status or main pid */
unit_add_to_dbus_queue(u);
diff --git a/src/service.h b/src/service.h
index e28f74b..3801e84 100644
--- a/src/service.h
+++ b/src/service.h
@@ -98,6 +98,8 @@ struct Service {
usec_t restart_usec;
usec_t timeout_usec;

+ dual_timestamp watchdog_timestamp;
+
ExecCommand* exec_command[_SERVICE_EXEC_COMMAND_MAX];
ExecContext exec_context;
--
1.7.7
Lennart Poettering
2011-11-02 01:34:04 UTC
Permalink
On Mon, 24.10.11 18:04, Michael Olbrich (***@pengutronix.de) wrote:

Heya,

Sorry for the delay in reviewing. We've been busy with getting F16 out
of the door and didn't want to add major new features into systemd at
that time. But now git is open for new features again.

When Kay and I first saw your patch we initially were inclined to say
"no" to this, since we didn't want to have monitoring capabilities in
systemd. It is our intention to provide the right hooks to do
monitoring, but not do the monitoring itself. i.e. provide the data that
monitoring tools need, but insist that they need to run outside of
system. Or with other words: we never ever want to see code in systemd
that is an HTTP client to check whether apache is still responding.

However, your patch is actually quite minimal and very generic, and
reuses the existing notification channel for the watchdog messages which
are just trivial extensions of it. That makes it a lot more interesting,
in particular because this means systemd never has to ask the clients,
but they have to query systemd. So, yupp, we are convinced, and would
like to merge this.
Post by Michael Olbrich
--- a/src/sd-daemon.h
+++ b/src/sd-daemon.h
@@ -217,6 +217,11 @@ int sd_is_mq(int fd, const char *path);
MAINPID=... The main pid of a daemon, in case systemd did not
fork off the process itself. Example: "MAINPID=4711"
+ WATCHDOG=1 Tells systemd to update the watchdog timestamp.
+ Services using this feature should do this in
+ regular intervals. A watchdog framework can use the
+ timestamps to detect failed services.
+
Daemons can choose to send additional variables. However, it is
recommended to prefix variable names not listed above with X_.
Please add the same to the man page of sd_notify().

Otherwise looks flawless!

Lennart
--
Lennart Poettering - Red Hat, Inc.
Michael Olbrich
2011-11-03 10:31:19 UTC
Permalink
Hi,
Post by Lennart Poettering
Sorry for the delay in reviewing. We've been busy with getting F16 out
of the door and didn't want to add major new features into systemd at
that time. But now git is open for new features again.
When Kay and I first saw your patch we initially were inclined to say
"no" to this, since we didn't want to have monitoring capabilities in
systemd. It is our intention to provide the right hooks to do
monitoring, but not do the monitoring itself. i.e. provide the data that
monitoring tools need, but insist that they need to run outside of
system. Or with other words: we never ever want to see code in systemd
that is an HTTP client to check whether apache is still responding.
I've been thinking on how to actually do some kind of monitoring for
services that can't (for whatever reason) do the sd_notify() stuff by
themselves. One of the use-cases is actually to monitor a web-server or an
attached fastCGI service. What I came up with is this:

- start the monitor process instead of the actual service.
- fork
- exec the actual service in the parent process
- is this enough to now break any main-pid detection?
- in the child process:
while (true) {
if (service_is_ok())
sd_notify(...)
sleep()
}

It's not all that elegant, but it works well enough. It would of course be
nicer to have a ExecWatchdog= or something like that. Would you accept a
patch for that?
Post by Lennart Poettering
However, your patch is actually quite minimal and very generic, and
reuses the existing notification channel for the watchdog messages which
are just trivial extensions of it. That makes it a lot more interesting,
in particular because this means systemd never has to ask the clients,
but they have to query systemd. So, yupp, we are convinced, and would
like to merge this.
Great.
Post by Lennart Poettering
Post by Michael Olbrich
--- a/src/sd-daemon.h
+++ b/src/sd-daemon.h
@@ -217,6 +217,11 @@ int sd_is_mq(int fd, const char *path);
MAINPID=... The main pid of a daemon, in case systemd did not
fork off the process itself. Example: "MAINPID=4711"
+ WATCHDOG=1 Tells systemd to update the watchdog timestamp.
+ Services using this feature should do this in
+ regular intervals. A watchdog framework can use the
+ timestamps to detect failed services.
+
Daemons can choose to send additional variables. However, it is
recommended to prefix variable names not listed above with X_.
Please add the same to the man page of sd_notify().
ok.
Post by Lennart Poettering
Otherwise looks flawless!
Great. I'll try to send an updated version soon.

Regards,
Michael
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Michael Olbrich
2011-10-24 16:04:03 UTC
Permalink
This patch adds the WatchdogRestartUSec and WatchdogRebootUSec
properties to services. Systemd will restart the service / reboot the
system if the watchdog timeout has not been updated for the configured
amount of time.
This functionality is only enabled if the watchdog timeout is set at
least once.
---
man/systemd.service.xml | 28 ++++++++++++++++++++++++++++
src/dbus-service.c | 4 ++++
src/load-fragment-gperf.gperf.m4 | 2 ++
src/service.c | 27 +++++++++++++++++++++++++++
src/service.h | 4 ++++
5 files changed, 65 insertions(+), 0 deletions(-)

diff --git a/man/systemd.service.xml b/man/systemd.service.xml
index 7b6f12d..90989cf 100644
--- a/man/systemd.service.xml
+++ b/man/systemd.service.xml
@@ -460,6 +460,34 @@
</varlistentry>

<varlistentry>
+ <term><varname>WatchdogRestartUSec=</varname></term>
+ <listitem><para>Configures the time to
+ wait before restarting a service. This
+ is activated with the first
+ <citerefentry><refentrytitle>sd_notify</refentrytitle><manvolnum>3</manvolnum></citerefentry>
+ call with "WATCHDOG=1". If the time
+ between two such calls is larger than
+ the configured time then the service
+ is restarted.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>WatchdogRebootUSec=</varname></term>
+ <listitem><para>Configures the time to
+ wait before rebooting the system. This
+ is basically the same as
+ <varname>WatchdogRestartUSec=</varname>
+ but the whole system is rebooted
+ instead of just restarting the
+ service. The typical use-case is to
+ set this to
+ <varname>WatchdogRestartUSec</varname>
+ + <varname>TimeoutSec</varname> to
+ reboot in case the service restart
+ fails.</para></listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><varname>Restart=</varname></term>
<listitem><para>Configures whether the
main service process shall be
diff --git a/src/dbus-service.c b/src/dbus-service.c
index 5a740de..34eabdf 100644
--- a/src/dbus-service.c
+++ b/src/dbus-service.c
@@ -43,6 +43,8 @@
" <property name=\"NotifyAccess\" type=\"s\" access=\"read\"/>\n" \
" <property name=\"RestartUSec\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"TimeoutUSec\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogRestartUSec\" type=\"t\" access=\"read\"/>\n" \
+ " <property name=\"WatchdogRebootUSec\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"WatchdogTimestamp\" type=\"t\" access=\"read\"/>\n" \
" <property name=\"WatchdogTimestampMonotonic\" type=\"t\" access=\"read\"/>\n" \
BUS_EXEC_COMMAND_INTERFACE("ExecStartPre") \
@@ -106,6 +108,8 @@ DBusHandlerResult bus_service_message_handler(Unit *u, DBusConnection *connectio
{ "org.freedesktop.systemd1.Service", "NotifyAccess", bus_service_append_notify_access, "s", &u->service.notify_access },
{ "org.freedesktop.systemd1.Service", "RestartUSec", bus_property_append_usec, "t", &u->service.restart_usec },
{ "org.freedesktop.systemd1.Service", "TimeoutUSec", bus_property_append_usec, "t", &u->service.timeout_usec },
+ { "org.freedesktop.systemd1.Service", "WatchdogRestartUSec", bus_property_append_usec, "t", &u->service.watchdog_restart_usec },
+ { "org.freedesktop.systemd1.Service", "WatchdogRebootUSec", bus_property_append_usec, "t", &u->service.watchdog_reboot_usec },
{ "org.freedesktop.systemd1.Service", "WatchdogTimestamp", bus_property_append_usec, "t", &u->service.watchdog_timestamp.realtime },
{ "org.freedesktop.systemd1.Service", "WatchdogTimestampMonotonic",bus_property_append_usec,"t", &u->service.watchdog_timestamp.monotonic },
BUS_EXEC_COMMAND_PROPERTY("org.freedesktop.systemd1.Service", u->service.exec_command[SERVICE_EXEC_START_PRE], "ExecStartPre"),
diff --git a/src/load-fragment-gperf.gperf.m4 b/src/load-fragment-gperf.gperf.m4
index 41797d2..cde21d2 100644
--- a/src/load-fragment-gperf.gperf.m4
+++ b/src/load-fragment-gperf.gperf.m4
@@ -131,6 +131,8 @@ Service.ExecStop, config_parse_exec, SERVICE_EXE
Service.ExecStopPost, config_parse_exec, SERVICE_EXEC_STOP_POST, offsetof(Service, exec_command)
Service.RestartSec, config_parse_usec, 0, offsetof(Service, restart_usec)
Service.TimeoutSec, config_parse_usec, 0, offsetof(Service, timeout_usec)
+Service.WatchdogRestartSec, config_parse_usec, 0, offsetof(Service, watchdog_restart_usec)
+Service.WatchdogRebootSec, config_parse_usec, 0, offsetof(Service, watchdog_reboot_usec)
Service.Type, config_parse_service_type, 0, offsetof(Service, type)
Service.Restart, config_parse_service_restart, 0, offsetof(Service, restart)
Service.PermissionsStartOnly, config_parse_bool, 0, offsetof(Service, permissions_start_only)
diff --git a/src/service.c b/src/service.c
index 0d68d8d..b0c775c 100644
--- a/src/service.c
+++ b/src/service.c
@@ -112,6 +112,11 @@ static void service_init(Unit *u) {

s->timeout_usec = DEFAULT_TIMEOUT_USEC;
s->restart_usec = DEFAULT_RESTART_USEC;
+
+ s->watchdog_restart_usec = DEFAULT_TIMEOUT_USEC;
+ s->watchdog_restart_watch.type = WATCH_INVALID;
+ s->watchdog_reboot_watch.type = WATCH_INVALID;
+
s->timer_watch.type = WATCH_INVALID;
#ifdef HAVE_SYSV_COMPAT
s->sysv_start_priority = -1;
@@ -197,14 +202,25 @@ static void service_connection_unref(Service *s) {
static void service_stop_watchdog(Service *s) {
assert(s);

+ unit_unwatch_timer(UNIT(s), &s->watchdog_restart_watch);
+ unit_unwatch_timer(UNIT(s), &s->watchdog_reboot_watch);
s->watchdog_timestamp.realtime = 0;
s->watchdog_timestamp.monotonic = 0;
}

static void service_reset_watchdog(Service *s) {
+ int r;
assert(s);

dual_timestamp_get(&s->watchdog_timestamp);
+ if (s->watchdog_restart_usec) {
+ if ((r = unit_watch_timer(UNIT(s), s->watchdog_restart_usec, &s->watchdog_restart_watch)) < 0)
+ log_warning("%s failed to install watchdog restart timer: %s", s->meta.id, strerror(-r));
+ }
+ if (s->watchdog_restart_usec) {
+ if ((r = unit_watch_timer(UNIT(s), s->watchdog_reboot_usec, &s->watchdog_reboot_watch)) < 0)
+ log_warning("%s failed to install watchdog reboot timer: %s", s->meta.id, strerror(-r));
+ }
}

static void service_done(Unit *u) {
@@ -2829,6 +2845,17 @@ static void service_timer_event(Unit *u, uint64_t elapsed, Watch* w) {
assert(s);
assert(elapsed == 1);

+ if (w == &s->watchdog_restart_watch) {
+ log_error("%s watchdog timeout: restarting service...", u->meta.id);
+ manager_add_job(u->meta.manager, JOB_RESTART, u, JOB_FAIL, true, 0, 0);
+ return;
+ }
+ if (w == &s->watchdog_reboot_watch) {
+ log_error("%s watchdog timeout: rebooting...", u->meta.id);
+ manager_add_job_by_name(u->meta.manager, JOB_START, "reboot.target", JOB_REPLACE, true, 0, 0);
+ return;
+ }
+
assert(w == &s->timer_watch);

switch (s->state) {
diff --git a/src/service.h b/src/service.h
index 3801e84..cab18c2 100644
--- a/src/service.h
+++ b/src/service.h
@@ -99,6 +99,10 @@ struct Service {
usec_t timeout_usec;

dual_timestamp watchdog_timestamp;
+ usec_t watchdog_restart_usec;
+ usec_t watchdog_reboot_usec;
+ Watch watchdog_restart_watch;
+ Watch watchdog_reboot_watch;

ExecCommand* exec_command[_SERVICE_EXEC_COMMAND_MAX];
ExecContext exec_context;
--
1.7.7
Lennart Poettering
2011-11-02 01:42:12 UTC
Permalink
Post by Michael Olbrich
<varlistentry>
+
<term><varname>WatchdogRestartUSec=</varname></term>
Hmm, so there's a bit of an asymmetry in systemd here. While the bus
properties usually use "USec" as unit, since they are just 64bit
integers, the configuration settings in the files are actually all
suffixed "Sec", since they all parse unit-less values as seconds. It's a
bit weird, but should clarify that WatchdogRestartSec=5 means 5s, while
still accepting WatchdogRestartSec=50ms -- which will be 50ms, even if
the switch is a bit surprising there.

Anyway, this might be a bit surprising in general, but it does make some
sense. And hence I'd like to ask you to use Sec=, not USec= as suffix
for your settings, especially since you use config_parse_usec() like the
other timeouts.
Post by Michael Olbrich
+ <listitem><para>Configures the time to
+ wait before restarting a service. This
+ is activated with the first
+ <citerefentry><refentrytitle>sd_notify</refentrytitle><manvolnum>3</manvolnum></citerefentry>
+ call with "WATCHDOG=1". If the time
+ between two such calls is larger than
+ the configured time then the service
+ is restarted.</para></listitem>
+ </varlistentry>
Please mention that this defaults to 0, which means no watchdog enabled.
Post by Michael Olbrich
dual_timestamp_get(&s->watchdog_timestamp);
+ if (s->watchdog_restart_usec) {
+ if ((r = unit_watch_timer(UNIT(s), s->watchdog_restart_usec, &s->watchdog_restart_watch)) < 0)
+ log_warning("%s failed to install watchdog restart timer: %s", s->meta.id, strerror(-r));
+ }
+ if (s->watchdog_restart_usec) {
+ if ((r = unit_watch_timer(UNIT(s), s->watchdog_reboot_usec, &s->watchdog_reboot_watch)) < 0)
+ log_warning("%s failed to install watchdog reboot timer: %s", s->meta.id, strerror(-r));
+ }
}
Hmm, please change this syntax:

if ((r = foo()) < 0) { ...

to this:

r = foo();
if (r < 0) { ...

I started systemd using the first syntax, but in order to come closer to
the kernel coding style all new code should use the second syntax.

Otherwise looks good.

Lennart
--
Lennart Poettering - Red Hat, Inc.
Michael Olbrich
2011-11-03 10:36:58 UTC
Permalink
Post by Lennart Poettering
Post by Michael Olbrich
<varlistentry>
+
<term><varname>WatchdogRestartUSec=</varname></term>
Hmm, so there's a bit of an asymmetry in systemd here. While the bus
properties usually use "USec" as unit, since they are just 64bit
integers, the configuration settings in the files are actually all
suffixed "Sec", since they all parse unit-less values as seconds. It's a
bit weird, but should clarify that WatchdogRestartSec=5 means 5s, while
still accepting WatchdogRestartSec=50ms -- which will be 50ms, even if
the switch is a bit surprising there.
Anyway, this might be a bit surprising in general, but it does make some
sense. And hence I'd like to ask you to use Sec=, not USec= as suffix
for your settings, especially since you use config_parse_usec() like the
other timeouts.
I just messed up the documentation, right? It's sec in the service file and
usec in the D-Bus API. I copied most of it from Timeout[U]Sec.
Post by Lennart Poettering
Post by Michael Olbrich
+ <listitem><para>Configures the time to
+ wait before restarting a service. This
+ is activated with the first
+ <citerefentry><refentrytitle>sd_notify</refentrytitle><manvolnum>3</manvolnum></citerefentry>
+ call with "WATCHDOG=1". If the time
+ between two such calls is larger than
+ the configured time then the service
+ is restarted.</para></listitem>
+ </varlistentry>
Please mention that this defaults to 0, which means no watchdog enabled.
ok.
Post by Lennart Poettering
Post by Michael Olbrich
dual_timestamp_get(&s->watchdog_timestamp);
+ if (s->watchdog_restart_usec) {
+ if ((r = unit_watch_timer(UNIT(s), s->watchdog_restart_usec, &s->watchdog_restart_watch)) < 0)
+ log_warning("%s failed to install watchdog restart timer: %s", s->meta.id, strerror(-r));
+ }
+ if (s->watchdog_restart_usec) {
+ if ((r = unit_watch_timer(UNIT(s), s->watchdog_reboot_usec, &s->watchdog_reboot_watch)) < 0)
+ log_warning("%s failed to install watchdog reboot timer: %s", s->meta.id, strerror(-r));
+ }
}
if ((r = foo()) < 0) { ...
r = foo();
if (r < 0) { ...
I started systemd using the first syntax, but in order to come closer to
the kernel coding style all new code should use the second syntax.
ok.
Post by Lennart Poettering
Otherwise looks good.
Michael
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Continue reading on narkive:
Loading...