Discussion:
[systemd-devel] ExecRestart
Brandon Black
2012-11-29 04:41:09 UTC
Permalink
Hi all,

I'm trying to write a systemd.service unit file for an existing
well-behaved daemon that's used to managing itself. The daemon binary
doubles as its own controller for sysvinit-like command. For example "foo
start" launches a new daemon. "foo stop" stops an existing instance of the
daemon. Similarly for restart, condrestart, status, etc. This makes
things very simple in the world of sysv-like init systems. The
"initscript" just execs the daemon binary and passes on the user action
argument, and all of the tricky bits are well-managed within the daemon's
own code (pidfiles, sockets, logging, strange corner cases, timing issues,
etc).

I can't simply convert the daemon to expect all of systemd's nice features
and gut all of its self-management code, as it must still be portable to
non-systemd platforms where these features are handy. For the most part
I've been able to successfully work around things, but lack of an
ExecRestart is one of my remaining hangups.

I certainly can publish a unit file without this, and restart would be
performed by ExecStop -> ExecStart. However, the daemon has a bunch of
nice code to do a better restart than that, and I'd need an ExecRestart to
allow users to continue to take advantage of that.

The daemon's "fast restart" code does all of the expensive startup
operations in the new daemon first (e.g. parsing large data input), then
signals the existing daemon to shut itself down, waits for it to release
its critical resources (e.g. sockets, pidfile), and finally takes over
those resources and finishes starting itself. Basically it's using the
overlap to avoid long service downtimes during that initial parsing phase
(and if that parsing fails, it leaves the old daemon running to boot).

Is there some design reason that we can't have an ExecRestart command?
Successful exit of that command would mean the old instance was killed
(which systemd could confirm), and that the restart command has launched a
new instance (which systemd can also figure out via PIDFile or
guessing/cgroup). Failure exit of that command would mean the existing
daemon instance was left alone (and again, systemd could confirm that
state).
Alexander E. Patrakov
2012-11-29 17:36:32 UTC
Permalink
Post by Brandon Black
The daemon's "fast restart" code does all of the expensive startup
operations in the new daemon first (e.g. parsing large data input), then
signals the existing daemon to shut itself down, waits for it to release
its critical resources (e.g. sockets, pidfile), and finally takes over
those resources and finishes starting itself.  Basically it's using the
overlap to avoid long service downtimes during that initial parsing phase
(and if that parsing fails, it leaves the old daemon running to boot).
This is not what a restart means in systemd world. What you described is just a nice way to do a reload. However, as the main pid changes during this reload, please be careful - several years ago that would hit an assertion in systemd, and due to my aziness I have not verified the fix.
--
Alexander E. Patrakov
Sent from Nokia N900
Brandon Black
2012-11-29 23:15:59 UTC
Permalink
On Thu, Nov 29, 2012 at 11:36 AM, Alexander E. Patrakov
Post by Alexander E. Patrakov
Post by Brandon Black
The daemon's "fast restart" code does all of the expensive startup
operations in the new daemon first (e.g. parsing large data input), then
This is not what a restart means in systemd world. What you described is
just a nice way to do a reload. However, as the main pid changes during
this reload, please be careful - several years ago that would hit an
assertion in systemd, and due to my aziness I have not verified the fix.
The current ExecReload code apparently intentionally doesn't allow for a
restart-like operation, though. If ExecReload replaces the process with a
fresh PID, that PID ends up getting SIGKILL'd by systemd. cf. this
post/thread:
http://lists.freedesktop.org/archives/systemd-devel/2012-June/005402.html .
I've confirmed similar behavior recently with my daemon and F18's copy of
systemd.
Jóhann B. Guðmundsson
2012-11-29 19:12:36 UTC
Permalink
Post by Brandon Black
Hi all,
I'm trying to write a systemd.service unit file for an existing
well-behaved daemon that's used to managing itself. The daemon binary
doubles as its own controller for sysvinit-like command. For example
"foo start" launches a new daemon. "foo stop" stops an existing
instance of the daemon. Similarly for restart, condrestart, status,
etc. This makes things very simple in the world of sysv-like init
systems. The "initscript" just execs the daemon binary and passes on
the user action argument, and all of the tricky bits are well-managed
within the daemon's own code (pidfiles, sockets, logging, strange
corner cases, timing issues, etc).
I can't simply convert the daemon to expect all of systemd's nice
features and gut all of its self-management code, as it must still be
portable to non-systemd platforms where these features are handy. For
the most part I've been able to successfully work around things, but
lack of an ExecRestart is one of my remaining hangups.
I certainly can publish a unit file without this, and restart would be
performed by ExecStop -> ExecStart. However, the daemon has a bunch
of nice code to do a better restart than that, and I'd need an
ExecRestart to allow users to continue to take advantage of that.
The daemon's "fast restart" code does all of the expensive startup
operations in the new daemon first (e.g. parsing large data input),
then signals the existing daemon to shut itself down, waits for it to
release its critical resources (e.g. sockets, pidfile), and finally
takes over those resources and finishes starting itself. Basically
it's using the overlap to avoid long service downtimes during that
initial parsing phase (and if that parsing fails, it leaves the old
daemon running to boot).
Is there some design reason that we can't have an ExecRestart command?
Successful exit of that command would mean the old instance was
killed (which systemd could confirm), and that the restart command has
launched a new instance (which systemd can also figure out via PIDFile
or guessing/cgroup). Failure exit of that command would mean the
existing daemon instance was left alone (and again, systemd could
confirm that state).
Which daemon is this ?

JBG
Colin Guthrie
2012-11-30 11:59:09 UTC
Permalink
Post by Brandon Black
The daemon's "fast restart" code does all of the expensive startup
operations in the new daemon first (e.g. parsing large data input), then
signals the existing daemon to shut itself down, waits for it to release
its critical resources (e.g. sockets, pidfile), and finally takes over
those resources and finishes starting itself. Basically it's using the
overlap to avoid long service downtimes during that initial parsing
phase (and if that parsing fails, it leaves the old daemon running to boot).
You should likely look into socket activation. It won't give you all the
features above it should mean that no connections are lost in that window.

What I would probably recommend here is that you write a basic wrapper
daemon (which itself can be socket activated, but that's beside the
point really), that can accept a signal to trigger the reload. You could
then signal this wrapper daemon in ExecReload (something like:
"ExecReload=/usr/bin/kill -s SIGUSR1 $MAINPID").

It can then pass around the socket as needed to your real daemon.

I think it's a basic design decision that ExecReload will not allow for
an effective restart here. Reload != Restart after all.

Col
--
Colin Guthrie
gmane(at)colin.guthr.ie
http://colin.guthr.ie/

Day Job:
Tribalogic Limited http://www.tribalogic.net/
Open Source:
Mageia Contributor http://www.mageia.org/
PulseAudio Hacker http://www.pulseaudio.org/
Trac Hacker http://trac.edgewall.org/
David Strauss
2012-12-01 01:03:14 UTC
Permalink
It won't give you all the features above it should mean that no connections are lost in that window.
I don't think socket activation will do that. If systemd has spawned a
process for each connection, those will get shut down. If the service
handles its own accept() for connections, systemd won't know about
those connection file descriptors or handle their persistence across a
daemon restart.

--
David Strauss
| ***@davidstrauss.net
| +1 512 577 5827 [mobile]
Lennart Poettering
2012-12-19 22:46:13 UTC
Permalink
Post by Brandon Black
The daemon's "fast restart" code does all of the expensive startup
operations in the new daemon first (e.g. parsing large data input), then
signals the existing daemon to shut itself down, waits for it to release
its critical resources (e.g. sockets, pidfile), and finally takes over
those resources and finishes starting itself. Basically it's using the
overlap to avoid long service downtimes during that initial parsing phase
(and if that parsing fails, it leaves the old daemon running to boot).
We specifically don't allow ExecRestart= in order to guarantee that all
restarts are comprehensive and really do what is necessary to be
done. For example, we consider it a good thing to kill all processes
forked off a daemon when we restart the daemon. We can do this easily
with the information from the cgroup, and we generally believe that such
a killing spree is best done from outside the daemon in question rather
from the suicidal daemon itself.

That said I do acknowledge that there is a bit of value of supporting
daemons which can do reexec on their own, where "reexec" is something
between the superficial "reload" and the hardcore "restart". In fact I
am kinda interested to implement "reexec" in some of systemd's own
services (such as journald), so that open sockets are kept around.

Now, the reason why we have no support for a nice "reexec" verb yet is
simply because I am a bit afraid of adding something that might turn-out
not to be necessary, and that might just be a special case of "reload"
after all. I mean, the difference between retsart and reload is kinda
complex already, and adding a third verb (plus all the various
transitive products of this such as "try-restart", "reload-or-restart"
for reexec) makes me feel a bit uncomfortable.

Or in other words:

I am pretty sure that we should not alter the current restart logic, and
should not introduce ExecRestart=. However, we really should think about
either introducing ExecReexec= or somehow making ExecReload= useful for
reexec-style reloading, too. But I haven't made my mind up on this, how
this could look like.

Michal, Zbigniew, Kay, do you have ideas about this?

Lennart
--
Lennart Poettering - Red Hat, Inc.
Daniel P. Berrange
2012-12-20 10:44:34 UTC
Permalink
Post by Lennart Poettering
Post by Brandon Black
The daemon's "fast restart" code does all of the expensive startup
operations in the new daemon first (e.g. parsing large data input), then
signals the existing daemon to shut itself down, waits for it to release
its critical resources (e.g. sockets, pidfile), and finally takes over
those resources and finishes starting itself. Basically it's using the
overlap to avoid long service downtimes during that initial parsing phase
(and if that parsing fails, it leaves the old daemon running to boot).
[snip]
Post by Lennart Poettering
I am pretty sure that we should not alter the current restart logic, and
should not introduce ExecRestart=. However, we really should think about
either introducing ExecReexec= or somehow making ExecReload= useful for
reexec-style reloading, too. But I haven't made my mind up on this, how
this could look like.
FWIW, as previously mentioned, I'd love to see an explicitly supported
way to trigger a re-exec of a daemon. Currently I'm just relying on the
ability to send a custom signal to libvirt's virtlockd daemon. The problem
is that sysadmins would need to learn a different signal number for each
project's daemon. So I think there's value to admins in having a standard
way to trigger this via sysadmin. Personally I think this should also be
separate from ExecReload which is merely used to refresh configuration
files.

Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Colin Guthrie
2012-12-20 11:37:53 UTC
Permalink
Post by Daniel P. Berrange
Post by Lennart Poettering
Post by Brandon Black
The daemon's "fast restart" code does all of the expensive startup
operations in the new daemon first (e.g. parsing large data input), then
signals the existing daemon to shut itself down, waits for it to release
its critical resources (e.g. sockets, pidfile), and finally takes over
those resources and finishes starting itself. Basically it's using the
overlap to avoid long service downtimes during that initial parsing phase
(and if that parsing fails, it leaves the old daemon running to boot).
[snip]
Post by Lennart Poettering
I am pretty sure that we should not alter the current restart logic, and
should not introduce ExecRestart=. However, we really should think about
either introducing ExecReexec= or somehow making ExecReload= useful for
reexec-style reloading, too. But I haven't made my mind up on this, how
this could look like.
FWIW, as previously mentioned, I'd love to see an explicitly supported
way to trigger a re-exec of a daemon. Currently I'm just relying on the
ability to send a custom signal to libvirt's virtlockd daemon. The problem
is that sysadmins would need to learn a different signal number for each
project's daemon. So I think there's value to admins in having a standard
way to trigger this via sysadmin. Personally I think this should also be
separate from ExecReload which is merely used to refresh configuration
files.
Hmmm, this gives me a small idea.

I've had a few users report similar things on different projects, e.g.
one user complained that they used to use:

service httpd closelogs

but with systemd that no longer worked.


Perhaps rather than trying to define a whole new language here we could
instead define some generic way to send signals to a "unit" (obviously
just to the main pid really) and some kind of nice way to cosmetically
rename signals.

e.g.:

SignalMap=closelogs=HUP reexec=USR1

Then you could simply run:
systemctl signal closelogs httpd.service

Or something similar.

Keeps things fairly defined still, but also gives a reasonable amount of
flexibility.

Of course running "systemctl signal HUP httpd.service" should also be
supported and the SignalMap is really just a lightweight wrapper for
cosmetic purposes.


(this doesn't solve all the problems I've had reported of "random" verbs
being used in initscripts. e.g. IIRC the postgres initscript had an
"initdb" verb... this won't be solvable via this mechanism, but I don't
think there is really a desire to allow ad-hoc verb registration in
units with custom "ExecVerb[initdb]=/usr/bin/..." style syntax - I could
be wrong tho'. FWIW Specifically in the postgres case I think it can be
solved differently anyway with an ExecStartPre that checks to see if the
db is init'ed and if not runs and init routine - that's how our mariadb
stuff works these days I think and it seems robust enough)

Col
--
Colin Guthrie
gmane(at)colin.guthr.ie
http://colin.guthr.ie/

Day Job:
Tribalogic Limited http://www.tribalogic.net/
Open Source:
Mageia Contributor http://www.mageia.org/
PulseAudio Hacker http://www.pulseaudio.org/
Trac Hacker http://trac.edgewall.org/
Lennart Poettering
2012-12-20 11:46:14 UTC
Permalink
Post by Colin Guthrie
Post by Daniel P. Berrange
FWIW, as previously mentioned, I'd love to see an explicitly supported
way to trigger a re-exec of a daemon. Currently I'm just relying on the
ability to send a custom signal to libvirt's virtlockd daemon. The problem
is that sysadmins would need to learn a different signal number for each
project's daemon. So I think there's value to admins in having a standard
way to trigger this via sysadmin. Personally I think this should also be
separate from ExecReload which is merely used to refresh configuration
files.
Hmmm, this gives me a small idea.
I've had a few users report similar things on different projects, e.g.
service httpd closelogs
but with systemd that no longer worked.
This is actually documented explicitly, that we don't support this:

http://www.freedesktop.org/wiki/Software/systemd/Incompatibilities

And I am pretty strongly of the oppinion that service-specific verbs
should not be handled in systemd, since there is no need for it and for
much of the verbs you really don't want systemd in the game. For example
apache init scripts frequently have a "configtest" or "info" verb, which
does some apache specific stuff, but where you really don't want all the
magic of systemd with detached ttys and things, because you actually
want the output on the local tty.

If we add more service verbs to systemd, then we should only do that for
verbs that many will actually implement, i.e. which are abstract enough
for people to use more widely.
Post by Colin Guthrie
Perhaps rather than trying to define a whole new language here we could
instead define some generic way to send signals to a "unit" (obviously
just to the main pid really) and some kind of nice way to cosmetically
rename signals.
Signals are really not useful for this as they are asynchronous. I am
pretty sure that we should push people towards implementation of these
verbs in a way that they can rely that the operation finished after
systemctl returned. By adding special support for signals for these
things we'd push people to make these things racy, but we really should
try to push people to make them synchronous and hence non-racy by default.
Lennart
--
Lennart Poettering - Red Hat, Inc.
Timothy Pepper
2012-12-20 22:51:41 UTC
Permalink
Post by Lennart Poettering
http://www.freedesktop.org/wiki/Software/systemd/Incompatibilities
And I am pretty strongly of the oppinion that service-specific verbs
should not be handled in systemd, since there is no need for it and for
much of the verbs you really don't want systemd in the game.
Not intending to argue the need for arbitrary verbs (don't think that
route is desirable/necessary), but I've been kicking the following use
case around my head for a while:

I'd like to have systemd automatically restart my daemon on failure
(ie: Restart=on-failure) in a controlled way (ie: StartLimitInterval and
StartLimitBurst) and have the daemon aware of how it is doing relative
to the start limit controls.

I might be satisfied with something like an
StartLimitAction=Exec=/path/to/daemon/reseter action. And I could
undoubtedly also cobble something equivalent together with additional
unit files and scripts that do management actions, OnFailure=, track
restart counts and do 'systemctl reset-failed' calls. Or an OnFailure
variant which is only triggered at the StartLimit being hit, but then
that overloads the idea of failure relative to Restart=on-failure.
All of these feel kludgy to me as systemd is already capable of pretty
sophisticated management of my daemon and I just want some insight into
what's happening one level up.
Post by Lennart Poettering
Signals are really not useful for this as they are asynchronous. I am
pretty sure that we should push people towards implementation of these
verbs in a way that they can rely that the operation finished after
systemctl returned. By adding special support for signals for these
things we'd push people to make these things racy, but we really should
try to push people to make them synchronous and hence non-racy by default.
I'm inclined to think what I'd really like are some environment variables
passed to me along the lines of how WATCHDOG_USEC informs an executed
service process for WatchDogSec. Such variables would allow my code
and systemd to have some shared configuration context just as there is
shared context with respect to the watchdog.

Along those lines, could env variables similarly be used to give
a daemon some start context (eg: cold start, full restart, quick
restart), allowing it some latitude to do custom things if it wanted?

But then unless sufficiently generic this also just starts becoming a
generic verb support mechanism...
--
Tim Pepper <***@linux.intel.com>
Intel Open Source Technology Centre
Loading...