Discussion:
Revisiting the "ExecRestart" issue
(too old to reply)
Brandon Black
2014-03-28 17:12:27 UTC
Permalink
Hi all,
I've brought this up before, but I became busy/discouraged and dropped
the ball. As systemd becomes increasingly widely deployed, I can no longer
afford to do so, so I'd like to explore this area a bit further on the list
again and see if we can't come up with a workable solution, or if perhaps
I've missed some systemd/cgroups change in the past year or so that already
allows a workaround.

To recap the previous discussion, see the threads at these links (same
thread, two different months in the thread-list):
http://lists.freedesktop.org/archives/systemd-devel/2012-November/007595.html
http://lists.freedesktop.org/archives/systemd-devel/2012-December/007804.html
As well as this referenced/related thread from even earlier (different
author, but I suspect his issues are similar at the core of things):
http://lists.freedesktop.org/archives/systemd-devel/2012-June/005400.html

The daemon I'm working on is the DNS server gdnsd (
https://github.com/~blblack/gdnsd ). While trying to keep this short (fat
chance!), these are the core unique things that matter about it from a
systemd perspective, and how they seem to paint me into a corner:

0) It's meant to be somewhat portable outside of systemd and Linux, at
least to the *BSDs. While I'm completely open to doing some small
(runtime-|autoconf-)conditional blocks of systemd-specific code in place of
traditional daemon code where it makes sense, I can't go and rewrite
everything in a new structure that only makes sense under systemd.

1) The daemon is designed to work as its own initscript. Not unique, but
certainly less-common. It ships a daemon binary which accepts
initscript-actions on the commandline. So, "/usr/sbin/gdnsd start" forks
off a daemon, "/usr/sbin/gdnsd stop" kills the existing daemon, ditto for
"/usr/sbin/gdnsd status", and all the other common initscript verbs. The
internal code is already handling unracy stops and starts, pidfile locking,
reliable "status", proper daemonization, privilege drop, etc through all of
this. Most traditional sysvinit-like systems of course will use a real
shell initscript at runtime, and the real initscript can just invoke these
verbs, perhaps redirecting their verbose output to /dev/null (and know that
pidfiles and processes and whatnot are already well-managed and not need to
write clunky/racy shell code to try to solve those problems).

2) During startup of a fresh daemon, a number of operations have to happen
in a serial fashion due to hard dependency constraints, and for some users
these startup operations can take significant wallclock time relative to
desired service availability. These events including things like loading
zonefiles (which can be expensive for large files or large counts of files,
which is a real world use-case today) and doing initial network-monitoring
polls of remote resources to set their initial state (which involve
timeouts for network responses - these are done in parallel to the degree
possible, but this can still add several seconds for reasonable
monitor-counts with reasonable timeouts). All of these things must
complete before the new daemon can begin answering requests legitimately on
its listening sockets.

3) As you can imagine, this creates a problem for the traditional "restart"
verb: If one stops and then starts, there can be a long gap of service
unavailability. To remedy this, I moved in the direction of having the
internal "restart" verb work in an overlapped fashion. The way "restart"
is implemented basically follows this logic:
a) restart is just a special case of "start"
b) it parses configuration and does all the potentially-long operations
of a normal start first
c) if anything fails (due to a new configuration error, etc), it dies
and leaves the old daemon instance alone.
d) when it successfully reaches the point where it and the existing
daemon can no longer co-exist (because it needs to steal the bound
sockets), it *then* kills the old daemon using the "stop" logic, locks the
pidfile for itself, binds the sockets, and continues on as the new daemon.
e) (and actually, in the upcoming next branch, SO_REUSEPORT will be used
to overlap the sockets as well, allowing for truly zero-packets-lost during
these restart operations).

4) Socket Activation! I know this is what some will scream when they skim
the above, but it's not a realistic solution in this case for a few reasons:
a) The startup delay, in some cases, can be many whole wallclock
seconds. This is necessary and acceptable in the general sense (this is
network service that people use with large server-side installations, not a
desktop thing).
b) The primary socket traffic we care about is UDP, and further we
*really* care about request->response latency for this traffic. Even if
you could set a large enough receive buffer to handle several seconds of
heavy UDP requests (and you can't, for at least some installations), the
multi-second-delay in the responses isn't reasonable.
c) Another side-point that might be better addressed in another thread:
even if both of the above weren't true, this daemon uses several sockets
for multiple "roles" internally, some of which share all low-level details
(e.g. two distinct use-cases for multiple TCP sockets that serve different
high-level protocols, where the user might choose arbitrary ports for
both). I'm not seeing any trivial way to distinguish these via socket
activation - perhaps some kind of socket "label" that could be accessed by
the daemon via sd_* APIs to distinguish would be useful here?

5) ExecReexec - this was one of Lennart's musings in the previous thread in
Dec2012. However, this doesn't map well to gdnsd's model if implemented in
the "obvious" manner of having ExecRexec send a signal to the running
daemon to re-exec itself. It would map well if gdnsd could respond to
SIGFOO via fork()->execve() on itself with the "restart" verb and let the
new instance replace itself when it's ready. The problem is that the new
restarting copy needs elevated privileges to bind its sockets, which it
then loses permanently by the time it becomes a real daemon (and thus can't
provide to the newly execve'd copy). In some cases we could pass on the
sockets on by clearing FD_CLOEXEC, but there's no guarantee as to what
socket bindings the new daemon will have: typically the same as before, but
perhaps the address or port number has changed in the config file for one
of five different sockets.
To try to infer and diff the config/states of the old and new daemon
would be a complex mess. What "gdnsd restart" wants to do is not a
"reload" or some halfway point between reload and restart, it's a full,
complete restart that re-evaluates everything freshly. It just wants to
use overlapping in the time dimension to reduce the downtime of that event.
(We do have a separate reload event for when just zonefiles have changed
but the rest of the configuration has not, and even support for monitoring
those at runtime without needing an event, but that's neither here nor
there and doesn't remove the need for an overlapped restart operation on
real config changes).

6) The TL;DR finale:

What I'm really looking for here is a mechanism by which we can overlap
two daemon instances temporarily for a single service, with the new one
eventually replacing the old one. The ideal would be that ExecRestart (or
whatever verb it ends up being) allows the possibility of the restart
command forking a new daemon becoming the main PID for the service after
killing off the existing one and taking over the pidfile.

I've superficially looked around, and it's possible that I can do this
already (using ExecReload for the moment...) by essentially having the new
daemon read the cgroups of the old daemon and set them on itself manually
while it's still root, although I'm not sure what exactly would happen when
the primary PID changes out from under systemd (via the pidfile being
updated at "runtime" from systemd's perspective) and the old process dies.
I have a bad feeling this would still lead to a SIGKILL of the new process
unless there were another mechanism to notify systemd of the changed PID,
but I haven't tested yet. Even if such a hack works, I fear the basic
manual-cgroup-copying operation would be considered an unsupported
mechanism/interface and break in a future version.

Given where things are at today, as best I can tell my best bet is to go
down that sort of road, though, and try to clone over the cgroups
memberships manually somehow during an ExecReload= command for this restart
(even though it really is a restart), and leaving true reloads (SIGHUP to a
running daemon) to be done from outside systemd. And if that doesn't work,
well, I don't know what to do at this stage. I understand the reluctance
to add these sorts of mechanisms in the general case because they're ripe
for mis-use by those porting hacky sysvinit scripts and whatnot. Perhaps
rather than a new unit-file verb, a better way to allow this is through
re-purposing ExecReload for daemons like this, and having API calls (over
dbus? or a shlib call, either way) that the new daemon instance can invoke
that do the cgroup-copying and main-pid-switching? I'd be happy to hack on
patches for some kind of solution myself, but I don't want to go off
hacking in a direction that will never get merged.

Another option that crosses my mind is that perhaps there are existing
mechanisms (requiring some compile-time support in the code of gdnsd) for
it to become a manager of its own sub-scope of some kind where it's free to
handle these cases in the way that it wants to. I really don't understand
how that works yet, but if there are reasonable paths forward in that
direction, I'd be willing to give that a shot as well. I'm in the process
of updating/refactoring/improving the daemonization and restart code in
general for a new major release, so this is an ideal time to try to fix
systemd compatibility issues while I'm in there.

Thanks,
-- Brandon
Michael Scherer
2014-03-30 02:09:32 UTC
Permalink
Post by Brandon Black
Hi all,
I've brought this up before, but I became busy/discouraged and
dropped the ball. As systemd becomes increasingly widely deployed, I
can no longer afford to do so, so I'd like to explore this area a bit
further on the list again and see if we can't come up with a workable
solution, or if perhaps I've missed some systemd/cgroups change in the
past year or so that already allows a workaround.
[.. snip .. ]
4) Socket Activation! I know this is what some will scream when they
skim the above, but it's not a realistic solution in this case for a
a) The startup delay, in some cases, can be many whole wallclock
seconds. This is necessary and acceptable in the general sense (this
is network service that people use with large server-side
installations, not a desktop thing).
It only occurs on the first start, no ?
Post by Brandon Black
b) The primary socket traffic we care about is UDP, and further we
*really* care about request->response latency for this traffic. Even
if you could set a large enough receive buffer to handle several
seconds of heavy UDP requests (and you can't, for at least some
installations), the multi-second-delay in the responses isn't
reasonable.
Again, that's a multiple second delay only for the first start, after,
this will be the regular way since the socket is directly used by the
daemon.
Post by Brandon Black
c) Another side-point that might be better addressed in another
thread: even if both of the above weren't true, this daemon uses
several sockets for multiple "roles" internally, some of which share
all low-level details (e.g. two distinct use-cases for multiple TCP
sockets that serve different high-level protocols, where the user
might choose arbitrary ports for both). I'm not seeing any trivial
way to distinguish these via socket activation - perhaps some kind of
socket "label" that could be accessed by the daemon via sd_* APIs to
distinguish would be useful here?
You can use getsockopt to get some information, and match the port/type
to the appropriate structure.
See https://trac.torproject.org/projects/tor/ticket/8908 for a patch
doing that kind of thing for tor.
--
Michael Scherer
Brandon Black
2014-03-31 16:03:24 UTC
Permalink
Post by Michael Scherer
Post by Brandon Black
4) Socket Activation! I know this is what some will scream when they
skim the above, but it's not a realistic solution in this case for a
a) The startup delay, in some cases, can be many whole wallclock
seconds. This is necessary and acceptable in the general sense (this
is network service that people use with large server-side
installations, not a desktop thing).
It only occurs on the first start, no ?
No, these delays (well, for configurations large enough to involve
substantial delays) will happen on every fresh start, include "restart"
starts. This means the sequential stop->start that systemd wants to do is
always going to give an availability gap where no daemon is processing
requests for a while. Socket activation would keep the sockets open during
that window, but the buffers would just overflow anyways and/or the
eventual responses would be way too late to matter. The command I want to
execute for ExecRestart doesn't have this issue because it knows how to
coordinate with itself for overlapping, so that the expensive "start"
operations happen before "stop".
Post by Michael Scherer
Post by Brandon Black
socket "label" that could be accessed by the daemon via sd_* APIs to
distinguish would be useful here?
You can use getsockopt to get some information, and match the port/type
to the appropriate structure.
See https://trac.torproject.org/projects/tor/ticket/8908 for a patch
doing that kind of thing for tor.
What I was trying to say (perhaps very unclearly): there might be
distinctions between the many sockets which getsockname() does not capture.
For a generic example: the daemon may allow the user to configure 0->N TCP
sockets for HTTP and 0->M other TCP sockets for HTTPS. The user gets to
choose arbitrary port numbers for them all. getsockname() is going to show
me M+N TCP sockets on arbitrary ports, but how does the information about
which was meant for which role get from user -> service unit -> actual
daemon code?
Brandon Black
2014-04-01 06:55:37 UTC
Permalink
Post by Brandon Black
Given where things are at today, as best I can tell my best bet is to go
down that sort of road, though, and try to clone over the cgroups
memberships manually somehow during an ExecReload= command for this restart
(even though it really is a restart), and leaving true reloads (SIGHUP to a
running daemon) to be done from outside systemd.
I've done some experimenting this evening (on a Fedora 20 system w/
systemd-208),
playing with methods of MAINPID notification and how to coerce
ExecReloadinto letting me do an overlapped restart. The result is
that I can make it
work, but it's hacky. The main thing that bothers me about it is that the
mechanisms probably aren't officially supported interfaces and my methods
will randomly fail on a future version of systemd (or a
differently-configured distro).

To recap my results: there were primarily two things in the way of naively
using ExecReload to trigger gdnsd's overlapped restart:

1) gdnsd wants to use sd_notifyf() to indicate the MAINPID switch in the
new daemon, which is a descendant of the ExecReload process. The
ExecReloadprocess doesn't get a copy of $NOTIFY_SOCKET even with
NotifyAccess=all. So I hacked around that by having the daemon set
$NOTIFY_SOCKET for itself, to the value "@/org/freedesktop/systemd1/notify",
which seems semi-standard for the moment.

2) ExecReload control processes can't become the MAINPID even after
notification because they're not in the correct cgroup (or subgroup, or
whatever it is that's special about most control procs), unlike
Start'scontrol process, which is in the right
cgroup for its descendants to become MAINPID successfully. This was hacked
around by grabbing the basic unit name from sd_pid_get_unit() (let's call
the result "$U") and then writing our pid to "/sys/fs/cgroup/systemd
/system.slice/$U/cgroup.procs" from the new daemon before it drops root
privs and later notifies about the MAINPID switch.

(And of course, re-purposing ExecReload isn't ideal in the first place.
It's semantically wrong and it wastes the reload verb, forcing actual
reload actions to need to happen from outside of systemctl)

The resulting commit (which is off in a testing branch of a development
branch for now, there's plenty of time to work out alternate solutions) is
here:

https://github.com/blblack/gdnsd
/commit/17a40b0483da7d072912169e832df31d69349440
Lennart Poettering
2014-04-23 20:06:30 UTC
Permalink
Post by Brandon Black
Post by Brandon Black
Given where things are at today, as best I can tell my best bet is to go
down that sort of road, though, and try to clone over the cgroups
memberships manually somehow during an ExecReload= command for this restart
(even though it really is a restart), and leaving true reloads (SIGHUP to a
running daemon) to be done from outside systemd.
I've done some experimenting this evening (on a Fedora 20 system w/
systemd-208),
playing with methods of MAINPID notification and how to coerce
ExecReloadinto letting me do an overlapped restart. The result is
that I can make it
work, but it's hacky. The main thing that bothers me about it is that the
mechanisms probably aren't officially supported interfaces and my methods
will randomly fail on a future version of systemd (or a
differently-configured distro).
You should be able to either inform systemd of new new PID by sending
MAINPID, or simply write a new PID file out, systemd should read it
again, if you configure it with PIDFile.
Post by Brandon Black
To recap my results: there were primarily two things in the way of naively
1) gdnsd wants to use sd_notifyf() to indicate the MAINPID switch in the
new daemon, which is a descendant of the ExecReload process. The
ExecReloadprocess doesn't get a copy of $NOTIFY_SOCKET even with
NotifyAccess=all. So I hacked around that by having the daemon set
which seems semi-standard for the moment.
2) ExecReload control processes can't become the MAINPID even after
notification because they're not in the correct cgroup (or subgroup, or
whatever it is that's special about most control procs), unlike
Start'scontrol process, which is in the right
cgroup for its descendants to become MAINPID successfully. This was hacked
around by grabbing the basic unit name from sd_pid_get_unit() (let's call
the result "$U") and then writing our pid to "/sys/fs/cgroup/systemd
/system.slice/$U/cgroup.procs" from the new daemon before it drops root
privs and later notifies about the MAINPID switch.
Hmm, yeah, the new process really needs to be forked off the original
main process. Control processes really can't become the main process I
fear...


Lennart
--
Lennart Poettering, Red Hat
Brandon Black
2014-04-24 02:20:10 UTC
Permalink
On Wed, Apr 23, 2014 at 3:06 PM, Lennart Poettering
Post by Brandon Black
Post by Brandon Black
To recap my results: there were primarily two things in the way of
naively
Post by Brandon Black
1) gdnsd wants to use sd_notifyf() to indicate the MAINPID switch in the
new daemon, which is a descendant of the ExecReload process. The
ExecReloadprocess doesn't get a copy of $NOTIFY_SOCKET even with
NotifyAccess=all. So I hacked around that by having the daemon set
$NOTIFY_SOCKET for itself, to the value
which seems semi-standard for the moment.
2) ExecReload control processes can't become the MAINPID even after
notification because they're not in the correct cgroup (or subgroup, or
whatever it is that's special about most control procs), unlike
Start'scontrol process, which is in the right
cgroup for its descendants to become MAINPID successfully. This was
hacked
Post by Brandon Black
around by grabbing the basic unit name from sd_pid_get_unit() (let's call
the result "$U") and then writing our pid to "/sys/fs/cgroup/systemd
/system.slice/$U/cgroup.procs" from the new daemon before it drops root
privs and later notifies about the MAINPID switch.
Hmm, yeah, the new process really needs to be forked off the original
main process.
The problem here is that the daemon performs operations that require root
privilege on startup, and then dumps its privileges for runtime, and thus
its execve'd child won't have the root privs it would need to start
everything over again. In theory some of these privileged things, like
listening sockets, could be handed to the exec child, but that assumes the
configured set of listening sockets hasn't changed (which might be the
reason for the restart). Other things like mlockall() can't be handed off
over fork/execve once privileges are gone.
Post by Brandon Black
Control processes really can't become the main process I
fear...
They can; I've already done it by writing to /sys as documented above, but
that doesn't seem like a reliable API for doing so going forward on all
platforms and in all situations. What's the fundamental problem with
officially letting a control process from ExecReload= become the main
process via some reasonably-standard mechanism? That's already what
happens to the "control process" for ExecStart=.

I'd propose two changes (and work on the patches myself) that would make
this case work for me reliably, if they're acceptable:

1) Can we get $NOTIFY_SOCKET set for control procs like ExecReload
when NotifyAccess=all ? That's what I initially thought that setting would
do, but apparently it doesn't. Or any other standard mechanism I could
rely on so that I'm not hardcoding a fallback socket path.

2) Given the above, would it be reasonable that if a control process from a
verb like ExecReload sent a MAINPID= message over the control socket,
systemd would accept this as the new main pid *and* internally take care of
promoting the specified PID to the proper cgroup?

-- Brandon
Lennart Poettering
2014-04-24 06:34:54 UTC
Permalink
Post by Brandon Black
The problem here is that the daemon performs operations that require root
privilege on startup, and then dumps its privileges for runtime, and thus
its execve'd child won't have the root privs it would need to start
everything over again. In theory some of these privileged things, like
listening sockets, could be handed to the exec child, but that assumes the
configured set of listening sockets hasn't changed (which might be the
reason for the restart).
There's always the option to raise the privs again via some setuid
helper...
Post by Brandon Black
Other things like mlockall() can't be handed off
over fork/execve once privileges are gone.
mlockall()? what's that supposed to do here? this is usually snakeoil...
Post by Brandon Black
Post by Lennart Poettering
Control processes really can't become the main process I
fear...
They can; I've already done it by writing to /sys as documented above, but
that doesn't seem like a reliable API for doing so going forward on all
platforms and in all situations. What's the fundamental problem with
Also note that sooner or later cgroupfs write access will be removed
from userspace applications...
Post by Brandon Black
officially letting a control process from ExecReload= become the main
process via some reasonably-standard mechanism? That's already what
happens to the "control process" for ExecStart=.
Well ExecStart= is very special, it's not the control process, really.
Post by Brandon Black
I'd propose two changes (and work on the patches myself) that would make
1) Can we get $NOTIFY_SOCKET set for control procs like ExecReload
when NotifyAccess=all ? That's what I initially thought that setting would
do, but apparently it doesn't. Or any other standard mechanism I could
rely on so that I'm not hardcoding a fallback socket path.
Hmm, we don't do this yet? This sounds like a useful thing to do. Added
to the TODO list for now...
Post by Brandon Black
2) Given the above, would it be reasonable that if a control process from a
verb like ExecReload sent a MAINPID= message over the control socket,
systemd would accept this as the new main pid *and* internally take care of
promoting the specified PID to the proper cgroup?
Hmm, this becomes messy if the daemon actually is more than one
process (think worker processes)... Not sure how we would handle that?

Lennart
--
Lennart Poettering, Red Hat
Brandon Black
2014-05-04 23:57:06 UTC
Permalink
On Thu, Apr 24, 2014 at 1:34 AM, Lennart Poettering
Post by Lennart Poettering
Post by Brandon Black
The problem here is that the daemon performs operations that require root
privilege on startup, and then dumps its privileges for runtime, and thus
its execve'd child won't have the root privs it would need to start
everything over again. In theory some of these privileged things, like
listening sockets, could be handed to the exec child, but that assumes
the
Post by Brandon Black
configured set of listening sockets hasn't changed (which might be the
reason for the restart).
There's always the option to raise the privs again via some setuid
helper...
That would seem to defeat the purpose of losing them in the first place
(limiting the damage potential of a compromised daemon).
Post by Lennart Poettering
Post by Brandon Black
Other things like mlockall() can't be handed off
over fork/execve once privileges are gone.
mlockall()? what's that supposed to do here? this is usually snakeoil...
This seems like another side-topic, but what about mlockall is snakeoil?
Should that be documented somewhere? It was just the first example of a
privileged operation we use that came to mind. It's an optional
(default-off) thing in gdnsd, but it seems like if you care about response
latency enough to minimize syscalls, minimizing pagefaults in the presence
of less-important batch processes that may consume significant memory is a
good idea as well. In any case, no, I don't think I can completely get rid
of privileged ops on startup at this time.
Post by Lennart Poettering
Post by Brandon Black
officially letting a control process from ExecReload= become the main
process via some reasonably-standard mechanism? That's already what
happens to the "control process" for ExecStart=.
Well ExecStart= is very special, it's not the control process, really.
Semantics. Can we not have some other verb be as special? My point is,
the systemd code certainly knows how to do this, it just doesn't chose to
for ExecReload. There could be an option declared for that behavior,
though, if it were a solution.
Post by Lennart Poettering
Post by Brandon Black
2) Given the above, would it be reasonable that if a control process
from a
Post by Brandon Black
verb like ExecReload sent a MAINPID= message over the control socket,
systemd would accept this as the new main pid *and* internally take care
of
Post by Brandon Black
promoting the specified PID to the proper cgroup?
Hmm, this becomes messy if the daemon actually is more than one
process (think worker processes)... Not sure how we would handle that?
I assume you mean worker processes which detach themselves from the parent
via setsid() and thus don't show a relationship to it (why would the daemon
chose to disassociate worker children like that? I have no idea).
Otherwise we could just move the whole process group of the sender of the
MAINPID= message.

Lennart Poettering
2014-04-23 06:18:46 UTC
Permalink
Post by Brandon Black
4) Socket Activation! I know this is what some will scream when they skim
a) The startup delay, in some cases, can be many whole wallclock
seconds. This is necessary and acceptable in the general sense (this is
network service that people use with large server-side installations, not a
desktop thing).
UDP is lossy anyway, and a startup delay of a few seconds shouldn't be
an issue at all. If we are speaking of 15min or so here, that might be a
problem, but otherwise this really sounds fine. And if your daemon
really takes 15min this sounds like something to look into...
Post by Brandon Black
even if both of the above weren't true, this daemon uses several sockets
for multiple "roles" internally, some of which share all low-level details
(e.g. two distinct use-cases for multiple TCP sockets that serve different
high-level protocols, where the user might choose arbitrary ports for
both). I'm not seeing any trivial way to distinguish these via socket
activation - perhaps some kind of socket "label" that could be accessed by
the daemon via sd_* APIs to distinguish would be useful here?
You can query the listening ports and properties using getsockname() and
friends. Also, sd-daemon provides sd_is_socket() which allows you to do
similar checks.

On our TODO list is to add an "fd store" concept to units where service
code can push fds to systemd, and pull them out again (to make reloads
nice). At the same time we'd add concept of labelling them.
Post by Brandon Black
5) ExecReexec - this was one of Lennart's musings in the previous thread in
Dec2012. However, this doesn't map well to gdnsd's model if implemented in
the "obvious" manner of having ExecRexec send a signal to the running
daemon to re-exec itself. It would map well if gdnsd could respond to
SIGFOO via fork()->execve() on itself with the "restart" verb and let the
new instance replace itself when it's ready. The problem is that the new
restarting copy needs elevated privileges to bind its sockets, which it
then loses permanently by the time it becomes a real daemon (and thus can't
provide to the newly execve'd copy). In some cases we could pass on the
sockets on by clearing FD_CLOEXEC, but there's no guarantee as to what
socket bindings the new daemon will have: typically the same as before, but
perhaps the address or port number has changed in the config file for one
of five different sockets.
At this point in time I am quite sure that ExecReload= should simply be
used for this.

I am quite sure that "systemctl restart" should do the same thing for
all services, and that means stopping the service, followed by starting,
and have both of these jobs follow the usual ordering dependency
logic (so that other jobs might be order between the stop/start!).

OTOH "systemctl reload" should be that verb where some service-specific
reload operation is executed, where no restriction is made how this
ultimately is implemented, and where no ordering logic really
applies. Whether a process reexec is done for this or not is an
implementation detail of the specific service, where systemd shouldn't
really have to be involved. In general the only suggestion we'd make is
that the effect of ExecReload should be synchronous, as comprehensive as
possible, yet also as graceful as possible. Reexecing as part of reload
sounds like a good idea, if enough care is taken not to stop any ongoing
connections or transactions.

There have been some changes in systemd a while back that makes sure
that ExecReload= can replace the process, so this should pretty much
work now if the daemon is up to it.

Lennart
--
Lennart Poettering, Red Hat
Brandon Black
2014-04-24 02:01:31 UTC
Permalink
On Wed, Apr 23, 2014 at 1:18 AM, Lennart Poettering
Post by Lennart Poettering
UDP is lossy anyway, and a startup delay of a few seconds shouldn't be
an issue at all. If we are speaking of 15min or so here, that might be a
problem, but otherwise this really sounds fine. And if your daemon
really takes 15min this sounds like something to look into...
There are many values between a few seconds and 15 minutes that are both
(a) reasonable startup times given the user's large configuration and (b)
undesirable downtime for a critical service like DNS.
Post by Lennart Poettering
At this point in time I am quite sure that ExecReload= should simply be
used for this.
That's an acceptable answer, although I think in the long term it poses
some questions about additional custom verbs, since at least gdnsd now
really wants two different reload-like actions (a simple SIGHUP that
reloads zone data vs the overlapped-restart under discussion here). But
for now, the easy case (SIGHUP) can just be done outside of
systemd/systemctl without any ill effects.

-- Brandon
Lennart Poettering
2014-04-24 06:16:32 UTC
Permalink
Post by Brandon Black
Post by Lennart Poettering
At this point in time I am quite sure that ExecReload= should simply be
used for this.
That's an acceptable answer, although I think in the long term it poses
some questions about additional custom verbs, since at least gdnsd now
really wants two different reload-like actions (a simple SIGHUP that
reloads zone data vs the overlapped-restart under discussion here). But
for now, the easy case (SIGHUP) can just be done outside of
systemd/systemctl without any ill effects.
Yeah, I am not convinced that custom verbs are something to support in
systemd. They are not generic, and systemd/systemctl should really just
cover the generic verbs. I mean, as soon as you do generic verbs you
probably also want to extend them with extra modifiying switches and so
on. But that all is probably better done in some specific, auxiliary
tool shipped along with the package.

I mean, there's really no point in abstracting something within the
systemd/systemctl context that is inherently not abstractable, if you
follow what I mean.

SMF allowed extending services with custom verbs. I don't think that
that was one of their better design decisions...

Lennart
--
Lennart Poettering, Red Hat
Continue reading on narkive:
Loading...