Discussion:
Side effects of `systemctl daemon-reload`?
Add Reply
Daniel Wang
2018-01-24 18:33:33 UTC
Reply
Permalink
Raw Message
I have cluster of 100s of nodes with systemd-232. To work-around a recently
discovered bug in systemd (https://github.com/systemd/systemd/issues/7798),
I want to deploy a timer to all my nodes that will check the number of
units and run `systemctl daemon-reload` once a certain threshold is hit
(100K for example).

I am asked whether I could have the timer skip the check and blindly run
`systemctl daemon-reload` at every invocation. Would there be any problem
with that? IIUC, reloading the daemon is a rather "safe" operation?
--
Best,
Daniel
Andrei Borzenkov
2018-01-24 19:16:59 UTC
Reply
Permalink
Raw Message
Post by Daniel Wang
I have cluster of 100s of nodes with systemd-232. To work-around a recently
discovered bug in systemd (https://github.com/systemd/systemd/issues/7798),
I want to deploy a timer to all my nodes that will check the number of
units and run `systemctl daemon-reload` once a certain threshold is hit
(100K for example).
I am asked whether I could have the timer skip the check and blindly run
`systemctl daemon-reload` at every invocation. Would there be any problem
with that? IIUC, reloading the daemon is a rather "safe" operation?
There were quite some subtle issues caused by incomplete state preserved
over reload. So YMMV.
Lennart Poettering
2018-01-24 20:11:54 UTC
Reply
Permalink
Raw Message
Post by Daniel Wang
I have cluster of 100s of nodes with systemd-232. To work-around a recently
discovered bug in systemd (https://github.com/systemd/systemd/issues/7798),
I want to deploy a timer to all my nodes that will check the number of
units and run `systemctl daemon-reload` once a certain threshold is hit
(100K for example).
I am asked whether I could have the timer skip the check and blindly run
`systemctl daemon-reload` at every invocation. Would there be any problem
with that? IIUC, reloading the daemon is a rather "safe" operation?
When you have unit files that have multiple ExecXYZ= lines of the same
type (such as a Type=oneshot file with three ExecStart= lines), and
you change these lines between reloads then there's the conceptual
problem that it's not clear where systemd shall continue execution
next: the line it currently executes might have been removed or
changed, and there's no clear rule how to recognize in the changed
version at which line we are now, and which one to start next.

Example:

Consider a service:

[Service]
ExecStart=/usr/bin/one
ExecStart=/usr/bin/two
ExecStart=/usr/bin/three

Now you change it to:

[Service]
ExecStart=/usr/bin/one
ExecStart=/usr/bin/foo
ExecStart=/usr/bin/qux
ExecStart=/usr/bin/three

And reload, while the process "two" was running. What should happen
now? Should systemd now run "foo"? or maybe "quux"? Or "three"?

Now, it's not just about replacing lines fully or adding and removing
some, but also just slightly altering the command line arguments.

Because it isn#t conceptually clear what to run next in such a case
versions of systemd before 233 had the simple rule: "we start from the
beginning" again, after a reload we'd always start the first of the
ExecStart= again.

In 233 this was improved a bit. There we'll try to find the right line
to continue with by searching for it literally, and falling back to a
simple index-based approach to this if we can't find it literlal. It's
making the best from a bad situation, and makes things fully reliable
when the unit file didn't change at least.

The above is the major caveat when reloading the daemon in full: if
you have unit files with multiple ExecXYZ= lines of the same type and
you keep reloading systemdyou are in trouble in < 233, and on >= 233
you are still possibly in trouble if you modify them in between but
are safe if you just keep them as-is.

The relevant PR is https://github.com/systemd/systemd/pull/5354

Another caveat is that because all state is flushed out and then
rebuild in full the operation is relatively heavy if you have tons of
units.

Other than the above doing reloads should be safe and not lose state.

Lennart
--
Lennart Poettering, Red Hat
Daniel Wang
2018-01-24 21:09:54 UTC
Reply
Permalink
Raw Message
Well explained. Thanks Lennart.
Post by Daniel Wang
Post by Daniel Wang
I have cluster of 100s of nodes with systemd-232. To work-around a
recently
Post by Daniel Wang
discovered bug in systemd (https://github.com/systemd/
systemd/issues/7798),
Post by Daniel Wang
I want to deploy a timer to all my nodes that will check the number of
units and run `systemctl daemon-reload` once a certain threshold is hit
(100K for example).
I am asked whether I could have the timer skip the check and blindly run
`systemctl daemon-reload` at every invocation. Would there be any problem
with that? IIUC, reloading the daemon is a rather "safe" operation?
When you have unit files that have multiple ExecXYZ= lines of the same
type (such as a Type=oneshot file with three ExecStart= lines), and
you change these lines between reloads then there's the conceptual
problem that it's not clear where systemd shall continue execution
next: the line it currently executes might have been removed or
changed, and there's no clear rule how to recognize in the changed
version at which line we are now, and which one to start next.
[Service]
ExecStart=/usr/bin/one
ExecStart=/usr/bin/two
ExecStart=/usr/bin/three
[Service]
ExecStart=/usr/bin/one
ExecStart=/usr/bin/foo
ExecStart=/usr/bin/qux
ExecStart=/usr/bin/three
And reload, while the process "two" was running. What should happen
now? Should systemd now run "foo"? or maybe "quux"? Or "three"?
Now, it's not just about replacing lines fully or adding and removing
some, but also just slightly altering the command line arguments.
Because it isn#t conceptually clear what to run next in such a case
versions of systemd before 233 had the simple rule: "we start from the
beginning" again, after a reload we'd always start the first of the
ExecStart= again.
In 233 this was improved a bit. There we'll try to find the right line
to continue with by searching for it literally, and falling back to a
simple index-based approach to this if we can't find it literlal. It's
making the best from a bad situation, and makes things fully reliable
when the unit file didn't change at least.
The above is the major caveat when reloading the daemon in full: if
you have unit files with multiple ExecXYZ= lines of the same type and
you keep reloading systemdyou are in trouble in < 233, and on >= 233
you are still possibly in trouble if you modify them in between but
are safe if you just keep them as-is.
The relevant PR is https://github.com/systemd/systemd/pull/5354
Another caveat is that because all state is flushed out and then
rebuild in full the operation is relatively heavy if you have tons of
units.
Other than the above doing reloads should be safe and not lose state.
Lennart
--
Lennart Poettering, Red Hat
--
Best,
Daniel
Loading...