Discussion:
Changed ordering of systemd-resolved.service
(too old to reply)
Dimitri John Ledkov
2018-04-16 10:47:34 UTC
Permalink
Raw Message
On 13 April 2018 at 16:40, Paul Menzel
Dear Dimitri, dear systemd folks,
In commit 1f158013 (resolved.service: set DefaultDependencies=no) the
ordering of systemd-resolved.service was changed. (How do I find the merge
request to find possible discussion? Also the commit message description is
too specific in my opinion, as it doesn’t give a clue that more is changed.)
https://github.com/systemd/systemd/pull/7609
I like starting systemd-resolved earlier, but unfortunately ordering it
before `network.target` adds a delay on systems wanting to start as fast as
possible. But why did you change it from `network-online.target` to
`network.target`? I’d say `network-online.target` is more correct.
For my use case of a fast system start-up, this change delays it by at least
100 ms, as now it takes longer to reach the end of the network target.
cloud-init initializes networking configuration by fetching,
potentionally, remote sources to customize an instance on first boot.
Specifically it may dhcp any interface, to reach a metadata source,
download the real networking configuration, reconfigure networking to
match the final networking details (all interfaces / public ip
addresses / etc), and proceed to complete netwokring.target and
network-online.target.

This means that resolved is required earlier in the boot cycle. Before
networking.target.

There are things that expect network to be up in
"network-online.target", which by some is implied to mean DNS
resolution too, unfortunately.
If your systems have problems with it, they have wrong dependencies, don’t
they? Also, they should probably be able to deal with the situation, that
DNS does not work, as that can happen during operation.
So, I’d really like to rework that ordering change.
Reworking that change will break certain public cloud providers
unfortunately because of public clouds metadata providers being odd.

Note, we cannot use dbus activation in this case as dbus-daemon is not
up yet, and systemd-resolve command line client also does not work at
this point.

If you want to make it an optional dependency that early, maybe it
will be possible to convert systemd-resolved to be socket activated on
tcp/udp?

Alternatively, as a system integrator, you may want to change these
dependencies in your distro, especially if you do not configure
resolved _stub resolver_ as the default provider of /etc/resolv.conf
or for example to do not use the recommended default stub-provider
over 127.0.0.53 and instead use the nss module over dbus.

The above dependencies are correct and recommend, for the default
setup of /etc/resolv.conf pointing at the stub-resolv.conf as
generated by resolved at runtime.

Specifically, the dependencies as is are "too-early" if one uses the
last two modes of the /etc/resolv.conf handling as described in the
man page - https://www.freedesktop.org/software/systemd/man/systemd-resolved.service.html#/etc/resolv.conf
--
Regards,

Dimitri.
Dimitri John Ledkov
2018-04-16 16:51:19 UTC
Permalink
Raw Message
On 16 April 2018 at 14:25, Paul Menzel
Dear Dimitri,
Thank you for your quick response.
Post by Dimitri John Ledkov
In commit 1f158013 (resolved.service: set DefaultDependencies=no) the
ordering of systemd-resolved.service was changed. (How do I find the merge
request to find possible discussion? Also the commit message description is
too specific in my opinion, as it doesn’t give a clue that more is changed.)
https://github.com/systemd/systemd/pull/7609
Thank you, no idea, why I didn’t find it with `git log --oneline --graph`.
Hmm, looks like, Lennart directly put that commit in master without merging
the pull request.
Post by Dimitri John Ledkov
I like starting systemd-resolved earlier, but unfortunately ordering it
before `network.target` adds a delay on systems wanting to start as fast as
possible. But why did you change it from `network-online.target` to
`network.target`? I’d say `network-online.target` is more correct.
For my use case of a fast system start-up, this change delays it by at least
100 ms, as now it takes longer to reach the end of the network target.
cloud-init initializes networking configuration by fetching,
potentially, remote sources to customize an instance on first boot.
Specifically it may dhcp any interface, to reach a metadata source,
download the real networking configuration, reconfigure networking to
match the final networking details (all interfaces / public ip
addresses / etc), and proceed to complete networking.target and
network-online.target.
This means that resolved is required earlier in the boot cycle. Before
networking.target.
Just to be sure, you mean *network.target*, right?
Thank you for specifying the requirement. I agree, that it should be started
as early as possible, but I disagree with the rest.
Post by Dimitri John Ledkov
There are things that expect network to be up in
"network-online.target", which by some is implied to mean DNS
resolution too, unfortunately.
Sorry for being ignorant, but could you please be specific, what these
things are. If these units have that requirement order them after
`network-online.target`.
Post by Dimitri John Ledkov
If your systems have problems with it, they have wrong dependencies, don’t
they? Also, they should probably be able to deal with the situation, that
DNS does not work, as that can happen during operation.
So, I’d really like to rework that ordering change.
Reworking that change will break certain public cloud providers
unfortunately because of public clouds metadata providers being odd.
Note, we cannot use dbus activation in this case as dbus-daemon is not
up yet, and systemd-resolve command line client also does not work at
this point.
If you want to make it an optional dependency that early, maybe it
will be possible to convert systemd-resolved to be socket activated on
tcp/udp?
Alternatively, as a system integrator, you may want to change these
dependencies in your distro, especially if you do not configure
resolved _stub resolver_ as the default provider of /etc/resolv.conf
or for example to do not use the recommended default stub-provider
over 127.0.0.53 and instead use the nss module over dbus.
The above dependencies are correct and recommend, for the default
setup of /etc/resolv.conf pointing at the stub-resolv.conf as
generated by resolved at runtime.
Specifically, the dependencies as is are "too-early" if one uses the
last two modes of the /etc/resolv.conf handling as described in the
man page -
https://www.freedesktop.org/software/systemd/man/systemd-resolved.service.html#/etc/resolv.conf
First, I think, the terminology of *early* leads to misunderstandings. For
you it includes ordering with `Before=`, for me it’s just about `After=`
statements.
It's actually both. Cloud-init is a cross-distribution tool, and it
injects itself at multiple points during boot. It pre-empts networking
target, is between networking & network-online, and after
network-online target.

Without this upstream change, cloud-init was not able to pre-empt
network.target, was resulting in a dependency cycle, and systems
resulted booting degraded (due to dependency cycle resolved by
shooting arbitrary unit in the head), in a default upstream systemd
configuration.
Anyway, regressing the user experience for everyone only because it’s
Can you please explain what has degraded? starting systemd-resolved
before or after network*.target shouldn't make any difference in wall
clock time to reach multi-user.target. And in my boot testing, I did
not see any boot regressions.

Or are you explicitly measuring time to network.target, separate from
time to network-online.target, and separate from reaching the default
target?

Have you been previously booting with network-onlinge.target &
systemd-resolved pulled into the default boot target? And if you were
booting without them, was that expected?

I am also getting multiple support requests for networking and DNS
resolution to be available during emergency and maintenance shell
consoles, thus pulling resolved earlier made a lot of sense to give
root shell at least some ability to talk to the outside world to
download fixes to the system.
required for cloud-init is not right in my opinion. As you pointed out, the
system integrator can adapt certain things, and in my opinion, I throw the
ball back to you, and kindly ask you, to adapt systemd locally so it works
with your use-case or let’s come up with a better solution.
Hm... cloud-init is distribution agnostic, packaged and shipped in
most distributions. And in stock configuration, one would expect any
Linux distro to work nicely with an upstream releases of cloud-init &
systemd.

Please explain the regression you have identified, to design a
solution fit for all purposes.
Maybe a new target is needed, where you can order your services after, as
ordering them after systemd-resolved.service is too specific.
Possibly, but what are your requirements which you have noticed to
have regressed that we need to fix?
I submitted a merge/pull request to change the ordering [1].
-1 from me.

Please explain, in detail, the regression/bug observed before jumping
onto reverting things. It's not like things are changed without reason
/ without fixing actual production discovered bugs affecting a wide
array of users (due to public cloud nature).
Kind regards,
Paul
[1] https://github.com/systemd/systemd/pull/8731
--
Regards,

Dimitri.
Dimitri John Ledkov
2018-04-16 23:13:48 UTC
Permalink
Raw Message
On 16 April 2018 at 18:20, Paul Menzel
Dear Dimitri,
Post by Dimitri John Ledkov
Post by Dimitri John Ledkov
In commit 1f158013 (resolved.service: set DefaultDependencies=no) the
ordering of systemd-resolved.service was changed. (How do I find the merge
request to find possible discussion? Also the commit message
description
is
too specific in my opinion, as it doesn’t give a clue that more is changed.)
https://github.com/systemd/systemd/pull/7609
Thank you, no idea, why I didn’t find it with `git log --oneline --graph`.
Hmm, looks like, Lennart directly put that commit in master without merging
the pull request.
Post by Dimitri John Ledkov
I like starting systemd-resolved earlier, but unfortunately ordering it
before `network.target` adds a delay on systems wanting to start as
fast
as
possible. But why did you change it from `network-online.target` to
`network.target`? I’d say `network-online.target` is more correct.
For my use case of a fast system start-up, this change delays it by at least
100 ms, as now it takes longer to reach the end of the network target.
cloud-init initializes networking configuration by fetching,
potentially, remote sources to customize an instance on first boot.
Specifically it may dhcp any interface, to reach a metadata source,
download the real networking configuration, reconfigure networking to
match the final networking details (all interfaces / public ip
addresses / etc), and proceed to complete networking.target and
network-online.target.
This means that resolved is required earlier in the boot cycle. Before
networking.target.
Just to be sure, you mean *network.target*, right?
Thank you for specifying the requirement. I agree, that it should be started
as early as possible, but I disagree with the rest.
Post by Dimitri John Ledkov
There are things that expect network to be up in
"network-online.target", which by some is implied to mean DNS
resolution too, unfortunately.
Sorry for being ignorant, but could you please be specific, what these
things are. If these units have that requirement order them after
`network-online.target`.
Post by Dimitri John Ledkov
If your systems have problems with it, they have wrong dependencies, don’t
they? Also, they should probably be able to deal with the situation, that
DNS does not work, as that can happen during operation.
So, I’d really like to rework that ordering change.
Reworking that change will break certain public cloud providers
unfortunately because of public clouds metadata providers being odd.
Note, we cannot use dbus activation in this case as dbus-daemon is not
up yet, and systemd-resolve command line client also does not work at
this point.
If you want to make it an optional dependency that early, maybe it
will be possible to convert systemd-resolved to be socket activated on
tcp/udp?
Alternatively, as a system integrator, you may want to change these
dependencies in your distro, especially if you do not configure
resolved _stub resolver_ as the default provider of /etc/resolv.conf
or for example to do not use the recommended default stub-provider
over 127.0.0.53 and instead use the nss module over dbus.
The above dependencies are correct and recommend, for the default
setup of /etc/resolv.conf pointing at the stub-resolv.conf as
generated by resolved at runtime.
Specifically, the dependencies as is are "too-early" if one uses the
last two modes of the /etc/resolv.conf handling as described in the
man page -
https://www.freedesktop.org/software/systemd/man/systemd-resolved.service.html#/etc/resolv.conf
First, I think, the terminology of *early* leads to misunderstandings. For
you it includes ordering with `Before=`, for me it’s just about `After=`
statements.
It's actually both. Cloud-init is a cross-distribution tool, and it
injects itself at multiple points during boot. It pre-empts networking
target, is between networking & network-online, and after
network-online target.
Without this upstream change, cloud-init was not able to pre-empt
network.target, was resulting in a dependency cycle, and systems
resulted booting degraded (due to dependency cycle resolved by
shooting arbitrary unit in the head), in a default upstream systemd
configuration.
Anyway, regressing the user experience for everyone only because it’s
Can you please explain what has degraded? starting systemd-resolved
before or after network*.target shouldn't make any difference in wall
clock time to reach multi-user.target. And in my boot testing, I did
not see any boot regressions.
Just look, what is ordered after the network target.
1. units/rc-local.service.in:After=network.target
2. units/systemd-user-sessions.service.in:After=remote-fs.target
nss-user-lookup.target network.target
Both are needed for the login screen.
Post by Dimitri John Ledkov
Or are you explicitly measuring time to network.target, separate from
time to network-online.target, and separate from reaching the default
target?
Have you been previously booting with network-online.target &
systemd-resolved pulled into the default boot target? And if you were
booting without them, was that expected
No, `systemd-networkd-wait-online.service` is disabled.
That's not the default, and kind of makes it a very limited / offline
system (or at least initially offline system). Most systems today,
require network to be up, to be considered booted. But that's what
your system is, if I understand this setup correctly.

Is systemd-networkd.service enabled? and network.target enabled? is
network-online.target part of the initial boot target? Did you have
systemd-resolved enabled before?

It almost feels like you do not want systemd-resolved.service to have
[Install] WantedBy=multi-user.target, and instead be [Install]
WantedBy=network-online.target? And ensure that network-online.target
is not pulled into the initial transaction (no units want it), that
way you can even keep resolved & networkd-wait-online "enabled" yet
not actually part of the of the inital boot transaction, or at least
not on the critical path to rc-local / systemd-user-sessions. I think
keeping systemd-resolved ordering as is, but ensure that it is not
pulled in, when not needed (offline system) makes sense, no?

Do you have per-chance and old bootchart, and a new one? to see what
new units got pulled in, and when, and what else is enabled?

Because systemd-resolved has also been recently ordered before
nss-lookup.target and has been made to want it.

BTW do you use systemd-resolved at all, and do you need it running
ever by default?

If you (1) use libnss-resolve, (2) have no need for systemd-resolved
stub daemon running (DNSStubListener=no); and (3) have it dbus
activated only; and (4) with /etc/resolv.conf pointing at the static
/usr//lib/systemd/resolv.conf you should disable
systemd-resolved.service from your boot.

If you do use DNSStubListener, I think it needs to become socket
activated then, thus your initially offline system (without any units
trying to hammer the network on critical boot path) should run just as
fast as before, and gain DNS resolution once networking is finaly up.
Post by Dimitri John Ledkov
I am also getting multiple support requests for networking and DNS
resolution to be available during emergency and maintenance shell
consoles, thus pulling resolved earlier made a lot of sense to give
root shell at least some ability to talk to the outside world to
download fixes to the system.
required for cloud-init is not right in my opinion. As you pointed out, the
system integrator can adapt certain things, and in my opinion, I throw the
ball back to you, and kindly ask you, to adapt systemd locally so it works
with your use-case or let’s come up with a better solution.
Hm... cloud-init is distribution agnostic, packaged and shipped in
most distributions. And in stock configuration, one would expect any
Linux distro to work nicely with an upstream releases of cloud-init &
systemd.
Do I understand it correctly, that systemd was adapted, so that *one* tool,
cloud-init, could work? Before systemd 237 it worked for a long time that
DNS resolution did not necessarily had to be working before the network
target was reached.
No, systemd has not been "adapted". There were multiple bugs in
systemd-resolved and its unit that were fixed to address multiple
issues. 1) it was not possible to use, or manually start
systemd-resolved before dbus daemon, as it would later connect to the
system bus and systemd-resolve tool would not work 2) it thus gained
ability to operate with DefaultDependencies=no 3) it was moved to
start as early as possible, such that DNS look-ups start working as
soon as networking is up and network-online.target is reached and
things start hamerring network to pull updates as soon as possible 4)
ultimately speeding up the boot for majority of systems, which have
networking on boot / expect to reach network-online.target as part of
the initial boot transaction. 5) as a side-effect, using cloud-init
was with systemd+netword+resolved only was fixed to work on a type of
cloud deployments.

Disabling wait-online, is niche, in common cases of deployed systems.
For example, even installers often do you not disable it, and instead
attempt automatic dhcp networking configuration on boot. (For example
the new Ubuntu Server Installer does that).
Post by Dimitri John Ledkov
Please explain the regression you have identified, to design a
solution fit for all purposes.
Maybe a new target is needed, where you can order your services after, as
ordering them after systemd-resolved.service is too specific.
Possibly, but what are your requirements which you have noticed to
have regressed that we need to fix?
It takes longer to reach the login screen.
ack.
Post by Dimitri John Ledkov
I submitted a merge/pull request to change the ordering [1].
-1 from me.
Please explain, in detail, the regression/bug observed before jumping
onto reverting things. It's not like things are changed without reason
/ without fixing actual production discovered bugs affecting a wide
array of users (due to public cloud nature).
I reach the console login over 100 ms earlier, when removing the ordering.
systemd-resolved unfortunately takes so long.
To summarize: Do you use resolved? do you need it at all as part of
the boot transaction? should it be disabled too? can we move
WantedBy=multi-user.target to instead network-online.target? would
socket activating resolved help?

Also, starting systemd-resolved taking 100ms sounds long. I wonder
what it is doing.
--
Regards,

Dimitri.
Loading...