Discussion:
Changed ordering of systemd-resolved.service
(too old to reply)
Dimitri John Ledkov
2018-04-16 10:47:34 UTC
Permalink
Raw Message
On 13 April 2018 at 16:40, Paul Menzel
<pmenzel+systemd-***@molgen.mpg.de> wrote:
> Dear Dimitri, dear systemd folks,
>
>
> In commit 1f158013 (resolved.service: set DefaultDependencies=no) the
> ordering of systemd-resolved.service was changed. (How do I find the merge
> request to find possible discussion? Also the commit message description is
> too specific in my opinion, as it doesn’t give a clue that more is changed.)
>

https://github.com/systemd/systemd/pull/7609

>
> I like starting systemd-resolved earlier, but unfortunately ordering it
> before `network.target` adds a delay on systems wanting to start as fast as
> possible. But why did you change it from `network-online.target` to
> `network.target`? I’d say `network-online.target` is more correct.
>
> For my use case of a fast system start-up, this change delays it by at least
> 100 ms, as now it takes longer to reach the end of the network target.
>

cloud-init initializes networking configuration by fetching,
potentionally, remote sources to customize an instance on first boot.
Specifically it may dhcp any interface, to reach a metadata source,
download the real networking configuration, reconfigure networking to
match the final networking details (all interfaces / public ip
addresses / etc), and proceed to complete netwokring.target and
network-online.target.

This means that resolved is required earlier in the boot cycle. Before
networking.target.

There are things that expect network to be up in
"network-online.target", which by some is implied to mean DNS
resolution too, unfortunately.

>
> If your systems have problems with it, they have wrong dependencies, don’t
> they? Also, they should probably be able to deal with the situation, that
> DNS does not work, as that can happen during operation.
>
> So, I’d really like to rework that ordering change.
>

Reworking that change will break certain public cloud providers
unfortunately because of public clouds metadata providers being odd.

Note, we cannot use dbus activation in this case as dbus-daemon is not
up yet, and systemd-resolve command line client also does not work at
this point.

If you want to make it an optional dependency that early, maybe it
will be possible to convert systemd-resolved to be socket activated on
tcp/udp?

Alternatively, as a system integrator, you may want to change these
dependencies in your distro, especially if you do not configure
resolved _stub resolver_ as the default provider of /etc/resolv.conf
or for example to do not use the recommended default stub-provider
over 127.0.0.53 and instead use the nss module over dbus.

The above dependencies are correct and recommend, for the default
setup of /etc/resolv.conf pointing at the stub-resolv.conf as
generated by resolved at runtime.

Specifically, the dependencies as is are "too-early" if one uses the
last two modes of the /etc/resolv.conf handling as described in the
man page - https://www.freedesktop.org/software/systemd/man/systemd-resolved.service.html#/etc/resolv.conf

--
Regards,

Dimitri.
Dimitri John Ledkov
2018-04-16 16:51:19 UTC
Permalink
Raw Message
On 16 April 2018 at 14:25, Paul Menzel
<pmenzel+systemd-***@molgen.mpg.de> wrote:
> Dear Dimitri,
>
>
> Thank you for your quick response.
>
>
> On 04/16/18 12:47, Dimitri John Ledkov wrote:
>>
>> On 13 April 2018 at 16:40, Paul Menzel wrote:
>
>
>>> In commit 1f158013 (resolved.service: set DefaultDependencies=no) the
>>> ordering of systemd-resolved.service was changed. (How do I find the
>>> merge
>>> request to find possible discussion? Also the commit message description
>>> is
>>> too specific in my opinion, as it doesn’t give a clue that more is
>>> changed.)
>>
>>
>> https://github.com/systemd/systemd/pull/7609
>
>
> Thank you, no idea, why I didn’t find it with `git log --oneline --graph`.
> Hmm, looks like, Lennart directly put that commit in master without merging
> the pull request.
>
>>> I like starting systemd-resolved earlier, but unfortunately ordering it
>>> before `network.target` adds a delay on systems wanting to start as fast
>>> as
>>> possible. But why did you change it from `network-online.target` to
>>> `network.target`? I’d say `network-online.target` is more correct.
>>>
>>> For my use case of a fast system start-up, this change delays it by at
>>> least
>>> 100 ms, as now it takes longer to reach the end of the network target.
>>
>>
>> cloud-init initializes networking configuration by fetching,
>> potentially, remote sources to customize an instance on first boot.
>> Specifically it may dhcp any interface, to reach a metadata source,
>> download the real networking configuration, reconfigure networking to
>> match the final networking details (all interfaces / public ip
>> addresses / etc), and proceed to complete networking.target and
>> network-online.target.
>>
>> This means that resolved is required earlier in the boot cycle. Before
>> networking.target.
>
>
> Just to be sure, you mean *network.target*, right?
>
> Thank you for specifying the requirement. I agree, that it should be started
> as early as possible, but I disagree with the rest.
>
>> There are things that expect network to be up in
>> "network-online.target", which by some is implied to mean DNS
>> resolution too, unfortunately.
>
>
> Sorry for being ignorant, but could you please be specific, what these
> things are. If these units have that requirement order them after
> `network-online.target`.
>
>>> If your systems have problems with it, they have wrong dependencies,
>>> don’t
>>> they? Also, they should probably be able to deal with the situation, that
>>> DNS does not work, as that can happen during operation.
>>>
>>> So, I’d really like to rework that ordering change.
>>
>>
>> Reworking that change will break certain public cloud providers
>> unfortunately because of public clouds metadata providers being odd.
>>
>> Note, we cannot use dbus activation in this case as dbus-daemon is not
>> up yet, and systemd-resolve command line client also does not work at
>> this point.
>>
>> If you want to make it an optional dependency that early, maybe it
>> will be possible to convert systemd-resolved to be socket activated on
>> tcp/udp?
>>
>> Alternatively, as a system integrator, you may want to change these
>> dependencies in your distro, especially if you do not configure
>> resolved _stub resolver_ as the default provider of /etc/resolv.conf
>> or for example to do not use the recommended default stub-provider
>> over 127.0.0.53 and instead use the nss module over dbus.
>>
>> The above dependencies are correct and recommend, for the default
>> setup of /etc/resolv.conf pointing at the stub-resolv.conf as
>> generated by resolved at runtime.
>>
>> Specifically, the dependencies as is are "too-early" if one uses the
>> last two modes of the /etc/resolv.conf handling as described in the
>> man page -
>> https://www.freedesktop.org/software/systemd/man/systemd-resolved.service.html#/etc/resolv.conf
>
>
> First, I think, the terminology of *early* leads to misunderstandings. For
> you it includes ordering with `Before=`, for me it’s just about `After=`
> statements.
>

It's actually both. Cloud-init is a cross-distribution tool, and it
injects itself at multiple points during boot. It pre-empts networking
target, is between networking & network-online, and after
network-online target.

Without this upstream change, cloud-init was not able to pre-empt
network.target, was resulting in a dependency cycle, and systems
resulted booting degraded (due to dependency cycle resolved by
shooting arbitrary unit in the head), in a default upstream systemd
configuration.


> Anyway, regressing the user experience for everyone only because it’s

Can you please explain what has degraded? starting systemd-resolved
before or after network*.target shouldn't make any difference in wall
clock time to reach multi-user.target. And in my boot testing, I did
not see any boot regressions.

Or are you explicitly measuring time to network.target, separate from
time to network-online.target, and separate from reaching the default
target?

Have you been previously booting with network-onlinge.target &
systemd-resolved pulled into the default boot target? And if you were
booting without them, was that expected?

I am also getting multiple support requests for networking and DNS
resolution to be available during emergency and maintenance shell
consoles, thus pulling resolved earlier made a lot of sense to give
root shell at least some ability to talk to the outside world to
download fixes to the system.


> required for cloud-init is not right in my opinion. As you pointed out, the
> system integrator can adapt certain things, and in my opinion, I throw the
> ball back to you, and kindly ask you, to adapt systemd locally so it works
> with your use-case or let’s come up with a better solution.
>

Hm... cloud-init is distribution agnostic, packaged and shipped in
most distributions. And in stock configuration, one would expect any
Linux distro to work nicely with an upstream releases of cloud-init &
systemd.

Please explain the regression you have identified, to design a
solution fit for all purposes.

> Maybe a new target is needed, where you can order your services after, as
> ordering them after systemd-resolved.service is too specific.
>

Possibly, but what are your requirements which you have noticed to
have regressed that we need to fix?

> I submitted a merge/pull request to change the ordering [1].
>

-1 from me.

Please explain, in detail, the regression/bug observed before jumping
onto reverting things. It's not like things are changed without reason
/ without fixing actual production discovered bugs affecting a wide
array of users (due to public cloud nature).

>
> Kind regards,
>
> Paul
>
>
> [1] https://github.com/systemd/systemd/pull/8731
>



--
Regards,

Dimitri.
Dimitri John Ledkov
2018-04-16 23:13:48 UTC
Permalink
Raw Message
On 16 April 2018 at 18:20, Paul Menzel
<pmenzel+systemd-***@molgen.mpg.de> wrote:
> Dear Dimitri,
>
>
> On 04/16/18 18:51, Dimitri John Ledkov wrote:
>
>> On 16 April 2018 at 14:25, Paul Menzel wrote:
>
>
>>> On 04/16/18 12:47, Dimitri John Ledkov wrote:
>>>>
>>>>
>>>> On 13 April 2018 at 16:40, Paul Menzel wrote:
>>>
>>>
>>>>> In commit 1f158013 (resolved.service: set DefaultDependencies=no) the
>>>>> ordering of systemd-resolved.service was changed. (How do I find the
>>>>> merge
>>>>> request to find possible discussion? Also the commit message
>>>>> description
>>>>> is
>>>>> too specific in my opinion, as it doesn’t give a clue that more is
>>>>> changed.)
>>>>
>>>>
>>>>
>>>> https://github.com/systemd/systemd/pull/7609
>>>
>>>
>>> Thank you, no idea, why I didn’t find it with `git log --oneline
>>> --graph`.
>>> Hmm, looks like, Lennart directly put that commit in master without
>>> merging
>>> the pull request.
>>>
>>>>> I like starting systemd-resolved earlier, but unfortunately ordering it
>>>>> before `network.target` adds a delay on systems wanting to start as
>>>>> fast
>>>>> as
>>>>> possible. But why did you change it from `network-online.target` to
>>>>> `network.target`? I’d say `network-online.target` is more correct.
>>>>>
>>>>> For my use case of a fast system start-up, this change delays it by at
>>>>> least
>>>>> 100 ms, as now it takes longer to reach the end of the network target.
>>>>
>>>>
>>>>
>>>> cloud-init initializes networking configuration by fetching,
>>>> potentially, remote sources to customize an instance on first boot.
>>>> Specifically it may dhcp any interface, to reach a metadata source,
>>>> download the real networking configuration, reconfigure networking to
>>>> match the final networking details (all interfaces / public ip
>>>> addresses / etc), and proceed to complete networking.target and
>>>> network-online.target.
>>>>
>>>> This means that resolved is required earlier in the boot cycle. Before
>>>> networking.target.
>>>
>>>
>>>
>>> Just to be sure, you mean *network.target*, right?
>>>
>>> Thank you for specifying the requirement. I agree, that it should be
>>> started
>>> as early as possible, but I disagree with the rest.
>>>
>>>> There are things that expect network to be up in
>>>> "network-online.target", which by some is implied to mean DNS
>>>> resolution too, unfortunately.
>>>
>>>
>>>
>>> Sorry for being ignorant, but could you please be specific, what these
>>> things are. If these units have that requirement order them after
>>> `network-online.target`.
>>>
>>>>> If your systems have problems with it, they have wrong dependencies,
>>>>> don’t
>>>>> they? Also, they should probably be able to deal with the situation,
>>>>> that
>>>>> DNS does not work, as that can happen during operation.
>>>>>
>>>>> So, I’d really like to rework that ordering change.
>>>>
>>>>
>>>>
>>>> Reworking that change will break certain public cloud providers
>>>> unfortunately because of public clouds metadata providers being odd.
>>>>
>>>> Note, we cannot use dbus activation in this case as dbus-daemon is not
>>>> up yet, and systemd-resolve command line client also does not work at
>>>> this point.
>>>>
>>>> If you want to make it an optional dependency that early, maybe it
>>>> will be possible to convert systemd-resolved to be socket activated on
>>>> tcp/udp?
>>>>
>>>> Alternatively, as a system integrator, you may want to change these
>>>> dependencies in your distro, especially if you do not configure
>>>> resolved _stub resolver_ as the default provider of /etc/resolv.conf
>>>> or for example to do not use the recommended default stub-provider
>>>> over 127.0.0.53 and instead use the nss module over dbus.
>>>>
>>>> The above dependencies are correct and recommend, for the default
>>>> setup of /etc/resolv.conf pointing at the stub-resolv.conf as
>>>> generated by resolved at runtime.
>>>>
>>>> Specifically, the dependencies as is are "too-early" if one uses the
>>>> last two modes of the /etc/resolv.conf handling as described in the
>>>> man page -
>>>>
>>>> https://www.freedesktop.org/software/systemd/man/systemd-resolved.service.html#/etc/resolv.conf
>>>
>>>
>>> First, I think, the terminology of *early* leads to misunderstandings.
>>> For
>>> you it includes ordering with `Before=`, for me it’s just about `After=`
>>> statements.
>>
>>
>> It's actually both. Cloud-init is a cross-distribution tool, and it
>> injects itself at multiple points during boot. It pre-empts networking
>> target, is between networking & network-online, and after
>> network-online target.
>>
>> Without this upstream change, cloud-init was not able to pre-empt
>> network.target, was resulting in a dependency cycle, and systems
>> resulted booting degraded (due to dependency cycle resolved by
>> shooting arbitrary unit in the head), in a default upstream systemd
>> configuration.
>>
>>> Anyway, regressing the user experience for everyone only because it’s
>>
>>
>> Can you please explain what has degraded? starting systemd-resolved
>> before or after network*.target shouldn't make any difference in wall
>> clock time to reach multi-user.target. And in my boot testing, I did
>> not see any boot regressions.
>
>
> Just look, what is ordered after the network target.
>
> 1. units/rc-local.service.in:After=network.target
> 2. units/systemd-user-sessions.service.in:After=remote-fs.target
> nss-user-lookup.target network.target
>
> Both are needed for the login screen.
>
>> Or are you explicitly measuring time to network.target, separate from
>> time to network-online.target, and separate from reaching the default
>> target?
>>
>> Have you been previously booting with network-online.target &
>> systemd-resolved pulled into the default boot target? And if you were
>> booting without them, was that expected
>
>
> No, `systemd-networkd-wait-online.service` is disabled.
>


That's not the default, and kind of makes it a very limited / offline
system (or at least initially offline system). Most systems today,
require network to be up, to be considered booted. But that's what
your system is, if I understand this setup correctly.

Is systemd-networkd.service enabled? and network.target enabled? is
network-online.target part of the initial boot target? Did you have
systemd-resolved enabled before?

It almost feels like you do not want systemd-resolved.service to have
[Install] WantedBy=multi-user.target, and instead be [Install]
WantedBy=network-online.target? And ensure that network-online.target
is not pulled into the initial transaction (no units want it), that
way you can even keep resolved & networkd-wait-online "enabled" yet
not actually part of the of the inital boot transaction, or at least
not on the critical path to rc-local / systemd-user-sessions. I think
keeping systemd-resolved ordering as is, but ensure that it is not
pulled in, when not needed (offline system) makes sense, no?

Do you have per-chance and old bootchart, and a new one? to see what
new units got pulled in, and when, and what else is enabled?

Because systemd-resolved has also been recently ordered before
nss-lookup.target and has been made to want it.

BTW do you use systemd-resolved at all, and do you need it running
ever by default?

If you (1) use libnss-resolve, (2) have no need for systemd-resolved
stub daemon running (DNSStubListener=no); and (3) have it dbus
activated only; and (4) with /etc/resolv.conf pointing at the static
/usr//lib/systemd/resolv.conf you should disable
systemd-resolved.service from your boot.

If you do use DNSStubListener, I think it needs to become socket
activated then, thus your initially offline system (without any units
trying to hammer the network on critical boot path) should run just as
fast as before, and gain DNS resolution once networking is finaly up.

>> I am also getting multiple support requests for networking and DNS
>> resolution to be available during emergency and maintenance shell
>> consoles, thus pulling resolved earlier made a lot of sense to give
>> root shell at least some ability to talk to the outside world to
>> download fixes to the system.
>>
>>> required for cloud-init is not right in my opinion. As you pointed out,
>>> the
>>> system integrator can adapt certain things, and in my opinion, I throw
>>> the
>>> ball back to you, and kindly ask you, to adapt systemd locally so it
>>> works
>>> with your use-case or let’s come up with a better solution.
>>
>>
>> Hm... cloud-init is distribution agnostic, packaged and shipped in
>> most distributions. And in stock configuration, one would expect any
>> Linux distro to work nicely with an upstream releases of cloud-init &
>> systemd.
>
>
> Do I understand it correctly, that systemd was adapted, so that *one* tool,
> cloud-init, could work? Before systemd 237 it worked for a long time that
> DNS resolution did not necessarily had to be working before the network
> target was reached.
>

No, systemd has not been "adapted". There were multiple bugs in
systemd-resolved and its unit that were fixed to address multiple
issues. 1) it was not possible to use, or manually start
systemd-resolved before dbus daemon, as it would later connect to the
system bus and systemd-resolve tool would not work 2) it thus gained
ability to operate with DefaultDependencies=no 3) it was moved to
start as early as possible, such that DNS look-ups start working as
soon as networking is up and network-online.target is reached and
things start hamerring network to pull updates as soon as possible 4)
ultimately speeding up the boot for majority of systems, which have
networking on boot / expect to reach network-online.target as part of
the initial boot transaction. 5) as a side-effect, using cloud-init
was with systemd+netword+resolved only was fixed to work on a type of
cloud deployments.

Disabling wait-online, is niche, in common cases of deployed systems.
For example, even installers often do you not disable it, and instead
attempt automatic dhcp networking configuration on boot. (For example
the new Ubuntu Server Installer does that).

>> Please explain the regression you have identified, to design a
>> solution fit for all purposes.
>
>>
>>>
>>> Maybe a new target is needed, where you can order your services after, as
>>> ordering them after systemd-resolved.service is too specific.
>>
>>
>> Possibly, but what are your requirements which you have noticed to
>> have regressed that we need to fix?
>
>
> It takes longer to reach the login screen.
>

ack.

>>> I submitted a merge/pull request to change the ordering [1].
>>
>>
>> -1 from me.
>>
>> Please explain, in detail, the regression/bug observed before jumping
>> onto reverting things. It's not like things are changed without reason
>> / without fixing actual production discovered bugs affecting a wide
>> array of users (due to public cloud nature).
>
>
> I reach the console login over 100 ms earlier, when removing the ordering.
> systemd-resolved unfortunately takes so long.
>

To summarize: Do you use resolved? do you need it at all as part of
the boot transaction? should it be disabled too? can we move
WantedBy=multi-user.target to instead network-online.target? would
socket activating resolved help?

Also, starting systemd-resolved taking 100ms sounds long. I wonder
what it is doing.

--
Regards,

Dimitri.
Loading...