[systemd-devel] [systemd-commits] units/basic.target units/poweroff.target units/reboot.target

Post by Lennart Poettering
Ahum.
This needs more discussion.

units: disable job timeouts
For boot, we might kill fsck in the middle, with likely catastrophic
consequences.

This I can agree with for now. However, we really should revisit this.

Yeah, that was supposed to be temporary, until we figure things out.

Isn't this what the various "download updates and reboot" gnome-y
things are doing?

Zbyszek

Colin Guthrie

2014-11-06 08:57:06 UTC

Post by Lennart Poettering
However, this one appears bogus to me. Is there any such software
around that really does this? And if so, this really appears weird to
me to support. Delaying shutdown for more than 30min is just wrong.

Isn't this what the various "download updates and reboot" gnome-y
things are doing?

I thought they rebooted into a special mode and then did their upgrades
and then rebooted again. I don't think the updates are actually
performed *while* rebooting. But I could be wrong here as I've not
looked at it too closely yet (as much as I would like to)

Col
--
Colin Guthrie
gmane(at)colin.guthr.ie
http://colin.guthr.ie/

Day Job:
Tribalogic Limited http://www.tribalogic.net/
Open Source:
Mageia Contributor http://www.mageia.org/
PulseAudio Hacker http://www.pulseaudio.org/
Trac Hacker http://trac.edgewall.org/

Lennart Poettering

2014-11-06 11:48:23 UTC

Isn't this what the various "download updates and reboot" gnome-y
things are doing?

No, they shutdown, reboot into a special mode, install, reboot again.

Lennart

--
Lennart Poettering, Red Hat

Zbigniew Jędrzejewski-Szmek

2014-11-06 13:24:13 UTC

Isn't this what the various "download updates and reboot" gnome-y
things are doing?

No, they shutdown, reboot into a special mode, install, reboot again.

I know that package *installation* is done after reboot. I was thinking that
the *download* was done during shutdown. But it appears that the 'install
and shutdown' dialog button appears only after they have been downloaded.
In other words, they are downloaded while system is running. It seems that
indeed there's no reason for long shutdowns right now. I reenabled the
timeouts now in git.

Zbyszek

Patrick Häcker

2014-11-06 11:45:25 UTC

Isn't this what the various "download updates and reboot" gnome-y
things are doing?

At least unattended-upgrades from Debian/Ubuntu/... can be configured to
install updates on shutdown (without any special mode or something). And,
yes, this can run for more than 30 minutes, which I could already observe in
its default mode (installing during normal system activities), so I see no
reason why this should not happen when configured to install during shutdown.
The reason is, that unattended-upgrades can basically update the whole
distribution to the next version, which naturally can take a lot of time.

It's questionable if this is a sane setup, but I can think of setups where
this might be useful, e.g. having two identically configured servers for
redundancy reasons where one server would be enough. Then it might make sense
to update one system during shutdown while the other one takes over. This has
the advantage, that normally running servers either have the old or the new
state, but never some intermediate state during the update. The shutdown time
does not really matter in this case and a watchdog killing the system
wouldn't be welcome. But all in all this seems like an exotic use case.

Kind regards
Patrick

Lennart Poettering

2014-11-06 13:28:12 UTC

Isn't this what the various "download updates and reboot" gnome-y
things are doing?

At least unattended-upgrades from Debian/Ubuntu/... can be configured to
install updates on shutdown (without any special mode or something). And,
yes, this can run for more than 30 minutes, which I could already observe in
its default mode (installing during normal system activities), so I see no
reason why this should not happen when configured to install during shutdown.
The reason is, that unattended-upgrades can basically update the whole
distribution to the next version, which naturally can take a lot of time.
It's questionable if this is a sane setup, but I can think of setups where
this might be useful, e.g. having two identically configured servers for
redundancy reasons where one server would be enough. Then it might make sense
to update one system during shutdown while the other one takes over. This has
the advantage, that normally running servers either have the old or the new
state, but never some intermediate state during the update. The shutdown time
does not really matter in this case and a watchdog killing the system
wouldn't be welcome. But all in all this seems like an exotic use case.

Is "unattended-upgrades" a package of its own? If so, I'd probably ask
the packagers to include drop-ins for reboot.target to override the
timeout. That way, as soon as you install it the shutdown timeouts are
disabled.

I think we should find good defaults, that work for most cases, and
make things robust in the common case. Now, an fsck or selinux relabel
taking a long time is a pretty common case, we shouldn't break that,
hence I figure turning off the boot timeout is probably a good
idea. However, doing unattended upgrades at shutdown is not really a
common case.

Lennart

--
Lennart Poettering, Red Hat

Zbigniew Jędrzejewski-Szmek

2014-11-06 13:44:00 UTC

Isn't this what the various "download updates and reboot" gnome-y
things are doing?

At least unattended-upgrades from Debian/Ubuntu/... can be configured to
install updates on shutdown (without any special mode or something). And,
yes, this can run for more than 30 minutes, which I could already observe in
its default mode (installing during normal system activities), so I see no
reason why this should not happen when configured to install during shutdown.
The reason is, that unattended-upgrades can basically update the whole
distribution to the next version, which naturally can take a lot of time.
It's questionable if this is a sane setup, but I can think of setups where
this might be useful, e.g. having two identically configured servers for
redundancy reasons where one server would be enough. Then it might make sense
to update one system during shutdown while the other one takes over. This has
the advantage, that normally running servers either have the old or the new
state, but never some intermediate state during the update. The shutdown time
does not really matter in this case and a watchdog killing the system
wouldn't be welcome. But all in all this seems like an exotic use case.

That is suboptimal. There really should be a way to this dynamically, like saying:
I'm a log-running job, I need more time, but everything is still fine. This
type of status should require periodical pings, watchdog style. Let's say that
the backup job run during shutdown hangs because there's no network, we want
to shutdown at some point anyway.

Zbyszek

Zbigniew Jędrzejewski-Szmek

2014-11-06 14:08:02 UTC

On a related note: if I read the code correctly, reboot -f or
JobFailureAction=reboot-force should sync the filesystems. But this doesn't
seem to work:
- on fedora-devel Adam W. said that fsck ran after a boot timeout
- yesterday I did something like 'sudo install ./systemd /usr/lib/systemd/ && sudo reboot -f'
and ended up with an _empty_ file in /usr/lib/systmed/systemd.
Am I missing something? Does this look like a kernel bug, or some
timing issue? Maybe the sync in running asynchronously?

Zbyszek

Lennart Poettering

2014-11-06 14:22:50 UTC

Post by Zbigniew JÄdrzejewski-Szmek
On a related note: if I read the code correctly, reboot -f or
JobFailureAction=reboot-force should sync the filesystems. But this doesn't
- on fedora-devel Adam W. said that fsck ran after a boot timeout
- yesterday I did something like 'sudo install ./systemd /usr/lib/systemd/ && sudo reboot -f'
and ended up with an _empty_ file in /usr/lib/systmed/systemd.
Am I missing something? Does this look like a kernel bug, or some
timing issue? Maybe the sync in running asynchronously?

Well, it will sync() but the fs will still be dirty and thus require
fsck on next boot.

If you want to avoid that you need to umount all before
rebooting.

That said, while we have the sync() in place before we invoke
reboot() during clean shutdowns (see shutdown.c) we actually are
missing that when you invoke "reboot -f". Or more, correctly: we
*were* missing it until 30s ago, until I added it there too.

I'd really recommend running "systemctl reboot -f" though in such
emergency situations, since it will immediately reboot, but still
umount all file systems before. "systemctl reboot -ff" (which is the
same as "reboot -f") is really just the last emergency if PID 1 is
hung.

Lennart

--
Lennart Poettering, Red Hat

Zbigniew Jędrzejewski-Szmek

2014-11-06 15:18:17 UTC

Well, it will sync() but the fs will still be dirty and thus require
fsck on next boot.

Well, I'd expect the journal to be replayed without any fsck.

Post by Lennart Poettering
That said, while we have the sync() in place before we invoke
reboot() during clean shutdowns (see shutdown.c) we actually are
missing that when you invoke "reboot -f". Or more, correctly: we
*were* missing it until 30s ago, until I added it there too.

Thanks. That completely explains what I was seeing.

Post by Lennart Poettering
I'd really recommend running "systemctl reboot -f" though in such
emergency situations, since it will immediately reboot, but still
umount all file systems before. "systemctl reboot -ff" (which is the
same as "reboot -f") is really just the last emergency if PID 1 is
hung.

Zbyszek

Lennart Poettering

2014-11-10 21:53:46 UTC

Isn't this what the various "download updates and reboot" gnome-y
things are doing?

At least unattended-upgrades from Debian/Ubuntu/... can be configured to
install updates on shutdown (without any special mode or something). And,
yes, this can run for more than 30 minutes, which I could already observe in
its default mode (installing during normal system activities), so I see no
reason why this should not happen when configured to install during shutdown.
The reason is, that unattended-upgrades can basically update the whole
distribution to the next version, which naturally can take a lot of time.
It's questionable if this is a sane setup, but I can think of setups where
this might be useful, e.g. having two identically configured servers for
redundancy reasons where one server would be enough. Then it might make sense
to update one system during shutdown while the other one takes over. This has
the advantage, that normally running servers either have the old or the new
state, but never some intermediate state during the update. The shutdown time
does not really matter in this case and a watchdog killing the system
wouldn't be welcome. But all in all this seems like an exotic use case.

I'm a log-running job, I need more time, but everything is still fine. This
type of status should require periodical pings, watchdog style. Let's say that
the backup job run during shutdown hangs because there's no network, we want
to shutdown at some point anyway.

So, we always had per-unit timeouts in place, and they are opt-out
(with the exception of Type=oneshot services where they are
opt-in). Hence adding a second level of opt-out timeouts doesn't
sound particularly attractive to me.

The reason I added the system-wide startup/shutdown timeouts was
really to be a safety net, so that the individual per-unit timeouts
and the opted-out exceptions don't add up beyond bounds.

The specific usecase I had for this (beyond the obvious one in
embedded/HA setups) was my Lenovo Yoga laptop. It has the power button
on the outside, that is reachable even when the laptop is closed (this
is due the fact that it is convertible into tablet mode, where the
button needs to stay accessible). Now, if the system is suspended, and
the lid closed, and the power button is accidentally hit because the
laptop was stuffed in a backpack, then since a couple of versions ago
is not a problem: after a short while the system will suspend
again. However, if the machine is powered off with the lid closed, and
the power-button is hit the machine powers up currently, then boots up
until the LUKS prompt is hit and then just hangs there, forever,
heating up my backpack, so then when I finally unpack it the battery
is completely empty. I figure this is not only an issue with laptops
like the Yoga but in general with all kinds of devices.

Now, the question is what we can do now about this:

a) we could move logind into early boot. This has multiple problems
though: it would need to track system state as gettys on other ttys
should only be started in multi-user mode, not in early boot. Also,
the behaviour would probably not be ideal: i think it would be
preferable if the system shuts down rather then suspend if we hang
during boot.

b) specifically do something about LUKS prompt timeouts: when a very
long timeout is hit for essential devices we could simply turn off
the machine again. This would fix my immediate problem, but I am
not sure I like it too much, I think other hangs should really be
covered too...

c) we can come up with a scheme that explicitly excludes fsck, selinux
relabel and so on from the overall-timeout. Sounds messy and
non-obvious given that they all have individual timeouts
anyway... Two layers of opting out of timeouts sounds suspicious?

Any other ideas?

Lennart

--
Lennart Poettering, Red Hat

Zbigniew Jędrzejewski-Szmek

2014-11-11 03:48:31 UTC

Isn't this what the various "download updates and reboot" gnome-y
things are doing?

At least unattended-upgrades from Debian/Ubuntu/... can be configured to
install updates on shutdown (without any special mode or something). And,
yes, this can run for more than 30 minutes, which I could already observe in
its default mode (installing during normal system activities), so I see no
reason why this should not happen when configured to install during shutdown.
The reason is, that unattended-upgrades can basically update the whole
distribution to the next version, which naturally can take a lot of time.
It's questionable if this is a sane setup, but I can think of setups where
this might be useful, e.g. having two identically configured servers for
redundancy reasons where one server would be enough. Then it might make sense
to update one system during shutdown while the other one takes over. This has
the advantage, that normally running servers either have the old or the new
state, but never some intermediate state during the update. The shutdown time
does not really matter in this case and a watchdog killing the system
wouldn't be welcome. But all in all this seems like an exotic use case.

Agreed.

Post by Lennart Poettering
The reason I added the system-wide startup/shutdown timeouts was
really to be a safety net, so that the individual per-unit timeouts
and the opted-out exceptions don't add up beyond bounds.

I guess that this is part of the issue: it is hard to define what
"without bounds" means. A fsck, selinux relabel, package
installation and probably many other things are effectively unbounded.
And they might happen together at the same boot. So any kind of
fixed limit is unlikely to work in the general case.

[snip Yoga case]
Sure, it solves this specific problem, but it causes significant
problems in other configurations. It seems that we're trying to solve
the problem in the wrong place. Even with the current JobTimeout
configured for basic.target there's a big window of opportunity for
the system to hang before systemd-logind.service is
started. systemd-logind.service has After=nss-user-lookup.target, and
I can image things going wrong there, especially with custom
configurations. It would be nice if the guard we put in place would
cover this too.

Post by Lennart Poettering
a) we could move logind into early boot. This has multiple problems
though: it would need to track system state as gettys on other ttys
should only be started in multi-user mode, not in early boot. Also,
the behaviour would probably not be ideal: i think it would be
preferable if the system shuts down rather then suspend if we hang
during boot.
b) specifically do something about LUKS prompt timeouts: when a very
long timeout is hit for essential devices we could simply turn off
the machine again. This would fix my immediate problem, but I am
not sure I like it too much, I think other hangs should really be
covered too...
c) we can come up with a scheme that explicitly excludes fsck, selinux
relabel and so on from the overall-timeout. Sounds messy and
non-obvious given that they all have individual timeouts
anyway... Two layers of opting out of timeouts sounds suspicious?

No good ideas so far. But whatever we do, I think we should treat
portable and non-portable devices differently. The trade-offs are
simply different. Otherwise, we could simply make this opt-in. After
all the designing the power-button so that it can be pushed
accidentally is special feat of design that does not happen too
often. (*)

Zbyszek

(*) I remember one server machine with the power button in the wrong
place which ended with the owner taking a screwdriver and ripping the
damn thing out.

Jóhann B. Guðmundsson

2014-11-06 14:12:52 UTC

Post by Lennart Poettering
However, doing unattended upgrades at shutdown is not really a
common case.

Well for Debian and Debian based distribution it most certainly can be
the case since it has allowed for it's update/upgrade mechanism to be
configured to install updates on shutdown.

In Fedora we had at least couple of cases where users were doing the same.

And if memory serves me correct with at least one case that was one
large installation in university where they had an cron job that updated
the lab computers , then shut them down at a spesific time of day or
updated them before they got shutdown

In both cases ( cronjob, shutdown command ) I think the solution would
be to create a unit that is installed and executed before the shutdown
target.

JBG

Zbigniew Jędrzejewski-Szmek

2014-11-06 14:16:50 UTC

Post by JÃ³hann B. GuÃ°mundsson

Post by Lennart Poettering
However, doing unattended upgrades at shutdown is not really a
common case.

Well for Debian and Debian based distribution it most certainly can
be the case since it has allowed for it's update/upgrade mechanism
to be configured to install updates on shutdown.
In Fedora we had at least couple of cases where users were doing the same.
And if memory serves me correct with at least one case that was one
large installation in university where they had an cron job that
updated the lab computers , then shut them down at a spesific time
of day or updated them before they got shutdown
In both cases ( cronjob, shutdown command ) I think the solution
would be to create a unit that is installed and executed before the
shutdown target.

What matters is how it is all arranged:

- if there's a job that does stuff, and then calls reboot or shutdown
- a hook into the shutdown or reboot target does some work

In the first case, things are fine. In the second, not. But it seems
that only the first case is actually used.

Zbyszek

Simon McVittie

2014-11-06 15:21:33 UTC

Post by Zbigniew JÄdrzejewski-Szmek
- if there's a job that does stuff, and then calls reboot or shutdown
- a hook into the shutdown or reboot target does some work

unattended-upgrades is currently the latter: the user shuts down (or is
reminded to shut down by an update notification), and
unattended-upgrades runs as a side-effect.

This is an optional (non-default) configuration of an optional package,
not core Debian/Ubuntu functionality; so it doesn't necessarily have to
be like this forever, it could be modified to tell systemd "I'm still
shutting down, continue to wait" periodically, it could be modified to
use "reboot into a special mode, install, then reboot again" logic under
systemd if that's something you already have, and, worst-case, it could
install a drop-in to override the timeout.

The default configuration is currently to perform the upgrades in-place
and prompt the user to reboot when convenient, just like a manual
apt/zypp/etc. run would.

I have worked on improving u-u's upgrade-during-shutdown mode for
SteamOS, where it is actively used in that mode (SteamOS doesn't use
systemd yet, as far as I know). With my patchset, it pre-downloads all
necessary packages and performs a pre-prepared transaction during
shutdown. Without my patchset, it downloads stuff during shutdown, which
seemed rather more precarious than anyone wants.

S

Zbigniew Jędrzejewski-Szmek

2014-11-06 15:24:34 UTC

Post by Simon McVittie

Post by Zbigniew JÄdrzejewski-Szmek
- if there's a job that does stuff, and then calls reboot or shutdown
- a hook into the shutdown or reboot target does some work

unattended-upgrades is currently the latter: the user shuts down (or is
reminded to shut down by an update notification), and
unattended-upgrades runs as a side-effect.
This is an optional (non-default) configuration of an optional package,
not core Debian/Ubuntu functionality; so it doesn't necessarily have to
be like this forever, it could be modified to tell systemd "I'm still
shutting down, continue to wait" periodically, it could be modified to
use "reboot into a special mode, install, then reboot again" logic under
systemd if that's something you already have, and, worst-case, it could
install a drop-in to override the timeout.
The default configuration is currently to perform the upgrades in-place
and prompt the user to reboot when convenient, just like a manual
apt/zypp/etc. run would.
I have worked on improving u-u's upgrade-during-shutdown mode for
SteamOS, where it is actively used in that mode (SteamOS doesn't use
systemd yet, as far as I know). With my patchset, it pre-downloads all
necessary packages and performs a pre-prepared transaction during
shutdown. Without my patchset, it downloads stuff during shutdown, which
seemed rather more precarious than anyone wants.

Hm, so maybe I was a bit hasty with the revert. It doesn't really matter
if download+updates or just updates are done during shutdown. In either
case, a fixed timeout is dangerous.

Zbyszek

Colin Guthrie

2014-11-06 16:59:45 UTC

Post by Simon McVittie

Post by Zbigniew JÄdrzejewski-Szmek
- if there's a job that does stuff, and then calls reboot or shutdown
- a hook into the shutdown or reboot target does some work

Was there not talk of teaching the sd-notify protocol the ability to
tell systemd that "I'm still alive and doing stuff - so please don't
kill me"?

A sort of keep-alive (or keep-me-in-this-state-please) ping.

Not sure if that ever came to pass but I remember seeing a discussion
kicking around the list about this.

Col
--
Colin Guthrie
gmane(at)colin.guthr.ie
http://colin.guthr.ie/

Day Job:
Tribalogic Limited http://www.tribalogic.net/
Open Source:
Mageia Contributor http://www.mageia.org/
PulseAudio Hacker http://www.pulseaudio.org/
Trac Hacker http://trac.edgewall.org/

Lennart Poettering

2014-11-10 19:25:37 UTC

Post by Colin Guthrie

Post by Simon McVittie

Post by Zbigniew JÄdrzejewski-Szmek
- if there's a job that does stuff, and then calls reboot or shutdown
- a hook into the shutdown or reboot target does some work

Was there not talk of teaching the sd-notify protocol the ability to
tell systemd that "I'm still alive and doing stuff - so please don't
kill me"?

That exists since quite some time. It's the WATCHDOG= field in sd_notify()

Not sure if this is really useful here though...

Lennart

--
Lennart Poettering, Red Hat

Patrick Häcker

2014-11-07 14:02:49 UTC

Post by Lennart Poettering
Is "unattended-upgrades" a package of its own?

Yes, it's a separate package (although it's obviously closely coupled with
the apt package manager).

Post by Lennart Poettering
If so, I'd probably ask the packagers to include drop-ins for
reboot.target to override the timeout. That way, as soon as you install
it the shutdown timeouts are disabled.

That should be possible. Currently the package contains

Post by Lennart Poettering
[Unit]
Description=Unattended Upgrades
DefaultDependencies=no
Before=shutdown.target reboot.target halt.target
Documentation=man:unattended-upgrade(8)
[Service]
Type=oneshot
ExecStart=/usr/share/unattended-upgrades/unattended-upgrade-shutdown
[Install]
WantedBy=shutdown.target

Only the maintainer Michael Vogt can decide if he wants to go in that
direction, thus I added him as CC.

@Michael Vogt:
The discussion is about adding a watchdog to systemd to power down the system
if the shutdown takes longer than some time (i.e. 30 minutes). The question
was how to avoid killing unattended-upgrade during a longer upgrade if it is
configured to update the packages at shutdown.

Kind regards
Patrick

Zbigniew Jędrzejewski-Szmek

2014-11-07 14:39:15 UTC

Post by Patrick HÃ¤cker
That should be possible. Currently the package contains

Only the maintainer Michael Vogt can decide if he wants to go in that
direction, thus I added him as CC.
The discussion is about adding a watchdog to systemd to power down the system
if the shutdown takes longer than some time (i.e. 30 minutes). The question
was how to avoid killing unattended-upgrade during a longer upgrade if it is
configured to update the packages at shutdown.

Systemd has to provide a mechanism, but how it is to be implemented is our
problem. I don't think that this is a question for maintainers of other packages
to answer.