Discussion:
reboot delay due to lack of VG deactivation before cryptsetup close
(too old to reply)
Chris Murphy
2018-04-02 01:05:47 UTC
Permalink
Summary: I have an LUKS encrypted partition, which is in turn made
into a PV. So all the LVs made in this volume group are encrypted,
including vg/swap which is found in fstab. The /etc/crypttab opens
this LUKS device using a keyfile during boot.

At reboot, there is a hang for cryptsetup, and the only device being
used is vg/swap. It seems to me systemd should deactivate swap, and
then it can deactivate the VG, and then it can stop cryptsetup. I
*think* what's happening is it's not deactivating the VG, therefore
the LVs are all still active even though they aren't in use, and
therefore cryptsetup close is failing.


journal log
https://drive.google.com/open?id=1PAX7RwKYi5WzwpRfPxQrS-IzhCchZy88


[***@f27s ~]$ uname -r
4.15.14-300.fc27.x86_64
[***@f27s ~]$ rpm -q systemd
systemd-234-10.git5f8984e.fc27.x86_64


[ 547.107624] f27s.localdomain systemd[1]: Deactivated swap /dev/vg/swap.

Looks like swap really is deactivated.

[ 551.832626] f27s.localdomain systemd-cryptsetup[1473]: Failed to
deactivate: Device or resource busy

Before this message, I see no evidence the VG is deactivated first and
thus probably why cryptsetup close fails. So it hangs for a 1m30s.

[ 635.123287] f27s.localdomain systemd[1]: lvm2-lvmetad.service: Main
process exited, code=killed, status=9/KILL

and

[ 637.132128] f27s.localdomain systemd[1]: dm-event.service: Main
process exited, code=killed, status=9/KILL


And then reboot happens.

If I'm right, this has nothing to do with swap being on this VG. Even
without swap on it, the VG will still have active LV's, which will
prevent cryptsetup close from completing. The VG has to be deactivated
first.
--
Chris Murphy
Andrei Borzenkov
2018-04-02 17:02:56 UTC
Permalink
Post by Chris Murphy
Summary: I have an LUKS encrypted partition, which is in turn made
into a PV. So all the LVs made in this volume group are encrypted,
including vg/swap which is found in fstab. The /etc/crypttab opens
this LUKS device using a keyfile during boot.
At reboot, there is a hang for cryptsetup, and the only device being
used is vg/swap. It seems to me systemd should deactivate swap, and
then it can deactivate the VG, and then it can stop cryptsetup. I
*think* what's happening is it's not deactivating the VG, therefore
the LVs are all still active even though they aren't in use, and
therefore cryptsetup close is failing.
...
Post by Chris Murphy
If I'm right, this has nothing to do with swap being on this VG. Even
without swap on it, the VG will still have active LV's, which will
prevent cryptsetup close from completing. The VG has to be deactivated
first.
You are most likely right, but I's say that is something LVM folks
should implement. So I would say this mail lacks at least one more
address in To or Cc :)

Of course there is generic problem of ordering services that are
responsible for configuring devices. Such service cannot be ordered
After device because otherwise device would never appear in the first
place. systemd refuses to order services Before device (it fails with
error message). So we can only add Requires or BindsTo and hope for the
best. But it also means that e.g. cryptsetup is not ordered against
services that need encrypted device. So during stop they will run
concurrently.

This is partially mitigated by the ordering of cryptsetup very early
during boot and hence very late during shutdown, so most services are
already stopped at this point. But if you are going to support arbitrary
deep stack of nested virtual devices this will become a real problem.

systemd needs clean framework to properly order service responsible for
setting up and tearing down device against all other units needing this
device.
Chris Murphy
2018-04-03 00:00:49 UTC
Permalink
Post by Andrei Borzenkov
Post by Chris Murphy
Summary: I have an LUKS encrypted partition, which is in turn made
into a PV. So all the LVs made in this volume group are encrypted,
including vg/swap which is found in fstab. The /etc/crypttab opens
this LUKS device using a keyfile during boot.
At reboot, there is a hang for cryptsetup, and the only device being
used is vg/swap. It seems to me systemd should deactivate swap, and
then it can deactivate the VG, and then it can stop cryptsetup. I
*think* what's happening is it's not deactivating the VG, therefore
the LVs are all still active even though they aren't in use, and
therefore cryptsetup close is failing.
...
Post by Chris Murphy
If I'm right, this has nothing to do with swap being on this VG. Even
without swap on it, the VG will still have active LV's, which will
prevent cryptsetup close from completing. The VG has to be deactivated
first.
You are most likely right, but I's say that is something LVM folks
should implement. So I would say this mail lacks at least one more
address in To or Cc :)
I've tried to subscribe to it, and I receive neither succeed or fail
message, and my emails to the list appear to go to /dev/null - and the
same for the list owner. *shrug* It's almost like there's symmetry
here.
Post by Andrei Borzenkov
Of course there is generic problem of ordering services that are
responsible for configuring devices. Such service cannot be ordered
After device because otherwise device would never appear in the first
place. systemd refuses to order services Before device (it fails with
error message). So we can only add Requires or BindsTo and hope for the
best. But it also means that e.g. cryptsetup is not ordered against
services that need encrypted device. So during stop they will run
concurrently.
This is partially mitigated by the ordering of cryptsetup very early
during boot and hence very late during shutdown, so most services are
already stopped at this point. But if you are going to support arbitrary
deep stack of nested virtual devices this will become a real problem.
systemd needs clean framework to properly order service responsible for
setting up and tearing down device against all other units needing this
device.
Call me oblivious, or maybe wishful thinker, but both LVM and dm-crypt
are ultimately owned by device-mapper. It seems to me systemd would
ideally, in order: umount everything in reverse order except /
(meaning also deactivate swap) and then "simply" send a message to
device-mapper in effect that it's reboot time - and then device-mapper
handles the tear down of their own stuff in the proper order, whatever
that is.

A crude way to approximate it would be umount the top level (file
systems and deactivate swap), but then to simultaneously "stop"
everything else. Wait. And then do it again. Wait. Then do it again.
By clobbering everyone at the same time, they all end up shutting down
in the proper sequence just by their nature.

As it is, 1m30 delay is not the worst thing in the world. But systemd
folks have spent a lot of effort on faster boot times. And this
problem effectively makes reboot slow, by a lot.
--
Chris Murphy
Loading...