[systemd-devel] dev-root.device is not active, results in an umount spree

Discussion:

[systemd-devel] dev-root.device is not active, results in an umount spree

Dimitri John Ledkov

2015-05-13 11:48:14 UTC

I am booting without initramfs, using a plan9 filesystem as rootfs in a kvm.

Thus my /proc/cmdline has:
root=/dev/root rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p

# mount
/dev/root on / type 9p
(rw,relatime,sync,dirsync,rw,trans=virtio,version=9p2000.L)

Yet, dev-root.device is dead:
# systemctl status dev-root.device
● dev-root.device
Loaded: loaded
Active: inactive (dead)

This is very bad. As a harmless action like following:

# mount --bind /opt /opt

Results in opt.mount unit to be generated which BindsTo
dev-root.device, which is inactive, thus systemd tries to stop that
unit straight away, and umount fails and is retried infinitely...
effectively DoSing init.

What am I missing and/or how can I force make dev-root.device to be
considered active?

I am planning to locally patch mount_add_device_links to skip if what
is "/dev/root", to avoid a call to unit_add_node_link... But I'm not
sure if this is the right thing to do or not.

--
Regards,

Dimitri.
Pura Vida!

https://clearlinux.org
Open Source Technology Center
Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ.

Martin Pitt

2015-05-13 11:53:19 UTC

Hey Dimitri,

Post by Dimitri John Ledkov
# systemctl status dev-root.device
● dev-root.device
Loaded: loaded
Active: inactive (dead)
# mount --bind /opt /opt
Results in opt.mount unit to be generated which BindsTo
dev-root.device, which is inactive, thus systemd tries to stop that
unit straight away, and umount fails and is retried infinitely...
effectively DoSing init.

For the record, I got a similar bug report a while ago:
https://bugs.launchpad.net/systemd/+bug/1444402
This is reproducible in a container with udev running, see the small
reproducer in the bug trail.

This is on my plate to investigate/fix, I just got interrupted by a
couple of security issues, so not this week.

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Dimitri John Ledkov

2015-05-13 12:10:27 UTC

Heya,

Post by Martin Pitt
Hey Dimitri,

Post by Dimitri John Ledkov
# systemctl status dev-root.device
● dev-root.device
Loaded: loaded
Active: inactive (dead)
# mount --bind /opt /opt
Results in opt.mount unit to be generated which BindsTo
dev-root.device, which is inactive, thus systemd tries to stop that
unit straight away, and umount fails and is retried infinitely...
effectively DoSing init.

https://bugs.launchpad.net/systemd/+bug/1444402
This is reproducible in a container with udev running, see the small
reproducer in the bug trail.
This is on my plate to investigate/fix, I just got interrupted by a
couple of security issues, so not this week.

Yes, quite.

So this affects / affected rocket, dracut, ostree, rkt and any other
similar containers / VMs.

Colin, has this been addressed and fixed at all? Or was this
workaround with "don't do this".

My current "plug" is to exclude umount har har.

--
Regards,

Dimitri.
Pura Vida!

https://clearlinux.org
Open Source Technology Center
Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ.

Lennart Poettering

2015-05-13 12:43:49 UTC

Post by Dimitri John Ledkov
I am booting without initramfs, using a plan9 filesystem as rootfs in a kvm.
root=/dev/root rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p
# mount
/dev/root on / type 9p
(rw,relatime,sync,dirsync,rw,trans=virtio,version=9p2000.L)
# systemctl status dev-root.device
● dev-root.device
Loaded: loaded
Active: inactive (dead)

Yeah, this /dev/root thing is really weird in the kernel. It's not an
actual device, it's just a weird string. We probably should filter it
out entirely, and never ever generate a unit dependency for it.

Please check if this fixes things for you:

http://cgit.freedesktop.org/systemd/systemd/commit/?id=7ba2711d3fd283c389db2a1e7b9598ba9f0dac0c

That said, I figure you should backport
628c89cc68ab96fce2de7ebba5933725d147aecc and friends to your tree,
which should also make this problem go away for you.

Post by Dimitri John Ledkov
I am planning to locally patch mount_add_device_links to skip if what
is "/dev/root", to avoid a call to unit_add_node_link... But I'm not
sure if this is the right thing to do or not.

It is, it's the change I made now, too.

Lennart

--
Lennart Poettering, Red Hat

Dimitri John Ledkov

2015-05-13 13:20:28 UTC

Post by Lennart Poettering

Post by Dimitri John Ledkov
I am booting without initramfs, using a plan9 filesystem as rootfs in a kvm.
root=/dev/root rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p
# mount
/dev/root on / type 9p
(rw,relatime,sync,dirsync,rw,trans=virtio,version=9p2000.L)
# systemctl status dev-root.device
● dev-root.device
Loaded: loaded
Active: inactive (dead)

Yeah, this /dev/root thing is really weird in the kernel. It's not an
actual device, it's just a weird string. We probably should filter it
out entirely, and never ever generate a unit dependency for it.
http://cgit.freedesktop.org/systemd/systemd/commit/?id=7ba2711d3fd283c389db2a1e7b9598ba9f0dac0c

Yeap. All good now, can this go into v219-stable tree as well please?

Post by Lennart Poettering
That said, I figure you should backport
628c89cc68ab96fce2de7ebba5933725d147aecc and friends to your tree,
which should also make this problem go away for you.

I have these via stable tree.

Post by Lennart Poettering

Post by Dimitri John Ledkov
I am planning to locally patch mount_add_device_links to skip if what
is "/dev/root", to avoid a call to unit_add_node_link... But I'm not
sure if this is the right thing to do or not.

It is, it's the change I made now, too.

Ok, cool.

--
Regards,

Dimitri.
Pura Vida!

https://clearlinux.org
Open Source Technology Center
Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ.

Martin Pitt

2015-05-14 10:51:37 UTC

Hello all,

Post by Dimitri John Ledkov
I am booting without initramfs, using a plan9 filesystem as rootfs in a kvm.
root=/dev/root rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p
# mount
/dev/root on / type 9p
(rw,relatime,sync,dirsync,rw,trans=virtio,version=9p2000.L)
# systemctl status dev-root.device
â dev-root.device
Loaded: loaded
Active: inactive (dead)
# mount --bind /opt /opt
Results in opt.mount unit to be generated which BindsTo
dev-root.device, which is inactive, thus systemd tries to stop that
unit straight away, and umount fails and is retried infinitely...
effectively DoSing init.

As I mentioned before, simply ignoring /dev/root doesn't help in all
cases, and hardcoding it in the code is a bit ugly.

The attached patch is a more general solution which stops creating
dead stub .device units for devices which don't exist at all. I. e.
while in your case -.mount was probably BindsTo=dev-root.device (or in
my case, in my container I had mnt.mount BindsTo=dev-sda3.device,
which exists on the host but not in the container), but with this
patch it should now not be bound to anything.

Lennart, if you don't like this there is an alternative, and more
complicated solution: We could instead teach device_found_node() to
actually do create a proper .device with the passed DeviceFound (which
is "MOUNT"), resulting in creating a state "tentative" device. I have
a half-working patch for this, but it's brittle as with pretty much
any uevent or moutinfo change the status changes back to "dead",
causing the unmount attempt. IMHO this approach isn't conceptually
correct either -- in such containers, if don't have the corresponding
device nodes at all, we shouldn't try to react to hotplug events and
clean up mounts. I. e. the "tentative" concept does not really apply
there. But I have misunderstood the intent.

Thanks for considering,

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Andrei Borzenkov

2015-05-14 11:24:13 UTC

Ð Thu, 14 May 2015 12:51:37 +0200

Post by Martin Pitt
Hello all,

Post by Dimitri John Ledkov
I am booting without initramfs, using a plan9 filesystem as rootfs in a kvm.
root=/dev/root rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p
# mount
/dev/root on / type 9p
(rw,relatime,sync,dirsync,rw,trans=virtio,version=9p2000.L)
# systemctl status dev-root.device
â dev-root.device
Loaded: loaded
Active: inactive (dead)
# mount --bind /opt /opt
Results in opt.mount unit to be generated which BindsTo
dev-root.device, which is inactive, thus systemd tries to stop that
unit straight away, and umount fails and is retried infinitely...
effectively DoSing init.

As I mentioned before, simply ignoring /dev/root doesn't help in all
cases, and hardcoding it in the code is a bit ugly.
The attached patch is a more general solution which stops creating
dead stub .device units for devices which don't exist at all. I. e.
while in your case -.mount was probably BindsTo=dev-root.device (or in
my case, in my container I had mnt.mount BindsTo=dev-sda3.device,
which exists on the host but not in the container), but with this
patch it should now not be bound to anything.

Will it be "rebound" when device appears? Otherwise any mount that
happens before udev is started/happens to notice device will not be
associated with device. Most common case is probably mounts inherited
from initrd.

Post by Martin Pitt
Lennart, if you don't like this there is an alternative, and more
complicated solution: We could instead teach device_found_node() to
actually do create a proper .device with the passed DeviceFound (which
is "MOUNT"), resulting in creating a state "tentative" device. I have
a half-working patch for this, but it's brittle as with pretty much
any uevent or moutinfo change the status changes back to "dead",
causing the unmount attempt. IMHO this approach isn't conceptually
correct either -- in such containers, if don't have the corresponding
device nodes at all, we shouldn't try to react to hotplug events and
clean up mounts. I. e. the "tentative" concept does not really apply
there. But I have misunderstood the intent.
Thanks for considering,
Martin

Martin Pitt

2015-05-18 09:50:32 UTC

Post by Andrei Borzenkov
Will it be "rebound" when device appears? Otherwise any mount that
happens before udev is started/happens to notice device will not be
associated with device. Most common case is probably mounts inherited
from initrd.

Not with the first patch (the one you replied to). The v2 patches do
create the .device right from the start in state "tentative" (that was
the intention judging by the comments, it just doesn't actually behave
that way) and thus .mounts do get bound to the .devices.

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Lennart Poettering

2015-05-14 16:09:07 UTC

Post by Martin Pitt

Post by Dimitri John Ledkov
# mount --bind /opt /opt
Results in opt.mount unit to be generated which BindsTo
dev-root.device, which is inactive, thus systemd tries to stop that
unit straight away, and umount fails and is retried infinitely...
effectively DoSing init.

As I mentioned before, simply ignoring /dev/root doesn't help in all
cases, and hardcoding it in the code is a bit ugly.

It doesn't help in all cases? Which ones? Can you elaborate?

And why is this ugly?

/dev/root is special in the kernel. No sysfs device by that name ever
exists, it's pretty much a fixed string the kernel inserts into the
mount table for the root device specified on the kernel cmdline. It
never exists otherwise and hence it is the right logic to check for it
explicitly and ignore it explicitly. It gets particularly confusing
for btrfs file systems, since those are device-less file systems
nominally, but when you use them as root fs without inird, they
suddenly appear as device backed fs this way...

Post by Martin Pitt
The attached patch is a more general solution which stops creating
dead stub .device units for devices which don't exist at all. I. e.
while in your case -.mount was probably BindsTo=dev-root.device (or in
my case, in my container I had mnt.mount BindsTo=dev-sda3.device,
which exists on the host but not in the container), but with this
patch it should now not be bound to anything.

The patch is not right, Because it will not load device units for
devices that have not been referenced yet. This would break things for
mount entries that carry no deps on their own, but only reference a
source device.

Post by Martin Pitt
Lennart, if you don't like this there is an alternative, and more
complicated solution: We could instead teach device_found_node() to
actually do create a proper .device with the passed DeviceFound (which
is "MOUNT"), resulting in creating a state "tentative" device. I have
a half-working patch for this, but it's brittle as with pretty much
any uevent or moutinfo change the status changes back to "dead",
causing the unmount attempt. IMHO this approach isn't conceptually
correct either -- in such containers, if don't have the corresponding
device nodes at all, we shouldn't try to react to hotplug events and
clean up mounts. I. e. the "tentative" concept does not really apply
there. But I have misunderstood the intent.

Hmm, not following I must admit.

Note though that for containers we do not longer create any device
dependencies anyway, see 47bc12e1ba35d38.

Lennart

--
Lennart Poettering, Red Hat

Martin Pitt

2015-05-17 11:02:35 UTC

Hey Lennart,

Post by Lennart Poettering

Post by Martin Pitt
As I mentioned before, simply ignoring /dev/root doesn't help in all
cases, and hardcoding it in the code is a bit ugly.

It doesn't help in all cases? Which ones? Can you elaborate?

It doesn't seem to help at all in e. g. LXC. This is with systemd git
master in a "test" container: with "lxc.mount.auto = sys:rw cgroup":

# mount -o bind /bin/ /mnt; systemctl show -p ActiveState,SubState,BindsTo,BoundBy mnt.mount
BindsTo=dev-sda3.device
BoundBy=
ActiveState=deactivating
SubState=unmounting

Journal says:

Mai 17 12:37:05 test systemd[1]: mnt.mount: Unit is bound to inactive unit dev-sda3.device. Stopping, too.
Mai 17 12:37:05 test systemd[1]: Unmounting /mnt...
Mai 17 12:37:05 test systemd[1]: Unmounted /mnt.

because of the BindsTo=, which is my system partition on the host. The
container doesn't have any /dev/sd*.

# systemctl show -p ActiveState,SubState,BindsTo,BoundBy dev-sda3.device
BindsTo=
BoundBy=etc-schroot.mount
ActiveState=inactive
SubState=dead

This exists becayse of the manager_load_unit() in unit_add_node_link()
which synthesizes such .device units with "dead" state, and there's no
check whether they actually exist.

Post by Lennart Poettering
And why is this ugly?
/dev/root is special in the kernel. No sysfs device by that name ever
exists, it's pretty much a fixed string the kernel inserts into the
mount table for the root device specified on the kernel cmdline. It
never exists otherwise and hence it is the right logic to check for it
explicitly and ignore it explicitly. It gets particularly confusing
for btrfs file systems, since those are device-less file systems
nominally, but when you use them as root fs without inird, they
suddenly appear as device backed fs this way...

Agreed that /dev/root is "special" and not a real thing. I just don't
like hardcoding such names in code if it can be avoided, and IMHO here
it can: Handling this special case isn't sufficient (see above), and
it's not necessary as a more generic solution would include it.

That said, I don't want to dwell on the /dev/root special case -- if
you like it, let's keep it, it's not harmful. (I just noticed that in
other cases you tried to avoid hardcoding stuff).

Post by Lennart Poettering

Post by Martin Pitt
The attached patch is a more general solution which stops creating
dead stub .device units for devices which don't exist at all. I. e.
while in your case -.mount was probably BindsTo=dev-root.device (or in
my case, in my container I had mnt.mount BindsTo=dev-sda3.device,
which exists on the host but not in the container), but with this
patch it should now not be bound to anything.

The patch is not right, Because it will not load device units for
devices that have not been referenced yet. This would break things for
mount entries that carry no deps on their own, but only reference a
source device.

I'm not sure I understand this, can you please elaborate with an
example? I'd like to understand the use case for synthesizing dead
.device units which are known to not exist and binding to them.

Post by Lennart Poettering

Post by Martin Pitt
Lennart, if you don't like this there is an alternative, and more
complicated solution: We could instead teach device_found_node() to
actually do create a proper .device with the passed DeviceFound (which
is "MOUNT"), resulting in creating a state "tentative" device. I have
a half-working patch for this, but it's brittle as with pretty much
any uevent or moutinfo change the status changes back to "dead",
causing the unmount attempt. IMHO this approach isn't conceptually
correct either -- in such containers, if don't have the corresponding
device nodes at all, we shouldn't try to react to hotplug events and
clean up mounts. I. e. the "tentative" concept does not really apply
there. But I have misunderstood the intent.

Hmm, not following I must admit.

In terms of the above example, what happens when parsing /mountinfo
generates a mnt.mount:

- status quo: manager_load_unit() in unit_add_node_link() creates a
dev-sda3.device which doesn't actually exist, mnt.mount then
BindsTo= that, and gets unmounted because it's a "dead" device.

- This patch: manager_get_unit() doesn't find a dev-sda3.device and
hence no BindsTo= is added. This would break if adding /dev/*
nodes *after* existing mounts is expected to add a BindsTo=.

- Alternative: Fix device_found_node() to create a .device with the
correct ("tentative") state if the device doesn't exist in /dev/
or udev (yet). Then manager_load_unit() would not create a "dead"
stub .device any more, but use the existing "tentative" .device
and mounts would also stop being auto-unmounted.

So as this proposed patch apparently isn't good (I do believe that it
breaks stuff, I'd just like to understand how), I'll look into fixing
my half-working patch for the alternative.

Post by Lennart Poettering
Note though that for containers we do not longer create any device
dependencies anyway, see 47bc12e1ba35d38.

This doesn't work in a container with a writable /sys and udev
running. (I'm not entirely sure why people would do this, but
apparently they do; we get bug reports for them and there might be
legitimate use cases). But TBH such explicit checks like "is /sys
writable" appear to be more like a brittle hack than a robust
solution; this doesn't tell you anything about what your /dev looks
like or whether or not udev is running, etc.

I also have a gut feeling that Dimitri's use case with a plan9 file
system would still break with the /dev/root special case fix if the
file system wasn't the root fs but e. g. /home, and you try to
bind-mount anything from it.

Thanks,

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Martin Pitt

2015-05-17 13:54:03 UTC

Hello again,

Post by Martin Pitt
- Alternative: Fix device_found_node() to create a .device with the
correct ("tentative") state if the device doesn't exist in /dev/
or udev (yet). Then manager_load_unit() would not create a "dead"
stub .device any more, but use the existing "tentative" .device
and mounts would also stop being auto-unmounted.

The attached patches implement this now. The whole device state logic
with the "tentative" handling isn't that easy to understand, and
indeed there are still some "nice" traps.

The first patch fixes the incomplete handling of tentative devices
that got picked up by mountinfo -- you NACK'ed the first patch which
would avoid creating the stub "dead" .device units for those, so this
now makes sure that this comment in device_found_node():

/* This is called whenever we find a device referenced in
* /proc/swaps or /proc/self/mounts. Such a device might be
* mounted/enabled at a time where udev has not finished
* probing it yet, and we thus haven't learned about it
* yet. In this case we will set the device unit to
* "tentative" state. */

will actually become true: i. e. with this patch this now creates a
.device unit with found == DEVICE_FOUND_MOUNT and thus state ==
DEVICE_TENTATIVE. This avoids the aforementioned trap of
manager_load_unit() creating bogus units later on, but keeps the idea
of having a real .device unit in "tentative" state for non-udev
devices.

This fixes the original "systemd immediately unmounts my mounts" bug,
but not for very long: If you remount or unmount just one mount on a
tentative device, mountinfo changes, and device_found_node() now calls
device_update_found_one() with "add == 0". Then

n = add ? (d->found | found) : (d->found & ~found);

would unset the previous DEVICE_FOUND_MOUNT flag, leaving 0 (i. e.
DEVICE_NOT_FOUND). Thus the previously "tentative" device would once
again be set to "dead", and systemd would unmount all other mounts
from it. This must never happen, we simply can't declare tentative
devices as dead and clean up their unmounts, this only works for
"plugged" ones that we know via udev.

I tested this now with various container setups (with and without
udev/writable /sys), real iron, VM, and the infamous "remove mounted
CD"; the latter still cleans up the stale mount point as intended.

Perhaps it's time to stick our heads together again, write down the
use cases where this stuff should actually do something (e. g. cleanup
mounts and track .device units), and see if we can simplify this
again?

Thanks,

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

systemd github import bot

2015-05-17 14:05:11 UTC

Patchset imported to github.
Pull request:
<https://github.com/systemd-devs/systemd/compare/master...systemd-mailing-devs:20150517135402.GD3281%40piware.de>

--
Generated by https://github.com/haraldh/mail2git

Martin Pitt

2015-05-18 14:08:51 UTC

Post by Martin Pitt
This fixes the original "systemd immediately unmounts my mounts" bug,
but not for very long: If you remount or unmount just one mount on a
tentative device, mountinfo changes, and device_found_node() now calls
device_update_found_one() with "add == 0". Then
n = add ? (d->found | found) : (d->found & ~found);
would unset the previous DEVICE_FOUND_MOUNT flag, leaving 0 (i. e.
DEVICE_NOT_FOUND). Thus the previously "tentative" device would once
again be set to "dead", and systemd would unmount all other mounts
from it. This must never happen, we simply can't declare tentative
devices as dead and clean up their unmounts, this only works for
"plugged" ones that we know via udev.

Eek, I attached the wrong 0002- patch, sorry about that. The above is
fixed by the attached patch, please ignore 0002- from the previous
mail. For completeness I also re-attach 0001- again (unchanged).

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Lennart Poettering

2015-05-18 20:57:34 UTC

Change device_found_node() to also create a .device unit if a device is not
known by udev; this is the case for "tentative" devices picked up by mountinfo
(DEVICE_FOUND_MOUNT). With that we can record the "found" attribute on the
unit.

Ah, I see now. This makes sense!

static int device_setup_unit(Manager *m, struct udev_device *dev, const char *path, bool main) {
_cleanup_free_ char *e = NULL;
- const char *sysfs;
+ const char *sysfs = NULL;
Unit *u = NULL;
bool delete;
int r;
assert(m);
- assert(dev);
assert(path);
- sysfs = udev_device_get_syspath(dev);
- if (!sysfs)
- return 0;
-
r = unit_name_from_path(path, ".device", &e);
if (r < 0)
return log_error_errno(r, "Failed to generate unit name from device path: %m");
u = manager_get_unit(m, e);
+ if (dev) {
+ sysfs = udev_device_get_syspath(dev);
+ if (!sysfs)
+ return 0;
+ }

Why move this down? In order to keep the patch small and easy to grok,
can this stay up where it was, but simply be enclosed in the "if (dev) {}" check?

- if (stat(node, &st) < 0) {
- if (errno == ENOENT)
+ if (stat(node, &st) >= 0) {
+ if (!S_ISBLK(st.st_mode) && !S_ISCHR(st.st_mode))
return 0;
- return log_error_errno(errno, "Failed to stat device node file %s: %m", node);
- }
-
- if (!S_ISBLK(st.st_mode) && !S_ISCHR(st.st_mode))
- return 0;
+ dev = udev_device_new_from_devnum(m->udev, S_ISBLK(st.st_mode) ? 'b' : 'c', st.st_rdev);
+ if (!dev && errno != ENOENT)
+ return log_oom();

Hmm, why are all these errors supposed to be OOM?
udev_device_new_from_devnum sets errno correctly, hence we should just
print what it really is set to with log_error_errno(), unless of
course it is ENOENT.

- dev = udev_device_new_from_devnum(m->udev, S_ISBLK(st.st_mode) ? 'b' : 'c', st.st_rdev);
- if (!dev) {
- if (errno == ENOENT)
- return 0;
-
- return log_oom();
+ } else {
+ if (errno != ENOENT)
+ return log_error_errno(errno, "Failed to stat device node file %s: %m", node);

The if "else {" and "if (errn..." lines could be merged into one
"else if (errno != ...", no?

From c18dbd432ecd16c57123b5fc04313d625e8a8e88 Mon Sep 17 00:00:00 2001
Date: Sun, 17 May 2015 15:17:58 +0200
Subject: [PATCH 2/3] device: never transition from "tentative" to "dead"
Only set a device to state "dead" if it was previously plugged through udev. We
must never transition from "tentative" to "dead", as there is no way to be sure
that this is actually true.
This fixes systemd unmounting everything from a tentative device as soon as
mountinfo changes.

Not following on this patch: what's the rationale here? your patch
basically means we would never ever clean up "tentative" devices, if
they once were referenced in mountinfo or /proc/swaps, but no longer
are.

Can you elaborate on what this patch is supposed to achieve? How could
it be problematic to deactivate device units that are neither
announced anywhere in /proc nor in udev?

Lennart

--
Lennart Poettering, Red Hat

Martin Pitt

2015-05-19 06:29:16 UTC

Hello,

Post by Lennart Poettering

+ if (dev) {
+ sysfs = udev_device_get_syspath(dev);
+ if (!sysfs)
+ return 0;
+ }

Why move this down? In order to keep the patch small and easy to grok,
can this stay up where it was, but simply be enclosed in the "if (dev) {}" check?

Right, that was probably just the result of multiple edits/iterations;
moved back.

Post by Lennart Poettering

- if (!S_ISBLK(st.st_mode) && !S_ISCHR(st.st_mode))
- return 0;
+ dev = udev_device_new_from_devnum(m->udev, S_ISBLK(st.st_mode) ? 'b' : 'c', st.st_rdev);
+ if (!dev && errno != ENOENT)
+ return log_oom();

Hmm, why are all these errors supposed to be OOM?

Not sure, perhaps hysterical raisins? But..

Post by Lennart Poettering
udev_device_new_from_devnum sets errno correctly, hence we should just
print what it really is set to with log_error_errno(), unless of
course it is ENOENT.

Makes sense, done now. It's an unrelated change to this patch, but if
you don't mind let's clean it up.

Post by Lennart Poettering

+ } else {
+ if (errno != ENOENT)
+ return log_error_errno(errno, "Failed to stat device node file %s: %m", node);

The if "else {" and "if (errn..." lines could be merged into one
"else if (errno != ...", no?

Right, done.

Updated patch attached. It doesn't look significantly smaller, mostly
because lots of it is an indentation change :/

Re-tested on a normal system, nspawn, LXC, and with ejecting a mounted
CD.

Post by Lennart Poettering

Subject: [PATCH 2/3] device: never transition from "tentative" to "dead"

Not following on this patch

Will reply on your other response. TL/DR: Current code in master is
overzealous, this patch is overcautious, this needs some more thought
and work; I don't have an updated patch yet.

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

systemd github import bot

2015-05-19 07:05:41 UTC

Patchset imported to github.
Pull request:
<https://github.com/systemd-devs/systemd/compare/master...systemd-mailing-devs:20150519062916.GD3222%40piware.de>

--
Generated by https://github.com/haraldh/mail2git

Lennart Poettering

2015-05-19 11:03:19 UTC

From 8bbd9d1df6877867ce7958c2e51574b3e74c68e6 Mon Sep 17 00:00:00 2001
Date: Sun, 17 May 2015 15:07:47 +0200
Subject: [PATCH] device: create units with intended "found" value

Applied this one! Thanks!

Lennart

--
Lennart Poettering, Red Hat

Lennart Poettering

2015-05-18 21:04:37 UTC

Post by Martin Pitt

Post by Martin Pitt
This fixes the original "systemd immediately unmounts my mounts" bug,
but not for very long: If you remount or unmount just one mount on a
tentative device, mountinfo changes, and device_found_node() now calls
device_update_found_one() with "add == 0". Then
n = add ? (d->found | found) : (d->found & ~found);
would unset the previous DEVICE_FOUND_MOUNT flag, leaving 0 (i. e.
DEVICE_NOT_FOUND). Thus the previously "tentative" device would once
again be set to "dead", and systemd would unmount all other mounts
from it. This must never happen, we simply can't declare tentative
devices as dead and clean up their unmounts, this only works for
"plugged" ones that we know via udev.

Eek, I attached the wrong 0002- patch, sorry about that. The above is
fixed by the attached patch, please ignore 0002- from the previous
mail. For completeness I also re-attach 0001- again (unchanged).

Still not getting what the purpose of the 0002 patch is, even in this
revision...

Please elaborate!

Post by Martin Pitt
--- a/src/core/device.c
+++ b/src/core/device.c
@@ -465,12 +465,13 @@ static void device_update_found_one(Device *d, bool add, DeviceFound found, bool
* now referenced by the kernel, then we assume the
* kernel knows it now, and udev might soon too. */
device_set_state(d, DEVICE_TENTATIVE);
- else
- /* If nobody sees the device, or if the device was
- * previously seen by udev and now is only referenced
- * from the kernel, then we consider the device is
+ else if (previous & DEVICE_FOUND_UDEV)
+ /* If the device was previously seen by udev and now is only
+ * referenced from the kernel, then we consider the device is
* gone, the kernel just hasn't noticed it yet. */
device_set_state(d, DEVICE_DEAD);
+ /* We never move from TENTATIVE to DEAD, as we can only determine this
+ * status update with udev, not with mountinfo */
}

Devices that show up in /proc/self/mountinfo or /proc/swap, and then
disappear again, without ever showing up in udev, need to be cleaned
up.

Lennart

--
Lennart Poettering, Red Hat

Martin Pitt

2015-05-19 06:59:17 UTC

Post by Lennart Poettering

Post by Martin Pitt

Post by Martin Pitt
This fixes the original "systemd immediately unmounts my mounts" bug,
but not for very long: If you remount or unmount just one mount on a
tentative device, mountinfo changes, and device_found_node() now calls
device_update_found_one() with "add == 0". Then
n = add ? (d->found | found) : (d->found & ~found);
would unset the previous DEVICE_FOUND_MOUNT flag, leaving 0 (i. e.
DEVICE_NOT_FOUND). Thus the previously "tentative" device would once
again be set to "dead", and systemd would unmount all other mounts
from it. This must never happen, we simply can't declare tentative
devices as dead and clean up their unmounts, this only works for
"plugged" ones that we know via udev.

Eek, I attached the wrong 0002- patch, sorry about that. The above is
fixed by the attached patch, please ignore 0002- from the previous
mail. For completeness I also re-attach 0001- again (unchanged).

Still not getting what the purpose of the 0002 patch is, even in this
revision...
Please elaborate!

I'll try to explain step by step:

Say you are booting with plan9 fs, a container, or some other thing
with a mount that references a device "/dev/foo" which isn't actually
available as /dev/foo and in udev. Note that you can't reproduce this
in nspawn, as it forcefully mounts /sys as r/o, which triggers the
unit_type_supported(UNIT_DEVICE) safety check in unit_add_node_link();
this happens in environments with writable /sys.

- Boot, dev-foo.device becomes DEVICE_FOUND_MOUNT/tentative

- Do some more mounts from /dev/foo, e. g.

mkdir /tmp/etc /tmp/boot
mount -o bind /etc /tmp/etc
mount -o bind /boot /tmp/boot

(In practice, you'd probably do such bind mounts with nspawn --bind
or lxc.mount.entry)

tmp-etc.mount and tmp-boot.mount will be BindsTo=dev-foo.device,
and dev-foo.device's status is still unchanged
(DEVICE_FOUND_MOUNT/tentative)

- Now do umount /tmp/boot. This *also* umounts /tmp/etc!

Journal says

systemd[1]: Requested transaction contradicts existing jobs: Resource deadlock avoided
systemd[1]: Unmounted /tmp/etc.

and dev-sda3.device is now inactive/dead, which tears down
the bound tmp-etc.mount. Boom!

Reason: Unmounting /tmp/boot triggers
device_update_found_one(found==DEVICE_FOUND_MOUNT, add==0).
d->found previously was DEVICE_FOUND_MOUNT, and this

n = add ? (d->found | found) : (d->found & ~found);

computes the new state to 0 (i. e. DEVICE_NOT_FOUND), and calls
device_set_state(DEVICE_DEAD). Thus here we (1) don't consider that
dev-foo.device is still bound by other units (tmp-etc.mount) and (2)
lose the information that dev-foo.device is tentative.

So the problem is that this tentative â dead transition only works if
a device is referenced once, but causes overzealous unmounts if there
are more references.

We need a reference count, or check if the device is bound by other
device still. Come to think of it now, we actually already have that:

Id=dev-foo.device
BoundBy=tmp-boot.mount tmp-etc.mount

But my patch doesn't take that into account yet.

Post by Lennart Poettering
Devices that show up in /proc/self/mountinfo or /proc/swap, and then
disappear again, without ever showing up in udev, need to be cleaned
up.

That's right, I forgot about that, thanks for catching! So current
master is too overzealous, and my current patch never cleans up. So
this needs some more work.

Thanks,

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Martin Pitt

2015-05-19 08:23:46 UTC

Hello all,

I have a better fix now.

Post by Martin Pitt
- Boot, dev-foo.device becomes DEVICE_FOUND_MOUNT/tentative
- Do some more mounts from /dev/foo, e. g.
mkdir /tmp/etc /tmp/boot
mount -o bind /etc /tmp/etc
mount -o bind /boot /tmp/boot
(In practice, you'd probably do such bind mounts with nspawn --bind
or lxc.mount.entry)
tmp-etc.mount and tmp-boot.mount will be BindsTo=dev-foo.device,
and dev-foo.device's status is still unchanged
(DEVICE_FOUND_MOUNT/tentative)
- Now do umount /tmp/boot. This *also* umounts /tmp/etc!

With this patch, the unit states are now as expected:

Id=dev-foo.device
BindsTo=
BoundBy=tmp-boot.mount
ActiveState=activating
SubState=tentative

Id=tmp-boot.mount
BindsTo=dev-foo.device
BoundBy=
ActiveState=active
SubState=mounted

And /tmp/etc stays around. After manually unmounting /tmp/mount, the
device state properly goes to "dead":

Id=dev-foo.device
BindsTo=
BoundBy=
ActiveState=inactive
SubState=dead

Post by Martin Pitt
So the problem is that this tentative â dead transition only works if
a device is referenced once, but causes overzealous unmounts if there
are more references.
We need a reference count, or check if the device is bound by other
Id=dev-foo.device
BoundBy=tmp-boot.mount tmp-etc.mount
But my patch doesn't take that into account yet.

This one does now. I left a fairly comprehensive commit log as this
isn't that easy to explain/understand. If it's too verbose for you I
can also trim it back a bit.

Thanks,

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

systemd github import bot

2015-05-19 09:02:38 UTC

Patchset imported to github.
Pull request:
<https://github.com/systemd-devs/systemd/compare/master...systemd-mailing-devs:20150519082346.GG3222%40piware.de>

--
Generated by https://github.com/haraldh/mail2git

Lennart Poettering

2015-05-19 11:56:17 UTC

Post by Martin Pitt
Hello all,
I have a better fix now.

So the problem is that this tentative → dead transition only works if
a device is referenced once, but causes overzealous unmounts if there
are more references.

Ah, indeed, that makes sense!

Post by Martin Pitt

We need a reference count, or check if the device is bound by other
Id=dev-foo.device
BoundBy=tmp-boot.mount tmp-etc.mount
But my patch doesn't take that into account yet.

This one does now. I left a fairly comprehensive commit log as this
isn't that easy to explain/understand. If it's too verbose for you I
can also trim it back a bit.

I have now committed a different fix now, that keeps counting of the
mount points in mount.c, instead of "reaching over" from device.c.

I only gave this light testing, would be cool if you could check if
this fixes things for you.

http://cgit.freedesktop.org/systemd/systemd/commit/?id=fcd8b266edf0df2b85079fcf7b099cd4028740e6

This commit will now collect two sets of devices while going through
/proc/self/mountinfo: the devices of lines that are no longer there,
and the devices of lines that are there. Only for devices in the
former set that are not in the latter we will now propagate an event
to device.c.

Does this make sense?

Post by Martin Pitt
+ /* If we found a tentative device via mountinfo which is still
+ * referenced by other mounts than the one that just got unmounted, we
+ * must leave it FOUND_MOUNT/tentative; otherwise we'd trigger
+ * unmounting of all other bound mounts. */
+ if (d->found == DEVICE_FOUND_MOUNT && n == 0) {
+ Iterator i;
+ Unit *bound;
+
+ SET_FOREACH(bound, UNIT(d)->dependencies[UNIT_BOUND_BY], i) {
+ if (bound->type == UNIT_MOUNT &&
+ MOUNT(bound)->state != MOUNT_DEAD && MOUNT(bound)->state != MOUNT_FAILED) {
+ log_unit_debug(UNIT(d), "Still bound by %s (%s), keeping state.",
+ bound->id, mount_state_to_string(MOUNT(bound)->state));
+ return;
+ }
+ }
+ log_unit_debug(UNIT(d), "Not referenced by any mounts any more, changing to dead.");
+ }
+
previous = d->found;
d->found = n;

Lennart

--
Lennart Poettering, Red Hat

Martin Pitt

2015-05-19 13:17:30 UTC

Hey Lennart,

Post by Lennart Poettering
I have now committed a different fix now, that keeps counting of the
mount points in mount.c, instead of "reaching over" from device.c.
I only gave this light testing, would be cool if you could check if
this fixes things for you.
http://cgit.freedesktop.org/systemd/systemd/commit/?id=fcd8b266edf0df2b85079fcf7b099cd4028740e6
This commit will now collect two sets of devices while going through
/proc/self/mountinfo: the devices of lines that are no longer there,
and the devices of lines that are there. Only for devices in the
former set that are not in the latter we will now propagate an event
to device.c.
Does this make sense?

It does, and it indeed should avoid some round trips. However, it does
not work yet. I added this for extra debugging:

--- a/src/core/device.c
+++ b/src/core/device.c
@@ -771,6 +771,9 @@ int device_found_node(Manager *m, const char *node, bool add, DeviceFound found,
assert(m);
assert(node);

+ if (node[0] == '/')
+ log_warning("XXXX device_found_node node %s add %i found %i now %i", node, add, found, now);
+
/* This is called whenever we find a device referenced in
* /proc/swaps or /proc/self/mounts. Such a device might be
* mounted/enabled at a time where udev has not finished

After unmounting /tmp/etc, my dev-sda3.device (which plays the role of
"dev-foo.device") still becomes "dead", and /tmp/boot gets unmounted:

---- unmounting /tmp/etc ----
Id=dev-sda3.device
BindsTo=
BoundBy=
ActiveState=inactive
SubState=dead

Id=tmp-boot.mount
BindsTo=
BoundBy=
ActiveState=inactive
SubState=dead

Journal follows. The first bits are from the manual "mount" commands:

| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1
| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1
| systemd[1]: tmp-etc.mount: Changed dead -> mounted
| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1
| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1
| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1
| systemd[1]: tmp-boot.mount: Changed dead -> mounted
| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1
| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1

Now "umount /tmp/etc" happens:

| systemd[1]: tmp-etc.mount: Changed mounted -> dead
| systemd[1]: XXXX device_found_node node /dev/sda3 add 0 found 2 now 1

^ So device_found_node() already gets called here, although there
should still be another active mount on sda3. This now causes the
usual cleanup unmount slaughter:

| systemd[1]: dev-sda3.device: Changed tentative -> dead
| systemd[1]: tmp-boot.mount: Trying to enqueue job tmp-boot.mount/stop/replace
| systemd[1]: tmp-boot.mount: Installed new job tmp-boot.mount/stop as 357
| systemd[1]: tmp-boot.mount: Enqueued job tmp-boot.mount/stop as 357
| systemd[1]: tmp-etc.mount: Collecting.
| systemd[1]: Failed to reset devices.list on /lxc/test/system.slice/tmp-boot.mount: Permission denied

(FTR, I get thousands of those, but that's unrelated)

| systemd[1]: tmp-boot.mount: About to execute: /bin/umount /tmp/boot -n
| systemd[1]: tmp-boot.mount: Forked /bin/umount as 641
| systemd[1]: tmp-boot.mount: Changed mounted -> unmounting
| systemd[1]: Unmounting /tmp/boot...
| systemd[641]: tmp-boot.mount: Executing: /bin/umount /tmp/boot -n

I'm not sure where this comes from now; there is no manual mount
command to bring back /tmp/boot. It looks like it "bounces", and
quickly remounts /tmp/boot and then unmounts it again:

| systemd[1]: XXXX device_found_node node /dev/sda3 add 1 found 2 now 1
| systemd[1]: dev-sda3.device: Changed dead -> tentative
| systemd[1]: tmp-boot.mount: Trying to enqueue job tmp-boot.mount/start/fail
| systemd[1]: Requested transaction contradicts existing jobs: Resource deadlock avoided
| systemd[1]: XXXX device_found_node node /dev/sda3 add 0 found 2 now 1
| systemd[1]: dev-sda3.device: Changed tentative -> dead
| systemd[1]: Received SIGCHLD from PID 641 (umount).
| systemd[1]: Child 641 (umount) died (code=exited, status=0/SUCCESS)
| systemd[1]: tmp-boot.mount: Child 641 belongs to tmp-boot.mount
| systemd[1]: tmp-boot.mount: Mount process exited, code=exited status=0
| systemd[1]: tmp-boot.mount: Changed unmounting -> dead
| systemd[1]: tmp-boot.mount: Job tmp-boot.mount/stop finished, result=done
| systemd[1]: Unmounted /tmp/boot.
| systemd[1]: tmp-boot.mount: Collecting.
| systemd[1]: dev-sda3.device: Collecting.

My first hunch is that this is caused by calling
mount_load_proc_self_mountinfo() in mount_dispatch_io()
(src/core/mount.c:1682) *before* it goes through that new
SET_FOREACH() loop. That call will already see the removed mount and
call device_found_node() with the removal.

I'll debug this more closely ASAP, just need to finish something else
first.

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Martin Pitt

2015-05-19 14:05:00 UTC

Post by Martin Pitt
My first hunch is that this is caused by calling
mount_load_proc_self_mountinfo() in mount_dispatch_io()
(src/core/mount.c:1682) *before* it goes through that new
SET_FOREACH() loop. That call will already see the removed mount and
call device_found_node() with the removal.

That was a red herring. Turns out that your patch only added devices
to "around" which had "mount->just_mounted || mount->just_changed",
which is obviously not the case for mounts which have been around for
a while (and these are the ones we need to keep!). This patch fixes
that; although I'm unsure yet whether we *also* need to record the ones
with "mount->just_mounted || mount->just_changed"; my gut feeling says
yes, and we need to restructure this a bit to run the set_put(around, ...)
for all mount->is_mounted Mount units. WDYT?

Anyway, with this it behaves as it should.

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

systemd github import bot

2015-05-19 14:05:45 UTC

Patchset imported to github.
Pull request:
<https://github.com/systemd-devs/systemd/compare/master...systemd-mailing-devs:20150519140500.GK3222%40piware.de>

--
Generated by https://github.com/haraldh/mail2git

Lennart Poettering

2015-05-19 15:23:39 UTC

Post by Martin Pitt
That was a red herring. Turns out that your patch only added devices
to "around" which had "mount->just_mounted || mount->just_changed",
which is obviously not the case for mounts which have been around for
a while (and these are the ones we need to keep!). This patch fixes
that; although I'm unsure yet whether we *also* need to record the ones
with "mount->just_mounted || mount->just_changed"; my gut feeling says
yes, and we need to restructure this a bit to run the set_put(around, ...)
for all mount->is_mounted Mount units. WDYT?

Yes, we need to also record those which were just mounted or changed,
in case systemd rereads /proc/self/mountinfo at a time where two
mounts where created at once from the same device.

I now committed a change to implement that, can you check if this
fixes the issue for you?

Thanks!

Lennart

--
Lennart Poettering, Red Hat

Martin Pitt

2015-05-19 15:31:00 UTC

Hey Lennart,

Post by Lennart Poettering
Yes, we need to also record those which were just mounted or changed,
in case systemd rereads /proc/self/mountinfo at a time where two
mounts where created at once from the same device.

OK, as I suspected.

Post by Lennart Poettering
I now committed a change to implement that, can you check if this
fixes the issue for you?

It does, and it looks much more straightforward now.

Many thanks for your patience, this was a hairy thing to untangle..

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Lennart Poettering

2015-05-19 15:46:22 UTC

Post by Martin Pitt
Hey Lennart,

Post by Lennart Poettering
Yes, we need to also record those which were just mounted or changed,
in case systemd rereads /proc/self/mountinfo at a time where two
mounts where created at once from the same device.

OK, as I suspected.

Post by Lennart Poettering
I now committed a change to implement that, can you check if this
fixes the issue for you?

It does, and it looks much more straightforward now.

Perfect! Thanks for tracking this down and keeping it up even when I
was too dumb to understand what the issue was!

Thanks!

Lennart

--
Lennart Poettering, Red Hat

Umut Tezduyar Lindskog

2015-05-19 13:43:07 UTC

On Tue, May 19, 2015 at 1:56 PM, Lennart Poettering

Post by Lennart Poettering

Post by Martin Pitt
Hello all,
I have a better fix now.

So the problem is that this tentative → dead transition only works if
a device is referenced once, but causes overzealous unmounts if there
are more references.

Ah, indeed, that makes sense!

Post by Martin Pitt

We need a reference count, or check if the device is bound by other
Id=dev-foo.device
BoundBy=tmp-boot.mount tmp-etc.mount
But my patch doesn't take that into account yet.

This one does now. I left a fairly comprehensive commit log as this
isn't that easy to explain/understand. If it's too verbose for you I
can also trim it back a bit.

I have now committed a different fix now, that keeps counting of the
mount points in mount.c, instead of "reaching over" from device.c.
I only gave this light testing, would be cool if you could check if
this fixes things for you.
http://cgit.freedesktop.org/systemd/systemd/commit/?id=fcd8b266edf0df2b85079fcf7b099cd4028740e6
This commit will now collect two sets of devices while going through
/proc/self/mountinfo: the devices of lines that are no longer there,
and the devices of lines that are there. Only for devices in the
former set that are not in the latter we will now propagate an event
to device.c.
Does this make sense?

Post by Martin Pitt
+ /* If we found a tentative device via mountinfo which is still
+ * referenced by other mounts than the one that just got unmounted, we
+ * must leave it FOUND_MOUNT/tentative; otherwise we'd trigger
+ * unmounting of all other bound mounts. */
+ if (d->found == DEVICE_FOUND_MOUNT && n == 0) {
+ Iterator i;
+ Unit *bound;
+
+ SET_FOREACH(bound, UNIT(d)->dependencies[UNIT_BOUND_BY], i) {
+ if (bound->type == UNIT_MOUNT &&
+ MOUNT(bound)->state != MOUNT_DEAD && MOUNT(bound)->state != MOUNT_FAILED) {
+ log_unit_debug(UNIT(d), "Still bound by %s (%s), keeping state.",
+ bound->id, mount_state_to_string(MOUNT(bound)->state));
+ return;
+ }
+ }
+ log_unit_debug(UNIT(d), "Not referenced by any mounts any more, changing to dead.");
+ }
+
previous = d->found;
d->found = n;

Lennart

Didn't work for me. Removed device doesn't go in to either cases, so
it never makes it to the "around" set but it makes it to "gone" set.

if (!mount->is_mounted)
{

}
else if (mount->just_mounted || mount->just_changed)
{

}

Umut

Post by Lennart Poettering
--
Lennart Poettering, Red Hat
_______________________________________________
systemd-devel mailing list
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Lennart Poettering

2015-05-17 16:06:21 UTC

Post by Martin Pitt
Hey Lennart,

Post by Lennart Poettering

Post by Martin Pitt
As I mentioned before, simply ignoring /dev/root doesn't help in all
cases, and hardcoding it in the code is a bit ugly.

It doesn't help in all cases? Which ones? Can you elaborate?

It doesn't seem to help at all in e. g. LXC.

Sounds like borkage in LXC.

Please ask LXC to follow these guidelines:

https://wiki.freedesktop.org/www/Software/systemd/ContainerInterface/

If you follow these guidelines with your container software systemd
will work fine. If you don't, then you are on your own.

More specifically, they should follow the second item in the
"Execution Environment" section: pre-mount /sys read-only in the
container. Whether device management is available is detected by
systemd by checking if /sys is writable. If it is writable it is
assumed that the full device logic is available. This results in udev
being started, .device units in systemd are made available and .mount
units get dependencies on them.

If LXC mounts /sys read-only however, then udev does not get started,
no .device units are available, and no dependencies create on
them. And all should work as intended.

Lennart

--
Lennart Poettering, Red Hat

Martin Pitt

2015-05-18 04:41:37 UTC

Post by Lennart Poettering
More specifically, they should follow the second item in the
"Execution Environment" section: pre-mount /sys read-only in the
container.

That's the default indeed, but you can configure it otherwise. While
that might be questionable, it's just one way to exhibit the bugs, as
Dimitri's plan9 example shows there are other cases where file systems
aren't on a "real" /dev/ device.

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Lennart Poettering

2015-05-18 12:10:53 UTC

Post by Martin Pitt

Post by Lennart Poettering
More specifically, they should follow the second item in the
"Execution Environment" section: pre-mount /sys read-only in the
container.

That's the default indeed, but you can configure it otherwise. While
that might be questionable, it's just one way to exhibit the bugs, as
Dimitri's plan9 example shows there are other cases where file systems
aren't on a "real" /dev/ device.

I don't really grok what the problem you are experencing is supposed
to be: note that a device showing up in /proc/self/mountinfo means it
will be set to "tentative" state, and thus will not resolve in an
unmount. What more do you need?

The whole point of the "tentative" state is that devices showing up in
/proc/self/mountinfo but not in /sys are put in it. Are you saying
that does not work?

Generally, dependencies should not be generated depending on
state. Dependencies should be static as much as possible, and
sometimes augmented as we load in more units, but the very clear focus
needs to be to keep them as static as possible. Acting on them should
be dynamic however, and hence our state engines for the units should
be optimized to reflect the unit state as fine-grained as necessary.

I don't understand your patch I must say, but from what I grok it
appears to make deps dynamic, based on state, and that's something I
really don't like.

Lennart

--
Lennart Poettering, Red Hat

Martin Pitt

2015-05-18 12:37:42 UTC

Hello Lennart,

Post by Lennart Poettering
I don't really grok what the problem you are experencing is supposed
to be: note that a device showing up in /proc/self/mountinfo means it
will be set to "tentative" state, and thus will not resolve in an
unmount. What more do you need?

Exactly, that's what these patches do.

Post by Lennart Poettering
The whole point of the "tentative" state is that devices showing up in
/proc/self/mountinfo but not in /sys are put in it. Are you saying
that does not work?

Right. As I said, the problem is that when /mountinfo is being read,
.device units (with state tentative/found == MOUNT) are not actually
being created, despite the comment saying so. They are created later,
rather unintentionally, in unit_add_node_link() which calls
manager_load_unit(); but the latter always just creates stubs with
state "dead", and never sets any proper state.

Post by Lennart Poettering
Generally, dependencies should not be generated depending on state.
Dependencies should be static as much as possible, and sometimes
augmented as we load in more units, but the very clear focus needs
to be to keep them as static as possible. Acting on them should be
dynamic however, and hence our state engines for the units should be
optimized to reflect the unit state as fine-grained as necessary.

Fully agreed.

Post by Lennart Poettering
I don't understand your patch I must say, but from what I grok it
appears to make deps dynamic, based on state, and that's something I
really don't like.

No, it just fixes the creation of the .device units to happen at the
right and intended time (when we actually have a proper state), and
thus have the intended state "tentative" instead of "dead".

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Martin Pitt

2015-05-18 13:51:47 UTC

Post by Lennart Poettering
The whole point of the "tentative" state is that devices showing up in
/proc/self/mountinfo but not in /sys are put in it. Are you saying
that does not work?

Simple demonstration with some bind mount:

# systemd-nspawn -b -D /tmp/myroot --bind /etc/cron.daily/

with current systemd you'll get

dev-sda3.device loaded inactive dead dev-sda3.device

and with the v2 patch you get

dev-sda3.device loaded activating tentative dev-sda3.device

which is what's intended, right? (This simple setup doesn't reproduce
the overzealous unmounting yet, but it shows the wrong state)

Thanks,

Martin

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

33 Replies
149 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Dimitri John Ledkov 2015-05-13 11:48:14 UTC

Martin Pitt 2015-05-13 11:53:19 UTC

Dimitri John Ledkov 2015-05-13 12:10:27 UTC

Lennart Poettering 2015-05-13 12:43:49 UTC

Dimitri John Ledkov 2015-05-13 13:20:28 UTC

Martin Pitt 2015-05-14 10:51:37 UTC

Andrei Borzenkov 2015-05-14 11:24:13 UTC

Martin Pitt 2015-05-18 09:50:32 UTC

Lennart Poettering 2015-05-14 16:09:07 UTC

Martin Pitt 2015-05-17 11:02:35 UTC

Martin Pitt 2015-05-17 13:54:03 UTC

systemd github import bot 2015-05-17 14:05:11 UTC

Martin Pitt 2015-05-18 14:08:51 UTC

Lennart Poettering 2015-05-18 20:57:34 UTC

Martin Pitt 2015-05-19 06:29:16 UTC

systemd github import bot 2015-05-19 07:05:41 UTC

Lennart Poettering 2015-05-19 11:03:19 UTC

Lennart Poettering 2015-05-18 21:04:37 UTC

Martin Pitt 2015-05-19 06:59:17 UTC

Martin Pitt 2015-05-19 08:23:46 UTC

systemd github import bot 2015-05-19 09:02:38 UTC

Lennart Poettering 2015-05-19 11:56:17 UTC

Martin Pitt 2015-05-19 13:17:30 UTC

Martin Pitt 2015-05-19 14:05:00 UTC

systemd github import bot 2015-05-19 14:05:45 UTC

Lennart Poettering 2015-05-19 15:23:39 UTC

Martin Pitt 2015-05-19 15:31:00 UTC

Lennart Poettering 2015-05-19 15:46:22 UTC

Umut Tezduyar Lindskog 2015-05-19 13:43:07 UTC

Lennart Poettering 2015-05-17 16:06:21 UTC

Martin Pitt 2015-05-18 04:41:37 UTC

Lennart Poettering 2015-05-18 12:10:53 UTC

Martin Pitt 2015-05-18 12:37:42 UTC

Martin Pitt 2015-05-18 13:51:47 UTC

about - legalese

Loading...