Discussion:
timed out waiting for device dev-disk-by\x2duuid
(too old to reply)
Chris Murphy
2014-05-13 02:48:03 UTC
Permalink
Two device Btrfs volume, with one device missing (simulated) will not boot, even with rootflags=degraded set which is currently required to enable Btrfs degraded mounts. Upon reaching a dracut shell after basic.target fails with time out, I can mount -o subvol=root,degraded and exit and continue boot normally with just the single device.

The problem seems to be that systemd (udev?) is not finding the volume by uuid for some reason, and therefore not attempting to mount it. But I don't know why it can't find it, or even how the find by uuid mechanism works this early in boot. So I'm not sure if this is a systemd or udev bug, or a dracut, or kernel bug.

The problem happens with systemd 208-9.fc20 with kernel 3.11.10-301.fc20, and systemd 212-4.fc21 and kernel 3.15.0-0.rc5.git0.1.fc21.


Details:

[ 0.000000] localhost.localdomain kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-3.11.10-301.fc20.x86_64 root=UUID=c27871d6-5dbe-4d32-9ba9-03b425076da4 ro rootflags=subvol=root,degraded elevator=deadline systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M


Partial of rdsosreport.txt from the above boot which has systemd logging set to debug.


[ 1.466282] systemd[1]: Reached target Basic System.
[ 1.466362] systemd[1]: Accepted connection on private bus.
[ 1.466808] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent
[ 1.466898] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
[ 91.089693] systemd[1]: Job dev-disk-by\x2duuid-c27871d6\x2d5dbe\x2d4d32\x2d9ba9\x2d03b425076da4.device/start timed out.
[ 91.098727] systemd[1]: Job dev-disk-by\x2duuid-c27871d6\x2d5dbe\x2d4d32\x2d9ba9\x2d03b425076da4.device/start finished, result=timeout
[ 91.099205] systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-c27871d6\x2d5dbe\x2d4d32\x2d9ba9\x2d03b425076da4.device.
[ 91.099227] systemd[1]: Job sysroot.mount/start finished, result=dependency
[ 91.101537] systemd[1]: Dependency failed for /sysroot.
[ 91.101548] systemd[1]: Job initrd-root-fs.target/start finished, result=dependency
[ 91.102482] systemd[1]: Dependency failed for Initrd Root File System.
[ 91.102497] systemd[1]: Job initrd-parse-etc.service/start finished, result=dependency
[ 91.104125] systemd[1]: Dependency failed for Reload Configuration from the Real Root.
[ 91.104135] systemd[1]: Triggering OnFailure= dependencies of initrd-parse-etc.service.
[ 91.104141] systemd[1]: Trying to enqueue job emergency.target/start/replace
[ 91.104165] systemd[1]: Installed new job emergency.target/start as 41
[ 91.104170] systemd[1]: Installed new job emergency.service/start as 42
[ 91.104174] systemd[1]: Installed new job sysinit.target/stop as 45
[ 91.104179] systemd[1]: Installed new job basic.target/stop as 46
[ 91.104183] systemd[1]: Job initrd.target/start finished, result=canceled
[ 91.104196] systemd[1]: Closed jobs progress timerfd.
[ 91.104200] systemd[1]: Installed new job initrd.target/stop as 47
[ 91.104205] systemd[1]: Job dracut-mount.service/start finished, result=canceled
[ 91.104272] systemd[1]: Installed new job dracut-mount.service/stop as 48
[ 91.104278] systemd[1]: Job dracut-pre-pivot.service/start finished, result=canceled
[ 91.104306] systemd[1]: Installed new job dracut-pre-pivot.service/stop as 49
[ 91.104310] systemd[1]: Enqueued job emergency.target/start as 41
[ 91.104317] systemd[1]: Triggering OnFailure= dependencies of initrd-root-fs.target.
[ 91.104322] systemd[1]: Trying to enqueue job emergency.target/start/isolate
[ 91.104365] systemd[1]: Job initrd.target/stop finished, result=canceled
[ 91.104371] systemd[1]: Job dracut-mount.service/stop finished, result=canceled
[ 91.104376] systemd[1]: Job dracut-pre-pivot.service/stop finished, result=canceled
[ 91.104382] systemd[1]: Merged into installed job emergency.target/start as 41
[ 91.104387] systemd[1]: Merged into installed job emergency.service/start as 42
[ 91.104391] systemd[1]: Merged into installed job sysinit.target/stop as 45
[ 91.104396] systemd[1]: Merged into installed job basic.target/stop as 46
[ 91.104400] systemd[1]: Installed new job local-fs.target/stop as 61
[ 91.104406] systemd[1]: Installed new job system.slice/stop as 62
[ 91.104411] systemd[1]: Installed new job slices.target/stop as 63
[ 91.104415] systemd[1]: Installed new job -.slice/stop as 64
[ 91.104419] systemd[1]: Installed new job sockets.target/stop as 65
[ 91.104424] systemd[1]: Installed new job systemd-journald.service/stop as 66
[ 91.104429] systemd[1]: Installed new job systemd-udevd-kernel.socket/stop as 67
[ 91.104434] systemd[1]: Installed new job systemd-udevd-control.socket/stop as 68
[ 91.104439] systemd[1]: Installed new job systemd-udevd.service/stop as 69
[ 91.104443] systemd[1]: Installed new job swap.target/stop as 70
[ 91.104448] systemd[1]: Installed new job systemd-tmpfiles-setup-dev.service/stop as 71
[ 91.104453] systemd[1]: Installed new job kmod-static-nodes.service/stop as 72
[ 91.104457] systemd[1]: Installed new job systemd-udev-trigger.service/stop as 73
[ 91.104461] systemd[1]: Installed new job plymouth-start.service/stop as 74
[ 91.104465] systemd[1]: Installed new job systemd-ask-password-plymouth.path/stop as 75
[ 91.104469] systemd[1]: Installed new job timers.target/stop as 76
[ 91.104473] systemd[1]: Installed new job paths.target/stop as 77
[ 91.104477] systemd[1]: Job initrd-fs.target/start finished, result=canceled
[ 91.104482] systemd[1]: Installed new job initrd-fs.target/stop as 78
[ 91.104486] systemd[1]: Installed new job dracut-initqueue.service/stop as 79
[ 91.104490] systemd[1]: Enqueued job emergency.target/start as 41
[ 91.104530] systemd[1]: Collecting dracut-pre-pivot.service
[ 91.104535] systemd[1]: Collecting initrd-cleanup.service
[ 91.104558] systemd[1]: Collecting dracut-mount.service
[ 91.104562] systemd[1]: Collecting initrd.target
[ 91.104645] systemd[1]: Collecting rescue.target
[ 91.104649] systemd[1]: Collecting rescue.service
[ 91.104654] systemd[1]: Collecting dracut-cmdline.service
[ 91.104658] systemd[1]: Collecting dracut-pre-trigger.service
[ 91.104662] systemd[1]: Collecting dracut-pre-udev.service
[ 91.104666] systemd[1]: Collecting cryptsetup.target
[ 91.104712] systemd[1]: Stopping dracut initqueue hook...
[ 91.104718] systemd[1]: dracut-initqueue.service changed exited -> dead
[ 91.104729] systemd[1]: Job dracut-initqueue.service/stop finished, result=done
[ 91.104736] systemd[1]: Stopped dracut initqueue hook.
[ 91.104776] systemd[1]: Job initrd-fs.target/stop finished, result=done
[ 91.104783] systemd[1]: Stopped target Initrd File Systems.
[ 91.104791] systemd[1]: Stopping Basic System.
[ 91.104796] systemd[1]: basic.target changed active -> dead
[ 91.104806] systemd[1]: Job basic.target/stop finished, result=done
[ 91.104812] systemd[1]: Stopped target Basic System.


The complete rdsosreport.txt

https://drive.google.com/file/d/0B_2Asp8DGjJ9MWctT3JfdlQ1S0k/edit?usp=sharing


Chris Murphy
poma
2014-05-13 11:13:41 UTC
Permalink
On 13.05.2014 04:48, Chris Murphy wrote:
...
Post by Chris Murphy
The problem seems to be that systemd (udev?) is not finding the volume by uuid for some reason, and therefore not attempting to mount it. But I don't know why it can't find it, or even how the find by uuid mechanism works this early in boot. So I'm not sure if this is a systemd or udev bug, or a dracut, or kernel bug.
The problem happens with systemd 208-9.fc20 with kernel 3.11.10-301.fc20, and systemd 212-4.fc21 and kernel 3.15.0-0.rc5.git0.1.fc21.
...

[ ***] A start job is running for device dev-disk-by\x2duuid-...
loop loop loop ...
where's my cup of hot water with a bit of milk ...
afterwards a drop to the dracut hell.
Bare metal, no butter-tree.

I've encountered on this also, however afterward with
kernel-3.15.0-0.rc5.git1.1.fc21.x86_64 &
systemd-212-9.git9ae7021.20140512.fc21.x86_64 no more. :)


poma
poma
2014-05-16 08:18:20 UTC
Permalink
Here's an example;

$ cat /etc/fstab
UUID=8c618270-30ae-4a18-a921-1d99034c35a5 / ext4 defaults 1 1
UUID=c40ada21-740e-49d9-bbd1-2c2a7c10b028 /boot ext4 defaults 1 2
UUID=85e74fda-7354-4384-8baf-4338e84b9ebe swap swap defaults 0 0

# journalctl -b -x
...
:04:10 kernel: Kernel command line: BOOT_IMAGE=/vmlinuz-3.15.0-0.rc5.git2.8.fc21.x86_64 root=UUID=8c618270-30ae-4a18-a921-1d99034c35a5 ... initrd=/initramfs-3.15.0-0.rc5.git2.8.fc21.x86_64.img
...
:04:10 systemd[1]: Expecting device dev-disk-by\x2duuid-8c618270\x2d30ae\x2d4a18\x2da921\x2d1d99034c35a5.device...
...
:04:11 systemd[1]: Found device SAMSUNG_HD103SJ 3.
-- Subject: Unit dev-disk-by\x2duuid-8c618270\x2d30ae\x2d4a18\x2da921\x2d1d99034c35a5.device has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit dev-disk-by\x2duuid-8c618270\x2d30ae\x2d4a18\x2da921\x2d1d99034c35a5.device has finished starting up.
--
-- The start-up result is done.
:04:11 systemd[1]: Starting File System Check on /dev/disk/by-uuid/8c618270-30ae-4a18-a921-1d99034c35a5...
-- Subject: Unit systemd-***@dev-disk-by\x2duuid-8c618270\x2d30ae\x2d4a18\x2da921\x2d1d99034c35a5.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-***@dev-disk-by\x2duuid-8c618270\x2d30ae\x2d4a18\x2da921\x2d1d99034c35a5.service has begun starting up.
...
:04:12 systemd-fsck[259]: /dev/sda3: clean, 1133241/6627328 files, 12143849/26479136 blocks
:04:12 kernel: fsck (260) used greatest stack depth: 4592 bytes left
:04:12 systemd[1]: Started File System Check on /dev/disk/by-uuid/8c618270-30ae-4a18-a921-1d99034c35a5.
-- Subject: Unit systemd-***@dev-disk-by\x2duuid-8c618270\x2d30ae\x2d4a18\x2da921\x2d1d99034c35a5.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-***@dev-disk-by\x2duuid-8c618270\x2d30ae\x2d4a18\x2da921\x2d1d99034c35a5.service has finished starting up.
--
-- The start-up result is done.
:04:12 systemd[1]: Mounting /sysroot...
-- Subject: Unit sysroot.mount has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit sysroot.mount has begun starting up.
:04:12 kernel: EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null)
:04:12 systemd[1]: Mounted /sysroot.
-- Subject: Unit sysroot.mount has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit sysroot.mount has finished starting up.
--
-- The start-up result is done.

:04:12 kernel: mount (273) used greatest stack depth: 4096 bytes left
:04:12 systemd[1]: Starting Initrd Root File System.
-- Subject: Unit initrd-root-fs.target has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit initrd-root-fs.target has begun starting up.
:04:12 systemd[1]: Reached target Initrd Root File System.
-- Subject: Unit initrd-root-fs.target has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit initrd-root-fs.target has finished starting up.
--
-- The start-up result is done.
:04:12 systemd[1]: Starting Reload Configuration from the Real Root...
-- Subject: Unit initrd-parse-etc.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit initrd-parse-etc.service has begun starting up.
:04:12 systemd[1]: Reloading.
:04:12 kernel: systemd-fstab-g (281) used greatest stack depth: 4032 bytes left
:04:12 systemd[1]: Started Reload Configuration from the Real Root.
-- Subject: Unit initrd-parse-etc.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit initrd-parse-etc.service has finished starting up.
--
-- The start-up result is done.
:04:12 systemd[1]: Starting Initrd File Systems.
-- Subject: Unit initrd-fs.target has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit initrd-fs.target has begun starting up.
:04:12 systemd[1]: Reached target Initrd File Systems.
-- Subject: Unit initrd-fs.target has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit initrd-fs.target has finished starting up.
--
-- The start-up result is done.
...
:04:24 kernel: systemd-udevd (400) used greatest stack depth: 3200 bytes left
...
:05:45 systemd[1]: Job dev-disk-by\x2duuid-85e74fda\x2d7354\x2d4384\x2d8baf\x2d4338e84b9ebe.device/start timed out.
:05:45 systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-85e74fda\x2d7354\x2d4384\x2d8baf\x2d4338e84b9ebe.device.
-- Subject: Unit dev-disk-by\x2duuid-85e74fda\x2d7354\x2d4384\x2d8baf\x2d4338e84b9ebe.device has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit dev-disk-by\x2duuid-85e74fda\x2d7354\x2d4384\x2d8baf\x2d4338e84b9ebe.device has failed.
--
-- The result is timeout.
:05:45 systemd[1]: Dependency failed for /dev/disk/by-uuid/85e74fda-7354-4384-8baf-4338e84b9ebe.
-- Subject: Unit dev-disk-by\x2duuid-85e74fda\x2d7354\x2d4384\x2d8baf\x2d4338e84b9ebe.swap has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit dev-disk-by\x2duuid-85e74fda\x2d7354\x2d4384\x2d8baf\x2d4338e84b9ebe.swap has failed.
--
-- The result is dependency.
:05:45 systemd[1]: Dependency failed for Swap.
-- Subject: Unit swap.target has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit swap.target has failed.
--
-- The result is dependency.
:05:45 systemd[1]: Job dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.device/start timed out.
:05:45 systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.device.
-- Subject: Unit dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.device has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.device has failed.
--
-- The result is timeout.
:05:45 systemd[1]: Dependency failed for /boot.
-- Subject: Unit boot.mount has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit boot.mount has failed.
--
-- The result is dependency.
:05:45 systemd[1]: Dependency failed for Local File Systems.
-- Subject: Unit local-fs.target has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit local-fs.target has failed.
--
-- The result is dependency.
...
:05:45 systemd[1]: Triggering OnFailure= dependencies of local-fs.target.
:05:45 systemd[1]: Dependency failed for File System Check on /dev/disk/by-uuid/c40ada21-740e-49d9-bbd1-2c2a7c10b028.
-- Subject: Unit systemd-***@dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-***@dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.service has failed.
--
-- The result is dependency.
...
:05:45 systemd[1]: Starting Emergency Shell...
-- Subject: Unit emergency.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit emergency.service has begun starting up.
:05:45 systemd[1]: Started Emergency Shell.
-- Subject: Unit emergency.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit emergency.service has finished starting up.
--
-- The start-up result is done.

:05:45 systemd[1]: Starting Emergency Mode.
-- Subject: Unit emergency.target has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit emergency.target has begun starting up.
:05:45 systemd[1]: Reached target Emergency Mode.
-- Subject: Unit emergency.target has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit emergency.target has finished starting up.
--
-- The start-up result is done.
...
:05:53 systemd[1]: Startup finished in 1.315s (kernel) + 2.882s (initrd) + 1min 39.528s (userspace) = 1min 43.726s.
-- Subject: System start-up is now complete
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- All system services necessary queued for starting at boot have been
-- successfully started. Note that this does not mean that the machine is
-- now idle as services might still be busy with completing start-up.
--
-- Kernel start-up required 1315394 microseconds.
--
-- Initial RAM disk start-up required 2882579 microseconds.
--
-- Userspace start-up required 99528761 microseconds.


poma
poma
2014-05-16 11:26:26 UTC
Permalink
And this is the output when working properly;

[ 1.509163] localhost.localdomain systemd[1]: Expecting device dev-disk-by\x2duuid-8c618270\x2d30ae\x2d4a18\x2da921\x2d1d99034c35a5.device...
-- Subject: Unit dev-disk-by\x2duuid-8c618270\x2d30ae\x2d4a18\x2da921\x2d1d99034c35a5.device has finished start-up
-- Unit dev-disk-by\x2duuid-8c618270\x2d30ae\x2d4a18\x2da921\x2d1d99034c35a5.device has finished starting up.
[ 2.694743] localhost.localdomain systemd[1]: Starting File System Check on /dev/disk/by-uuid/8c618270-30ae-4a18-a921-1d99034c35a5...
-- Subject: Unit systemd-***@dev-disk-by\x2duuid-8c618270\x2d30ae\x2d4a18\x2da921\x2d1d99034c35a5.service has begun with start-up
-- Unit systemd-***@dev-disk-by\x2duuid-8c618270\x2d30ae\x2d4a18\x2da921\x2d1d99034c35a5.service has begun starting up.
[ 2.871497] localhost.localdomain systemd[1]: Started File System Check on /dev/disk/by-uuid/8c618270-30ae-4a18-a921-1d99034c35a5.
-- Subject: Unit systemd-***@dev-disk-by\x2duuid-8c618270\x2d30ae\x2d4a18\x2da921\x2d1d99034c35a5.service has finished start-up
-- Unit systemd-***@dev-disk-by\x2duuid-8c618270\x2d30ae\x2d4a18\x2da921\x2d1d99034c35a5.service has finished starting up.
[ 6.974099] localhost.localdomain systemd[1]: Expecting device dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.device...
-- Subject: Unit dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.device has begun with start-up
-- Unit dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.device has begun starting up.
-- Subject: Unit dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.device has finished start-up
-- Unit dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.device has finished starting up.
[ 11.609924] localhost.localdomain systemd[1]: Starting File System Check on /dev/disk/by-uuid/c40ada21-740e-49d9-bbd1-2c2a7c10b028...
-- Subject: Unit systemd-***@dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.service has begun with start-up
-- Unit systemd-***@dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.service has begun starting up.
-- Subject: Unit dev-disk-by\x2duuid-85e74fda\x2d7354\x2d4384\x2d8baf\x2d4338e84b9ebe.device has finished start-up
-- Unit dev-disk-by\x2duuid-85e74fda\x2d7354\x2d4384\x2d8baf\x2d4338e84b9ebe.device has finished starting up.
[ 11.843937] localhost.localdomain systemd[1]: Activating swap /dev/disk/by-uuid/85e74fda-7354-4384-8baf-4338e84b9ebe...
-- Subject: Unit dev-disk-by\x2duuid-85e74fda\x2d7354\x2d4384\x2d8baf\x2d4338e84b9ebe.swap has begun with start-up
-- Unit dev-disk-by\x2duuid-85e74fda\x2d7354\x2d4384\x2d8baf\x2d4338e84b9ebe.swap has begun starting up.
[ 12.116326] localhost.localdomain systemd[1]: Activated swap /dev/disk/by-uuid/85e74fda-7354-4384-8baf-4338e84b9ebe.
-- Subject: Unit dev-disk-by\x2duuid-85e74fda\x2d7354\x2d4384\x2d8baf\x2d4338e84b9ebe.swap has finished start-up
-- Unit dev-disk-by\x2duuid-85e74fda\x2d7354\x2d4384\x2d8baf\x2d4338e84b9ebe.swap has finished starting up.
[ 12.545564] localhost.localdomain systemd[1]: Started File System Check on /dev/disk/by-uuid/c40ada21-740e-49d9-bbd1-2c2a7c10b028.
-- Subject: Unit systemd-***@dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.service has finished start-up
-- Unit systemd-***@dev-disk-by\x2duuid-c40ada21\x2d740e\x2d49d9\x2dbbd1\x2d2c2a7c10b028.service has finished starting up.


poma
Lennart Poettering
2014-05-15 17:29:55 UTC
Permalink
Post by Chris Murphy
Two device Btrfs volume, with one device missing (simulated) will not
boot, even with rootflags=degraded set which is currently required to
enable Btrfs degraded mounts. Upon reaching a dracut shell after
basic.target fails with time out, I can mount -o subvol=root,degraded
and exit and continue boot normally with just the single device.
The problem seems to be that systemd (udev?) is not finding the volume
by uuid for some reason, and therefore not attempting to mount it. But
I don't know why it can't find it, or even how the find by uuid
mechanism works this early in boot. So I'm not sure if this is a
systemd or udev bug, or a dracut, or kernel bug.
The problem happens with systemd 208-9.fc20 with kernel
3.11.10-301.fc20, and systemd 212-4.fc21 and kernel
3.15.0-0.rc5.git0.1.fc21.
As soon as btrfs reports that a file system is ready, systemd will pick
it up. This is handled with the "btrfs" udev built-in, and invoked via
/usr/lib/udev/rules.d/64-btrfs.rules. rootflags has no influence on
that, as at that point it is not clear whether the block device will be
the once that carries the root file system, or any other file system.

Not sure what we should be doing about this. Maybe introduce a new
btrfs=degraded switch that acts globally, and influences the udev built-in?

Kay?

Lennart
--
Lennart Poettering, Red Hat
Lennart Poettering
2014-05-15 18:16:18 UTC
Permalink
Post by Lennart Poettering
Post by Chris Murphy
Two device Btrfs volume, with one device missing (simulated) will not
boot, even with rootflags=degraded set which is currently required to
enable Btrfs degraded mounts. Upon reaching a dracut shell after
basic.target fails with time out, I can mount -o subvol=root,degraded
and exit and continue boot normally with just the single device.
The problem seems to be that systemd (udev?) is not finding the volume
by uuid for some reason, and therefore not attempting to mount it. But
I don't know why it can't find it, or even how the find by uuid
mechanism works this early in boot. So I'm not sure if this is a
systemd or udev bug, or a dracut, or kernel bug.
The problem happens with systemd 208-9.fc20 with kernel
3.11.10-301.fc20, and systemd 212-4.fc21 and kernel
3.15.0-0.rc5.git0.1.fc21.
As soon as btrfs reports that a file system is ready, systemd will pick
it up. This is handled with the "btrfs" udev built-in, and invoked via
/usr/lib/udev/rules.d/64-btrfs.rules. rootflags has no influence on
that, as at that point it is not clear whether the block device will be
the once that carries the root file system, or any other file system.
Not sure what we should be doing about this. Maybe introduce a new
btrfs=degraded switch that acts globally, and influences the udev built-in?
Kay?
So, as it turns out there's no kernel APi available to check whether a
btrfs raid array is now complete enough for degraded mode to
succeed. There's only a way to check whether it is fully complete.

And even if we had an API for this, how would this even work at all? I
mean, just having a catchall switch to boot in degraded mode is really
dangerous if people have more than one array and we might end up
mounting an fs in degraded mode that actually is fully available if we
just waited 50ms longer...

I mean this is even the problem with just one array: if you have
redundancy of 3 disks, when do you start mounting the thing when
degraded mode is requested on the kernel command line? as soon as
degrdaded mounting is possible (thus fucking up possible all 3 disks
that happened to show up last), or later?

I have no idea how this all should work really, it's a giant mess. There
probably needs to be some btrfs userspace daemon thing that watches
btrfs arrays and does some magic if they timeout.

But for now I am pretty sure we should just leave everything in fully
manual mode, that's the safest thing to do...

Lennart
--
Lennart Poettering, Red Hat
Goffredo Baroncelli
2014-05-15 20:57:13 UTC
Permalink
[...]
Post by Lennart Poettering
So, as it turns out there's no kernel APi available to check whether a
btrfs raid array is now complete enough for degraded mode to
succeed. There's only a way to check whether it is fully complete.
And even if we had an API for this, how would this even work at all?
In what this should be different than the normal RAID system ?

In both case there are two timeout: the first one is for waiting the full system, the second one is for the minimal set of disks to a degraded mode. If even the second timeout is passed, then we should consider the filesystem not build-able.

How it is handle for the RAID system ? Knowing that we should consider to apply the same strategies fro btrfs (may be we need some userspace tool to do that)
Post by Lennart Poettering
mean, just having a catchall switch to boot in degraded mode is really
dangerous if people have more than one array and we might end up
mounting an fs in degraded mode that actually is fully available if we
just waited 50ms longer...
I mean this is even the problem with just one array: if you have
redundancy of 3 disks, when do you start mounting the thing when
degraded mode is requested on the kernel command line? as soon as
degrdaded mounting is possible (thus fucking up possible all 3 disks
that happened to show up last), or later?
I have no idea how this all should work really, it's a giant mess. There
probably needs to be some btrfs userspace daemon thing that watches
btrfs arrays and does some magic if they timeout.
But for now I am pretty sure we should just leave everything in fully
manual mode, that's the safest thing to do...
Lennart
--
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
Kay Sievers
2014-05-15 21:10:06 UTC
Permalink
On Thu, May 15, 2014 at 10:57 PM, Goffredo Baroncelli
Post by Goffredo Baroncelli
[...]
Post by Lennart Poettering
So, as it turns out there's no kernel APi available to check whether a
btrfs raid array is now complete enough for degraded mode to
succeed. There's only a way to check whether it is fully complete.
And even if we had an API for this, how would this even work at all?
In what this should be different than the normal RAID system ?
In both case there are two timeout: the first one is for waiting the full system, the second one is for the minimal set of disks to a degraded mode. If even the second timeout is passed, then we should consider the filesystem not build-able.
How it is handle for the RAID system ? Knowing that we should consider to apply the same strategies fro btrfs (may be we need some userspace tool to do that)
RAID is not handled by systemd, it is handled by other tools or not at
all. Initrds have some logic here, but nothing convincing, and it is
just the same mess as this.

Kay
Chris Murphy
2014-05-15 21:54:05 UTC
Permalink
Post by Goffredo Baroncelli
[...]
Post by Lennart Poettering
So, as it turns out there's no kernel APi available to check whether a
btrfs raid array is now complete enough for degraded mode to
succeed. There's only a way to check whether it is fully complete.
And even if we had an API for this, how would this even work at all?
In what this should be different than the normal RAID system ?
I think it's because with md, the array assembly is separate from fs mount. I don't know what timeout it uses, but it does do automatic degraded assembly eventually. Once assembled (degraded or normal) then the md device is "ready" and that's when udev rules start to apply and systemd will try to mount root fs.

However on Btrfs, the degraded assembly and fs mount concepts are combined. So without degraded assembly first, it sounds like udev+systemd don't even try to mount the fs.

At least that's my rudimentary understanding.

The udev rule right now is asking if all Btrfs member devices are present and it sounds like that answer is no with a missing device; so a mount isn't even attempted by systemd rather than attempting a degraded mount specifically for the root=UUID device(s).

What is parsing the boot parameters ro, root=, and rootflags=? Are those recognized by the kernel or systemd?
Post by Goffredo Baroncelli
In both case there are two timeout: the first one is for waiting the full system, the second one is for the minimal set of disks to a degraded mode. If even the second timeout is passed, then we should consider the filesystem not build-able.
How it is handle for the RAID system ? Knowing that we should consider to apply the same strategies fro btrfs (may be we need some userspace tool to do that)
Well that sounds like a user space tool to be in the initramfs so it can do this logic before systemd even attempt to mount rootfs. Or it's done by kernel code if it's possible for it to parse root=UUID and only care about the member devices for that volume.


Chris Murphy
Lennart Poettering
2014-05-15 22:37:48 UTC
Permalink
Post by Chris Murphy
The udev rule right now is asking if all Btrfs member devices are
present and it sounds like that answer is no with a missing device; so
a mount isn't even attempted by systemd rather than attempting a
degraded mount specifically for the root=UUID device(s).
Yes, correct.

And my suspicion is that if any more complex logic than this shall take
place, then this should probably be managed by some kind of storage
daemon still to be written, not udev. udev doesn't do timeouts, and has
no clue about raid arrays. If things get this complex there really needs
to be some other component in place that can handle this...
Post by Chris Murphy
What is parsing the boot parameters ro, root=, and rootflags=? Are
those recognized by the kernel or systemd?
When an initrd is used it is the initrd, otherwise the kernel itself.

Lennart
--
Lennart Poettering, Red Hat
Chris Murphy
2014-05-17 23:30:45 UTC
Permalink
[....]
Post by Chris Murphy
The udev rule right now is asking if all Btrfs member devices are
present and it sounds like that answer is no with a missing device;
so a mount isn't even attempted by systemd rather than attempting a
degraded mount specifically for the root=UUID device(s).
Who is in charge to mount the filesystem ?
Ultimately systemd mounts the defined root file system to /sysroot. It knows what volume to mount based on boot parameter root=UUID= but it doesn't even try to mount it until the volume UUID for root fs has appeared. Until then, the mount command isn't even issued.
cat /usr/lib/dracut/modules.d/90btrfs/80-btrfs.rules
SUBSYSTEM!="block", GOTO="btrfs_end"
ACTION!="add|change", GOTO="btrfs_end"
ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end"
RUN+="/sbin/btrfs device scan $env{DEVNAME}"
RUN+="/sbin/initqueue --finished --unique --name btrfs_finished /sbin/btrfs_finished"
LABEL="btrfs_end"
and
cat btrfs_finished.sh
#!/bin/sh
# -*- mode: shell-script; indent-tabs-mode: nil; sh-basic-offset: 4; -*-
# ex: ts=8 sw=4 sts=4 et filetype=sh
type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh
btrfs_check_complete() {
local _rootinfo _dev
_dev="${1:-/dev/root}"
[ -e "$_dev" ] || return 0
_rootinfo=$(udevadm info --query=env "--name=$_dev" 2>/dev/null)
if strstr "$_rootinfo" "ID_FS_TYPE=btrfs"; then
info "Checking, if btrfs device complete"
unset __btrfs_mount
mount -o ro "$_dev" /tmp >/dev/null 2>&1
__btrfs_mount=$?
[ $__btrfs_mount -eq 0 ] && umount "$_dev" >/dev/null 2>&1
return $__btrfs_mount
fi
return 0
}
btrfs_check_complete $1
exit $?
It seems that when a new btrfs device appears, the system attempt to mount it. If it succeed then it is assumed that all devices are present.
No, the system definitely does not attempt to mount it if there's a missing device. Systemd never executes /bin/mount at all in that case. A prerequisite for the mount attempt is this line:

[ 1.621517] localhost.localdomain systemd[1]: dev-disk-by\x2duuid-9ff63135\x2dce42\x2d4447\x2da6de\x2dd7c9b4fb6d66.device changed dead -> plugged

That line only appears if all devices are present. And mount attempt doesn't happen. The system just hangs.

However, if I do an rd.break=pre-mount, and get to a dracut shell this command works:

mount -t btrfs -o subvol=root,ro,degraded -U <uuid>

The volume UUID is definitely present even though not all devices are present. So actually in this case it's confusing why this uuid hasn't gone from dead to plugged. Until it's plugged, the mount command won't happen.
To allow a degraded boot, it should be sufficient replace
mount -o ro "$_dev" /tmp >/dev/null 2>&1
with
OPTS="ro"
grep -q degraded /proc/cmdline && OPTS=",degraded"
mount -o $OPTS "$_dev" /tmp >/dev/null 2>&1
The problem isn't that the degraded mount option isn't being used by systemd. The problem is that systemd isn't changing the device from dead to plugged.

And the problem there is that there are actually four possible states for an array, yet btrfs device ready apparently only distinguishes between 1 and not 1 (i.e. 2, 3, 4).

1. All devices ready.
2. Minimum number of data/metadata devices ready, allow degraded rw mount.
3. Minimum number of data devices not ready, but enough metadata devices are ready, allow degraded ro mount.
4. Minimum number of data/metadata devices not ready, degraded mount not possible.

So I think it's a question for the btrfs list to see what the long term strategy is, in the face of the fact rootflags=degraded alone does not work on systemd systems. Once I'm on 208-16 on Fedora 20, I get the same hang as on Rawhide. So actually I have to force power off, reboot with mount option rd.break=pre-mount, mount the volume manually, and exit twice. And that's fine for me, but it's non-obvious for most users.

The thing to put to the Btrfs list is how are they expecting this to work down the road.

Right now, the way md does this, it doesn't do anything at all. It's actually dracut scripts that check for the existance of the rootfs volume UUID up to 240 times, with an 0.5 sleep between each attempt. After 240 failed attempts, dracut runs mdadm -R which forcibly runs the array with available devices (i.e. degraded assembly), at that moment the volume UUID becomes available, the device goes from dead to plugged, and systemd mounts it. And boot continues normally.

So maybe Btrfs can leverage that same loop used for md degraded booting. But after the loop completes, then what? I don't see how systemd gets informed to use an additional mount option "degraded" conditionally. I think the equivalent for dracut's mdadm -R, for btrfs would be something like 'btrfs device allowdegraded -U <uuid>' to set a state on the volume to permit normal mounts to work. Then the device goes from dead to plugged, and systemd just issues the usual mount command.

*shrug*



Chris Murphy
Chris Murphy
2014-05-18 18:24:25 UTC
Permalink
Post by Chris Murphy
[ 1.621517] localhost.localdomain systemd[1]: dev-disk-by\x2duuid-9ff63135\x2dce42\x2d4447\x2da6de\x2dd7c9b4fb6d66.device changed dead -> plugged
That line only appears if all devices are present. And mount attempt doesn't happen. The system just hangs.
mount -t btrfs -o subvol=root,ro,degraded -U <uuid>
The volume UUID is definitely present even though not all devices are present. So actually in this case it's confusing why this uuid hasn't gone from dead to plugged. Until it's plugged, the mount command won't happen.
2 device Btrfs raid1, sda3 and sdb3. When both are available I get these lines:

[ 2.168697] localhost.localdomain systemd-udevd[109]: creating link '/dev/disk/by-uuid/9ff63135-ce42-4447-a6de-d7c9b4fb6d66' to '/dev/sda3'
[ 2.170232] localhost.localdomain systemd-udevd[135]: creating link '/dev/disk/by-uuid/9ff63135-ce42-4447-a6de-d7c9b4fb6d66' to '/dev/sdb3'

That precipitates systemd changing the by-uuid.device from dead to plugged. If I remove one device, then udev does not create a link from uuid to /dev for the remaining device. Therefore the expected uuid doesn't ever appear to systemd, and thus it doesn't attempt to mount it, and the system hangs indefinitely.




Chris Murphy

Chris Murphy
2014-05-15 20:57:51 UTC
Permalink
Post by Lennart Poettering
Post by Lennart Poettering
Post by Chris Murphy
Two device Btrfs volume, with one device missing (simulated) will not
boot, even with rootflags=degraded set which is currently required to
enable Btrfs degraded mounts. Upon reaching a dracut shell after
basic.target fails with time out, I can mount -o subvol=root,degraded
and exit and continue boot normally with just the single device.
The problem seems to be that systemd (udev?) is not finding the volume
by uuid for some reason, and therefore not attempting to mount it. But
I don't know why it can't find it, or even how the find by uuid
mechanism works this early in boot. So I'm not sure if this is a
systemd or udev bug, or a dracut, or kernel bug.
The problem happens with systemd 208-9.fc20 with kernel
3.11.10-301.fc20, and systemd 212-4.fc21 and kernel
3.15.0-0.rc5.git0.1.fc21.
As soon as btrfs reports that a file system is ready, systemd will pick
it up. This is handled with the "btrfs" udev built-in, and invoked via
/usr/lib/udev/rules.d/64-btrfs.rules. rootflags has no influence on
that, as at that point it is not clear whether the block device will be
the once that carries the root file system, or any other file system.
Not sure what we should be doing about this. Maybe introduce a new
btrfs=degraded switch that acts globally, and influences the udev built-in?
Kay?
So, as it turns out there's no kernel APi available to check whether a
btrfs raid array is now complete enough for degraded mode to
succeed. There's only a way to check whether it is fully complete.
And even if we had an API for this, how would this even work at all? I
mean, just having a catchall switch to boot in degraded mode is really
dangerous if people have more than one array and we might end up
mounting an fs in degraded mode that actually is fully available if we
just waited 50ms longer...
I mean this is even the problem with just one array: if you have
redundancy of 3 disks, when do you start mounting the thing when
degraded mode is requested on the kernel command line? as soon as
degrdaded mounting is possible (thus fucking up possible all 3 disks
that happened to show up last), or later?
I have no idea how this all should work really, it's a giant mess. There
probably needs to be some btrfs userspace daemon thing that watches
btrfs arrays and does some magic if they timeout.
But for now I am pretty sure we should just leave everything in fully
manual mode, that's the safest thing to do…
Is it that the existing udev rule either doesn't know, or doesn't have a way of knowing, that rootflags=degraded should enable only the root=UUID device to bypass the "ready" rule?

Does udev expect a different readiness state to attempt a mount, than a manual mount from dracut shell? I'm confused why the Btrfs volume is "not ready" for systemd which then doesn't even attempt to mount it; and yet at a dracut shell it is ready when I do the mount manually. That seems like two readiness states.

I'd say it's not udev's responsibility, but rather Btrfs kernel code, to make sure things don't get worse with the file system, regardless of what devices it's presented with. At the time it tries to do the mount, it has its own logic for normal and degraded mounts whether the minimum number of devices are present or not and if not it fails. The degraded mount is also per volume, not global.

For example if I remove a device, and boot degraded and work for a few hours making lots of changes (even doing a system update, which is probably insane to do), I can later reboot with the "stale" device attached and Btrfs figures it out, passively. That means it figures out if there's a newer copy when a file is read, and forwards the newest copy to user space, and "fixes" the stale copy on the previously missing device. A manual balance ensures all new files also have redundancy. I think it's intended eventually to have a smarter balance "catch up" filter that can also run automatically in such a case. In any case the file system isn't trashed.



Chris Murphy
Kay Sievers
2014-05-15 21:20:40 UTC
Permalink
Post by Chris Murphy
Post by Lennart Poettering
Post by Lennart Poettering
Post by Chris Murphy
Two device Btrfs volume, with one device missing (simulated) will not
boot, even with rootflags=degraded set which is currently required to
enable Btrfs degraded mounts. Upon reaching a dracut shell after
basic.target fails with time out, I can mount -o subvol=root,degraded
and exit and continue boot normally with just the single device.
The problem seems to be that systemd (udev?) is not finding the volume
by uuid for some reason, and therefore not attempting to mount it. But
I don't know why it can't find it, or even how the find by uuid
mechanism works this early in boot. So I'm not sure if this is a
systemd or udev bug, or a dracut, or kernel bug.
The problem happens with systemd 208-9.fc20 with kernel
3.11.10-301.fc20, and systemd 212-4.fc21 and kernel
3.15.0-0.rc5.git0.1.fc21.
As soon as btrfs reports that a file system is ready, systemd will pick
it up. This is handled with the "btrfs" udev built-in, and invoked via
/usr/lib/udev/rules.d/64-btrfs.rules. rootflags has no influence on
that, as at that point it is not clear whether the block device will be
the once that carries the root file system, or any other file system.
Not sure what we should be doing about this. Maybe introduce a new
btrfs=degraded switch that acts globally, and influences the udev built-in?
Kay?
So, as it turns out there's no kernel APi available to check whether a
btrfs raid array is now complete enough for degraded mode to
succeed. There's only a way to check whether it is fully complete.
And even if we had an API for this, how would this even work at all? I
mean, just having a catchall switch to boot in degraded mode is really
dangerous if people have more than one array and we might end up
mounting an fs in degraded mode that actually is fully available if we
just waited 50ms longer...
I mean this is even the problem with just one array: if you have
redundancy of 3 disks, when do you start mounting the thing when
degraded mode is requested on the kernel command line? as soon as
degrdaded mounting is possible (thus fucking up possible all 3 disks
that happened to show up last), or later?
I have no idea how this all should work really, it's a giant mess. There
probably needs to be some btrfs userspace daemon thing that watches
btrfs arrays and does some magic if they timeout.
But for now I am pretty sure we should just leave everything in fully
manual mode, that's the safest thing to do…
Is it that the existing udev rule either doesn't know, or doesn't have a way of knowing, that rootflags=degraded should enable only the root=UUID device to bypass the "ready" rule?
Does udev expect a different readiness state to attempt a mount, than a manual mount from dracut shell? I'm confused why the Btrfs volume is "not ready" for systemd which then doesn't even attempt to mount it; and yet at a dracut shell it is ready when I do the mount manually. That seems like two readiness states.
The btrfs kernel state has only one state, and that is what udev reacts to.
Post by Chris Murphy
I'd say it's not udev's responsibility, but rather Btrfs kernel code, to make sure things don't get worse with the file system, regardless of what devices it's presented with. At the time it tries to do the mount, it has its own logic for normal and degraded mounts whether the minimum number of devices are present or not and if not it fails. The degraded mount is also per volume, not global.
For example if I remove a device, and boot degraded and work for a few hours making lots of changes (even doing a system update, which is probably insane to do), I can later reboot with the "stale" device attached and Btrfs figures it out, passively. That means it figures out if there's a newer copy when a file is read, and forwards the newest copy to user space, and "fixes" the stale copy on the previously missing device. A manual balance ensures all new files also have redundancy. I think it's intended eventually to have a smarter balance "catch up" filter that can also run automatically in such a case. In any case the file system isn't trashed.
The problem is when to actively force to degrade things when devices
do not show up in time. That is nothing the kernel can know, it would
need to be userspace making that decision. But udev does not really
have that information at that level, it would need to try until the
kernel is satisfied mounting a volume degraded.

This all is probably not a job for udev or systemd, but for a
specialized storage daemon which has explicit configuration/policy in
which way to mess around with the user's data.

This is not an area where we should try to be smart; falling back to
manual intervention from udev's side sounds like the right approach,
looking at the tools we (don't) have at hand at the moment.

Kay
Continue reading on narkive:
Loading...