Discussion:
Race condition between udev rules and hwdb
(too old to reply)
Peter Hutterer
2018-02-05 06:02:07 UTC
Permalink
Raw Message
Hi all,

I think there is some race conditions between the udev rules and the hwdb
and I cannot rely that udev rules are applied consistently on a device.

For reference, after building libinput run
sudo ./build/libinput-test-suite-runner --filter-test=lid_update_hw_on_key_multiple_keyboards
run it repeatedly and at some point it will fail.

The source of the issue is the udev properties for the test device
*sometimes* get overwritten by the hwdb value of a rule with a lower lexical
ordering. In other words: 90-something.hwdb+rules overwrites
99-myrule.rules, that shouldn't happen (or at least not randomly).

For more detail, the relevant parts are:

90-foo.hwdb entry with a dmi match that assigns a property

libinput:name:*Lid Switch*:dmi:*:ct9:*
LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=reliable

and the matching 90-foo.rules:
KERNELS=="input*", \
IMPORT{builtin}="hwdb 'libinput:name:$attr{name}:$attr{[dmi/id]modalias}'"

This assigns 'reliable' to the device.

I also have per-device udev rules, 92-foo.rules, in this case:

ATTRS{name}=="litest Lid Switch Surface3*", \
ENV{ID_INPUT_SWITCH}="1", \
ENV{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}="write_open"

This overwrites 'reliable' with 'write_open'.

On test-runner start, we install all the udev rules and hwdb files, run hwdb
--update and then start the tests.

In *most* cases, 'write_open' is correctly assigned to the device. In the
failure cases, 'reliable' is assigned. Nothing changes in the udevadm test
output and I've verified that the order appears to change, in the failure
cases the 92-foo.rules applies before the hwdb overwrites it:

ATTRS{name}=="litest Lid Switch Surface3*",
ENV{ID_INPUT_SWITCH}="1",
ENV{BONGO}="litest",
ENV{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}="write_open"

ATTRS{name}=="litest Lid Switch Surface3*",
ENV{FIRSTVAL}="$env{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}"

Running with this udev rule shows that FIRSTVAL is write_open but the real
property is 'reliable'.

This happens anywhere between 1 out of 3 and 1 out of 50, though it seems to
be more common when creating/removing uinput devices like crazy.

What's going on here and where could the issue lie?

Thanks

Cheers,
Peter
Mantas Mikulėnas
2018-02-05 09:51:30 UTC
Permalink
Raw Message
Post by Peter Hutterer
Hi all,
I think there is some race conditions between the udev rules and the hwdb
and I cannot rely that udev rules are applied consistently on a device.
For reference, after building libinput run
sudo ./build/libinput-test-suite-runner --filter-test=lid_update_hw_
on_key_multiple_keyboards
run it repeatedly and at some point it will fail.
The source of the issue is the udev properties for the test device
*sometimes* get overwritten by the hwdb value of a rule with a lower lexical
ordering. In other words: 90-something.hwdb+rules overwrites
99-myrule.rules, that shouldn't happen (or at least not randomly).
90-foo.hwdb entry with a dmi match that assigns a property
libinput:name:*Lid Switch*:dmi:*:ct9:*
LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=reliable
KERNELS=="input*", \
IMPORT{builtin}="hwdb 'libinput:name:$attr{name}:$
attr{[dmi/id]modalias}'"
This assigns 'reliable' to the device.
ATTRS{name}=="litest Lid Switch Surface3*", \
ENV{ID_INPUT_SWITCH}="1", \
ENV{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}="write_open"
This overwrites 'reliable' with 'write_open'.
On test-runner start, we install all the udev rules and hwdb files, run hwdb
--update and then start the tests.
In *most* cases, 'write_open' is correctly assigned to the device. In the
failure cases, 'reliable' is assigned. Nothing changes in the udevadm test
output and I've verified that the order appears to change, in the failure
ATTRS{name}=="litest Lid Switch Surface3*",
ENV{ID_INPUT_SWITCH}="1",
ENV{BONGO}="litest",
ENV{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}="write_open"
ATTRS{name}=="litest Lid Switch Surface3*",
ENV{FIRSTVAL}="$env{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}"
Running with this udev rule shows that FIRSTVAL is write_open but the real
property is 'reliable'.
This happens anywhere between 1 out of 3 and 1 out of 50, though it seems to
be more common when creating/removing uinput devices like crazy.
Here's a wild guess...

I wonder if the race condition is in ATTRS{}; attributes are not cached but
read directly from sysfs, and for ATTRS udev has to go upwards the entire
/sys hierarchy – for each and every rule, I believe.

So it could be that some rules do not match because by that time the device
has already disappeared. What happens if you change the rules to rely
entirely on ENV{} matches?
--
Mantas Mikulėnas
Peter Hutterer
2018-02-05 23:21:09 UTC
Permalink
Raw Message
Post by Mantas Mikulėnas
Post by Peter Hutterer
Hi all,
I think there is some race conditions between the udev rules and the hwdb
and I cannot rely that udev rules are applied consistently on a device.
For reference, after building libinput run
sudo ./build/libinput-test-suite-runner --filter-test=lid_update_hw_
on_key_multiple_keyboards
run it repeatedly and at some point it will fail.
The source of the issue is the udev properties for the test device
*sometimes* get overwritten by the hwdb value of a rule with a lower lexical
ordering. In other words: 90-something.hwdb+rules overwrites
99-myrule.rules, that shouldn't happen (or at least not randomly).
90-foo.hwdb entry with a dmi match that assigns a property
libinput:name:*Lid Switch*:dmi:*:ct9:*
LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=reliable
KERNELS=="input*", \
IMPORT{builtin}="hwdb 'libinput:name:$attr{name}:$
attr{[dmi/id]modalias}'"
This assigns 'reliable' to the device.
ATTRS{name}=="litest Lid Switch Surface3*", \
ENV{ID_INPUT_SWITCH}="1", \
ENV{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}="write_open"
This overwrites 'reliable' with 'write_open'.
On test-runner start, we install all the udev rules and hwdb files, run hwdb
--update and then start the tests.
In *most* cases, 'write_open' is correctly assigned to the device. In the
failure cases, 'reliable' is assigned. Nothing changes in the udevadm test
output and I've verified that the order appears to change, in the failure
ATTRS{name}=="litest Lid Switch Surface3*",
ENV{ID_INPUT_SWITCH}="1",
ENV{BONGO}="litest",
ENV{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}="write_open"
ATTRS{name}=="litest Lid Switch Surface3*",
ENV{FIRSTVAL}="$env{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}"
Running with this udev rule shows that FIRSTVAL is write_open but the real
property is 'reliable'.
This happens anywhere between 1 out of 3 and 1 out of 50, though it seems to
be more common when creating/removing uinput devices like crazy.
Here's a wild guess...
I wonder if the race condition is in ATTRS{}; attributes are not cached but
read directly from sysfs, and for ATTRS udev has to go upwards the entire
/sys hierarchy – for each and every rule, I believe.
So it could be that some rules do not match because by that time the device
has already disappeared. What happens if you change the rules to rely
entirely on ENV{} matches?
Thanks for the tip, unfortunately I couldn't verify it. I tried that
yesterday but it's... difficult. Matching on ENV{NAME} turned out to be more
volatile than I expected [1] but the main issue I have here is that NAME is
set on the parent input device, not on the evdev node where everything else
is set. There is no ENVS support to search up from the device, so I'm not
sure I could coerce the rules and hwdb matching into the required hierarchy.
Some of the match bits require the event node's detail, so I can't easily
assign the properties to the parent device.

That aside, I *know* that the devices aren't removed until I finished with
them, they're created in the same process as part of the test setup/tear
down. And after the uinput device is created, I even wait for udev's 'add'
event before passing control to libinput [2]. So in theory, there's no way
the device is being deleted before udev had time to set everything up.

Given they're a hierarchy too, what's the likelyhood that
/sys/.../input/input123 device isn't there when the
/sys/.../input/input123/event0 device shows up?

Cheers,
Peter

[1] I think udev easily gets confused here because the name includes the
quotes so any rule has to either use * or use some magic I can't figure out
to work around that.
[2] https://github.com/wayland-project/libinput/blob/master/test/litest.c#L2732
Mantas Mikulėnas
2018-02-06 07:14:46 UTC
Permalink
Raw Message
Post by Peter Hutterer
Post by Mantas Mikulėnas
Post by Peter Hutterer
Hi all,
I think there is some race conditions between the udev rules and the
hwdb
Post by Mantas Mikulėnas
Post by Peter Hutterer
and I cannot rely that udev rules are applied consistently on a device.
For reference, after building libinput run
sudo ./build/libinput-test-suite-runner --filter-test=lid_update_hw_
on_key_multiple_keyboards
run it repeatedly and at some point it will fail.
The source of the issue is the udev properties for the test device
*sometimes* get overwritten by the hwdb value of a rule with a lower lexical
ordering. In other words: 90-something.hwdb+rules overwrites
99-myrule.rules, that shouldn't happen (or at least not randomly).
90-foo.hwdb entry with a dmi match that assigns a property
libinput:name:*Lid Switch*:dmi:*:ct9:*
LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=reliable
KERNELS=="input*", \
IMPORT{builtin}="hwdb 'libinput:name:$attr{name}:$
attr{[dmi/id]modalias}'"
This assigns 'reliable' to the device.
ATTRS{name}=="litest Lid Switch Surface3*", \
ENV{ID_INPUT_SWITCH}="1", \
ENV{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}="write_open"
This overwrites 'reliable' with 'write_open'.
On test-runner start, we install all the udev rules and hwdb files, run hwdb
--update and then start the tests.
In *most* cases, 'write_open' is correctly assigned to the device. In
the
Post by Mantas Mikulėnas
Post by Peter Hutterer
failure cases, 'reliable' is assigned. Nothing changes in the udevadm
test
Post by Mantas Mikulėnas
Post by Peter Hutterer
output and I've verified that the order appears to change, in the
failure
Post by Mantas Mikulėnas
Post by Peter Hutterer
ATTRS{name}=="litest Lid Switch Surface3*",
ENV{ID_INPUT_SWITCH}="1",
ENV{BONGO}="litest",
ENV{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}="write_open"
ATTRS{name}=="litest Lid Switch Surface3*",
ENV{FIRSTVAL}="$env{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}"
Running with this udev rule shows that FIRSTVAL is write_open but the
real
Post by Mantas Mikulėnas
Post by Peter Hutterer
property is 'reliable'.
This happens anywhere between 1 out of 3 and 1 out of 50, though it
seems
Post by Mantas Mikulėnas
Post by Peter Hutterer
to
be more common when creating/removing uinput devices like crazy.
Here's a wild guess...
I wonder if the race condition is in ATTRS{}; attributes are not cached
but
Post by Mantas Mikulėnas
read directly from sysfs, and for ATTRS udev has to go upwards the entire
/sys hierarchy – for each and every rule, I believe.
So it could be that some rules do not match because by that time the
device
Post by Mantas Mikulėnas
has already disappeared. What happens if you change the rules to rely
entirely on ENV{} matches?
Thanks for the tip, unfortunately I couldn't verify it. I tried that
yesterday but it's... difficult. Matching on ENV{NAME} turned out to be more
volatile than I expected [1] but the main issue I have here is that NAME is
set on the parent input device, not on the evdev node where everything else
is set. There is no ENVS support to search up from the device, so I'm not
sure I could coerce the rules and hwdb matching into the required hierarchy.
Yeah, after some sleep I realized I'm probably *way* off.

(That said, there is IMPORT{parent}="fooenv".)
--
Mantas Mikulėnas
Peter Hutterer
2018-02-07 04:40:26 UTC
Permalink
Raw Message
Post by Mantas Mikulėnas
Post by Peter Hutterer
Post by Mantas Mikulėnas
Post by Peter Hutterer
Hi all,
I think there is some race conditions between the udev rules and the
hwdb
Post by Mantas Mikulėnas
Post by Peter Hutterer
and I cannot rely that udev rules are applied consistently on a device.
For reference, after building libinput run
sudo ./build/libinput-test-suite-runner --filter-test=lid_update_hw_
on_key_multiple_keyboards
run it repeatedly and at some point it will fail.
The source of the issue is the udev properties for the test device
*sometimes* get overwritten by the hwdb value of a rule with a lower lexical
ordering. In other words: 90-something.hwdb+rules overwrites
99-myrule.rules, that shouldn't happen (or at least not randomly).
90-foo.hwdb entry with a dmi match that assigns a property
libinput:name:*Lid Switch*:dmi:*:ct9:*
LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=reliable
KERNELS=="input*", \
IMPORT{builtin}="hwdb 'libinput:name:$attr{name}:$
attr{[dmi/id]modalias}'"
This assigns 'reliable' to the device.
ATTRS{name}=="litest Lid Switch Surface3*", \
ENV{ID_INPUT_SWITCH}="1", \
ENV{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}="write_open"
This overwrites 'reliable' with 'write_open'.
On test-runner start, we install all the udev rules and hwdb files, run hwdb
--update and then start the tests.
In *most* cases, 'write_open' is correctly assigned to the device. In
the
Post by Mantas Mikulėnas
Post by Peter Hutterer
failure cases, 'reliable' is assigned. Nothing changes in the udevadm
test
Post by Mantas Mikulėnas
Post by Peter Hutterer
output and I've verified that the order appears to change, in the
failure
Post by Mantas Mikulėnas
Post by Peter Hutterer
ATTRS{name}=="litest Lid Switch Surface3*",
ENV{ID_INPUT_SWITCH}="1",
ENV{BONGO}="litest",
ENV{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}="write_open"
ATTRS{name}=="litest Lid Switch Surface3*",
ENV{FIRSTVAL}="$env{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}"
Running with this udev rule shows that FIRSTVAL is write_open but the
real
Post by Mantas Mikulėnas
Post by Peter Hutterer
property is 'reliable'.
This happens anywhere between 1 out of 3 and 1 out of 50, though it
seems
Post by Mantas Mikulėnas
Post by Peter Hutterer
to
be more common when creating/removing uinput devices like crazy.
Here's a wild guess...
I wonder if the race condition is in ATTRS{}; attributes are not cached
but
Post by Mantas Mikulėnas
read directly from sysfs, and for ATTRS udev has to go upwards the entire
/sys hierarchy – for each and every rule, I believe.
So it could be that some rules do not match because by that time the
device
Post by Mantas Mikulėnas
has already disappeared. What happens if you change the rules to rely
entirely on ENV{} matches?
Thanks for the tip, unfortunately I couldn't verify it. I tried that
yesterday but it's... difficult. Matching on ENV{NAME} turned out to be more
volatile than I expected [1] but the main issue I have here is that NAME is
set on the parent input device, not on the evdev node where everything else
is set. There is no ENVS support to search up from the device, so I'm not
sure I could coerce the rules and hwdb matching into the required hierarchy.
Yeah, after some sleep I realized I'm probably *way* off.
(That said, there is IMPORT{parent}="fooenv".)
yep, thanks - that worked and it confirmed it's not an issue with ATTRS.
The rule replaced to:

IMPORT{parent}="NAME"
ENV{NAME}=="*litest Lid Switch Surface3*", \
ENV{ID_INPUT_SWITCH}="1", \
ENV{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}="write_open"
ENV{NAME}=="*litest Lid Switch Surface3*", \
ENV{FIRSTVAL}="$env{LIBINPUT_ATTR_LID_SWITCH_RELIABILITY}"

Gives me:
E: FIRSTVAL=write_open
E: LIBINPUT_ATTR_LID_SWITCH_RELIABILITY=reliable

So the same issue, the rule is overwritten by a hwdb run that should've
happened sooner.

Cheers,
Peter

Loading...