Discussion:
Fail to reset-failed as user
(too old to reply)
Olivier Brunel
2014-12-12 15:06:09 UTC
Permalink
Hi,

Today I had one unit in failed state, and after taking care of things I
wanted to simply reset its state (to inactive) w/out having to start it.

Looking up the man page, I see there's a command reset-failed for this
exact purpose, awesome. So I go:

% systemctl reset-failed backups2.service
Failed to reset failed state of unit backups2.service: No such device or
address

I was nicely asked to authenticate, but then it failed stating the unit
doesn't exist or something (not sure what the error message refers to)?
Now of course said unit does exist:

% systemctl is-failed backups2.service
failed

And I could eventually do it, as root:

% sudo systemctl reset-failed backups2.service

This worked fine and is probably what I would have done had my fingers
not slipped (sc instead of ssc, aliases for `systemctl` and `sudo
systemctl` resp.), but I'm not sure what I missed: since I was properly
authenticated, shouldn't the systemctl call also have worked?

FYI here's what shows up in the journal, confirming the auth:
Dec 12 15:40:00 arch.local polkitd[670]: Operator of unix-session:c3
successfully authenticated as unix-user:jjacky to gain TEMPORARY
authorization for action org.freedesktop.systemd1.manage-units for
system-bus-name::1.259 [systemctl reset-failed backups2.service] (owned
by unix-user:jjacky)

What am I missing/misunderstanding? (or is this a bug?)
Thanks,
-j
Lennart Poettering
2015-02-03 21:17:53 UTC
Permalink
On Fri, 12.12.14 16:06, Olivier Brunel (***@jjacky.com) wrote:

Sorry for resurrecting this old thread this late. Is this still an
issue? Does this work on current git?
Post by Olivier Brunel
Today I had one unit in failed state, and after taking care of things I
wanted to simply reset its state (to inactive) w/out having to start it.
Looking up the man page, I see there's a command reset-failed for this
% systemctl reset-failed backups2.service
Failed to reset failed state of unit backups2.service: No such device or
address
Hmm, did you issue this from some weird environment (su/sudo context,
from a system service context or so?)

If this is still an issue, could you try to reproduce this after
issuing "systemd-analyze set-log-level debug"? Then please attach the
log output this generates!

Thanks,

Lennart
--
Lennart Poettering, Red Hat
Olivier Brunel
2015-02-05 18:20:34 UTC
Permalink
Post by Lennart Poettering
Sorry for resurrecting this old thread this late. Is this still an
issue? Does this work on current git?
Still an issue w/ 218 yes, haven't actually had time to try with current
git. I'll try to do that over the weekend.
Post by Lennart Poettering
Post by Olivier Brunel
Today I had one unit in failed state, and after taking care of things I
wanted to simply reset its state (to inactive) w/out having to start it.
Looking up the man page, I see there's a command reset-failed for this
% systemctl reset-failed backups2.service
Failed to reset failed state of unit backups2.service: No such device or
address
Hmm, did you issue this from some weird environment (su/sudo context,
from a system service context or so?)
If this is still an issue, could you try to reproduce this after
issuing "systemd-analyze set-log-level debug"? Then please attach the
log output this generates!
Meanwhile, this is what I get today: http://ix.io/gaR
This is not from some weird environment no (or, not that I'm aware of),
but an (almost) up-to-date Arch Linux x64, systemd 218.

-j
Post by Lennart Poettering
Thanks,
Lennart
Lennart Poettering
2015-02-11 20:13:58 UTC
Permalink
Post by Olivier Brunel
Post by Lennart Poettering
Sorry for resurrecting this old thread this late. Is this still an
issue? Does this work on current git?
Still an issue w/ 218 yes, haven't actually had time to try with current
git. I'll try to do that over the weekend.
Post by Lennart Poettering
Post by Olivier Brunel
Today I had one unit in failed state, and after taking care of things I
wanted to simply reset its state (to inactive) w/out having to start it.
Looking up the man page, I see there's a command reset-failed for this
% systemctl reset-failed backups2.service
Failed to reset failed state of unit backups2.service: No such device or
address
Hmm, did you issue this from some weird environment (su/sudo context,
from a system service context or so?)
If this is still an issue, could you try to reproduce this after
issuing "systemd-analyze set-log-level debug"? Then please attach the
log output this generates!
Meanwhile, this is what I get today: http://ix.io/gaR
This is not from some weird environment no (or, not that I'm aware of),
but an (almost) up-to-date Arch Linux x64, systemd 218.
Puzzled. Don't see how this can happen. Also, works fine here...

If you can reproduce this on git, it would be good to gdb this. For
that:

a) start gdb, type "attach 1", to attach to PID 1

b) add a breakpoint on method_reset_failed_unit, by issuing "b
method_reset_failed_unit"

c) Continue execution of PID 1, by typing in the line "c"

d) trigger the issue, gdb should break at that instant.

e) now, check which call fails by stepping through the function with
"n". As soon as the function is left, use "c" to continue
execution. Not that the function will be executed twice, one after
the other. The first invocation will be without PolicyKit privs,
the second one with PolicyKit privs. The second invocation is the
interesting one. Check why it exits non-zero, and whether
unit_reset_failed() is invoked at all (which actually does the
inetersting work).

f) post your findings here

g) leave gdb again with ^D

Don'd do much more than this at the same time. Since you stop
execution of PID 1 a lot of things will be slow and potentially time
ut while you run all this.

Thanks,

Lennart
--
Lennart Poettering, Red Hat
Olivier Brunel
2015-02-14 18:37:00 UTC
Permalink
Post by Lennart Poettering
Post by Olivier Brunel
Post by Lennart Poettering
Sorry for resurrecting this old thread this late. Is this still an
issue? Does this work on current git?
Still an issue w/ 218 yes, haven't actually had time to try with current
git. I'll try to do that over the weekend.
Post by Lennart Poettering
Post by Olivier Brunel
Today I had one unit in failed state, and after taking care of things I
wanted to simply reset its state (to inactive) w/out having to start it.
Looking up the man page, I see there's a command reset-failed for this
% systemctl reset-failed backups2.service
Failed to reset failed state of unit backups2.service: No such device or
address
Hmm, did you issue this from some weird environment (su/sudo context,
from a system service context or so?)
If this is still an issue, could you try to reproduce this after
issuing "systemd-analyze set-log-level debug"? Then please attach the
log output this generates!
Meanwhile, this is what I get today: http://ix.io/gaR
This is not from some weird environment no (or, not that I'm aware of),
but an (almost) up-to-date Arch Linux x64, systemd 218.
Puzzled. Don't see how this can happen. Also, works fine here...
If you can reproduce this on git, it would be good to gdb this. For
a) start gdb, type "attach 1", to attach to PID 1
b) add a breakpoint on method_reset_failed_unit, by issuing "b
method_reset_failed_unit"
c) Continue execution of PID 1, by typing in the line "c"
d) trigger the issue, gdb should break at that instant.
e) now, check which call fails by stepping through the function with
"n". As soon as the function is left, use "c" to continue
execution. Not that the function will be executed twice, one after
the other. The first invocation will be without PolicyKit privs,
the second one with PolicyKit privs. The second invocation is the
interesting one. Check why it exits non-zero, and whether
unit_reset_failed() is invoked at all (which actually does the
inetersting work).
f) post your findings here
Alright so I did some testing, here's what I found:

- on that second invocation, method_reset_failed_unit() fails from its
call to bus_unit_method_reset_failed(), and that comes down to
bus_message_enter_struct() returning -ENXIO.

- I don't know how this whole thing is supposed to work, but what I
noticed is that bus_message_enter_struct() is called twice from
method_reset_failed_unit(), once from bus_verify_manage_unit_async() and
then from bus_unit_method_reset_failed(). Details as follow:

First, when bus_verify_manage_unit_async() is called:

#0 bus_message_enter_struct (m=0x7f5fb0cb88b0, c=0x7f5fb0cb8aa0,
contents=0x7f5faef0d152 "bba{ss}", item_size=0x7fffcebd4928,
offsets=0x7fffcebd4918,
n_offsets=0x7fffcebd4920) at src/libsystemd/sd-bus/bus-message.c:3865
#1 0x00007f5faee80136 in sd_bus_message_enter_container
(m=0x7f5fb0cb88b0, type=114 'r',
contents=0x7f5faef0d152 "bba{ss}") at
src/libsystemd/sd-bus/bus-message.c:4012
#2 0x00007f5faee8e00d in bus_verify_polkit_async (call=0x7f5fb0ca59a0,
capability=21,
action=0x7f5faeef05f8 "org.freedesktop.systemd1.manage-units",
interactive=false,
registry=0x7f5fb0c0a890, error=0x7fffcebd4ad0) at
src/libsystemd/sd-bus/bus-util.c:374
#3 0x00007f5faee0aa00 in bus_verify_manage_unit_async
(m=0x7f5fb0c0a460, call=0x7f5fb0ca59a0,
error=0x7fffcebd4ad0) at src/core/dbus.c:1196
#4 0x00007f5faee0c801 in method_reset_failed_unit (bus=0x7f5fb0ca32f0,
message=0x7f5fb0ca59a0,
userdata=0x7f5fb0c0a460, error=0x7fffcebd4ad0) at
src/core/dbus-manager.c:574

(gdb) p *c
$38 = {enclosing = 0 '\000', need_offsets = false, index = 0,
saved_index = 0,
signature = 0x7f5fb0c09110 "(bba{ss})", before = 0, begin = 0, end =
133, array_size = 0x0,
offsets = 0x0, n_offsets = 0, offsets_allocated = 0, offset_index = 0,
item_size = 133,
peeked_signature = 0x0}
(gdb) p contents
$39 = 0x7f5faef0d152 "bba{ss}"

It eventually returns 1.

Then it gets to called from bus_unit_method_reset_failed():

#0 bus_message_enter_struct (m=0x7f5fb0cb88b0, c=0x7f5fb0cb8250,
contents=0x7f5faef0d152 "bba{ss}", item_size=0x7fffcebd48e8,
offsets=0x7fffcebd48d8,
n_offsets=0x7fffcebd48e0) at src/libsystemd/sd-bus/bus-message.c:3865
#1 0x00007f5faee80136 in sd_bus_message_enter_container
(m=0x7f5fb0cb88b0, type=114 'r',
contents=0x7f5faef0d152 "bba{ss}") at
src/libsystemd/sd-bus/bus-message.c:4012
#2 0x00007f5faee8e00d in bus_verify_polkit_async (call=0x7f5fb0ca59a0,
capability=21,
action=0x7f5faeef05f8 "org.freedesktop.systemd1.manage-units",
interactive=false,
registry=0x7f5fb0c0a890, error=0x7fffcebd4ad0) at
src/libsystemd/sd-bus/bus-util.c:374
#3 0x00007f5faee0aa00 in bus_verify_manage_unit_async
(m=0x7f5fb0c0a460, call=0x7f5fb0ca59a0,
error=0x7fffcebd4ad0) at src/core/dbus.c:1196
#4 0x00007f5faee12feb in bus_unit_method_reset_failed (bus=0x7f5fb0ca32f0,
message=0x7f5fb0ca59a0, userdata=0x7f5fb0cc7ff0, error=0x7fffcebd4ad0)
at src/core/dbus-unit.c:496
#5 0x00007f5faee0c8aa in method_reset_failed_unit (bus=0x7f5fb0ca32f0,
message=0x7f5fb0ca59a0,
userdata=0x7f5fb0c0a460, error=0x7fffcebd4ad0) at
src/core/dbus-manager.c:588

(gdb) p *c
$40 = {enclosing = 114 'r', need_offsets = true, index = 2, saved_index
= 2,
signature = 0x7f5fb0ca3ec0 "bba{ss}", before = 0, begin = 0, end =
133, array_size = 0x0,
offsets = 0x0, n_offsets = 0, offsets_allocated = 8391685410159683651,
offset_index = 0,
item_size = 0, peeked_signature = 0x0}
(gdb) p contents
$41 = 0x7f5faef0d152 "bba{ss}"

And this will fail on:
if (c->signature[c->index] != SD_BUS_TYPE_STRUCT_BEGIN ||
and return -ENXIO.


Hope this can be helpful,
-j
Post by Lennart Poettering
g) leave gdb again with ^D
Don'd do much more than this at the same time. Since you stop
execution of PID 1 a lot of things will be slow and potentially time
ut while you run all this.
Thanks,
Lennart
Lennart Poettering
2015-04-24 15:25:13 UTC
Permalink
On Sat, 14.02.15 19:37, Olivier Brunel (***@jjacky.com) wrote:

Heya!
Post by Olivier Brunel
#0 bus_message_enter_struct (m=0x7f5fb0cb88b0, c=0x7f5fb0cb8250,
contents=0x7f5faef0d152 "bba{ss}", item_size=0x7fffcebd48e8,
offsets=0x7fffcebd48d8,
n_offsets=0x7fffcebd48e0) at src/libsystemd/sd-bus/bus-message.c:3865
#1 0x00007f5faee80136 in sd_bus_message_enter_container
(m=0x7f5fb0cb88b0, type=114 'r',
contents=0x7f5faef0d152 "bba{ss}") at
src/libsystemd/sd-bus/bus-message.c:4012
#2 0x00007f5faee8e00d in bus_verify_polkit_async (call=0x7f5fb0ca59a0,
capability=21,
action=0x7f5faeef05f8 "org.freedesktop.systemd1.manage-units",
interactive=false,
registry=0x7f5fb0c0a890, error=0x7fffcebd4ad0) at
src/libsystemd/sd-bus/bus-util.c:374
#3 0x00007f5faee0aa00 in bus_verify_manage_unit_async
(m=0x7f5fb0c0a460, call=0x7f5fb0ca59a0,
error=0x7fffcebd4ad0) at src/core/dbus.c:1196
#4 0x00007f5faee12feb in bus_unit_method_reset_failed (bus=0x7f5fb0ca32f0,
message=0x7f5fb0ca59a0, userdata=0x7f5fb0cc7ff0, error=0x7fffcebd4ad0)
at src/core/dbus-unit.c:496
#5 0x00007f5faee0c8aa in method_reset_failed_unit (bus=0x7f5fb0ca32f0,
message=0x7f5fb0ca59a0,
userdata=0x7f5fb0c0a460, error=0x7fffcebd4ad0) at
src/core/dbus-manager.c:588
(gdb) p *c
$40 = {enclosing = 114 'r', need_offsets = true, index = 2, saved_index
= 2,
signature = 0x7f5fb0ca3ec0 "bba{ss}", before = 0, begin = 0, end =
133, array_size = 0x0,
offsets = 0x0, n_offsets = 0, offsets_allocated = 8391685410159683651,
offset_index = 0,
item_size = 0, peeked_signature = 0x0}
(gdb) p contents
$41 = 0x7f5faef0d152 "bba{ss}"
if (c->signature[c->index] != SD_BUS_TYPE_STRUCT_BEGIN ||
and return -ENXIO.
Hope this can be helpful,
Yes it was!

I am pretty sure this was fixed with
1d22e9068c52c1cf935bcdff70b9b9654e3c939e. Can you check if this fixes
the issue for you?

(This was simply that we checked the PK auth twice, unnecessarily. And
the second time the read ptr into the PK message was already at the
end of the message which meant parsing it failed. But with the change
pointed out above this is fixed, we should authenticate only once
now.)

Thanks for gdb'ing this!

Lennart
--
Lennart Poettering, Red Hat
Continue reading on narkive:
Loading...