Discussion:
possible message leak in bus->wqueue ?
(too old to reply)
eshark
2015-08-07 09:18:15 UTC
Permalink
Raw Message
Hi, all
If some message went into bus->wqueue, and failed to run ioctl(KDBUS_CMD_SEND) and returned r < 0,
I found that this message will remain in the bus->wqueue. If the peer is killed for some reason, this message will fail to be sent and remain in the wqueu for ever.


Because in dispatch_wqueue() , when bus_write_message() return r <0, dispatch_wqueue() will simply return this "r " into the caller.
And the wqueue is invisible to user application, so user application also cannot remove this message to handle this error case.


I wonder whether this is a problem, and if yes, should we remove this message in dispatch_wqueue() when r < 0 ?




Thanks a lot !




Li Cheng
cee1
2015-08-07 10:07:18 UTC
Permalink
Raw Message
Post by eshark
Hi, all
If some message went into bus->wqueue, and failed to run
ioctl(KDBUS_CMD_SEND) and returned r < 0,
I found that this message will remain in the bus->wqueue. If the peer is
killed for some reason, this message will fail to be sent and remain in the
wqueu for ever.
Because in dispatch_wqueue() , when bus_write_message() return r <0,
dispatch_wqueue() will simply return this "r " into the caller.
And the wqueue is invisible to user application, so user application also
cannot remove this message to handle this error case.
I wonder whether this is a problem, and if yes, should we remove this
message in dispatch_wqueue() when r < 0 ?
I've the same question.

E.g.

dispatch_wqueue()
bus_write_message()
bus_kernel_write_message()

"""
r = ioctl(bus->output_fd, KDBUS_CMD_SEND, &cmd);
if (r < 0) {
...
else if (errno == ENXIO || errno == ESRCH) {
...
if (m->header->type == SD_BUS_MESSAGE_METHOD_CALL)
sd_bus_error_setf(&error,
SD_BUS_ERROR_SERVICE_UNKNOWN, "Destination %s not known",
m->destination);
else {
log_debug("Could not deliver message
to %s as destination is not known. Ignoring.", m->destination);
return 0;
}
}
"""

If A __returns__ a result to B, but B has already died (After sending
a "method call" message):

1. It will return ENXIO or ESRCH, right?
2. dispatch_wqueue(), bus_write_message() and
bus_kernel_write_message() returns 0
3. Next time dispatch_wqueue() called, it will retry, but never
succeed - so, deadlocked?
--
Regards,

- cee1
David Herrmann
2015-08-24 10:26:23 UTC
Permalink
Raw Message
Hi
Post by cee1
Post by eshark
Hi, all
If some message went into bus->wqueue, and failed to run
ioctl(KDBUS_CMD_SEND) and returned r < 0,
I found that this message will remain in the bus->wqueue. If the peer is
killed for some reason, this message will fail to be sent and remain in the
wqueu for ever.
Because in dispatch_wqueue() , when bus_write_message() return r <0,
dispatch_wqueue() will simply return this "r " into the caller.
And the wqueue is invisible to user application, so user application also
cannot remove this message to handle this error case.
I wonder whether this is a problem, and if yes, should we remove this
message in dispatch_wqueue() when r < 0 ?
I've the same question.
E.g.
dispatch_wqueue()
bus_write_message()
bus_kernel_write_message()
"""
r = ioctl(bus->output_fd, KDBUS_CMD_SEND, &cmd);
if (r < 0) {
...
else if (errno == ENXIO || errno == ESRCH) {
...
if (m->header->type == SD_BUS_MESSAGE_METHOD_CALL)
sd_bus_error_setf(&error,
SD_BUS_ERROR_SERVICE_UNKNOWN, "Destination %s not known",
m->destination);
else {
log_debug("Could not deliver message
to %s as destination is not known. Ignoring.", m->destination);
return 0;
This probably needs to be "return 1;". Lennart, any comments?
Post by cee1
}
}
"""
If A __returns__ a result to B, but B has already died (After sending
1. It will return ENXIO or ESRCH, right?
It returns EBADSLT.
Post by cee1
2. dispatch_wqueue(), bus_write_message() and
bus_kernel_write_message() returns 0
3. Next time dispatch_wqueue() called, it will retry, but never
succeed - so, deadlocked?
Thanks
David
eshark
2015-11-26 04:27:36 UTC
Permalink
Raw Message
Hi, all,

It seems that the exec_spawn() will return 0 if fork() fails, because that
return log_unit_error_errno(params->unit_id, errno, "Failed to fork: %m");
will return -r eventually. And here r = exec_context_load_environment() , which has
exited successfully.

This may lead big trouble to the caller of exec_spawn(). For example, mount_spawn(), which
also calls exec_spawn(), will not goto fail but run continuelly in this case. Then the following unit_watch_pid()
will fail at assert(pid >= 1);.

I also commit a patch to fix this problem, please help to review.

Thanks a lot£¡

¡¡Li Cheng
Lennart Poettering
2015-11-27 13:18:08 UTC
Permalink
Raw Message
Post by eshark
Hi, all,
It seems that the exec_spawn() will return 0 if fork() fails, because that
return log_unit_error_errno(params->unit_id, errno, "Failed to fork: %m");
will return -r eventually. And here r = exec_context_load_environment() , which has
exited successfully.
This may lead big trouble to the caller of exec_spawn(). For example, mount_spawn(), which
also calls exec_spawn(), will not goto fail but run continuelly in this case. Then the following unit_watch_pid()
will fail at assert(pid >= 1);.
I also commit a patch to fix this problem, please help to review.
Indeed, this is a bug! Patch looks great! I turned this now into a
github PR (https://github.com/systemd/systemd/pull/2048), because
that's how we prefer submissions these days. Will merge as soon as the
CI likes it too.

Thanks a lot!

Lennart
--
Lennart Poettering, Red Hat
eshark
2015-12-06 09:34:26 UTC
Permalink
Raw Message
Hi, All


         In the sd-bus.c, there are four conditions for calling the bus_enter_closing, as is: 

                        if (r == -ENOTCONN || r == -ECONNRESET || r == -EPIPE || r == -ESHUTDOWN) {
                                   bus_enter_closing(bus);
                                   return -ECONNRESET;
                           }
   
         While working with kdbus in the kernel, the kdbus return -EPIPE  to the remote connection while the local connection dies, and sometimes the kdbus
also return -ECONNRESET when the local connection dies. 
  
         Thus, it seems unreasonable for the remote connection to enter BUS_CLOSING state when it gets -EPIPE or sometimes -ECONNRESET.


         If you agree with the above issues,  which is the better choice, to modify the conditions in the sd-bus.c , or to modify the errno in the kdbus kernel?

Because the kdbus kernel always return -EPIPE to the remote connection , so it is reasonable to remove the EPIPE from the sd-bus.c ?
And becasue the kdbus kernel sometimes returen -ECONNRESET to the remote connection when the local connection dies, so it is reasonable to give a new errno for this case? such as EPERM or EHOSTUNREACH..


B.R.
Li Cheng


   
eshark
2015-12-09 02:40:36 UTC
Permalink
Raw Message
Hi, All


         In the sd-bus.c, there are four conditions for calling the bus_enter_closing, as is: 

                        if (r == -ENOTCONN || r == -ECONNRESET || r == -EPIPE || r == -ESHUTDOWN) {
                                   bus_enter_closing(bus);
                                   return -ECONNRESET;
                           }
   
         While working with kdbus in the kernel, the kdbus return -EPIPE  to the remote connection while the local connection dies, and sometimes the kdbus
also return -ECONNRESET when the local connection dies. 
  
         Thus, it seems unreasonable for the remote connection to enter BUS_CLOSING state when it gets -EPIPE or sometimes -ECONNRESET.


         If you agree with the above issues,  which is the better choice, to modify the conditions in the sd-bus.c , or to modify the errno in the kdbus kernel?

Because the kdbus kernel always return -EPIPE to the remote connection , so it is reasonable to remove the EPIPE from the sd-bus.c ?
And becasue the kdbus kernel sometimes returen -ECONNRESET to the remote connection when the local connection dies, so it is reasonable to give a new errno for this case? such as EPERM or EHOSTUNREACH..


B.R.
Li Cheng


   
eshark
2017-12-29 09:19:35 UTC
Permalink
Raw Message
Hi, All
I tried to test the socket activation by a simple foobar.socket and foobar.service, which are as the following:
foobar.socket:
[Socket]
ListenStream=/dev/socket/foobar


[Install]
WantedBy=sockets.target


foobar.service:
[Service]
Type=simple
ExecStart=/usr/bin/test-socket
Restart=no


I also wrote a simple program to connect to /dev/socket/foobar , in order to activitate the foobar.service.
When I ran the program, the foobar.service was started by systemd , and the foobar.socket changed from 'listening' state to 'running' state.
All works OK as expected, but when I killed the test-socket, it was started again by the systemd, even if I didn't run my program.
And from the system journal logs , I found that
"
Line 2035: 31,29604,571630004,-;systemd[1]: foobar.socket got notified about service death (failed permanently: no)
Line 2038: 31,29605,571630065,-;systemd[1]: foobar.socket changed running -> listening
Line 2050: 31,29609,571632385,-;systemd[1]: Incoming traffic on foobar.socket
Line 2056: 28,29611,571633087,-;systemd[1]: Cannot add dependency job for unit systemd-bus-proxyd.socket, ignoring: Unit systemd-bus-proxyd.socket failed to load: No such file or directory.
Line 2056: 28,29611,571633087,-;systemd[1]: Cannot add dependency job for unit systemd-bus-proxyd.socket, ignoring: Unit systemd-bus-proxyd.socket failed to load: No such file or directory.
Line 2065: 31,29614,571633544,-;systemd[1]: foobar.socket changed listening -> running
"
It seems that immediately after the death of foobar.service, some unknown incoming traffic on foobar.socket made the foobar.service started again by the systemd .
Could anyone give me some suggestion that who connected to the foobar.socket ? Any idea about how to debug this problem is very appreciated.


Thanks a lot.
eshark
2018-01-02 02:09:49 UTC
Permalink
Raw Message
The program source codes and the foobar.service , the foobar.socket are as the attachments.


Thanks for any suggestion!






At 2017-12-29 16:19:35, "eshark" <***@163.com> wrote:

Hi, All
I tried to test the socket activation by a simple foobar.socket and foobar.service, which are as the following:
foobar.socket:
[Socket]
ListenStream=/dev/socket/foobar


[Install]
WantedBy=sockets.target


foobar.service:
[Service]
Type=simple
ExecStart=/usr/bin/test-socket
Restart=no


I also wrote a simple program to connect to /dev/socket/foobar , in order to activitate the foobar.service.
When I ran the program, the foobar.service was started by systemd , and the foobar.socket changed from 'listening' state to 'running' state.
All works OK as expected, but when I killed the test-socket, it was started again by the systemd, even if I didn't run my program.
And from the system journal logs , I found that
"
Line 2035: 31,29604,571630004,-;systemd[1]: foobar.socket got notified about service death (failed permanently: no)
Line 2038: 31,29605,571630065,-;systemd[1]: foobar.socket changed running -> listening
Line 2050: 31,29609,571632385,-;systemd[1]: Incoming traffic on foobar.socket
Line 2056: 28,29611,571633087,-;systemd[1]: Cannot add dependency job for unit systemd-bus-proxyd.socket, ignoring: Unit systemd-bus-proxyd.socket failed to load: No such file or directory.
Line 2056: 28,29611,571633087,-;systemd[1]: Cannot add dependency job for unit systemd-bus-proxyd.socket, ignoring: Unit systemd-bus-proxyd.socket failed to load: No such file or directory.
Line 2065: 31,29614,571633544,-;systemd[1]: foobar.socket changed listening -> running
"
It seems that immediately after the death of foobar.service, some unknown incoming traffic on foobar.socket made the foobar.service started again by the systemd .
Could anyone give me some suggestion that who connected to the foobar.socket ? Any idea about how to debug this problem is very appreciated.


Thanks a lot.
Lennart Poettering
2018-01-24 11:01:42 UTC
Permalink
Raw Message
Post by eshark
Hi, All
[Socket]
ListenStream=/dev/socket/foobar
[Install]
WantedBy=sockets.target
[Service]
Type=simple
ExecStart=/usr/bin/test-socket
Restart=no
I also wrote a simple program to connect to /dev/socket/foobar , in order to activitate the foobar.service.
When I ran the program, the foobar.service was started by systemd , and the foobar.socket changed from 'listening' state to 'running' state.
All works OK as expected, but when I killed the test-socket, it was started again by the systemd, even if I didn't run my program.
And from the system journal logs , I found that
"
Line 2035: 31,29604,571630004,-;systemd[1]: foobar.socket got notified about service death (failed permanently: no)
Line 2038: 31,29605,571630065,-;systemd[1]: foobar.socket changed running -> listening
Line 2050: 31,29609,571632385,-;systemd[1]: Incoming traffic on foobar.socket
Line 2056: 28,29611,571633087,-;systemd[1]: Cannot add dependency job for unit systemd-bus-proxyd.socket, ignoring: Unit systemd-bus-proxyd.socket failed to load: No such file or directory.
Line 2056: 28,29611,571633087,-;systemd[1]: Cannot add dependency job for unit systemd-bus-proxyd.socket, ignoring: Unit systemd-bus-proxyd.socket failed to load: No such file or directory.
Line 2065: 31,29614,571633544,-;systemd[1]: foobar.socket changed listening -> running
"
It seems that immediately after the death of foobar.service, some unknown incoming traffic on foobar.socket made the foobar.service started again by the systemd .
Could anyone give me some suggestion that who connected to the foobar.socket ? Any idea about how to debug this problem is very appreciated.
If your service is activated due to incoming traffic and you do not
process the incoming traffic and exit, then you will be started
immediately again, as the socket still has traffic queued.

The idea is usually to process everything queued up on the listening
socket, and then exit until the next incoming traffic.

If you never accept any of the incoming connections you hence will
make systemd busy loop around your service.

Lennart
--
Lennart Poettering, Red Hat
eshark
2015-11-27 02:51:18 UTC
Permalink
Raw Message
Hi, all,

It seems that the exec_spawn() will return 0 if fork() fails, because that
return log_unit_error_errno(params->unit_id, errno, "Failed to fork: %m");
will return -r eventually. And here r = exec_context_load_environment() , which has
exited successfully.

This may lead big trouble to the caller of exec_spawn(). For example, mount_spawn(), which
also calls exec_spawn(), will not goto fail but run continuelly in this case. Then the following unit_watch_pid()
will fail at assert(pid >= 1);.

I also commit a patch to fix this problem, please help to review.

Thanks a lot£¡

¡¡Li Cheng
Loading...