Discussion:
socket unit refusing connection when JOB_STOP is pending
Add Reply
Moravec, Stanislav (ERT)
2017-05-16 11:28:04 UTC
Reply
Permalink
Raw Message
Hello all,

I wanted to seek your opinion about correctness of the current behavior
of socket activated units.

Let's assume we have socket activated service (for example authd - auth.socket) and
some other background service (for the purpose of this test called authtest.service)
that needs to connect to the socket service to properly stop itself.

The authtest defines dependency on auth.socket as expected:

# cat /usr/lib/systemd/system/authtest.service
[Unit]
Description=Test Script to connect auth during shutdown
After=auth.socket
Requires=auth.socket

[Service]
ExecStart=/bin/true
ExecStop=/usr/bin/connect_authd
Type=oneshot
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Yet, authtest doesn't stop correctly (in our test case, the connection just fails,
not real failure), because auth.socket refuses connections as soon as pending job
on auth.socket is JOB_STOP, even if it's not yet time to really stop the unit.

The auth.socket:
May 16 11:23:41 pra0097 systemd[1]: Installed new job auth.socket/stop as 9395
May 16 11:23:41 pra0097 systemd[1]: Incoming traffic on auth.socket
May 16 11:23:41 pra0097 systemd[1]: Suppressing connection request on auth.socket since unit stop is scheduled.
// NOTE the above
May 16 11:24:44 pra0097 systemd[1]: auth.socket changed listening -> dead
May 16 11:24:44 pra0097 systemd[1]: Job auth.socket/stop finished, result=done
May 16 11:24:44 pra0097 systemd[1]: Closed Authd Activation Socket.
May 16 11:24:44 pra0097 systemd[1]: Stopping Authd Activation Socket.

The authtest:
May 16 11:23:41 pra0097 systemd[1]: Installed new job authtest.service/stop as 9337
May 16 11:23:41 pra0097 systemd[1]: About to execute: /usr/bin/connect_authd
May 16 11:23:41 pra0097 systemd[1]: Forked /usr/bin/connect_authd as 7051
May 16 11:23:41 pra0097 systemd[1]: authtest.service changed exited -> stop
May 16 11:23:41 pra0097 systemd[1]: Stopping Test Script to connect auth during shutdown...
May 16 11:23:41 pra0097 systemd[7051]: Executing: /usr/bin/connect_authd
May 16 11:23:41 pra0097 connect_authd[7051]: Tue May 16 11:23:41 CEST 2017
May 16 11:23:41 pra0097 connect_authd[7051]: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
May 16 11:23:41 pra0097 connect_authd[7051]: systemd 1 root 38u IPv6 19431 0t0 TCP *:auth (LISTEN)
May 16 11:23:41 pra0097 connect_authd[7051]: ERROR reading from socket: Connection reset by peer
May 16 11:23:41 pra0097 connect_authd[7051]: sending message: 80,80
May 16 11:23:41 pra0097 systemd[1]: Child 7051 belongs to authtest.service
May 16 11:23:41 pra0097 systemd[1]: authtest.service: control process exited, code=exited status=0
May 16 11:23:41 pra0097 systemd[1]: authtest.service got final SIGCHLD for state stop
May 16 11:23:41 pra0097 systemd[1]: authtest.service changed stop -> dead
May 16 11:23:41 pra0097 systemd[1]: Job authtest.service/stop finished, result=done
May 16 11:23:41 pra0097 systemd[1]: Stopped Test Script to connect auth during shutdown.
May 16 11:23:41 pra0097 systemd[1]: authtest.service: cgroup is empty


The relevant piece of code:
static void socket_enter_running(Socket *s, int cfd) {
...
/* We don't take connections anymore if we are supposed to shut down anyway */
if (unit_stop_pending(UNIT(s))) {
log_unit_debug(UNIT(s), "Suppressing connection request since unit stop is scheduled.");
...


bool unit_stop_pending(Unit *u) {
...
return u->job && u->job->type == JOB_STOP;
}

Would not it make sense to still allow connections while the unit is still running?
Or maybe for compatibility some boolean could be added to socket unit definition to allow
the socket to keep answering connection until it really is stopped.

If it was not a socket activated unit the 2 services would order and work just fine,
so why should socket unit be different?

Opinions?

Thanks!
StanM
Moravec, Stanislav (ERT)
2017-05-24 11:40:53 UTC
Reply
Permalink
Raw Message
No one has any opinion?

Thanks again
Stan

-----Original Message-----
From: systemd-devel [mailto:systemd-devel-***@lists.freedesktop.org] On Behalf Of Moravec, Stanislav (ERT)
Sent: Tuesday, 16 May, 2017 13:28
To: systemd-***@lists.freedesktop.org
Subject: [systemd-devel] socket unit refusing connection when JOB_STOP is pending

Hello all,

I wanted to seek your opinion about correctness of the current behavior
of socket activated units.

Let's assume we have socket activated service (for example authd - auth.socket) and
some other background service (for the purpose of this test called authtest.service)
that needs to connect to the socket service to properly stop itself.

The authtest defines dependency on auth.socket as expected:

# cat /usr/lib/systemd/system/authtest.service
[Unit]
Description=Test Script to connect auth during shutdown
After=auth.socket
Requires=auth.socket

[Service]
ExecStart=/bin/true
ExecStop=/usr/bin/connect_authd
Type=oneshot
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Yet, authtest doesn't stop correctly (in our test case, the connection just fails,
not real failure), because auth.socket refuses connections as soon as pending job
on auth.socket is JOB_STOP, even if it's not yet time to really stop the unit.

The auth.socket:
May 16 11:23:41 pra0097 systemd[1]: Installed new job auth.socket/stop as 9395
May 16 11:23:41 pra0097 systemd[1]: Incoming traffic on auth.socket
May 16 11:23:41 pra0097 systemd[1]: Suppressing connection request on auth.socket since unit stop is scheduled.
// NOTE the above
May 16 11:24:44 pra0097 systemd[1]: auth.socket changed listening -> dead
May 16 11:24:44 pra0097 systemd[1]: Job auth.socket/stop finished, result=done
May 16 11:24:44 pra0097 systemd[1]: Closed Authd Activation Socket.
May 16 11:24:44 pra0097 systemd[1]: Stopping Authd Activation Socket.

The authtest:
May 16 11:23:41 pra0097 systemd[1]: Installed new job authtest.service/stop as 9337
May 16 11:23:41 pra0097 systemd[1]: About to execute: /usr/bin/connect_authd
May 16 11:23:41 pra0097 systemd[1]: Forked /usr/bin/connect_authd as 7051
May 16 11:23:41 pra0097 systemd[1]: authtest.service changed exited -> stop
May 16 11:23:41 pra0097 systemd[1]: Stopping Test Script to connect auth during shutdown...
May 16 11:23:41 pra0097 systemd[7051]: Executing: /usr/bin/connect_authd
May 16 11:23:41 pra0097 connect_authd[7051]: Tue May 16 11:23:41 CEST 2017
May 16 11:23:41 pra0097 connect_authd[7051]: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
May 16 11:23:41 pra0097 connect_authd[7051]: systemd 1 root 38u IPv6 19431 0t0 TCP *:auth (LISTEN)
May 16 11:23:41 pra0097 connect_authd[7051]: ERROR reading from socket: Connection reset by peer
May 16 11:23:41 pra0097 connect_authd[7051]: sending message: 80,80
May 16 11:23:41 pra0097 systemd[1]: Child 7051 belongs to authtest.service
May 16 11:23:41 pra0097 systemd[1]: authtest.service: control process exited, code=exited status=0
May 16 11:23:41 pra0097 systemd[1]: authtest.service got final SIGCHLD for state stop
May 16 11:23:41 pra0097 systemd[1]: authtest.service changed stop -> dead
May 16 11:23:41 pra0097 systemd[1]: Job authtest.service/stop finished, result=done
May 16 11:23:41 pra0097 systemd[1]: Stopped Test Script to connect auth during shutdown.
May 16 11:23:41 pra0097 systemd[1]: authtest.service: cgroup is empty


The relevant piece of code:
static void socket_enter_running(Socket *s, int cfd) {
...
/* We don't take connections anymore if we are supposed to shut down anyway */
if (unit_stop_pending(UNIT(s))) {
log_unit_debug(UNIT(s), "Suppressing connection request since unit stop is scheduled.");
...


bool unit_stop_pending(Unit *u) {
...
return u->job && u->job->type == JOB_STOP;
}

Would not it make sense to still allow connections while the unit is still running?
Or maybe for compatibility some boolean could be added to socket unit definition to allow
the socket to keep answering connection until it really is stopped.

If it was not a socket activated unit the 2 services would order and work just fine,
so why should socket unit be different?

Opinions?

Thanks!
StanM


_______________________________________________
systemd-devel mailing list
systemd-***@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Lennart Poettering
2017-05-29 15:44:59 UTC
Reply
Permalink
Raw Message
Post by Moravec, Stanislav (ERT)
Hello all,
I wanted to seek your opinion about correctness of the current behavior
of socket activated units.
Let's assume we have socket activated service (for example authd - auth.socket) and
some other background service (for the purpose of this test called authtest.service)
that needs to connect to the socket service to properly stop itself.
# cat /usr/lib/systemd/system/authtest.service
[Unit]
Description=Test Script to connect auth during shutdown
After=auth.socket
Requires=auth.socket
[Service]
ExecStart=/bin/true
ExecStop=/usr/bin/connect_authd
Type=oneshot
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
Yet, authtest doesn't stop correctly (in our test case, the connection just fails,
not real failure), because auth.socket refuses connections as soon as pending job
on auth.socket is JOB_STOP, even if it's not yet time to really stop the unit.
May 16 11:23:41 pra0097 systemd[1]: Installed new job auth.socket/stop as 9395
May 16 11:23:41 pra0097 systemd[1]: Incoming traffic on auth.socket
May 16 11:23:41 pra0097 systemd[1]: Suppressing connection request on auth.socket since unit stop is scheduled.
// NOTE the above
May 16 11:24:44 pra0097 systemd[1]: auth.socket changed listening -> dead
May 16 11:24:44 pra0097 systemd[1]: Job auth.socket/stop finished, result=done
May 16 11:24:44 pra0097 systemd[1]: Closed Authd Activation Socket.
May 16 11:24:44 pra0097 systemd[1]: Stopping Authd Activation Socket.
May 16 11:23:41 pra0097 systemd[1]: Installed new job authtest.service/stop as 9337
May 16 11:23:41 pra0097 systemd[1]: About to execute: /usr/bin/connect_authd
May 16 11:23:41 pra0097 systemd[1]: Forked /usr/bin/connect_authd as 7051
May 16 11:23:41 pra0097 systemd[1]: authtest.service changed exited -> stop
May 16 11:23:41 pra0097 systemd[1]: Stopping Test Script to connect auth during shutdown...
May 16 11:23:41 pra0097 systemd[7051]: Executing: /usr/bin/connect_authd
May 16 11:23:41 pra0097 connect_authd[7051]: Tue May 16 11:23:41 CEST 2017
May 16 11:23:41 pra0097 connect_authd[7051]: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
May 16 11:23:41 pra0097 connect_authd[7051]: systemd 1 root 38u IPv6 19431 0t0 TCP *:auth (LISTEN)
May 16 11:23:41 pra0097 connect_authd[7051]: ERROR reading from socket: Connection reset by peer
May 16 11:23:41 pra0097 connect_authd[7051]: sending message: 80,80
May 16 11:23:41 pra0097 systemd[1]: Child 7051 belongs to authtest.service
May 16 11:23:41 pra0097 systemd[1]: authtest.service: control process exited, code=exited status=0
May 16 11:23:41 pra0097 systemd[1]: authtest.service got final SIGCHLD for state stop
May 16 11:23:41 pra0097 systemd[1]: authtest.service changed stop -> dead
May 16 11:23:41 pra0097 systemd[1]: Job authtest.service/stop finished, result=done
May 16 11:23:41 pra0097 systemd[1]: Stopped Test Script to connect auth during shutdown.
May 16 11:23:41 pra0097 systemd[1]: authtest.service: cgroup is empty
static void socket_enter_running(Socket *s, int cfd) {
...
/* We don't take connections anymore if we are supposed to shut down anyway */
if (unit_stop_pending(UNIT(s))) {
log_unit_debug(UNIT(s), "Suppressing connection request since unit stop is scheduled.");
...
bool unit_stop_pending(Unit *u) {
...
return u->job && u->job->type == JOB_STOP;
}
Would not it make sense to still allow connections while the unit is still running?
Or maybe for compatibility some boolean could be added to socket unit definition to allow
the socket to keep answering connection until it really is stopped.
If it was not a socket activated unit the 2 services would order and work just fine,
so why should socket unit be different?
Opinions?
This is indeed a shortcoming in systemd's model right now: we don't
permit a start and a stop job to be enqueued for the same unit at the
same time. But to do what you want to do we'd need to permit that: the
service is supposed to stop, but also temporarily start.

I don't really have any nice way out to recommend to you I
fear. Permitting multiple jobs to be enqueued for the same unit would
be a major change in the design of systemd, and would result in a
number of complex problems (i.e. detecting cycles and deadlocks
becomes much more complex).

The best I can offer is to change the design of the services in
question: instead of connecting to the other service only at shutdown,
instead establish the connection when starting up, and leave the
connection around. THis way abnormal exits could be detected as well,
and no activation would be necessary anymore at shutdown.

I hope that helps in any way?

Lennart
--
Lennart Poettering, Red Hat
Moravec, Stanislav (ERT)
2017-05-30 18:07:23 UTC
Reply
Permalink
Raw Message
OK. Understood, thanks much!
We'll try to follow up on using some parent process (xinetd or something like that).
Eventually however, this is a limitation that prohibits using systemd as full init
replacement and should be addressed in some way sooner or later, I guess.
Thanks again
Stan

-----Original Message-----
From: Lennart Poettering [mailto:***@poettering.net]
Sent: Monday, 29 May, 2017 17:45
To: Moravec, Stanislav (ERT) <***@hpe.com>
Cc: systemd-***@lists.freedesktop.org
Subject: Re: [systemd-devel] socket unit refusing connection when JOB_STOP is pending
Post by Moravec, Stanislav (ERT)
Hello all,
I wanted to seek your opinion about correctness of the current behavior
of socket activated units.
Let's assume we have socket activated service (for example authd - auth.socket) and
some other background service (for the purpose of this test called authtest.service)
that needs to connect to the socket service to properly stop itself.
# cat /usr/lib/systemd/system/authtest.service
[Unit]
Description=Test Script to connect auth during shutdown
After=auth.socket
Requires=auth.socket
[Service]
ExecStart=/bin/true
ExecStop=/usr/bin/connect_authd
Type=oneshot
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
Yet, authtest doesn't stop correctly (in our test case, the connection just fails,
not real failure), because auth.socket refuses connections as soon as pending job
on auth.socket is JOB_STOP, even if it's not yet time to really stop the unit.
May 16 11:23:41 pra0097 systemd[1]: Installed new job auth.socket/stop as 9395
May 16 11:23:41 pra0097 systemd[1]: Incoming traffic on auth.socket
May 16 11:23:41 pra0097 systemd[1]: Suppressing connection request on auth.socket since unit stop is scheduled.
// NOTE the above
May 16 11:24:44 pra0097 systemd[1]: auth.socket changed listening -> dead
May 16 11:24:44 pra0097 systemd[1]: Job auth.socket/stop finished, result=done
May 16 11:24:44 pra0097 systemd[1]: Closed Authd Activation Socket.
May 16 11:24:44 pra0097 systemd[1]: Stopping Authd Activation Socket.
May 16 11:23:41 pra0097 systemd[1]: Installed new job authtest.service/stop as 9337
May 16 11:23:41 pra0097 systemd[1]: About to execute: /usr/bin/connect_authd
May 16 11:23:41 pra0097 systemd[1]: Forked /usr/bin/connect_authd as 7051
May 16 11:23:41 pra0097 systemd[1]: authtest.service changed exited -> stop
May 16 11:23:41 pra0097 systemd[1]: Stopping Test Script to connect auth during shutdown...
May 16 11:23:41 pra0097 systemd[7051]: Executing: /usr/bin/connect_authd
May 16 11:23:41 pra0097 connect_authd[7051]: Tue May 16 11:23:41 CEST 2017
May 16 11:23:41 pra0097 connect_authd[7051]: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
May 16 11:23:41 pra0097 connect_authd[7051]: systemd 1 root 38u IPv6 19431 0t0 TCP *:auth (LISTEN)
May 16 11:23:41 pra0097 connect_authd[7051]: ERROR reading from socket: Connection reset by peer
May 16 11:23:41 pra0097 connect_authd[7051]: sending message: 80,80
May 16 11:23:41 pra0097 systemd[1]: Child 7051 belongs to authtest.service
May 16 11:23:41 pra0097 systemd[1]: authtest.service: control process exited, code=exited status=0
May 16 11:23:41 pra0097 systemd[1]: authtest.service got final SIGCHLD for state stop
May 16 11:23:41 pra0097 systemd[1]: authtest.service changed stop -> dead
May 16 11:23:41 pra0097 systemd[1]: Job authtest.service/stop finished, result=done
May 16 11:23:41 pra0097 systemd[1]: Stopped Test Script to connect auth during shutdown.
May 16 11:23:41 pra0097 systemd[1]: authtest.service: cgroup is empty
static void socket_enter_running(Socket *s, int cfd) {
...
/* We don't take connections anymore if we are supposed to shut down anyway */
if (unit_stop_pending(UNIT(s))) {
log_unit_debug(UNIT(s), "Suppressing connection request since unit stop is scheduled.");
...
bool unit_stop_pending(Unit *u) {
...
return u->job && u->job->type == JOB_STOP;
}
Would not it make sense to still allow connections while the unit is still running?
Or maybe for compatibility some boolean could be added to socket unit definition to allow
the socket to keep answering connection until it really is stopped.
If it was not a socket activated unit the 2 services would order and work just fine,
so why should socket unit be different?
Opinions?
This is indeed a shortcoming in systemd's model right now: we don't
permit a start and a stop job to be enqueued for the same unit at the
same time. But to do what you want to do we'd need to permit that: the
service is supposed to stop, but also temporarily start.

I don't really have any nice way out to recommend to you I
fear. Permitting multiple jobs to be enqueued for the same unit would
be a major change in the design of systemd, and would result in a
number of complex problems (i.e. detecting cycles and deadlocks
becomes much more complex).

The best I can offer is to change the design of the services in
question: instead of connecting to the other service only at shutdown,
instead establish the connection when starting up, and leave the
connection around. THis way abnormal exits could be detected as well,
and no activation would be necessary anymore at shutdown.

I hope that helps in any way?

Lennart
--
Lennart Poettering, Red Hat
Uoti Urpala
2017-05-30 19:47:38 UTC
Reply
Permalink
Raw Message
Post by Moravec, Stanislav (ERT)
OK. Understood, thanks much!
We'll try to follow up on using some parent process (xinetd or something like that).
BTW your problem description wasn't very clear. Is your specific
problem case about socket activation of normal services (the issue
being that if the service has not been started before shutdown it won't
be, but things work if something did start it earlier) or about
Accept=true sockets that start a new service instance from template for
each connection? I'd guess that the latter might be somewhat easier to
support architecturally, though I'm not familiar enough with that to
say for sure.
Moravec, Stanislav (ERT)
2017-05-30 20:05:12 UTC
Reply
Permalink
Raw Message
Uoti,

yes, it was about Redhat/Centos7 authd (rfc 1413) service, so it was
the latter - one child process per each connection:

***@.service:[Unit]
***@.service:Description=Authd Ident Protocol Requests Server
***@.service:After=local-fs.target
***@.service:
***@.service:[Service]
***@.service:User=ident
***@.service:ExecStart=/usr/sbin/in.authd -t60 --xerror --os -E
***@.service:StandardInput=socket

auth.socket:[Unit]
auth.socket:Description=Authd Activation Socket
auth.socket:
auth.socket:[Socket]
auth.socket:ListenStream=113
auth.socket:Accept=true
auth.socket:
auth.socket:[Install]
auth.socket:WantedBy=sockets.target


regards
Stan
Michal Sekletar
2017-05-31 08:14:44 UTC
Reply
Permalink
Raw Message
On Mon, May 29, 2017 at 5:44 PM, Lennart Poettering
Post by Lennart Poettering
This is indeed a shortcoming in systemd's model right now: we don't
permit a start and a stop job to be enqueued for the same unit at the
same time. But to do what you want to do we'd need to permit that: the
service is supposed to stop, but also temporarily start.
AFAIU, this is not exactly the case Stanislav is talking about. He
wants systemd to activate instance of a service during shutdown while
stop job is already enqueued for respective socket unit (which is
different unit). At that time there can't be any stop job enqueued for
service instance since that isn't running yet. Hence there is no
conflict between start and stop jobs. *But* this is only true when we
talk about the service instance itself. That instance can have
dependencies that are already running and are scheduled to be stopped,
and here we have the problem that Lennart is talking about.
Moravec, Stanislav (ERT)
2017-05-31 13:43:30 UTC
Reply
Permalink
Raw Message
FYI:
I tried to simply bypass the pending job check:
+int ignore_stop_pending = true;
static void socket_enter_running(Socket *s, int cfd) {
...
- if (unit_stop_pending(UNIT(s))) {
+ if (!ignore_stop_pending && unit_stop_pending(UNIT(s))) {

But, as expected, it's not as that easy - the startup of the service fails to get queued.

...
May 31 16:07:30 pra0097 systemd[1]: Stopping Test Script to connect auth during shutdown...
May 31 16:07:30 pra0097 systemd[3830]: Executing: /usr/bin/connect_authd
May 31 16:07:30 pra0097 connect_authd[3830]: Wed May 31 16:07:30 CEST 2017
May 31 16:07:31 pra0097 connect_authd[3830]: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
May 31 16:07:31 pra0097 connect_authd[3830]: systemd 1 root 46u IPv6 20535 0t0 TCP *:auth (LISTEN)
May 31 16:07:31 pra0097 systemd[1]: auth.socket failed to queue service startup job (Maybe the service file is missing or not a non-template unit?): Transaction is destructive.
May 31 16:07:31 pra0097 systemd[1]: Closed Authd Activation Socket.
May 31 16:07:31 pra0097 systemd[1]: Unit auth.socket entered failed state.
May 31 16:09:00 pra0097 systemd[1]: authtest.service stopping timed out. Terminating.
...

Stan
Michal Sekletar
2017-06-05 09:49:58 UTC
Reply
Permalink
Raw Message
On Wed, May 31, 2017 at 3:43 PM, Moravec, Stanislav (ERT)
Post by Moravec, Stanislav (ERT)
+int ignore_stop_pending = true;
static void socket_enter_running(Socket *s, int cfd) {
...
- if (unit_stop_pending(UNIT(s))) {
+ if (!ignore_stop_pending && unit_stop_pending(UNIT(s))) {
But, as expected, it's not as that easy - the startup of the service fails to get queued.
This is because, stop jobs queued on shutdown have special job mode
that doesn't allow them to be replaced. When you removed the check you
caused activation to go through and that generated start jobs that
would normally replace pending stop jobs. But like I said, on shutdown
those stop job objects have the special job mode (flag) that prohibits
this.

Michal

Loading...