[systemd-devel] Processes running after a service has stopped

Discussion:

Ross Lagerwall

2014-11-28 13:42:22 UTC

The handling of a service with KillMode set to something other than cgroup
is a bit confusing (as of systemd 208).

Suppose I have a service which has KillMode set to process and it happens
to leave some children behind.
# systemctl start tester
# systemctl status tester
tester.service - tester service
Loaded: loaded (/etc/systemd/system/tester.service; static)
Active: active (running) since Fri 2014-11-28 13:32:40 GMT; 2s ago
Main PID: 5690 (tester)
CGroup: /system.slice/tester.service
├─5690 /home/ross/tester start
└─5691 /home/ross/tester start

# systemctl stop tester
# systemctl status tester
tester.service - tester service
Loaded: loaded (/etc/systemd/system/tester.service; static)
Active: inactive (dead)

Now even though there is still a process running, systemd doesn't indicate
this. Furthermore, trying to kill these processes doesn't work because the
service is "stopped":
# systemctl kill --kill-who=all tester.service
Failed to issue method call: Unit tester.service is not loaded.

Even more confusing, when the service is started again, the existing process
reappears:
# systemctl start tester
# systemctl status tester
tester.service - tester service
Loaded: loaded (/etc/systemd/system/tester.service; static)
Active: active (running) since Fri 2014-11-28 13:36:09 GMT; 7s ago
Main PID: 5730 (tester)
CGroup: /system.slice/tester.service
├─5691 /home/ross/tester start
├─5730 /home/ross/tester start
└─5731 /home/ross/tester start

Is there a reason for the way this is handled? Perhaps systemd could show
existing processes for a service regardless of the state the service is in?

Also, perhaps systemd could allow killing these processes even if the service
is "stopped"?

Regards
--
Ross Lagerwall

Lennart Poettering

2014-11-28 18:53:33 UTC

Permalink

Post by Ross Lagerwall
The handling of a service with KillMode set to something other than cgroup
is a bit confusing (as of systemd 208).

Hmm, could you test this with newer systemd please? 208 is already
quite old.

Where (in terms of: "which cgroup"?) does "systemd-cgls" show the left-over processes?

We should show the cgroup contents regardless of the state of a
service actually, nothing should be hidden there. If things are hidden
just because of the service state then this would be a bug. If you can
reproduce it with 217 or so that would be great!

Thanks!

Lennart

--
Lennart Poettering, Red Hat

Ross Lagerwall

2014-11-28 20:03:23 UTC

Permalink

Post by Lennart Poettering

Post by Ross Lagerwall
The handling of a service with KillMode set to something other than cgroup
is a bit confusing (as of systemd 208).

Hmm, could you test this with newer systemd please? 208 is already
quite old.
Where (in terms of: "which cgroup"?) does "systemd-cgls" show the left-over processes?

In it's own cgroup, as would normally be the case:

│ ├─tester.service
│ │ └─24709 /home/ross/Downloads/tester start

Post by Lennart Poettering
We should show the cgroup contents regardless of the state of a
service actually, nothing should be hidden there. If things are hidden
just because of the service state then this would be a bug. If you can
reproduce it with 217 or so that would be great!

The same behavior seems to occur with 217 (on Arch):
# systemctl start tester.service
# systemctl status tester.service
● tester.service - Tester service
Loaded: loaded (/etc/systemd/system/tester.service; static)
Active: active (running) since Fri 2014-11-28 19:46:21 GMT; 4s ago
Main PID: 25067 (tester)
CGroup: /system.slice/tester.service
├─25067 /home/ross/Downloads/tester start
└─25068 /home/ross/Downloads/tester start
# systemctl stop tester
# systemctl status tester.service
● tester.service - Tester service
Loaded: loaded (/etc/systemd/system/tester.service; static)
Active: inactive (dead)

# ps aux | grep tester
root 25068 0.0 0.0 4048 76 ? S 19:46 0:00 /home/ross/Downloads/tester start

# systemctl start tester.service
# systemctl status tester.service
● tester.service - Tester service
Loaded: loaded (/etc/systemd/system/tester.service; static)
Active: active (running) since Fri 2014-11-28 19:50:58 GMT; 2s ago
Main PID: 25148 (tester)
CGroup: /system.slice/tester.service
├─25068 /home/ross/Downloads/tester start <-- the left over process!
├─25148 /home/ross/Downloads/tester start
└─25149 /home/ross/Downloads/tester start

With 217, running "systemctl kill --kill-who=all -s KILL tester.service"
doesn't fail, but it doesn't seem to do anything either.

Thanks,

--
Ross Lagerwall

Ross Lagerwall

2014-11-29 15:27:14 UTC

Permalink

If a cgroup fails to be destroyed (most likely because there are still
processes running as part of a service after the main pid exits), don't
free and remove the cgroup unit from the manager. This fixes a
regression introduced by the cgroup rework in v205 where systemd would
forget about processes still running after the unit becomes inactive.
(This can happen when the main pid exits and KillMode=process or none).
---

Not sure if this is the correct fix but it seems to fix the issue I reported.
When the issue occurs, this notice is visible in the logs:
Nov 28 22:11:32 centi7 systemd[1]: Failed to destroy cgroup /system.slice/tester.service: Device or resource busy

src/core/cgroup.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/core/cgroup.c b/src/core/cgroup.c
index 70fc925..af04835 100644
--- a/src/core/cgroup.c
+++ b/src/core/cgroup.c
@@ -791,8 +791,10 @@ void unit_destroy_cgroup(Unit *u) {
return;

r = cg_trim_everywhere(u->manager->cgroup_supported, u->cgroup_path, !unit_has_name(u, SPECIAL_ROOT_SLICE));
- if (r < 0)
+ if (r < 0) {
log_debug_errno(r, "Failed to destroy cgroup %s: %m", u->cgroup_path);
+ return;
+ }

hashmap_remove(u->manager->cgroup_unit, u->cgroup_path);

--
2.1.2

Lennart Poettering

2014-12-09 01:33:03 UTC

Permalink

Post by Ross Lagerwall
If a cgroup fails to be destroyed (most likely because there are still
processes running as part of a service after the main pid exits), don't
free and remove the cgroup unit from the manager. This fixes a
regression introduced by the cgroup rework in v205 where systemd would
forget about processes still running after the unit becomes inactive.
(This can happen when the main pid exits and KillMode=process or none).
---
Not sure if this is the correct fix but it seems to fix the issue I reported.
Nov 28 22:11:32 centi7 systemd[1]: Failed to destroy cgroup /system.slice/tester.service: Device or resource busy

Patch looks great actually. Applied! Thanks for tracking this down and
prepping a patch.

After applying it I renamed the unit_destroy_cgroup() call to
unit_destroy_cgroup_if_empty() since it is now much less destructive
than it used to be.

Thanks,

Lennart

--
Lennart Poettering, Red Hat