Discussion:
[systemd-devel] rghvk@outlook.com
Aravindhan Krishnan
2021-06-07 15:56:12 UTC
Permalink
Hi Folks,

I am finding anomalous behavior when I am trying to run dhclient process
inside my docker container in vanilla Ubuntu 16.04 host. The service gets
into "deactivating" state and is stuck forever. In the mail I have attached
a minimalistic reproduction of the issue seen.

Working logic:

- There is a sample ***@.service script which invokes the `trial`
binary with the option passed to the systemd service via @ option
- The valid options are sleep and dhclient_<interface_name>
- The binary either invokes a long-lived sleep process or dhclient
process on the said interface_name based on the input
- The binary then spawns `kill_trial.sh` script. The script sleeps for
20 seconds and kills the parent `trial` binary. The kill signal is SIGKILL
in the trial example. In the real-world, this can be a SIGSEGV indicating a
crash in the parent process.
- If the trial binary was started for sleep process things work fine and
service goes into "failed" state as expected
- However, in case of dhclient, the service is stuck in "deactivating"
state if the underlying host OS is Ubuntu 16.04. This works well if the
host is running Ubuntu 20.04.
- We have kept TimeoutStopSec to infinity, because in real-word
deployments, the core collection post a crash takes varying time depending
on the memory config on the host.


Steps to reproduce
# tar -xf minimal_repro.tar -C minimal_repro/
# cd minimal_repro/
# docker build -t trial .
# docker rm -f trial
# docker run -it -d --net=host --privileged -v
/sys/fs/cgroup:/sys/fs/cgroup:ro --name trial trial
# docker exec -it trial bash

# systemctl start ***@dhclient_eth1.service

# #Keep monitoring ***@dhclient_eth1.service -- issue should be seen
within 20-30 seconds on Ubuntu 16.04 host

# systemctl status ***@dhclient_eth1.service
● ***@dhclient_eth1.service - Trial
Loaded: loaded (/etc/systemd/system/***@.service; static; vendor
preset: enabled)
Active: deactivating (stop-sigterm) (Result: signal) since Mon
2021-06-07 13:19:12 UTC; 1min 11s ago
Process: 55 ExecStartPre=/bin/bash
/etc/systemd/system/trial_service_script.sh pre_start dhclient_eth1
(code=exited, status=0/SUCCESS)
Process: 56 ExecStart=/bin/bash
/etc/systemd/system/trial_service_script.sh start dhclient_eth1
(code=killed, signal=KILL)
Main PID: 56 (code=killed, signal=KILL)
Tasks: 0 (limit: 38590)
Memory: 588.0K
CGroup:
/docker/903fca0cee1387b7c2113a36ee5efdb3a25edd1e60584fe5da5d0c5b5ffd8241/system.slice/system-trial.slice/***@dhclient_eth1.service

# #NOTE: `Active: deactivating` -- in stuck state
# #Running `systemctl daemon-reload` forces the service to go to failed
state

# systemctl start ***@sleep.service

# #Keep monitoring ***@sleep.service -- would be killed in 20-30 seconds
and goes into failed state as expected

# # systemctl status ***@sleep.service
● ***@sleep.service - Trial
Loaded: loaded (/etc/systemd/system/***@.service; static; vendor
preset: enabled)
Active: failed (Result: signal) since Mon 2021-06-07 13:38:19 UTC; 21s
ago
Process: 113 ExecStartPre=/bin/bash
/etc/systemd/system/trial_service_script.sh pre_start sleep (code=exited,
status=0/SUCCESS)
Process: 114 ExecStart=/bin/bash
/etc/systemd/system/trial_service_script.sh start sleep (code=killed,
signal=KILL)
Process: 129 ExecStopPost=/bin/bash
/etc/systemd/system/trial_service_script.sh post_stop sleep (code=exited,
status=0/SUCCESS)
Main PID: 114 (code=killed, signal=KILL)

Please advise on what can help us in alleviating the issue.

Thanks,
Aravindhan

Regards,
Aravindhan Krishnan...
Aravindhan Krishnan
2021-06-07 15:57:35 UTC
Permalink
Adding Raghav.

And sorry the subject should have stated: Discrepancy in using dhclient b/w
ubuntu 20.04 and ubuntu 16.04

Regards,
Aravindhan Krishnan...
Post by Aravindhan Krishnan
Hi Folks,
I am finding anomalous behavior when I am trying to run dhclient process
inside my docker container in vanilla Ubuntu 16.04 host. The service gets
into "deactivating" state and is stuck forever. In the mail I have attached
a minimalistic reproduction of the issue seen.
- The valid options are sleep and dhclient_<interface_name>
- The binary either invokes a long-lived sleep process or dhclient
process on the said interface_name based on the input
- The binary then spawns `kill_trial.sh` script. The script sleeps for
20 seconds and kills the parent `trial` binary. The kill signal is SIGKILL
in the trial example. In the real-world, this can be a SIGSEGV indicating a
crash in the parent process.
- If the trial binary was started for sleep process things work fine
and service goes into "failed" state as expected
- However, in case of dhclient, the service is stuck in "deactivating"
state if the underlying host OS is Ubuntu 16.04. This works well if the
host is running Ubuntu 20.04.
- We have kept TimeoutStopSec to infinity, because in real-word
deployments, the core collection post a crash takes varying time depending
on the memory config on the host.
Steps to reproduce
# tar -xf minimal_repro.tar -C minimal_repro/
# cd minimal_repro/
# docker build -t trial .
# docker rm -f trial
# docker run -it -d --net=host --privileged -v
/sys/fs/cgroup:/sys/fs/cgroup:ro --name trial trial
# docker exec -it trial bash
within 20-30 seconds on Ubuntu 16.04 host
preset: enabled)
Active: deactivating (stop-sigterm) (Result: signal) since Mon
2021-06-07 13:19:12 UTC; 1min 11s ago
Process: 55 ExecStartPre=/bin/bash
/etc/systemd/system/trial_service_script.sh pre_start dhclient_eth1
(code=exited, status=0/SUCCESS)
Process: 56 ExecStart=/bin/bash
/etc/systemd/system/trial_service_script.sh start dhclient_eth1
(code=killed, signal=KILL)
Main PID: 56 (code=killed, signal=KILL)
Tasks: 0 (limit: 38590)
Memory: 588.0K
# #NOTE: `Active: deactivating` -- in stuck state
# #Running `systemctl daemon-reload` forces the service to go to failed
state
seconds and goes into failed state as expected
preset: enabled)
Active: failed (Result: signal) since Mon 2021-06-07 13:38:19 UTC;
21s ago
Process: 113 ExecStartPre=/bin/bash
/etc/systemd/system/trial_service_script.sh pre_start sleep (code=exited,
status=0/SUCCESS)
Process: 114 ExecStart=/bin/bash
/etc/systemd/system/trial_service_script.sh start sleep (code=killed,
signal=KILL)
Process: 129 ExecStopPost=/bin/bash
/etc/systemd/system/trial_service_script.sh post_stop sleep (code=exited,
status=0/SUCCESS)
Main PID: 114 (code=killed, signal=KILL)
Please advise on what can help us in alleviating the issue.
Thanks,
Aravindhan
Regards,
Aravindhan Krishnan...
Reindl Harald
2021-06-07 16:23:32 UTC
Permalink
Post by Aravindhan Krishnan
Adding Raghav.
And sorry the subject should have stated: Discrepancy in using dhclient
b/w ubuntu 20.04 and ubuntu 16.04
and why didn't you fix it in your own reply?

to your problem:
you have a wild mix of docker, systemd-units and shellscripts but don't
provide the source of the scripts nor the systemd unit

overly complex for something that can be trivial as:

[***@srv-rhsoft:~]$ cat /etc/systemd/system/network-wan-dhcp.service
[Unit]
Description=Internet DHCP-Client

[Service]
Type=forking
ExecStart=/usr/sbin/dhclient -4 -q --no-pid --request-options
subnet-mask,broadcast-address,routers br-wan
PermissionsStartOnly=yes
SuccessExitStatus=80
Restart=always
RestartSec=5
ProtectSystem=strict
ProtectHome=yes
ReadWritePaths=-/var/lib/dhclient
PrivateTmp=yes
NoNewPrivileges=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
MemoryDenyWriteExecute=yes
CapabilityBoundingSet=CAP_NET_ADMIN CAP_NET_BIND_SERVICE
CAP_NET_BROADCAST CAP_NET_RAW
LockPersonality=yes
PrivateDevices=yes
ProtectHostname=yes
RestrictNamespaces=yes
RestrictRealtime=yes
RestrictSUIDSGID=yes
ProtectClock=true
ProtectKernelLogs=true
UMask=077
SystemCallArchitectures=native
SystemCallFilter=@system-service @network-io @privileged
SystemCallFilter=~@aio @chown @clock @cpu-emulation @debug @keyring
@module @mount @obsolete @raw-io @reboot @resources @swap
InaccessiblePaths=-/boot
InaccessiblePaths=-/efi
InaccessiblePaths=-/root
Post by Aravindhan Krishnan
On Mon, 7 Jun 2021 at 21:26, Aravindhan Krishnan
Hi Folks,
I am finding anomalous behavior when I am trying to run dhclient
process inside my docker container in vanilla Ubuntu 16.04 host. The
service gets into "deactivating" state and is stuck forever. In the
mail I have attached a minimalistic reproduction of the issue seen.
`trial` binary with the option passed to the systemd service via
@ option
* The valid options are sleep and dhclient_<interface_name>
* The binary either invokes a long-lived sleep process or dhclient
process on the said interface_name based on the input
* The binary then spawns `kill_trial.sh` script. The script sleeps
for 20 seconds and kills the parent `trial` binary. The kill
signal is SIGKILL in the trial example. In the real-world, this
can be a SIGSEGV indicating a crash in the parent process.
* If the trial binary was started for sleep process things work
fine and service goes into "failed" state as expected
* However, in case of dhclient, the service is stuck in
"deactivating" state if the underlying host OS is Ubuntu 16.04.
This works well if the host is running Ubuntu 20.04.
* We have kept TimeoutStopSec to infinity, because in real-word
deployments, the core collection post a crash takes varying time
depending on the memory config on the host.
Steps to reproduce
# tar -xf minimal_repro.tar -C minimal_repro/
# cd minimal_repro/
# docker build -t trial .
# docker rm -f trial
# docker run -it -d --net=host --privileged -v
/sys/fs/cgroup:/sys/fs/cgroup:ro --name trial trial
# docker exec -it trial bash
seen within 20-30 seconds on Ubuntu 16.04 host
vendor preset: enabled)
     Active: deactivating (stop-sigterm) (Result: signal) since Mon
2021-06-07 13:19:12 UTC; 1min 11s ago
    Process: 55 ExecStartPre=/bin/bash
/etc/systemd/system/trial_service_script.sh pre_start dhclient_eth1
(code=exited, status=0/SUCCESS)
    Process: 56 ExecStart=/bin/bash
/etc/systemd/system/trial_service_script.sh start dhclient_eth1
(code=killed, signal=KILL)
   Main PID: 56 (code=killed, signal=KILL)
      Tasks: 0 (limit: 38590)
     Memory: 588.0K
# #NOTE: `Active: deactivating` -- in stuck state
# #Running `systemctl daemon-reload` forces the service to go to
failed state
seconds and goes into failed state as expected
vendor preset: enabled)
     Active: failed (Result: signal) since Mon 2021-06-07 13:38:19
UTC; 21s ago
    Process: 113 ExecStartPre=/bin/bash
/etc/systemd/system/trial_service_script.sh pre_start sleep
(code=exited, status=0/SUCCESS)
    Process: 114 ExecStart=/bin/bash
/etc/systemd/system/trial_service_script.sh start sleep
(code=killed, signal=KILL)
    Process: 129 ExecStopPost=/bin/bash
/etc/systemd/system/trial_service_script.sh post_stop sleep
(code=exited, status=0/SUCCESS)
   Main PID: 114 (code=killed, signal=KILL)
Please advise on what can help us in alleviating the issue.
Lennart Poettering
2021-06-07 16:54:19 UTC
Permalink
Post by Aravindhan Krishnan
Hi Folks,
I am finding anomalous behavior when I am trying to run dhclient process
inside my docker container in vanilla Ubuntu 16.04 host. The service gets
into "deactivating" state and is stuck forever. In the mail I have attached
a minimalistic reproduction of the issue seen.
Are you running systemd inside of a Docker container on Ubuntu 16.04?

Docker isn't really up to that. In particular not 5y old versions of it.

Lennart

--
Lennart Poettering, Berlin
Aravindhan Krishnan
2021-06-07 17:17:12 UTC
Permalink
Hi Lennart,

Thanks for the quick response. Yes, we are running systemd inside the
docker. We were also able to see the same issue even on top of Centos 7.9.

Attaching the kernel and OS details of the centos host

# uname -r
3.10.0-1160.25.1.el7.x86_64

# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)


Regards,
Aravindhan Krishnan...
Post by Aravindhan Krishnan
Post by Aravindhan Krishnan
Hi Folks,
I am finding anomalous behavior when I am trying to run dhclient process
inside my docker container in vanilla Ubuntu 16.04 host. The service gets
into "deactivating" state and is stuck forever. In the mail I have
attached
Post by Aravindhan Krishnan
a minimalistic reproduction of the issue seen.
Are you running systemd inside of a Docker container on Ubuntu 16.04?
Docker isn't really up to that. In particular not 5y old versions of it.
Lennart
--
Lennart Poettering, Berlin
Lennart Poettering
2021-06-07 20:15:00 UTC
Permalink
Post by Aravindhan Krishnan
Hi Lennart,
Thanks for the quick response. Yes, we are running systemd inside the
docker. We were also able to see the same issue even on top of
Centos 7.9.
Unlike pretty much all other container managers Docker doesn't really
make it easy to run systemd inside it. Docker upstream is pretty
hostile towards systemd, so this is unlikely to change.

We document pretty extensively what container managers have to do to
make sure systemd just works inside containers. Pretty much all
container managers just implement that, but Docker doesn't. This is
what they need to implement:

https://systemd.io/CONTAINER_INTERFACE

Consider switching to a different container manager implementation,
there are plenty others. (in particular podman is mostly a drop-in
replacement for Docker, if you need Docker semantics. Podman upstream
isn't hostile towards systemd, so things mostly just work there.)
Post by Aravindhan Krishnan
Attaching the kernel and OS details of the centos host
# uname -r
3.10.0-1160.25.1.el7.x86_64
# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)
This is very old. You might want to switch to a newer OS for this
anyway.

Lennart

--
Lennart Poettering, Berlin

Silvio Knizek
2021-06-07 19:26:55 UTC
Permalink
Post by Aravindhan Krishnan
Hi Folks,
I am finding anomalous behavior when I am trying to run dhclient
process inside my docker container in vanilla Ubuntu 16.04 host. The
service gets into "deactivating" state and is stuck forever. In the
mail I have attached a minimalistic reproduction of the issue seen.
Thanks,
Aravindhan
Regards,
Aravindhan Krishnan...
Hi Aravindhan,

don't run systemd in a docker container in the first place? Also Ubuntu
16.04 is really old.
IMHO all your problems are created by your setup itself. I really
appreciate the minimal example you attached, but if your premise
(running systemd in a docker container and not just one simple process)
is already wrong, than no solution can be right.

BR
Silvio
Loading...