Aravindhan Krishnan
2021-06-07 15:56:12 UTC
Hi Folks,
I am finding anomalous behavior when I am trying to run dhclient process
inside my docker container in vanilla Ubuntu 16.04 host. The service gets
into "deactivating" state and is stuck forever. In the mail I have attached
a minimalistic reproduction of the issue seen.
Working logic:
- There is a sample ***@.service script which invokes the `trial`
binary with the option passed to the systemd service via @ option
- The valid options are sleep and dhclient_<interface_name>
- The binary either invokes a long-lived sleep process or dhclient
process on the said interface_name based on the input
- The binary then spawns `kill_trial.sh` script. The script sleeps for
20 seconds and kills the parent `trial` binary. The kill signal is SIGKILL
in the trial example. In the real-world, this can be a SIGSEGV indicating a
crash in the parent process.
- If the trial binary was started for sleep process things work fine and
service goes into "failed" state as expected
- However, in case of dhclient, the service is stuck in "deactivating"
state if the underlying host OS is Ubuntu 16.04. This works well if the
host is running Ubuntu 20.04.
- We have kept TimeoutStopSec to infinity, because in real-word
deployments, the core collection post a crash takes varying time depending
on the memory config on the host.
Steps to reproduce
# tar -xf minimal_repro.tar -C minimal_repro/
# cd minimal_repro/
# docker build -t trial .
# docker rm -f trial
# docker run -it -d --net=host --privileged -v
/sys/fs/cgroup:/sys/fs/cgroup:ro --name trial trial
# docker exec -it trial bash
# systemctl start ***@dhclient_eth1.service
# #Keep monitoring ***@dhclient_eth1.service -- issue should be seen
within 20-30 seconds on Ubuntu 16.04 host
# systemctl status ***@dhclient_eth1.service
â ***@dhclient_eth1.service - Trial
Loaded: loaded (/etc/systemd/system/***@.service; static; vendor
preset: enabled)
Active: deactivating (stop-sigterm) (Result: signal) since Mon
2021-06-07 13:19:12 UTC; 1min 11s ago
Process: 55 ExecStartPre=/bin/bash
/etc/systemd/system/trial_service_script.sh pre_start dhclient_eth1
(code=exited, status=0/SUCCESS)
Process: 56 ExecStart=/bin/bash
/etc/systemd/system/trial_service_script.sh start dhclient_eth1
(code=killed, signal=KILL)
Main PID: 56 (code=killed, signal=KILL)
Tasks: 0 (limit: 38590)
Memory: 588.0K
CGroup:
/docker/903fca0cee1387b7c2113a36ee5efdb3a25edd1e60584fe5da5d0c5b5ffd8241/system.slice/system-trial.slice/***@dhclient_eth1.service
# #NOTE: `Active: deactivating` -- in stuck state
# #Running `systemctl daemon-reload` forces the service to go to failed
state
# systemctl start ***@sleep.service
# #Keep monitoring ***@sleep.service -- would be killed in 20-30 seconds
and goes into failed state as expected
# # systemctl status ***@sleep.service
â ***@sleep.service - Trial
Loaded: loaded (/etc/systemd/system/***@.service; static; vendor
preset: enabled)
Active: failed (Result: signal) since Mon 2021-06-07 13:38:19 UTC; 21s
ago
Process: 113 ExecStartPre=/bin/bash
/etc/systemd/system/trial_service_script.sh pre_start sleep (code=exited,
status=0/SUCCESS)
Process: 114 ExecStart=/bin/bash
/etc/systemd/system/trial_service_script.sh start sleep (code=killed,
signal=KILL)
Process: 129 ExecStopPost=/bin/bash
/etc/systemd/system/trial_service_script.sh post_stop sleep (code=exited,
status=0/SUCCESS)
Main PID: 114 (code=killed, signal=KILL)
Please advise on what can help us in alleviating the issue.
Thanks,
Aravindhan
Regards,
Aravindhan Krishnan...
I am finding anomalous behavior when I am trying to run dhclient process
inside my docker container in vanilla Ubuntu 16.04 host. The service gets
into "deactivating" state and is stuck forever. In the mail I have attached
a minimalistic reproduction of the issue seen.
Working logic:
- There is a sample ***@.service script which invokes the `trial`
binary with the option passed to the systemd service via @ option
- The valid options are sleep and dhclient_<interface_name>
- The binary either invokes a long-lived sleep process or dhclient
process on the said interface_name based on the input
- The binary then spawns `kill_trial.sh` script. The script sleeps for
20 seconds and kills the parent `trial` binary. The kill signal is SIGKILL
in the trial example. In the real-world, this can be a SIGSEGV indicating a
crash in the parent process.
- If the trial binary was started for sleep process things work fine and
service goes into "failed" state as expected
- However, in case of dhclient, the service is stuck in "deactivating"
state if the underlying host OS is Ubuntu 16.04. This works well if the
host is running Ubuntu 20.04.
- We have kept TimeoutStopSec to infinity, because in real-word
deployments, the core collection post a crash takes varying time depending
on the memory config on the host.
Steps to reproduce
# tar -xf minimal_repro.tar -C minimal_repro/
# cd minimal_repro/
# docker build -t trial .
# docker rm -f trial
# docker run -it -d --net=host --privileged -v
/sys/fs/cgroup:/sys/fs/cgroup:ro --name trial trial
# docker exec -it trial bash
# systemctl start ***@dhclient_eth1.service
# #Keep monitoring ***@dhclient_eth1.service -- issue should be seen
within 20-30 seconds on Ubuntu 16.04 host
# systemctl status ***@dhclient_eth1.service
â ***@dhclient_eth1.service - Trial
Loaded: loaded (/etc/systemd/system/***@.service; static; vendor
preset: enabled)
Active: deactivating (stop-sigterm) (Result: signal) since Mon
2021-06-07 13:19:12 UTC; 1min 11s ago
Process: 55 ExecStartPre=/bin/bash
/etc/systemd/system/trial_service_script.sh pre_start dhclient_eth1
(code=exited, status=0/SUCCESS)
Process: 56 ExecStart=/bin/bash
/etc/systemd/system/trial_service_script.sh start dhclient_eth1
(code=killed, signal=KILL)
Main PID: 56 (code=killed, signal=KILL)
Tasks: 0 (limit: 38590)
Memory: 588.0K
CGroup:
/docker/903fca0cee1387b7c2113a36ee5efdb3a25edd1e60584fe5da5d0c5b5ffd8241/system.slice/system-trial.slice/***@dhclient_eth1.service
# #NOTE: `Active: deactivating` -- in stuck state
# #Running `systemctl daemon-reload` forces the service to go to failed
state
# systemctl start ***@sleep.service
# #Keep monitoring ***@sleep.service -- would be killed in 20-30 seconds
and goes into failed state as expected
# # systemctl status ***@sleep.service
â ***@sleep.service - Trial
Loaded: loaded (/etc/systemd/system/***@.service; static; vendor
preset: enabled)
Active: failed (Result: signal) since Mon 2021-06-07 13:38:19 UTC; 21s
ago
Process: 113 ExecStartPre=/bin/bash
/etc/systemd/system/trial_service_script.sh pre_start sleep (code=exited,
status=0/SUCCESS)
Process: 114 ExecStart=/bin/bash
/etc/systemd/system/trial_service_script.sh start sleep (code=killed,
signal=KILL)
Process: 129 ExecStopPost=/bin/bash
/etc/systemd/system/trial_service_script.sh post_stop sleep (code=exited,
status=0/SUCCESS)
Main PID: 114 (code=killed, signal=KILL)
Please advise on what can help us in alleviating the issue.
Thanks,
Aravindhan
Regards,
Aravindhan Krishnan...