Discussion:
sometimes systemd takes a very long time to restart service
(too old to reply)
JB
2013-01-02 00:27:07 UTC
Permalink
Hello all,
I do have one more question about systemd. Sometimes, it takes a
really really long time to get a service to stop or restart.
Othertimes, it doesn't. Now, when it does take a long time I get
entires in the logs such as this:

Jan 1 16:23:42 localhost systemd[1]: webrickd.service stopping timed
out (2). Killing.
Jan 1 16:25:12 localhost systemd[1]: webrickd.service still around
after SIGKILL (2). Entering failed mode.
Jan 1 16:25:12 localhost systemd[1]: Unit webrickd.service entered
failed state.
Jan 1 16:25:12 localhost pulseaudio[9196]: main.c: Unable to contact
D-Bus: org.freedesktop.DBus.Error.Spawn.ExecFailed: /bin/dbus-launch
terminated abnormally with the following error: Autolaunch error: X11
initialization failed.

Might also be nice to figure out how to get rid of that last pulseaudio
error as well but I'll do that some other time.

The service is actually no longer running and does stop very quickly but
systemd still sits there timing out and I have no idea why.

Here is the service file:
***********************************************
[Unit]
Description=DriveSafety Configuration ruby webrick daemon
After=network.target

[Service]
Type=forking
WorkingDirectory=/home/sim/vection/bin
PIDFile=/home/sim/vection/data/logs/simd.pid
ExecStartPre=/home/sim/vection/system/scripts/preStartSimd.sh
ExecStart=/home/sim/vection/bin/simd.rb -d -p
/home/sim/vection/data/logs/simd.pid
StandardOutput=null
StandardError=null
User=sim
Group=sim
ControlGroup=cpu:/

[Install]
WantedBy=multi-user.target
***********************************************

I don't know why this works sometimes but not others. I would think if
the process has ended, it should see that but it doesn't seem to.

Thanks in advance for any ideas on how I can diagnose this or solve it.


JB
Reindl Harald
2013-01-02 00:49:49 UTC
Permalink
Post by JB
Hello all,
I do have one more question about systemd. Sometimes, it takes a really really long time to get a service to
Jan 1 16:23:42 localhost systemd[1]: webrickd.service stopping timed out (2). Killing.
Jan 1 16:25:12 localhost systemd[1]: webrickd.service still around after SIGKILL (2). Entering failed mode.
Jan 1 16:25:12 localhost systemd[1]: Unit webrickd.service entered failed state.
the same happens on F17 with rsyslog if it is configured for mysqld
on a remote host because network.service is stopped before rsyslog
and rsyslog does not handle correctly the siutuation

it took me months to find out what hangs permamently 1.5 minutes
on production machines, my workaround is a systemd-unit which
starts after network.service and TimeoutStopSec=1
________________________________

cat /etc/systemd/system/rsyslog.service
[Unit]
Description=System Logging Service
After=network.service

[Service]
EnvironmentFile=-/etc/sysconfig/rsyslog
ExecStart=/sbin/rsyslogd -n $SYSLOGD_OPTIONS
Sockets=syslog.socket
StandardOutput=null
Restart=always
RestartSec=1
TimeoutStopSec=1

[Install]
WantedBy=multi-user.target
Alias=syslog.service
JB
2013-01-02 01:52:38 UTC
Permalink
Thanks! I'll try and it may work in my case. What's interesting is
that in your case it sounded like rsyslog was hanging around while it
was having problems dealing with the condition of having the network
unavailable. In my case, webrickd actually stops and shuts down almost
immediately but for some reason systemd doesn't or can't figure that
out. Right now it is happening every single time I do:
systemctl restart webrickd.service

I timed it and like you said it sits for 3 minutes every time. Very
irritating. I did have it working at one point but not consistently.
Boot's were always quick and often restarts of just the service were
quick. I'm not even doing a full shutdown. I wish I could figure out
why it is so inconsistent.

I'll try your solution but I'm concerned it's covering up a deeper
problem and I already have quite a bit of those. I call these types of
solutions "technical debt" because sooner or later you will wind up
having to paying for it :)

JB
Post by Reindl Harald
Post by JB
Hello all,
I do have one more question about systemd. Sometimes, it takes a really really long time to get a service to
Jan 1 16:23:42 localhost systemd[1]: webrickd.service stopping timed out (2). Killing.
Jan 1 16:25:12 localhost systemd[1]: webrickd.service still around after SIGKILL (2). Entering failed mode.
Jan 1 16:25:12 localhost systemd[1]: Unit webrickd.service entered failed state.
the same happens on F17 with rsyslog if it is configured for mysqld
on a remote host because network.service is stopped before rsyslog
and rsyslog does not handle correctly the siutuation
it took me months to find out what hangs permamently 1.5 minutes
on production machines, my workaround is a systemd-unit which
starts after network.service and TimeoutStopSec=1
________________________________
cat /etc/systemd/system/rsyslog.service
[Unit]
Description=System Logging Service
After=network.service
[Service]
EnvironmentFile=-/etc/sysconfig/rsyslog
ExecStart=/sbin/rsyslogd -n $SYSLOGD_OPTIONS
Sockets=syslog.socket
StandardOutput=null
Restart=always
RestartSec=1
TimeoutStopSec=1
[Install]
WantedBy=multi-user.target
Alias=syslog.service
------------------------------------------------------------------------
_______________________________________________
systemd-devel mailing list
http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Andrey Borzenkov
2013-01-02 06:20:49 UTC
Permalink
В Tue, 01 Jan 2013 18:52:38 -0700
Post by JB
Thanks! I'll try and it may work in my case. What's interesting is
that in your case it sounded like rsyslog was hanging around while it
was having problems dealing with the condition of having the network
unavailable. In my case, webrickd actually stops and shuts down almost
immediately but for some reason systemd doesn't or can't figure that
out.
Showing "systemctl status webrickd.service" before restarting and
during restart may give some hints.
JB
2013-01-02 06:37:56 UTC
Permalink
Post by Andrey Borzenkov
В Tue, 01 Jan 2013 18:52:38 -0700
Post by JB
Thanks! I'll try and it may work in my case. What's interesting is
that in your case it sounded like rsyslog was hanging around while it
was having problems dealing with the condition of having the network
unavailable. In my case, webrickd actually stops and shuts down almost
immediately but for some reason systemd doesn't or can't figure that
out.
Showing "systemctl status webrickd.service" before restarting and
during restart may give some hints.
Good idea.

It goes from active/running to this immediately after restart...

webrickd.service - Configuration ruby webrick daemon
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: deactivating (final-sigterm) since Tue, 01 Jan 2013
22:44:39 -0700; 1min 14s ago
Process: 15771 ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
/home/rtuser/app/data/logs/webrickd.pid (code=exited, status=0/SUCCESS)

Then this, anywhere from 1 to 3 minutes later...

webrickd.service - Configuration ruby webrick daemon
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: active (running) since Tue, 01 Jan 2013 22:47:40
-0700; 15s ago
Process: 15807 ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
/home/rtuser/app/data/logs/webrickd.pid (code=exited, status=0/SUCCESS)

Any ideas why it sits in state "deactivating (final-sigterm)" for 1 to 3
minutes?
Andrey Borzenkov
2013-01-02 07:09:31 UTC
Permalink
В Tue, 01 Jan 2013 23:37:56 -0700
Post by JB
Post by Andrey Borzenkov
В Tue, 01 Jan 2013 18:52:38 -0700
Post by JB
Thanks! I'll try and it may work in my case. What's interesting is
that in your case it sounded like rsyslog was hanging around while it
was having problems dealing with the condition of having the network
unavailable. In my case, webrickd actually stops and shuts down almost
immediately but for some reason systemd doesn't or can't figure that
out.
Showing "systemctl status webrickd.service" before restarting and
during restart may give some hints.
Good idea.
It goes from active/running to this immediately after restart...
webrickd.service - Configuration ruby webrick daemon
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: deactivating (final-sigterm) since Tue, 01 Jan 2013
22:44:39 -0700; 1min 14s ago
Process: 15771 ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
/home/rtuser/app/data/logs/webrickd.pid (code=exited, status=0/SUCCESS)
Then this, anywhere from 1 to 3 minutes later...
webrickd.service - Configuration ruby webrick daemon
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: active (running) since Tue, 01 Jan 2013 22:47:40
-0700; 15s ago
Process: 15807 ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
/home/rtuser/app/data/logs/webrickd.pid (code=exited, status=0/SUCCESS)
And initial state (i.e. after it is started)?
Post by JB
Any ideas why it sits in state "deactivating (final-sigterm)" for 1 to 3
minutes?
JB
2013-01-02 07:13:24 UTC
Permalink
Post by Andrey Borzenkov
В Tue, 01 Jan 2013 23:37:56 -0700
Post by JB
Post by Andrey Borzenkov
В Tue, 01 Jan 2013 18:52:38 -0700
Post by JB
Thanks! I'll try and it may work in my case. What's interesting is
that in your case it sounded like rsyslog was hanging around while it
was having problems dealing with the condition of having the network
unavailable. In my case, webrickd actually stops and shuts down almost
immediately but for some reason systemd doesn't or can't figure that
out.
Showing "systemctl status webrickd.service" before restarting and
during restart may give some hints.
Good idea.
It goes from active/running to this immediately after restart...
webrickd.service - Configuration ruby webrick daemon
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: deactivating (final-sigterm) since Tue, 01 Jan 2013
22:44:39 -0700; 1min 14s ago
Process: 15771 ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
/home/rtuser/app/data/logs/webrickd.pid (code=exited, status=0/SUCCESS)
Then this, anywhere from 1 to 3 minutes later...
webrickd.service - Configuration ruby webrick daemon
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: active (running) since Tue, 01 Jan 2013 22:47:40
-0700; 15s ago
Process: 15807 ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
/home/rtuser/app/data/logs/webrickd.pid (code=exited, status=0/SUCCESS)
And initial state (i.e. after it is started)?
Sorry I wasn't clear, the second status block up there that says
"Active: active (running)" is the status (i.e. after it is started).

The former status block above is what it changes to nearly immediately
after executing "systemctl restart webrickd.service" then it stays that
way (verified on another console) until after the command returns 1 to 3
minutes later at which point the status changes to what is reported in
the latter block above. Repeating reproduces the same result.
JB
2013-01-02 09:28:19 UTC
Permalink
Post by Andrey Borzenkov
В Tue, 01 Jan 2013 23:37:56 -0700
Post by JB
Post by Andrey Borzenkov
В Tue, 01 Jan 2013 18:52:38 -0700
Post by JB
Thanks! I'll try and it may work in my case. What's interesting is
that in your case it sounded like rsyslog was hanging around while it
was having problems dealing with the condition of having the network
unavailable. In my case, webrickd actually stops and shuts down almost
immediately but for some reason systemd doesn't or can't figure that
out.
Showing "systemctl status webrickd.service" before restarting and
during restart may give some hints.
Good idea.
It goes from active/running to this immediately after restart...
webrickd.service - Configuration ruby webrick daemon
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: deactivating (final-sigterm) since Tue, 01 Jan 2013
22:44:39 -0700; 1min 14s ago
Process: 15771 ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
/home/rtuser/app/data/logs/webrickd.pid (code=exited, status=0/SUCCESS)
Then this, anywhere from 1 to 3 minutes later...
webrickd.service - Configuration ruby webrick daemon
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: active (running) since Tue, 01 Jan 2013 22:47:40
-0700; 15s ago
Process: 15807 ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
/home/rtuser/app/data/logs/webrickd.pid (code=exited, status=0/SUCCESS)
And initial state (i.e. after it is started)?
Post by JB
Any ideas why it sits in state "deactivating (final-sigterm)" for 1 to 3
minutes
It gets better. I took a very stripped simple daemon to try to
understand the interaction between systemd and the process. Here it is:
*******************************************
#!/bin/env
/usr/bin/ruby

require "webrick"

class SomeServlet < WEBrick::HTTPServlet::AbstractServlet
def do_GET (request, response)
response.status=200
response.body="Howdy"
end
end

class SomeServer < WEBrick::HTTPServer
def initialize
super(:Port => 1234, :DocumentRoot => ".")
mount "/", SomeServlet
['TERM'].each { |signal| trap(signal) { shutdown }
}

start
end
end

WEBrick::Daemon.start
s = SomeServer.new
*******************************************


Here's the service file:
*******************************************
[Unit]
Description=Webrick Test Service
After=network.target

[Service]
Type=forking
ExecStart=/usr/bin/ruby /home/rtuser/test.rb

[Install]
WantedBy=multi-user.target
*******************************************

I put this in /etc/systemd/system/webrickd.service
Then ran systemctl enable webrickd.service
Then ran systemctl --system daemon-reload
Then ran systemctl start webrickd.service

Running it from the command line runs as it should. Web server starts
and test.rb is running in the background as a daemon returning me back
to the shell.

But as soon as I try to use systemd, it starts but as soon as the
WEBrick::Daemon.start call is made, everything is killed.
Nothing happens after that. I've tried using the "simple" service
type. I've tried trapping different signals, I've tried starting it up
different ways (e.g. just /home/rtuser/test.rb instead of call to ruby
first.) I've tried just starting the server and not calling
WEBrick.Daemonize then systemd reports all kinds of errors and
failures. I've tried all kinds of things. I opened a log file both
before and after the WEBrick::Daemon.start call to see what was and was
not getting executed. Before call to WEBrick::Daemon.start everything
works, afterward everything is dead but the systemctl command exits quickly.

systemctl status shows:
webrickd.service - Webrick Test Service
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: inactive (dead) since Wed, 02 Jan 2013 01:23:01 -0700;
9min ago
Process: 1605 ExecStart=/usr/bin/ruby /home/rtuser/test.rb
(code=exited, status=0/SUCCESS)
Main PID: 1607 (code=exited, status=0/SUCCESS)
CGroup: name=systemd:/system/webrickd.service

I'm really trying to understand systemd but it is times like this when
do long for the simpler days of init. At this point I would *gladly*
trade a longer boot time for all the headache this has given me. I had
a working init script that has run great for years but quit working with
systemd so I tried the systemd way and just can't make this stupid thing
work. It either takes forever or it flat out fails.

If someone could tell me the "right way" to make this work with systemd,
I would love to use it but I've been at this on and off for weeks and it
isn't getting any easier.
Andrey Borzenkov
2013-01-02 10:14:19 UTC
Permalink
В Wed, 02 Jan 2013 02:28:19 -0700
Post by JB
*******************************************
[Unit]
Description=Webrick Test Service
After=network.target
[Service]
Type=forking
ExecStart=/usr/bin/ruby /home/rtuser/test.rb
[Install]
WantedBy=multi-user.target
*******************************************
I put this in /etc/systemd/system/webrickd.service
Then ran systemctl enable webrickd.service
That is redundant. You can start service even if it is "disabled".
Post by JB
Then ran systemctl --system daemon-reload
Then ran systemctl start webrickd.service
Running it from the command line runs as it should. Web server starts
and test.rb is running in the background as a daemon returning me back
to the shell.
But as soon as I try to use systemd, it starts but as soon as the
WEBrick::Daemon.start call is made, everything is killed.
That is because of ...
Post by JB
webrickd.service - Webrick Test Service
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: inactive (dead) since Wed, 02 Jan 2013 01:23:01 -0700;
9min ago
Process: 1605 ExecStart=/usr/bin/ruby /home/rtuser/test.rb
(code=exited, status=0/SUCCESS)
Main PID: 1607 (code=exited, status=0/SUCCESS)
^^^^^^
Post by JB
CGroup: name=systemd:/system/webrickd.service
You daemon forks too much. This would be a problem with some of
implementations of initscript as well. Even worse, it may sometimes
work due to race condition and sometimes it may fail.
Post by JB
I'm really trying to understand systemd but it is times like this when
do long for the simpler days of init.
initscripts often hide design or programming errors when systemd makes
them obvious. Do not shoot the messenger.
Post by JB
If someone could tell me the "right way" to make this work with systemd,
I would love to use it but I've been at this on and off for weeks and it
isn't getting any easier.
JB
2013-01-02 11:02:27 UTC
Permalink
Post by Andrey Borzenkov
В Wed, 02 Jan 2013 02:28:19 -0700
Post by JB
*******************************************
[Unit]
Description=Webrick Test Service
After=network.target
[Service]
Type=forking
ExecStart=/usr/bin/ruby /home/rtuser/test.rb
[Install]
WantedBy=multi-user.target
*******************************************
I put this in /etc/systemd/system/webrickd.service
Then ran systemctl --system daemon-reload
Then ran systemctl start webrickd.service
Running it from the command line runs as it should. Web server starts
and test.rb is running in the background as a daemon returning me back
to the shell.
But as soon as I try to use systemd, it starts but as soon as the
WEBrick::Daemon.start call is made, everything is killed.
That is because of ...
Post by JB
webrickd.service - Webrick Test Service
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: inactive (dead) since Wed, 02 Jan 2013 01:23:01 -0700;
9min ago
Process: 1605 ExecStart=/usr/bin/ruby /home/rtuser/test.rb
(code=exited, status=0/SUCCESS)
Main PID: 1607 (code=exited, status=0/SUCCESS)
^^^^^^
Post by JB
CGroup: name=systemd:/system/webrickd.service
You daemon forks too much. This would be a problem with some of
implementations of initscript as well. Even worse, it may sometimes
work due to race condition and sometimes it may fail.
As far as I know, it forks once. You sure it forks "too much?" I
should pull an strace just to see exactly how many times it's forking so
I can be sure. Who decided how much forking was too much? Pretty sure
that's always been a "feature" of unix to be used at the discretion of
the programmer. Albeit an expensive one but a feature nonetheless. Not
trying to argue but I'm hoping you see my point. Systemd decided that
it would by default *only* allow daemon processes to fork once, yet
accommodate through special configuration (RemainAfterExit) what I would
consider normal unix programming behavior? Init doesn't care how many
times you fork because it's flexible that way. Now systemd came along
and decided that was somehow a programming error?
Post by Andrey Borzenkov
Post by JB
I'm really trying to understand systemd but it is times like this when
do long for the simpler days of init.
initscripts often hide design or programming errors when systemd makes
them obvious. Do not shoot the messenger.
Not interested in shooting anyone but I am interested in the
interpretation of "programming error." Seems like you're assuming the
systemd assumptions and paradigm are perfect. Pretty sure Unix never
dictated default forking limits, at least not until things like fork
bombs came about and security issues were considered. Forking more than
once could hardly be considered a programming error. I think systemd
folks came up with that one all by themselves :) My point is that
assuming systemd's default allow one fork approach is "pure" and then
passing the rigid inflexibility off as design flaws and programming
errors on the part of daemon developers, IMHO goes too far.
Post by Andrey Borzenkov
Post by JB
If someone could tell me the "right way" to make this work with systemd,
I would love to use it but I've been at this on and off for weeks and it
isn't getting any easier.
Andrey Borzenkov
2013-01-02 12:11:36 UTC
Permalink
В Wed, 02 Jan 2013 04:02:27 -0700
Post by JB
Post by Andrey Borzenkov
В Wed, 02 Jan 2013 02:28:19 -0700
Post by JB
*******************************************
[Unit]
Description=Webrick Test Service
After=network.target
[Service]
Type=forking
ExecStart=/usr/bin/ruby /home/rtuser/test.rb
[Install]
WantedBy=multi-user.target
*******************************************
I put this in /etc/systemd/system/webrickd.service
Then ran systemctl --system daemon-reload
Then ran systemctl start webrickd.service
Running it from the command line runs as it should. Web server starts
and test.rb is running in the background as a daemon returning me back
to the shell.
But as soon as I try to use systemd, it starts but as soon as the
WEBrick::Daemon.start call is made, everything is killed.
That is because of ...
Post by JB
webrickd.service - Webrick Test Service
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: inactive (dead) since Wed, 02 Jan 2013 01:23:01 -0700;
9min ago
Process: 1605 ExecStart=/usr/bin/ruby /home/rtuser/test.rb
(code=exited, status=0/SUCCESS)
Main PID: 1607 (code=exited, status=0/SUCCESS)
^^^^^^
Post by JB
CGroup: name=systemd:/system/webrickd.service
You daemon forks too much. This would be a problem with some of
implementations of initscript as well. Even worse, it may sometimes
work due to race condition and sometimes it may fail.
As far as I know, it forks once. You sure it forks "too much?"
The only other explanation would be that daemon simply exits
immediately. But then it hardly can be called "daemon"?
Post by JB
Not interested in shooting anyone but I am interested in the
interpretation of "programming error." Seems like you're assuming the
systemd assumptions and paradigm are perfect. Pretty sure Unix never
dictated default forking limits, at least not until things like fork
bombs came about and security issues were considered. Forking more than
once could hardly be considered a programming error. I think systemd
folks came up with that one all by themselves :) My point is that
assuming systemd's default allow one fork approach is "pure" and then
passing the rigid inflexibility off as design flaws and programming
errors on the part of daemon developers, IMHO goes too far.
systemd monitors service. Service is represented by process. If you
allow unlimited flexibility in how many times service can fork before
it finally starts its main process - how can systemd know which process
to monitor? When fork chain stops? I am sure if you suggest race free
way to do it that works in all cases, everyone will be happy to
integrate it in systemd.

If you do not really care whether systemd can monitor state of service
and are OK with "start it and forget about it", just add
RemainAfterExit=true. sytemd is flexible enough to support both ways.
Post by JB
So, if I proceed using "Simple" and then later as a result of one of the
web service calls made to the server either use fork/exec or system() to
launch another process that runs for anywhere from 30 seconds to several
hours, then that process exits but not the one that originally started,
will systemd have a problem with that?
It should not. That is what many services do already.
JB
2013-01-02 18:24:08 UTC
Permalink
Post by Andrey Borzenkov
В Wed, 02 Jan 2013 04:02:27 -0700
Post by JB
Post by Andrey Borzenkov
В Wed, 02 Jan 2013 02:28:19 -0700
Post by JB
*******************************************
[Unit]
Description=Webrick Test Service
After=network.target
[Service]
Type=forking
ExecStart=/usr/bin/ruby /home/rtuser/test.rb
[Install]
WantedBy=multi-user.target
*******************************************
I put this in /etc/systemd/system/webrickd.service
Then ran systemctl --system daemon-reload
Then ran systemctl start webrickd.service
Running it from the command line runs as it should. Web server starts
and test.rb is running in the background as a daemon returning me back
to the shell.
But as soon as I try to use systemd, it starts but as soon as the
WEBrick::Daemon.start call is made, everything is killed.
That is because of ...
Post by JB
webrickd.service - Webrick Test Service
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: inactive (dead) since Wed, 02 Jan 2013 01:23:01 -0700;
9min ago
Process: 1605 ExecStart=/usr/bin/ruby /home/rtuser/test.rb
(code=exited, status=0/SUCCESS)
Main PID: 1607 (code=exited, status=0/SUCCESS)
^^^^^^
Post by JB
CGroup: name=systemd:/system/webrickd.service
You daemon forks too much. This would be a problem with some of
implementations of initscript as well. Even worse, it may sometimes
work due to race condition and sometimes it may fail.
As far as I know, it forks once. You sure it forks "too much?"
The only other explanation would be that daemon simply exits
immediately. But then it hardly can be called "daemon"?
Agreed, only other thing I could think of is if it is somehow
immediately getting a SIGINT or SIGTERM somehow.
Post by Andrey Borzenkov
Post by JB
Not interested in shooting anyone but I am interested in the
interpretation of "programming error." Seems like you're assuming the
systemd assumptions and paradigm are perfect. Pretty sure Unix never
dictated default forking limits, at least not until things like fork
bombs came about and security issues were considered. Forking more than
once could hardly be considered a programming error. I think systemd
folks came up with that one all by themselves :) My point is that
assuming systemd's default allow one fork approach is "pure" and then
passing the rigid inflexibility off as design flaws and programming
errors on the part of daemon developers, IMHO goes too far.
systemd monitors service. Service is represented by process. If you
allow unlimited flexibility in how many times service can fork before
it finally starts its main process - how can systemd know which process
to monitor? When fork chain stops? I am sure if you suggest race free
way to do it that works in all cases, everyone will be happy to
integrate it in systemd.
If you do not really care whether systemd can monitor state of service
and are OK with "start it and forget about it", just add
RemainAfterExit=true. sytemd is flexible enough to support both ways.
Post by JB
So, if I proceed using "Simple" and then later as a result of one of the
web service calls made to the server either use fork/exec or system() to
launch another process that runs for anywhere from 30 seconds to several
hours, then that process exits but not the one that originally started,
will systemd have a problem with that?
It should not. That is what many services do already.
Thanks, I'll try that.
Zbigniew Jędrzejewski-Szmek
2013-01-02 14:22:32 UTC
Permalink
Post by JB
Post by Andrey Borzenkov
В Tue, 01 Jan 2013 23:37:56 -0700
Post by JB
Post by Andrey Borzenkov
В Tue, 01 Jan 2013 18:52:38 -0700
Post by JB
Thanks! I'll try and it may work in my case. What's
interesting is that in your case it sounded like rsyslog was
hanging around while it was having problems dealing with the
condition of having the network unavailable. In my case,
webrickd actually stops and shuts down almost immediately
but for some reason systemd doesn't or can't figure that
out.
Showing "systemctl status webrickd.service" before restarting and
during restart may give some hints.
Good idea.
It goes from active/running to this immediately after restart...
webrickd.service - Configuration ruby webrick daemon
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: deactivating (final-sigterm) since Tue, 01 Jan
2013 22:44:39 -0700; 1min 14s ago
Process: 15771
ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
/home/rtuser/app/data/logs/webrickd.pid (code=exited,
status=0/SUCCESS)
Then this, anywhere from 1 to 3 minutes later...
webrickd.service - Configuration ruby webrick daemon
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: active (running) since Tue, 01 Jan 2013
22:47:40 -0700; 15s ago
Process: 15807
ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
/home/rtuser/app/data/logs/webrickd.pid (code=exited,
status=0/SUCCESS)
And initial state (i.e. after it is started)?
Post by JB
Any ideas why it sits in state "deactivating (final-sigterm)"
for 1 to 3 minutes
It gets better. I took a very stripped simple daemon to try to
Hi,
your process is forking as it should (as least on my machine).
Putting systemd in debug mode
sudo kill -SIGRTMIN+22 1
and running
sudo systemctl start webrickd.service
gives me:
Jan 02 08:55:27 spora sudo[20648]: zbyszek : TTY=pts/0 ; PWD=/home/zbyszek/src/systemd ; USER=root ; COMMAND=/bin/systemctl start webrickd.service
Jan 02 08:55:27 spora systemd[1]: Accepted connection on private bus.
Jan 02 08:55:27 spora systemd[1]: Running GC...
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.StartUnit() on /org/freedesktop/systemd1
Jan 02 08:55:27 spora systemd[1]: Trying to enqueue job webrickd.service/start/replace
Jan 02 08:55:27 spora systemd[1]: Installed new job webrickd.service/start as 2851
Jan 02 08:55:27 spora systemd[1]: Enqueued job webrickd.service/start as 2851
Jan 02 08:55:27 spora systemd[1]: About to execute: /usr/bin/ruby /home/zbyszek/src/systemd/test.rb
Jan 02 08:55:27 spora systemd[1]: Forked /usr/bin/ruby as 20651
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed failed -> start
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.GetUnit() on /org/freedesktop/systemd1
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Properties.Get() on /org/freedesktop/systemd1/unit/webrickd_2eservice
Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20651 (ruby).
Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20651 (ruby)
Jan 02 08:55:27 spora systemd[1]: Child 20651 died (code=exited, status=0/SUCCESS)
Jan 02 08:55:27 spora systemd[1]: Child 20651 belongs to webrickd.service
Jan 02 08:55:27 spora systemd[1]: webrickd.service: control process exited, code=exited status=0
Jan 02 08:55:27 spora systemd[1]: webrickd.service got final SIGCHLD for state start
Jan 02 08:55:27 spora systemd[1]: Main PID guessed: 20653
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed start -> running
Jan 02 08:55:27 spora systemd[1]: Job webrickd.service/start finished, result=done
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20653 (ruby).
Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20653 (ruby)
Jan 02 08:55:27 spora systemd[1]: Child 20653 died (code=exited, status=0/SUCCESS)
Jan 02 08:55:27 spora systemd[1]: Child 20653 belongs to webrickd.service
Jan 02 08:55:27 spora systemd[1]: webrickd.service: main process exited, code=exited, status=0
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed running -> stop-sigterm
Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20656 (ruby).
Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20656 (ruby)
Jan 02 08:55:27 spora systemd[1]: Child 20656 died (code=exited, status=1/FAILURE)
Jan 02 08:55:27 spora systemd[1]: Accepted connection on private bus.
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent
Jan 02 08:55:27 spora systemd[1]: webrickd.service: cgroup is empty
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed stop-sigterm -> dead
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
Jan 02 09:01:01 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
Jan 02 09:01:01 spora systemd[1]: Running GC...
Jan 02 09:01:01 spora systemd[1]: Collecting webrickd.service

With normal UNIX double forking, the process should fork twice, and
the first two should processes should exit. This is what happens,
but in the wrong order. The first process is only supposed to exit
after the main process is alive and ready to serve requests. As you
can see from the log above, the first process exits while the second
process is still running, and thus systemd thinks that this is the main
process. Subsequenty it exits, and the service is considered dead.

OTOH, your original problem (with restart taking >1 minute) is
something different. What systemd version exactly are you using? Can
you enable debugging and then post the output from the long 'systemctl
restart webrick.service'?

Zbyszek
JB
2013-01-03 20:41:57 UTC
Permalink
Post by Zbigniew Jędrzejewski-Szmek
Post by JB
Post by Andrey Borzenkov
В Tue, 01 Jan 2013 23:37:56 -0700
Post by JB
Post by Andrey Borzenkov
В Tue, 01 Jan 2013 18:52:38 -0700
Post by JB
Thanks! I'll try and it may work in my case. What's
interesting is that in your case it sounded like rsyslog was
hanging around while it was having problems dealing with the
condition of having the network unavailable. In my case,
webrickd actually stops and shuts down almost immediately
but for some reason systemd doesn't or can't figure that
out.
Showing "systemctl status webrickd.service" before restarting and
during restart may give some hints.
Good idea.
It goes from active/running to this immediately after restart...
webrickd.service - Configuration ruby webrick daemon
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: deactivating (final-sigterm) since Tue, 01 Jan
2013 22:44:39 -0700; 1min 14s ago
Process: 15771
ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
/home/rtuser/app/data/logs/webrickd.pid (code=exited,
status=0/SUCCESS)
Then this, anywhere from 1 to 3 minutes later...
webrickd.service - Configuration ruby webrick daemon
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: active (running) since Tue, 01 Jan 2013
22:47:40 -0700; 15s ago
Process: 15807
ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
/home/rtuser/app/data/logs/webrickd.pid (code=exited,
status=0/SUCCESS)
And initial state (i.e. after it is started)?
Post by JB
Any ideas why it sits in state "deactivating (final-sigterm)"
for 1 to 3 minutes
It gets better. I took a very stripped simple daemon to try to
Hi,
your process is forking as it should (as least on my machine).
Putting systemd in debug mode
sudo kill -SIGRTMIN+22 1
and running
sudo systemctl start webrickd.service
Jan 02 08:55:27 spora sudo[20648]: zbyszek : TTY=pts/0 ; PWD=/home/zbyszek/src/systemd ; USER=root ; COMMAND=/bin/systemctl start webrickd.service
Jan 02 08:55:27 spora systemd[1]: Accepted connection on private bus.
Jan 02 08:55:27 spora systemd[1]: Running GC...
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.StartUnit() on /org/freedesktop/systemd1
Jan 02 08:55:27 spora systemd[1]: Trying to enqueue job webrickd.service/start/replace
Jan 02 08:55:27 spora systemd[1]: Installed new job webrickd.service/start as 2851
Jan 02 08:55:27 spora systemd[1]: Enqueued job webrickd.service/start as 2851
Jan 02 08:55:27 spora systemd[1]: About to execute: /usr/bin/ruby /home/zbyszek/src/systemd/test.rb
Jan 02 08:55:27 spora systemd[1]: Forked /usr/bin/ruby as 20651
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed failed -> start
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.GetUnit() on /org/freedesktop/systemd1
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Properties.Get() on /org/freedesktop/systemd1/unit/webrickd_2eservice
Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20651 (ruby).
Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20651 (ruby)
Jan 02 08:55:27 spora systemd[1]: Child 20651 died (code=exited, status=0/SUCCESS)
Jan 02 08:55:27 spora systemd[1]: Child 20651 belongs to webrickd.service
Jan 02 08:55:27 spora systemd[1]: webrickd.service: control process exited, code=exited status=0
Jan 02 08:55:27 spora systemd[1]: webrickd.service got final SIGCHLD for state start
Jan 02 08:55:27 spora systemd[1]: Main PID guessed: 20653
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed start -> running
Jan 02 08:55:27 spora systemd[1]: Job webrickd.service/start finished, result=done
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20653 (ruby).
Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20653 (ruby)
Jan 02 08:55:27 spora systemd[1]: Child 20653 died (code=exited, status=0/SUCCESS)
Jan 02 08:55:27 spora systemd[1]: Child 20653 belongs to webrickd.service
Jan 02 08:55:27 spora systemd[1]: webrickd.service: main process exited, code=exited, status=0
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed running -> stop-sigterm
Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20656 (ruby).
Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20656 (ruby)
Jan 02 08:55:27 spora systemd[1]: Child 20656 died (code=exited, status=1/FAILURE)
Jan 02 08:55:27 spora systemd[1]: Accepted connection on private bus.
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent
Jan 02 08:55:27 spora systemd[1]: webrickd.service: cgroup is empty
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed stop-sigterm -> dead
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
Jan 02 09:01:01 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
Jan 02 09:01:01 spora systemd[1]: Running GC...
Jan 02 09:01:01 spora systemd[1]: Collecting webrickd.service
With normal UNIX double forking, the process should fork twice, and
the first two should processes should exit. This is what happens,
but in the wrong order. The first process is only supposed to exit
after the main process is alive and ready to serve requests. As you
can see from the log above, the first process exits while the second
process is still running, and thus systemd thinks that this is the main
process. Subsequenty it exits, and the service is considered dead.
Ahh, that makes tremendous sense. Thank you so much for taking a look
at that. I'm going to have to read up on how to put systemd in debug
mode. I've written daemon's before in C where I have complete control
over this. I wonder if the forking implementation Ruby's daemonize has
this backwards. The more I learn about Ruby the more frustrated I
become with it. I've worked around this problem by using the "Simple"
type and not daemonizing the process.
Post by Zbigniew Jędrzejewski-Szmek
OTOH, your original problem (with restart taking >1 minute) is
something different. What systemd version exactly are you using? Can
you enable debugging and then post the output from the long 'systemctl
restart webrick.service'?
I'm using whatever comes with FC15. According to the log it's version
26 (RPM is 26-18)
I'm currently in crisis with this app and have already stripped the
daemon code out and rewritten accordingly. When I have this released,
I'll go ahead and run the test you suggested but I think they might be
related.
Zbigniew Jędrzejewski-Szmek
2013-01-04 18:51:35 UTC
Permalink
Post by JB
Post by Zbigniew Jędrzejewski-Szmek
Post by JB
Post by Andrey Borzenkov
В Tue, 01 Jan 2013 23:37:56 -0700
Post by JB
Post by Andrey Borzenkov
В Tue, 01 Jan 2013 18:52:38 -0700
Post by JB
Thanks! I'll try and it may work in my case. What's
interesting is that in your case it sounded like rsyslog was
hanging around while it was having problems dealing with the
condition of having the network unavailable. In my case,
webrickd actually stops and shuts down almost immediately
but for some reason systemd doesn't or can't figure that
out.
Showing "systemctl status webrickd.service" before restarting and
during restart may give some hints.
Good idea.
It goes from active/running to this immediately after restart...
webrickd.service - Configuration ruby webrick daemon
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: deactivating (final-sigterm) since Tue, 01 Jan
2013 22:44:39 -0700; 1min 14s ago
Process: 15771
ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
/home/rtuser/app/data/logs/webrickd.pid (code=exited,
status=0/SUCCESS)
Then this, anywhere from 1 to 3 minutes later...
webrickd.service - Configuration ruby webrick daemon
Loaded: loaded (/etc/systemd/system/webrickd.service)
Active: active (running) since Tue, 01 Jan 2013
22:47:40 -0700; 15s ago
Process: 15807
ExecStart=/home/rtuser/app/bin/webrickd.rb -d -p
/home/rtuser/app/data/logs/webrickd.pid (code=exited,
status=0/SUCCESS)
And initial state (i.e. after it is started)?
Post by JB
Any ideas why it sits in state "deactivating (final-sigterm)"
for 1 to 3 minutes
It gets better. I took a very stripped simple daemon to try to
Hi,
your process is forking as it should (as least on my machine).
^not (arghhh!)
Post by JB
Post by Zbigniew Jędrzejewski-Szmek
Putting systemd in debug mode sudo kill -SIGRTMIN+22 1
and running
sudo systemctl start webrickd.service
Jan 02 08:55:27 spora sudo[20648]: zbyszek : TTY=pts/0 ; PWD=/home/zbyszek/src/systemd ; USER=root ; COMMAND=/bin/systemctl start webrickd.service
Jan 02 08:55:27 spora systemd[1]: Accepted connection on private bus.
Jan 02 08:55:27 spora systemd[1]: Running GC...
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.StartUnit() on /org/freedesktop/systemd1
Jan 02 08:55:27 spora systemd[1]: Trying to enqueue job webrickd.service/start/replace
Jan 02 08:55:27 spora systemd[1]: Installed new job webrickd.service/start as 2851
Jan 02 08:55:27 spora systemd[1]: Enqueued job webrickd.service/start as 2851
Jan 02 08:55:27 spora systemd[1]: About to execute: /usr/bin/ruby /home/zbyszek/src/systemd/test.rb
Jan 02 08:55:27 spora systemd[1]: Forked /usr/bin/ruby as 20651
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed failed -> start
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.GetUnit() on /org/freedesktop/systemd1
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Properties.Get() on /org/freedesktop/systemd1/unit/webrickd_2eservice
Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20651 (ruby).
Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20651 (ruby)
Jan 02 08:55:27 spora systemd[1]: Child 20651 died (code=exited, status=0/SUCCESS)
Jan 02 08:55:27 spora systemd[1]: Child 20651 belongs to webrickd.service
Jan 02 08:55:27 spora systemd[1]: webrickd.service: control process exited, code=exited status=0
Jan 02 08:55:27 spora systemd[1]: webrickd.service got final SIGCHLD for state start
Jan 02 08:55:27 spora systemd[1]: Main PID guessed: 20653
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed start -> running
Jan 02 08:55:27 spora systemd[1]: Job webrickd.service/start finished, result=done
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20653 (ruby).
Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20653 (ruby)
Jan 02 08:55:27 spora systemd[1]: Child 20653 died (code=exited, status=0/SUCCESS)
Jan 02 08:55:27 spora systemd[1]: Child 20653 belongs to webrickd.service
Jan 02 08:55:27 spora systemd[1]: webrickd.service: main process exited, code=exited, status=0
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed running -> stop-sigterm
Jan 02 08:55:27 spora systemd[1]: Received SIGCHLD from PID 20656 (ruby).
Jan 02 08:55:27 spora systemd[1]: Got SIGCHLD for process 20656 (ruby)
Jan 02 08:55:27 spora systemd[1]: Child 20656 died (code=exited, status=1/FAILURE)
Jan 02 08:55:27 spora systemd[1]: Accepted connection on private bus.
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Agent.Released() on /org/freedesktop/systemd1/agent
Jan 02 08:55:27 spora systemd[1]: webrickd.service: cgroup is empty
Jan 02 08:55:27 spora systemd[1]: webrickd.service changed stop-sigterm -> dead
Jan 02 08:55:27 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
Jan 02 09:01:01 spora systemd[1]: Got D-Bus request: org.freedesktop.DBus.NameOwnerChanged() on /org/freedesktop/DBus
Jan 02 09:01:01 spora systemd[1]: Running GC...
Jan 02 09:01:01 spora systemd[1]: Collecting webrickd.service
With normal UNIX double forking, the process should fork twice, and
the first two should processes should exit. This is what happens,
but in the wrong order. The first process is only supposed to exit
after the main process is alive and ready to serve requests. As you
can see from the log above, the first process exits while the second
process is still running, and thus systemd thinks that this is the main
process. Subsequenty it exits, and the service is considered dead.
Ahh, that makes tremendous sense. Thank you so much for taking a
look at that. I'm going to have to read up on how to put systemd in
debug mode.
Everything should be described in systemd(1). Cf. sections SIGNALS and KERNEL
COMMAND LINE.
Post by JB
I've written daemon's before in C where I have complete
control over this. I wonder if the forking implementation Ruby's
daemonize has this backwards. The more I learn about Ruby the more
frustrated I become with it. I've worked around this problem by
using the "Simple" type and not daemonizing the process.
Post by Zbigniew Jędrzejewski-Szmek
OTOH, your original problem (with restart taking >1 minute) is
something different. What systemd version exactly are you using? Can
you enable debugging and then post the output from the long 'systemctl
restart webrick.service'?
I'm using whatever comes with FC15. According to the log it's
version 26 (RPM is 26-18)
Oh, this is really really old. Please retest with latest FC.
Post by JB
I'm currently in crisis with this app and have already stripped the
daemon code out and rewritten accordingly. When I have this
released, I'll go ahead and run the test you suggested but I think
they might be related.
Zbyszek
JB
2013-01-04 20:40:18 UTC
Permalink
Post by Zbigniew Jędrzejewski-Szmek
Post by JB
Post by Zbigniew Jędrzejewski-Szmek
Post by JB
It gets better. I took a very stripped simple daemon to try to
Hi,
your process is forking as it should (as least on my machine).
^not (arghhh!)
Figured that's what you meant after reading your comments.
Post by Zbigniew Jędrzejewski-Szmek
Post by JB
I'm using whatever comes with FC15. According to the log it's
version 26 (RPM is 26-18)
Oh, this is really really old. Please retest with latest FC.
If only it were that simple. Unfortunately using the latest FC is
simply not an option with this application. It relies up on RTAI lxrt
userspace realtime computation system. Currently RTAI only provides a
64-bit patch for 2.6.38.8 kernel. I've tried running that kernel on FC17
and FC16 without success so I need to make this work.

Continue reading on narkive:
Loading...