Discussion:
[systemd-devel] possible bug guessing mainpid when service type=forking..
Marius Tolzmann
2011-06-20 17:34:47 UTC
Permalink
Hello..

Today we hit a bug where incorrect guessing of the MAINPID in a
Type=forking service always killed the service somehow..

The service in question was ypbind from the ypbind-mt package from
kernel.org/utils/NIS..

It forks two times to detach and provides the mainpid in
/var/run/ypbind.pid .. setting PIDFile=/var/run/ypbind.pid and setting
Type=forking always did the job - until today, when we installed it on a
somehow slower hardware.

There seems to be some kind of race-condition where systemd takes the
pid of the first fork() as the MAINPID which then gets killed after the
second fork() - i don't know for sure and may be totally wrong here.

Since tracing systemd and/or increasing debug level changed the behavior
so much that the bug wasn't triggered anymore i can't provide any more
information about this.

We fixed this issue by patching ypbind-mt to run in foreground an let
systemd do the forking - as intended 8)

But it still seems to be some kind of bug in setting the correct MAINPID
in Type=forking service files.

The bug was triggered when providing PIDFile= and also when PIDFile= was
commented out. It works fine on slightly faster systems and may also
work fine on even slower hardware.

I tried systemd v27 and v29 - same problem.

Fixing the ypbind binary solved everything for us - so i don't know if
anybody cares to fix this.. but it was hard to find what could have
happened because the service just won't start even if started again and
again manually and worked all the time.

BTW a patch for ypbind-mt adding a -foreground commandline switch will
be sent to the maintainers soon..

bye marius..
--
Dipl.-Inf. Marius Tolzmann <***@molgen.mpg.de>
----------------------------------.------------------------------
MPI f. molekulare Genetik |
Ihnestrasse 63-73, D-14195 Berlin | ==> MarIuX GNU/Linux <==
Phone: +49 (0)30 8413 1709 |
----------------------------------^------------------------------
God put me on earth to accomplish a certain number of things.
Right now I am so far behind..
..I will never die. <by calvin from calvin&hobbes ;)>
Lennart Poettering
2011-06-20 19:36:39 UTC
Permalink
Post by Marius Tolzmann
We fixed this issue by patching ypbind-mt to run in foreground an let
systemd do the forking - as intended 8)
But it still seems to be some kind of bug in setting the correct MAINPID
in Type=forking service files.
This very much sounds as if the service does not wait in the top-level
process that the daemon process is properly forked and finished writing
the PID file. systemd tries to read the PID file immediately after the
top-level process exits which means that systemd will race against the
daemon there.
Post by Marius Tolzmann
Fixing the ypbind binary solved everything for us - so i don't know if
anybody cares to fix this.. but it was hard to find what could have
happened because the service just won't start even if started again and
again manually and worked all the time.
So my guess is that ypbind-mt does something like this:

fork() --> fork() --> write_pid_file()
exit() exit() run_main_loop()
...

but it should be doing:

fork() ------------> fork() --> write_pid_file()
wait_for_notify() exit() notify_grandparent()
exit() run_main_loop()
...

i.e. the main process should not exit before it hasn't ensure that the
PID file of the daemon process has been written. Otherwise any usage of
the PID file is necessarily racy.
Post by Marius Tolzmann
BTW a patch for ypbind-mt adding a -foreground commandline switch will
be sent to the maintainers soon..
Nice, this is good to hear, Type=simple is definitely the ncier solution.

Lennart
--
Lennart Poettering - Red Hat, Inc.
Marius Tolzmann
2011-06-21 16:50:01 UTC
Permalink
hi..
Post by Lennart Poettering
This very much sounds as if the service does not wait in the top-level
process that the daemon process is properly forked and finished writing
the PID file. systemd tries to read the PID file immediately after the
top-level process exits which means that systemd will race against the
daemon there.
fork() --> fork() --> write_pid_file()
exit() exit() run_main_loop()
thats it..
Post by Lennart Poettering
fork() ------------> fork() --> write_pid_file()
wait_for_notify() exit() notify_grandparent()
exit() run_main_loop()
...
the run-in-foreground patch works fine.. and as far as i understood
Type=forking is just good for compatibility and daemons should not fork
away anymore!?
Post by Lennart Poettering
i.e. the main process should not exit before it hasn't ensure that the
PID file of the daemon process has been written. Otherwise any usage of
the PID file is necessarily racy.
but it doesn't work without setting PIDFile= either - so the PID itself
is racy, too.
Post by Lennart Poettering
Post by Marius Tolzmann
BTW a patch for ypbind-mt adding a -foreground commandline switch will
be sent to the maintainers soon..
Nice, this is good to hear, Type=simple is definitely the ncier solution.
may be it could be mentioned in the manpage where Type=forking is
explained that there could be a race if the daemon is forking twice or
if it behaves odd in another way.. - GuessMainPID= already states that
it may not work reliable..

there was nothing written to the logs that pointed in that direction and
increasing verbosity changed everything 8) tricky situation and i think
systemd should not be blamed for this.. ever 8)

bye marius..
--
Dipl.-Inf. Marius Tolzmann <***@molgen.mpg.de>
----------------------------------.------------------------------
MPI f. molekulare Genetik |
Ihnestrasse 63-73, D-14195 Berlin | ==> MarIuX GNU/Linux <==
Phone: +49 (0)30 8413 1709 |
----------------------------------^------------------------------
God put me on earth to accomplish a certain number of things.
Right now I am so far behind..
..I will never die. <by calvin from calvin&hobbes ;)>
Lennart Poettering
2011-06-21 17:25:57 UTC
Permalink
Post by Marius Tolzmann
Post by Lennart Poettering
fork() ------------> fork() --> write_pid_file()
wait_for_notify() exit() notify_grandparent()
exit() run_main_loop()
...
the run-in-foreground patch works fine.. and as far as i understood
Type=forking is just good for compatibility and daemons should not fork
away anymore!?
Yes, daemons which do not double-fork are definitely much nicer than
those which do.
Post by Marius Tolzmann
Post by Lennart Poettering
i.e. the main process should not exit before it hasn't ensure that the
PID file of the daemon process has been written. Otherwise any usage of
the PID file is necessarily racy.
but it doesn't work without setting PIDFile= either - so the PID itself
is racy, too.
Hmm?

There's one important distinction here: daemons which return in the
parent before having made sure that the PID file is fully written are
simply broken. But services which run more than one process by default
are not broken but perfectly valid.
Post by Marius Tolzmann
Post by Lennart Poettering
Post by Marius Tolzmann
BTW a patch for ypbind-mt adding a -foreground commandline switch will
be sent to the maintainers soon..
Nice, this is good to hear, Type=simple is definitely the ncier solution.
may be it could be mentioned in the manpage where Type=forking is
explained that there could be a race if the daemon is forking twice or
if it behaves odd in another way.. - GuessMainPID= already states that
it may not work reliable..
Well, GuessMainPID= does not work properly for some correctly written
daemon (i.e. those which run more than one process continously). But the
PID file reading will only break for broken services. We generally do
not add extra comments regarding bugs in software to the
documentation. We instead prefer if those bug are fixed. Which is why the
comment regarding GuessMainPID='s reliability is there but no comment
regarding handling of daemons incorrectly writing PID files.

Lennart
--
Lennart Poettering - Red Hat, Inc.
Loading...