Discussion:
Significant performance loss caused by commit
(too old to reply)
v***@pengaru.com
2017-07-07 20:14:39 UTC
Permalink
Raw Message
In doing some casual journalctl profiling and stracing, it became apparent
that `journalctl -b --no-pager` runs across a significant quantity of logs,
~10% of the time was thrown away on getpid() calls due to commmit a65f06b.

As-is:
# time ./journalctl -b --no-pager > /dev/null

real 0m11.033s
user 0m10.084s
sys 0m0.943s


After changing journal_pid_changed() to simply return 1:
# time ./journalctl -b --no-pager > /dev/null

real 0m9.641s
user 0m9.449s
sys 0m0.191s


More system time is being expended in repeated getpid() calls than write(),
see the strace:

12:51:56.939287 write(1, "Jul 07 09:25:23 x61s unknown"..., 57) = 57 <0.001276>
12:51:56.940633 getpid() = 10713 <0.000032>
12:51:56.940732 getpid() = 10713 <0.000012>
12:51:56.940801 getpid() = 10713 <0.000032>
12:51:56.940867 getpid() = 10713 <0.000041>
12:51:56.940942 getpid() = 10713 <0.000041>
12:51:56.941047 getpid() = 10713 <0.000012>
12:51:56.941117 getpid() = 10713 <0.000012>
12:51:56.941185 getpid() = 10713 <0.000011>
12:51:56.941253 getpid() = 10713 <0.000011>
12:51:56.941320 getpid() = 10713 <0.000039>
12:51:56.941395 getpid() = 10713 <0.000041>
12:51:56.941494 getpid() = 10713 <0.000011>
12:51:56.941561 getpid() = 10713 <0.000012>
12:51:56.941629 getpid() = 10713 <0.000039>
12:51:56.942942 write(1, "Jul 07 09:25:23 x61s unknown"..., 57) = 57 <0.000058>
12:51:56.943052 getpid() = 10713 <0.000039>
12:51:56.943156 getpid() = 10713 <0.000017>
12:51:56.943230 getpid() = 10713 <0.000018>
12:51:56.943305 getpid() = 10713 <0.000012>
12:51:56.943374 getpid() = 10713 <0.000017>
12:51:56.943449 getpid() = 10713 <0.000011>
12:51:56.943517 getpid() = 10713 <0.000011>
12:51:56.943585 getpid() = 10713 <0.000011>
12:51:56.943652 getpid() = 10713 <0.000011>
12:51:56.943721 getpid() = 10713 <0.000030>
12:51:56.943796 getpid() = 10713 <0.000041>
12:51:56.943870 getpid() = 10713 <0.000041>
12:51:56.943944 getpid() = 10713 <0.000041>
12:51:56.944061 getpid() = 10713 <0.001334>
12:51:56.945459 write(1, "Jul 07 09:25:23 x61s unknown"..., 56) = 56 <0.000018>
12:51:56.945544 getpid() = 10713 <0.000017>
12:51:56.945620 getpid() = 10713 <0.000017>
12:51:56.945694 getpid() = 10713 <0.000012>
12:51:56.945763 getpid() = 10713 <0.000011>
12:51:56.945832 getpid() = 10713 <0.000012>
12:51:56.945901 getpid() = 10713 <0.000011>
12:51:56.945969 getpid() = 10713 <0.000011>
12:51:56.946048 getpid() = 10713 <0.000013>
12:51:56.946118 getpid() = 10713 <0.000024>
12:51:56.946188 getpid() = 10713 <0.000047>
12:51:56.946277 getpid() = 10713 <0.000041>
12:51:56.946353 getpid() = 10713 <0.000041>
12:51:56.946428 getpid() = 10713 <0.000040>
12:51:56.946539 getpid() = 10713 <0.001363>

As this is public sd-journal API, it's somewhat set in stone. However,
there's nothing preventing the systemd-internal tooling from linking with
a less defensive/conformant underlying implementation shared with the public
API implementation where these kinds of overheads can be preserved.

For the curious; the logs being processed for this boot are 48 * 8MiB on SSD,
1.8Ghz Core2 Duo, 4.12 kernel.

Regards,
Vito Caputo
Mantas Mikulėnas
2017-07-07 20:37:08 UTC
Permalink
Raw Message
Back when that commit was made, didn't glibc cache the getpid() result in
userspace? That would explain why it was not noticed.
Post by v***@pengaru.com
In doing some casual journalctl profiling and stracing, it became apparent
that `journalctl -b --no-pager` runs across a significant quantity of logs,
~10% of the time was thrown away on getpid() calls due to commmit a65f06b.
# time ./journalctl -b --no-pager > /dev/null
real 0m11.033s
user 0m10.084s
sys 0m0.943s
# time ./journalctl -b --no-pager > /dev/null
real 0m9.641s
user 0m9.449s
sys 0m0.191s
More system time is being expended in repeated getpid() calls than write(),
12:51:56.939287 write(1, "Jul 07 09:25:23 x61s unknown"..., 57) = 57 <0.001276>
12:51:56.940633 getpid() = 10713 <0.000032>
12:51:56.940732 getpid() = 10713 <0.000012>
12:51:56.940801 getpid() = 10713 <0.000032>
12:51:56.940867 getpid() = 10713 <0.000041>
12:51:56.940942 getpid() = 10713 <0.000041>
12:51:56.941047 getpid() = 10713 <0.000012>
12:51:56.941117 getpid() = 10713 <0.000012>
12:51:56.941185 getpid() = 10713 <0.000011>
12:51:56.941253 getpid() = 10713 <0.000011>
12:51:56.941320 getpid() = 10713 <0.000039>
12:51:56.941395 getpid() = 10713 <0.000041>
12:51:56.941494 getpid() = 10713 <0.000011>
12:51:56.941561 getpid() = 10713 <0.000012>
12:51:56.941629 getpid() = 10713 <0.000039>
12:51:56.942942 write(1, "Jul 07 09:25:23 x61s unknown"..., 57) = 57 <0.000058>
12:51:56.943052 getpid() = 10713 <0.000039>
12:51:56.943156 getpid() = 10713 <0.000017>
12:51:56.943230 getpid() = 10713 <0.000018>
12:51:56.943305 getpid() = 10713 <0.000012>
12:51:56.943374 getpid() = 10713 <0.000017>
12:51:56.943449 getpid() = 10713 <0.000011>
12:51:56.943517 getpid() = 10713 <0.000011>
12:51:56.943585 getpid() = 10713 <0.000011>
12:51:56.943652 getpid() = 10713 <0.000011>
12:51:56.943721 getpid() = 10713 <0.000030>
12:51:56.943796 getpid() = 10713 <0.000041>
12:51:56.943870 getpid() = 10713 <0.000041>
12:51:56.943944 getpid() = 10713 <0.000041>
12:51:56.944061 getpid() = 10713 <0.001334>
12:51:56.945459 write(1, "Jul 07 09:25:23 x61s unknown"..., 56) = 56 <0.000018>
12:51:56.945544 getpid() = 10713 <0.000017>
12:51:56.945620 getpid() = 10713 <0.000017>
12:51:56.945694 getpid() = 10713 <0.000012>
12:51:56.945763 getpid() = 10713 <0.000011>
12:51:56.945832 getpid() = 10713 <0.000012>
12:51:56.945901 getpid() = 10713 <0.000011>
12:51:56.945969 getpid() = 10713 <0.000011>
12:51:56.946048 getpid() = 10713 <0.000013>
12:51:56.946118 getpid() = 10713 <0.000024>
12:51:56.946188 getpid() = 10713 <0.000047>
12:51:56.946277 getpid() = 10713 <0.000041>
12:51:56.946353 getpid() = 10713 <0.000041>
12:51:56.946428 getpid() = 10713 <0.000040>
12:51:56.946539 getpid() = 10713 <0.001363>
As this is public sd-journal API, it's somewhat set in stone. However,
there's nothing preventing the systemd-internal tooling from linking with
a less defensive/conformant underlying implementation shared with the public
API implementation where these kinds of overheads can be preserved.
For the curious; the logs being processed for this boot are 48 * 8MiB on SSD,
1.8Ghz Core2 Duo, 4.12 kernel.
Regards,
Vito Caputo
_______________________________________________
systemd-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
Mantas Mikulėnas <***@gmail.com>
Sent from my phone
v***@pengaru.com
2017-07-07 20:49:54 UTC
Permalink
Raw Message
Hmm, this crossed my mind, and come to think of it I did a dist-upgrade
from Debian jessie to stretch overnight machine and haven't rebooted.

Perhaps the vdso isn't working and the costly getpid() is a red herring, will
reboot and retest to confirm.
Post by Mantas Mikulėnas
Back when that commit was made, didn't glibc cache the getpid() result in
userspace? That would explain why it was not noticed.
Post by v***@pengaru.com
In doing some casual journalctl profiling and stracing, it became apparent
that `journalctl -b --no-pager` runs across a significant quantity of logs,
~10% of the time was thrown away on getpid() calls due to commmit a65f06b.
# time ./journalctl -b --no-pager > /dev/null
real 0m11.033s
user 0m10.084s
sys 0m0.943s
# time ./journalctl -b --no-pager > /dev/null
real 0m9.641s
user 0m9.449s
sys 0m0.191s
More system time is being expended in repeated getpid() calls than write(),
12:51:56.939287 write(1, "Jul 07 09:25:23 x61s unknown"..., 57) = 57 <0.001276>
12:51:56.940633 getpid() = 10713 <0.000032>
12:51:56.940732 getpid() = 10713 <0.000012>
12:51:56.940801 getpid() = 10713 <0.000032>
12:51:56.940867 getpid() = 10713 <0.000041>
12:51:56.940942 getpid() = 10713 <0.000041>
12:51:56.941047 getpid() = 10713 <0.000012>
12:51:56.941117 getpid() = 10713 <0.000012>
12:51:56.941185 getpid() = 10713 <0.000011>
12:51:56.941253 getpid() = 10713 <0.000011>
12:51:56.941320 getpid() = 10713 <0.000039>
12:51:56.941395 getpid() = 10713 <0.000041>
12:51:56.941494 getpid() = 10713 <0.000011>
12:51:56.941561 getpid() = 10713 <0.000012>
12:51:56.941629 getpid() = 10713 <0.000039>
12:51:56.942942 write(1, "Jul 07 09:25:23 x61s unknown"..., 57) = 57 <0.000058>
12:51:56.943052 getpid() = 10713 <0.000039>
12:51:56.943156 getpid() = 10713 <0.000017>
12:51:56.943230 getpid() = 10713 <0.000018>
12:51:56.943305 getpid() = 10713 <0.000012>
12:51:56.943374 getpid() = 10713 <0.000017>
12:51:56.943449 getpid() = 10713 <0.000011>
12:51:56.943517 getpid() = 10713 <0.000011>
12:51:56.943585 getpid() = 10713 <0.000011>
12:51:56.943652 getpid() = 10713 <0.000011>
12:51:56.943721 getpid() = 10713 <0.000030>
12:51:56.943796 getpid() = 10713 <0.000041>
12:51:56.943870 getpid() = 10713 <0.000041>
12:51:56.943944 getpid() = 10713 <0.000041>
12:51:56.944061 getpid() = 10713 <0.001334>
12:51:56.945459 write(1, "Jul 07 09:25:23 x61s unknown"..., 56) = 56 <0.000018>
12:51:56.945544 getpid() = 10713 <0.000017>
12:51:56.945620 getpid() = 10713 <0.000017>
12:51:56.945694 getpid() = 10713 <0.000012>
12:51:56.945763 getpid() = 10713 <0.000011>
12:51:56.945832 getpid() = 10713 <0.000012>
12:51:56.945901 getpid() = 10713 <0.000011>
12:51:56.945969 getpid() = 10713 <0.000011>
12:51:56.946048 getpid() = 10713 <0.000013>
12:51:56.946118 getpid() = 10713 <0.000024>
12:51:56.946188 getpid() = 10713 <0.000047>
12:51:56.946277 getpid() = 10713 <0.000041>
12:51:56.946353 getpid() = 10713 <0.000041>
12:51:56.946428 getpid() = 10713 <0.000040>
12:51:56.946539 getpid() = 10713 <0.001363>
As this is public sd-journal API, it's somewhat set in stone. However,
there's nothing preventing the systemd-internal tooling from linking with
a less defensive/conformant underlying implementation shared with the public
API implementation where these kinds of overheads can be preserved.
For the curious; the logs being processed for this boot are 48 * 8MiB on SSD,
1.8Ghz Core2 Duo, 4.12 kernel.
Regards,
Vito Caputo
_______________________________________________
systemd-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
Sent from my phone
v***@pengaru.com
2017-07-07 21:09:47 UTC
Permalink
Raw Message
Post by v***@pengaru.com
Post by Mantas Mikulėnas
Back when that commit was made, didn't glibc cache the getpid() result in
userspace? That would explain why it was not noticed.
Hmm, this crossed my mind, and come to think of it I did a dist-upgrade
from Debian jessie to stretch overnight machine and haven't rebooted.
Perhaps the vdso isn't working and the costly getpid() is a red herring, will
reboot and retest to confirm.
Nope, same getpid() performance hit after rebooting.
v***@pengaru.com
2017-07-07 21:35:16 UTC
Permalink
Raw Message
Post by v***@pengaru.com
Post by Mantas Mikulėnas
Back when that commit was made, didn't glibc cache the getpid() result in
userspace? That would explain why it was not noticed.
Hmm, this crossed my mind, and come to think of it I did a dist-upgrade
from Debian jessie to stretch overnight machine and haven't rebooted.
Perhaps the vdso isn't working and the costly getpid() is a red herring, will
reboot and retest to confirm.
It appears Debian has a glibc patch to disable the caching (I was unaware
such an elaborate dance was being performed to cache this!)

https://anonscm.debian.org/cgit/pkg-glibc/glibc.git/commit/debian/patches/any?id=5850253f509604dd46a6131acc057ea26e1588ba

Unsure where I stand on core system software assuming certain syscalls are
always going to be exceptionally cheap though...
Zbigniew Jędrzejewski-Szmek
2017-07-07 22:34:22 UTC
Permalink
Raw Message
Post by v***@pengaru.com
Post by v***@pengaru.com
Post by Mantas Mikulėnas
Back when that commit was made, didn't glibc cache the getpid() result in
userspace? That would explain why it was not noticed.
Hmm, this crossed my mind, and come to think of it I did a dist-upgrade
from Debian jessie to stretch overnight machine and haven't rebooted.
Perhaps the vdso isn't working and the costly getpid() is a red herring, will
reboot and retest to confirm.
It appears Debian has a glibc patch to disable the caching (I was unaware
such an elaborate dance was being performed to cache this!)
https://anonscm.debian.org/cgit/pkg-glibc/glibc.git/commit/debian/patches/any?id=5850253f509604dd46a6131acc057ea26e1588ba
Do we know the justification for this patch?
Post by v***@pengaru.com
Unsure where I stand on core system software assuming certain syscalls are
always going to be exceptionally cheap though...
Optimization is never in a vacuum. If glibc does something cheaply, it
seems reasonable to take advantage of it.

Zbyszek
v***@pengaru.com
2017-07-07 22:54:09 UTC
Permalink
Raw Message
Post by Zbigniew Jędrzejewski-Szmek
Post by v***@pengaru.com
Post by v***@pengaru.com
Post by Mantas Mikulėnas
Back when that commit was made, didn't glibc cache the getpid() result in
userspace? That would explain why it was not noticed.
Hmm, this crossed my mind, and come to think of it I did a dist-upgrade
from Debian jessie to stretch overnight machine and haven't rebooted.
Perhaps the vdso isn't working and the costly getpid() is a red herring, will
reboot and retest to confirm.
It appears Debian has a glibc patch to disable the caching (I was unaware
such an elaborate dance was being performed to cache this!)
https://anonscm.debian.org/cgit/pkg-glibc/glibc.git/commit/debian/patches/any?id=5850253f509604dd46a6131acc057ea26e1588ba
Do we know the justification for this patch?
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857909

Which references this upstream glibc bug:

https://sourceware.org/bugzilla/show_bug.cgi?id=19957
https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commit;h=0cb313f7cb0e418b3d56f3a2ac69790522ab825d
Post by Zbigniew Jędrzejewski-Szmek
Post by v***@pengaru.com
Unsure where I stand on core system software assuming certain syscalls are
always going to be exceptionally cheap though...
Optimization is never in a vacuum. If glibc does something cheaply, it
seems reasonable to take advantage of it.
Except there's always a risk of these things regressing to normal syscalls,
and one has to weigh the utility against that. It's unclear to me what
significant utility having the sd-journal API police changing pids by
calling getpid() at every public entrypoint is bringing to the table.
Zbigniew Jędrzejewski-Szmek
2017-07-08 03:49:11 UTC
Permalink
Raw Message
Post by v***@pengaru.com
Post by Zbigniew Jędrzejewski-Szmek
Post by v***@pengaru.com
Post by v***@pengaru.com
Post by Mantas Mikulėnas
Back when that commit was made, didn't glibc cache the getpid() result in
userspace? That would explain why it was not noticed.
Hmm, this crossed my mind, and come to think of it I did a dist-upgrade
from Debian jessie to stretch overnight machine and haven't rebooted.
Perhaps the vdso isn't working and the costly getpid() is a red herring, will
reboot and retest to confirm.
It appears Debian has a glibc patch to disable the caching (I was unaware
such an elaborate dance was being performed to cache this!)
https://anonscm.debian.org/cgit/pkg-glibc/glibc.git/commit/debian/patches/any?id=5850253f509604dd46a6131acc057ea26e1588ba
Do we know the justification for this patch?
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857909
https://sourceware.org/bugzilla/show_bug.cgi?id=19957
https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commit;h=0cb313f7cb0e418b3d56f3a2ac69790522ab825d
Post by Zbigniew Jędrzejewski-Szmek
Post by v***@pengaru.com
Unsure where I stand on core system software assuming certain syscalls are
always going to be exceptionally cheap though...
Optimization is never in a vacuum. If glibc does something cheaply, it
seems reasonable to take advantage of it.
Except there's always a risk of these things regressing to normal syscalls,
and one has to weigh the utility against that. It's unclear to me what
significant utility having the sd-journal API police changing pids by
calling getpid() at every public entrypoint is bringing to the table.
So it seems the issue has been fixed in glibc upstream more than a year
ago, and it doesn't seem to make sense to optimize current systemd git for
that.

I see the argument that the getpid() checks are a bit excessive. Is their
overhead actually noticeable with current glibc?

Zbyszek
v***@pengaru.com
2017-07-10 21:04:44 UTC
Permalink
Raw Message
Post by Zbigniew Jędrzejewski-Szmek
Post by v***@pengaru.com
Post by Zbigniew Jędrzejewski-Szmek
Post by v***@pengaru.com
Post by v***@pengaru.com
Post by Mantas Mikulėnas
Back when that commit was made, didn't glibc cache the getpid() result in
userspace? That would explain why it was not noticed.
Hmm, this crossed my mind, and come to think of it I did a dist-upgrade
from Debian jessie to stretch overnight machine and haven't rebooted.
Perhaps the vdso isn't working and the costly getpid() is a red herring, will
reboot and retest to confirm.
It appears Debian has a glibc patch to disable the caching (I was unaware
such an elaborate dance was being performed to cache this!)
https://anonscm.debian.org/cgit/pkg-glibc/glibc.git/commit/debian/patches/any?id=5850253f509604dd46a6131acc057ea26e1588ba
Do we know the justification for this patch?
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857909
https://sourceware.org/bugzilla/show_bug.cgi?id=19957
https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commit;h=0cb313f7cb0e418b3d56f3a2ac69790522ab825d
Post by Zbigniew Jędrzejewski-Szmek
Post by v***@pengaru.com
Unsure where I stand on core system software assuming certain syscalls are
always going to be exceptionally cheap though...
Optimization is never in a vacuum. If glibc does something cheaply, it
seems reasonable to take advantage of it.
Except there's always a risk of these things regressing to normal syscalls,
and one has to weigh the utility against that. It's unclear to me what
significant utility having the sd-journal API police changing pids by
calling getpid() at every public entrypoint is bringing to the table.
So it seems the issue has been fixed in glibc upstream more than a year
ago, and it doesn't seem to make sense to optimize current systemd git for
that.
Can you provide a commit id? I took a glance at sourcewaire.org/git/gitweb.cgi
for getpid commits and didn't see anything relevant since the removal[1].
Post by Zbigniew Jędrzejewski-Szmek
I see the argument that the getpid() checks are a bit excessive. Is their
overhead actually noticeable with current glibc?
On my spare arch system I still see gratuitous getpid() calls from
journalctl, which is on glibc 2.5-2.

The pollution of strace output alone due to these checks is nuisance enough
for me to want the checks removed, considering their only value is to catch
programmer errors. There's an abundance of potential programmer errors
we're not making any effort to prevent, why is this one so privileged that
it warrants policing?

I appreciate Lennart's point about the hazards of forking from threaded
programs. It just doesn't seem like a valid rationalization for sprinkling
a system library's entrypoints with getpid() calls to catch this in
production.

Considering the associated potential costs, and the historic controversy
surrounding the caching of this particular syscall[2] I'm a bit confused by
the status quo.

Cheers,
Vito Caputo

1: https://sourceware.org/git/gitweb.cgi?p=glibc.git&a=search&h=HEAD&st=commit&s=getpid
2: http://yarchive.net/comp/linux/getpid_caching.html
Lennart Poettering
2017-07-11 07:35:59 UTC
Permalink
Raw Message
Post by v***@pengaru.com
Post by Zbigniew Jędrzejewski-Szmek
Post by v***@pengaru.com
Except there's always a risk of these things regressing to normal syscalls,
and one has to weigh the utility against that. It's unclear to me what
significant utility having the sd-journal API police changing pids by
calling getpid() at every public entrypoint is bringing to the table.
So it seems the issue has been fixed in glibc upstream more than a year
ago, and it doesn't seem to make sense to optimize current systemd git for
that.
Can you provide a commit id? I took a glance at sourcewaire.org/git/gitweb.cgi
for getpid commits and didn't see anything relevant since the removal[1].
Post by Zbigniew Jędrzejewski-Szmek
I see the argument that the getpid() checks are a bit excessive. Is their
overhead actually noticeable with current glibc?
On my spare arch system I still see gratuitous getpid() calls from
journalctl, which is on glibc 2.5-2.
The pollution of strace output alone due to these checks is nuisance enough
for me to want the checks removed, considering their only value is to catch
programmer errors. There's an abundance of potential programmer errors
we're not making any effort to prevent, why is this one so privileged that
it warrants policing?
strace doesn't show getpid() calls if they are properly cached. While
stracing journalctl I never have seen a single one of them...
Post by v***@pengaru.com
I appreciate Lennart's point about the hazards of forking from threaded
programs. It just doesn't seem like a valid rationalization for sprinkling
a system library's entrypoints with getpid() calls to catch this in
production.
Normally it's dead cheap to check that, it's just reading and
comparing one memory location. It's a pitty that this isn't the case
currently on Debian, but as it appears this is an oversight on their
side, and I am sure it will be eventually fixed there, if it hasn't already.
Post by v***@pengaru.com
Considering the associated potential costs, and the historic controversy
surrounding the caching of this particular syscall[2] I'm a bit confused by
the status quo.
Well, normally there's next to no cost to this, on pretty much any non-Debian
distro there isn't any price for this to pay...

I mean, we could certainly cache that value in our code too, but given
that glibc does that already in the normal case I think this is better
left to be fixed in glibc rather than our code.

Lennart
--
Lennart Poettering, Red Hat
Uoti Urpala
2017-07-11 09:55:43 UTC
Permalink
Raw Message
Post by Lennart Poettering
Normally it's dead cheap to check that, it's just reading and
comparing one memory location. It's a pitty that this isn't the case
currently on Debian, but as it appears this is an oversight on their
side, and I am sure it will be eventually fixed there, if it hasn't already.
Are you sure about those "Debian only" and "will be 'fixed'" parts? The
Debian patch seems to be a cherry pick from upstream glibc. Is there
evidence of some error that would cause effects only visible on Debian
and nowhere else? And/or has the change been reverted or behavior
otherwise modified upstream to limit the range of relevant versions?
Lennart Poettering
2017-07-11 10:00:14 UTC
Permalink
Raw Message
Post by Uoti Urpala
Post by Lennart Poettering
Normally it's dead cheap to check that, it's just reading and
comparing one memory location. It's a pitty that this isn't the case
currently on Debian, but as it appears this is an oversight on their
side, and I am sure it will be eventually fixed there, if it hasn't already.
Are you sure about those "Debian only" and "will be 'fixed'" parts? The
Debian patch seems to be a cherry pick from upstream glibc. Is there
evidence of some error that would cause effects only visible on Debian
and nowhere else? And/or has the change been reverted or behavior
otherwise modified upstream to limit the range of relevant versions?
See the links Vito provided:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857909
https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commit;h=0cb313f7cb0e418b3d56f3a2ac69790522ab825d

i.e. Debian undid the PID caching to fix some issue that has been fix
properly now, and hence the PID caching should be turned on again.

On Fedora at least getpid() is not visible in strace, and is fully
cached, as it should be.

Lennart
--
Lennart Poettering, Red Hat
Uoti Urpala
2017-07-11 10:17:40 UTC
Permalink
Raw Message
Resend with correct list address
Post by v***@pengaru.com
Post by Uoti Urpala
Are you sure about those "Debian only" and "will be 'fixed'" parts? The
Debian patch seems to be a cherry pick from upstream glibc. Is there
evidence of some error that would cause effects only visible on Debian
and nowhere else? And/or has the change been reverted or behavior
otherwise modified upstream to limit the range of relevant versions?
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857909
https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commit;h=0cb313f7cb0e418b3d56f3a2ac69790522ab825d
i.e. Debian undid the PID caching to fix some issue that has been fix
properly now, and hence the PID caching should be turned on again.
That seems backwards: the commit cherry-picked by Debian seems to be
c579f48edba88380635a, which is NEWER than above 0cb313f7cb0e418b3d56.
In other words, it seems 0cb313 was a failed attempt at a fix and the
patch cherry-picked by Debian was needed to properly fix things.
Post by v***@pengaru.com
On Fedora at least getpid() is not visible in strace, and is fully
cached, as it should be.
Is that glibc 2.25? It seems to contain c579f48 at least; could be a
Fedora-specific change though?
Michael Chapman
2017-07-11 11:59:45 UTC
Permalink
Raw Message
Post by v***@pengaru.com
Post by Uoti Urpala
Post by Lennart Poettering
Normally it's dead cheap to check that, it's just reading and
comparing one memory location. It's a pitty that this isn't the case
currently on Debian, but as it appears this is an oversight on their
side, and I am sure it will be eventually fixed there, if it hasn't already.
Are you sure about those "Debian only" and "will be 'fixed'" parts? The
Debian patch seems to be a cherry pick from upstream glibc. Is there
evidence of some error that would cause effects only visible on Debian
and nowhere else? And/or has the change been reverted or behavior
otherwise modified upstream to limit the range of relevant versions?
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857909
https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commit;h=0cb313f7cb0e418b3d56f3a2ac69790522ab825d
i.e. Debian undid the PID caching to fix some issue that has been fix
properly now, and hence the PID caching should be turned on again.
On Fedora at least getpid() is not visible in strace, and is fully
cached, as it should be.
I just tested this on F25 and F26 beta, and it's certainly visible for me
on both of them:

# cat /etc/system-release
Fedora release 25 (Twenty Five)
# rpm -q glibc
glibc-2.24-9.fc25.x86_64
# strace -c journalctl --since -1hour 2>&1 >/dev/null | head -10
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
93.93 0.167020 2 83761 getpid
3.93 0.006983 2 3025 write
0.54 0.000953 10 97 mmap
0.39 0.000696 13 52 8 open
0.31 0.000558 14 40 munmap
0.19 0.000332 8 42 mprotect
0.15 0.000264 6 45 fstat
0.14 0.000246 6 44 close

# cat /etc/system-release
Fedora release 26 (Twenty Six)
# rpm -q glibc
glibc-2.25-7.fc26.x86_64
# strace -c journalctl --since -1hour 2>&1 >/dev/null | head -10
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
62.84 0.007874 4 2063 getpid
7.96 0.000998 12 86 mmap
7.85 0.000983 20 48 8 open
3.85 0.000483 9 54 mprotect
2.80 0.000351 5 71 write
2.75 0.000345 11 32 read
2.69 0.000337 8 41 fstat
2.66 0.000333 8 40 close

The second machine had just been started, which is why the numbers are a
lot lower. Nevertheless, getpid is still taking by far the most
amount of time in syscalls.

Both of these are on Fedora's testing branch, but I don't think Fedora's
regular branch has a significantly different version of glibc.
Zbigniew Jędrzejewski-Szmek
2017-07-11 12:37:06 UTC
Permalink
Raw Message
Post by Michael Chapman
Post by v***@pengaru.com
Post by Uoti Urpala
Post by Lennart Poettering
Normally it's dead cheap to check that, it's just reading and
comparing one memory location. It's a pitty that this isn't the case
currently on Debian, but as it appears this is an oversight on their
side, and I am sure it will be eventually fixed there, if it hasn't already.
Are you sure about those "Debian only" and "will be 'fixed'" parts? The
Debian patch seems to be a cherry pick from upstream glibc. Is there
evidence of some error that would cause effects only visible on Debian
and nowhere else? And/or has the change been reverted or behavior
otherwise modified upstream to limit the range of relevant versions?
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857909
https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commit;h=0cb313f7cb0e418b3d56f3a2ac69790522ab825d
i.e. Debian undid the PID caching to fix some issue that has been fix
properly now, and hence the PID caching should be turned on again.
On Fedora at least getpid() is not visible in strace, and is fully
cached, as it should be.
I just tested this on F25 and F26 beta, and it's certainly visible
# cat /etc/system-release
Fedora release 25 (Twenty Five)
# rpm -q glibc
glibc-2.24-9.fc25.x86_64
# strace -c journalctl --since -1hour 2>&1 >/dev/null | head -10
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
93.93 0.167020 2 83761 getpid
3.93 0.006983 2 3025 write
0.54 0.000953 10 97 mmap
0.39 0.000696 13 52 8 open
0.31 0.000558 14 40 munmap
0.19 0.000332 8 42 mprotect
0.15 0.000264 6 45 fstat
0.14 0.000246 6 44 close
# cat /etc/system-release
Fedora release 26 (Twenty Six)
# rpm -q glibc
glibc-2.25-7.fc26.x86_64
# strace -c journalctl --since -1hour 2>&1 >/dev/null | head -10
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
62.84 0.007874 4 2063 getpid
7.96 0.000998 12 86 mmap
7.85 0.000983 20 48 8 open
3.85 0.000483 9 54 mprotect
2.80 0.000351 5 71 write
2.75 0.000345 11 32 read
2.69 0.000337 8 41 fstat
2.66 0.000333 8 40 close
The second machine had just been started, which is why the numbers
are a lot lower. Nevertheless, getpid is still taking by far the
most amount of time in syscalls.
Both of these are on Fedora's testing branch, but I don't think
Fedora's regular branch has a significantly different version of
glibc.
Yep, I can confirm that: many many getpid() syscalls on F26 and rawhide,
and none on F24.

Zbyszek
Lennart Poettering
2017-07-11 14:07:11 UTC
Permalink
Raw Message
Post by v***@pengaru.com
Post by Uoti Urpala
Post by Lennart Poettering
Normally it's dead cheap to check that, it's just reading and
comparing one memory location. It's a pitty that this isn't the case
currently on Debian, but as it appears this is an oversight on their
side, and I am sure it will be eventually fixed there, if it hasn't already.
Are you sure about those "Debian only" and "will be 'fixed'" parts? The
Debian patch seems to be a cherry pick from upstream glibc. Is there
evidence of some error that would cause effects only visible on Debian
and nowhere else? And/or has the change been reverted or behavior
otherwise modified upstream to limit the range of relevant versions?
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=857909
https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commit;h=0cb313f7cb0e418b3d56f3a2ac69790522ab825d
i.e. Debian undid the PID caching to fix some issue that has been fix
properly now, and hence the PID caching should be turned on again.
On Fedora at least getpid() is not visible in strace, and is fully
cached, as it should be.
I just tested this on F25 and F26 beta, and it's certainly visible for me on
# cat /etc/system-release
Fedora release 25 (Twenty Five)
# rpm -q glibc
glibc-2.24-9.fc25.x86_64
# strace -c journalctl --since -1hour 2>&1 >/dev/null | head -10
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
93.93 0.167020 2 83761 getpid
3.93 0.006983 2 3025 write
0.54 0.000953 10 97 mmap
0.39 0.000696 13 52 8 open
0.31 0.000558 14 40 munmap
0.19 0.000332 8 42 mprotect
0.15 0.000264 6 45 fstat
0.14 0.000246 6 44 close
Hmm, so I run a slightly older glibc, as I haven#t updated my system
in a while:

$ strace -c journalctl --since -1hour 2>&1 >/dev/null | head -10
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
25.95 0.001276 7 195 mmap
23.21 0.001141 7 164 30 open
22.29 0.001096 9 119 munmap
6.53 0.000321 2 134 close
6.10 0.000300 2 135 fstat
5.15 0.000253 5 56 mprotect
4.88 0.000240 2 102 fstatfs
2.30 0.000113 4 32 read

getpid() is nowhere to be seen in this... Seems Fedora regressed on
this too recently. Meh.

Somebody should probably file a bug about this regression and get
clarification if this is going to remain slow, or if they are going to
fix that again... If this is going to remain slow, we shoud probably
find a different way to detect forks... we could install a
pthread_atfork() handler and set some flag...

Lennart
--
Lennart Poettering, Red Hat
Lennart Poettering
2017-07-11 14:10:38 UTC
Permalink
Raw Message
Post by Lennart Poettering
Hmm, so I run a slightly older glibc, as I haven#t updated my system
$ strace -c journalctl --since -1hour 2>&1 >/dev/null | head -10
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
25.95 0.001276 7 195 mmap
23.21 0.001141 7 164 30 open
22.29 0.001096 9 119 munmap
6.53 0.000321 2 134 close
6.10 0.000300 2 135 fstat
5.15 0.000253 5 56 mprotect
4.88 0.000240 2 102 fstatfs
2.30 0.000113 4 32 read
getpid() is nowhere to be seen in this... Seems Fedora regressed on
this too recently. Meh.
Forgot to mention:

$ rpm -qa glibc
glibc-2.24-4.fc25.x86_64

Apparently, this regressed between this version and
glibc-2.24-9.fc25.x86_64 hence.

Lennart
--
Lennart Poettering, Red Hat
Tomasz Torcz
2017-07-11 14:55:04 UTC
Permalink
Raw Message
Post by Lennart Poettering
Post by Lennart Poettering
Hmm, so I run a slightly older glibc, as I haven#t updated my system
$ strace -c journalctl --since -1hour 2>&1 >/dev/null | head -10
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
25.95 0.001276 7 195 mmap
23.21 0.001141 7 164 30 open
22.29 0.001096 9 119 munmap
6.53 0.000321 2 134 close
6.10 0.000300 2 135 fstat
5.15 0.000253 5 56 mprotect
4.88 0.000240 2 102 fstatfs
2.30 0.000113 4 32 read
getpid() is nowhere to be seen in this... Seems Fedora regressed on
this too recently. Meh.
$ rpm -qa glibc
glibc-2.24-4.fc25.x86_64
Apparently, this regressed between this version and
glibc-2.24-9.fc25.x86_64 hence.
From glibc changelog:

* Wed Jun 07 2017 Arjun Shankar <***@lostca.se> - 2.24-6
- Auto-sync with upstream release/2.24/master,
commit 7b60553e360731338631ccdda71590ac5deca137, fixing:
- Remove the PID cache (#1443976)
--
Tomasz Torcz To co nierealne -- tutaj jest normalne.
xmpp: ***@chrome.pl Ziomale na życie mają tu patenty specjalne.
Lennart Poettering
2017-07-11 15:20:10 UTC
Permalink
Raw Message
Post by Tomasz Torcz
Post by Lennart Poettering
Post by Lennart Poettering
Hmm, so I run a slightly older glibc, as I haven#t updated my system
$ strace -c journalctl --since -1hour 2>&1 >/dev/null | head -10
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
25.95 0.001276 7 195 mmap
23.21 0.001141 7 164 30 open
22.29 0.001096 9 119 munmap
6.53 0.000321 2 134 close
6.10 0.000300 2 135 fstat
5.15 0.000253 5 56 mprotect
4.88 0.000240 2 102 fstatfs
2.30 0.000113 4 32 read
getpid() is nowhere to be seen in this... Seems Fedora regressed on
this too recently. Meh.
$ rpm -qa glibc
glibc-2.24-4.fc25.x86_64
Apparently, this regressed between this version and
glibc-2.24-9.fc25.x86_64 hence.
- Auto-sync with upstream release/2.24/master,
- Remove the PID cache (#1443976)
I commented on that bug now. It doesn#t really have a proper
explanation, all it says is that "The glibc PID cache negatively
interacts with setting up containers and namespaces."...

Lennart
--
Lennart Poettering, Red Hat
Tomasz Torcz
2017-07-11 16:06:22 UTC
Permalink
Raw Message
Post by Lennart Poettering
Post by Tomasz Torcz
Post by Lennart Poettering
$ rpm -qa glibc
glibc-2.24-4.fc25.x86_64
Apparently, this regressed between this version and
glibc-2.24-9.fc25.x86_64 hence.
- Auto-sync with upstream release/2.24/master,
- Remove the PID cache (#1443976)
I commented on that bug now. It doesn#t really have a proper
explanation, all it says is that "The glibc PID cache negatively
interacts with setting up containers and namespaces."...
Upstream commit has longer rationale:
http://repo.or.cz/glibc.git/commit/c579f48edba88380635ab98cb612030e3ed8691e

But I'm not competent enough to judge t.
--
Tomasz Torcz To co nierealne -- tutaj jest normalne.
xmpp: ***@chrome.pl Ziomale na życie mają tu patenty specjalne.
Mantas Mikulėnas
2017-07-12 08:25:25 UTC
Permalink
Raw Message
Post by Tomasz Torcz
Post by Lennart Poettering
Post by Tomasz Torcz
Post by Lennart Poettering
$ rpm -qa glibc
glibc-2.24-4.fc25.x86_64
Apparently, this regressed between this version and
glibc-2.24-9.fc25.x86_64 hence.
- Auto-sync with upstream release/2.24/master,
- Remove the PID cache (#1443976)
I commented on that bug now. It doesn#t really have a proper
explanation, all it says is that "The glibc PID cache negatively
interacts with setting up containers and namespaces."...
http://repo.or.cz/glibc.git/commit/c579f48edba88380635ab98cb61203
0e3ed8691e
Looks like they're basically saying that the same could be done via VDSO in
a much simpler way.
--
Mantas Mikulėnas <***@gmail.com>
Florian Weimer
2017-07-11 19:26:26 UTC
Permalink
Raw Message
Post by Lennart Poettering
Apparently, this regressed between this version and
glibc-2.24-9.fc25.x86_64 hence.
Yes, I backported the fork cache removal to Fedora 25. There is no
longer a good way to main such a cache in userspace because glibc
cannot intercept anymore all the ways that can change the PID of the
current process because the kernel interfaces for process management
are incredibly rich these days.
Lennart Poettering
2017-07-12 07:39:04 UTC
Permalink
Raw Message
Post by Florian Weimer
Post by Lennart Poettering
Apparently, this regressed between this version and
glibc-2.24-9.fc25.x86_64 hence.
Yes, I backported the fork cache removal to Fedora 25. There is no
longer a good way to main such a cache in userspace because glibc
cannot intercept anymore all the ways that can change the PID of the
current process because the kernel interfaces for process management
are incredibly rich these days.
Please be more specific here. What is this all about? What triggered
this specifically? is this about docker? docker is written in golang
anyway, iirc, which doesn't bother with linking to libc anyway?

Is this a glibc upstream choice primarily? Were the regressions this
causes considered?

I mean, the getpid() checking code is not only in use in systemd, but
in various other bits, in particular PulseAudio, where I started
adding these checks for a good reason. It sounds pretty strange to me
to just regress all that...

Were there considerations to solve this differently? (for example,
expose some concept how getpid() caching can be turned out
specifically for apps needing that? it appears to me that after all
the number of apps calling clone() directly are very much numbered,
while a ton of apps will benefit from getpid() caching...)

Please elaborate!

Lennart
--
Lennart Poettering, Red Hat
Florian Weimer
2017-07-12 07:51:06 UTC
Permalink
Raw Message
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
Apparently, this regressed between this version and
glibc-2.24-9.fc25.x86_64 hence.
Yes, I backported the fork cache removal to Fedora 25. There is no
longer a good way to main such a cache in userspace because glibc
cannot intercept anymore all the ways that can change the PID of the
current process because the kernel interfaces for process management
are incredibly rich these days.
Please be more specific here. What is this all about?
We got many bug reports over the years about sandboxes and other heavy
users of namespaces and clone that the glibc PID cache got out of
sync, both in child and parent (!) processes.
Post by Lennart Poettering
What triggered this specifically? is this about docker? docker is
written in golang anyway, iirc, which doesn't bother with linking to
libc anyway?
It needs glibc for access to the host and user databases.
Post by Lennart Poettering
Is this a glibc upstream choice primarily? Were the regressions this
causes considered?
I raised the problem of applications calling getpid frequently and
named OpenSSL as an example.
Post by Lennart Poettering
I mean, the getpid() checking code is not only in use in systemd, but
in various other bits, in particular PulseAudio, where I started
adding these checks for a good reason. It sounds pretty strange to me
to just regress all that...
Fork detection using getpid is not reliable. It gives false negatives
in the case of double-forks, where the process can be different but
the PID is the same due to reuse. Considering that this use case is
broken, I don't think it's worthwhile to jump through hoops to support
code which is fundamentally broken anyway.
Lennart Poettering
2017-07-12 08:23:38 UTC
Permalink
Raw Message
Post by Florian Weimer
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
Apparently, this regressed between this version and
glibc-2.24-9.fc25.x86_64 hence.
Yes, I backported the fork cache removal to Fedora 25. There is no
longer a good way to main such a cache in userspace because glibc
cannot intercept anymore all the ways that can change the PID of the
current process because the kernel interfaces for process management
are incredibly rich these days.
Please be more specific here. What is this all about?
We got many bug reports over the years about sandboxes and other heavy
users of namespaces and clone that the glibc PID cache got out of
sync, both in child and parent (!) processes.
have any links?
Post by Florian Weimer
Post by Lennart Poettering
What triggered this specifically? is this about docker? docker is
written in golang anyway, iirc, which doesn't bother with linking to
libc anyway?
It needs glibc for access to the host and user databases.
can you elaborate? I fail to see any relationship between
unshare()/fork()/getpid() and NSS?
Post by Florian Weimer
Post by Lennart Poettering
Is this a glibc upstream choice primarily? Were the regressions this
causes considered?
I raised the problem of applications calling getpid frequently and
named OpenSSL as an example.
Link?
Post by Florian Weimer
Post by Lennart Poettering
I mean, the getpid() checking code is not only in use in systemd, but
in various other bits, in particular PulseAudio, where I started
adding these checks for a good reason. It sounds pretty strange to me
to just regress all that...
Fork detection using getpid is not reliable. It gives false negatives
in the case of double-forks, where the process can be different but
the PID is the same due to reuse. Considering that this use case is
broken, I don't think it's worthwhile to jump through hoops to support
code which is fundamentally broken anyway.
Uh, that's a bit non-chalant, no? Yes, the UNIX PID concept is awfully
designed, but if you argue on that level, you#d have to remove kill(),
and half of the other syscalls that take a PID from glibc too...

The primary intention of checking the PID in our calls is to filter
out cases where people assume they can use our context objects across
fork()s: a clean, early error code is a ton better than memory
corruption. And I am pretty sure the usecase is very valid... And yes,
even if checking getpid() misses some theoretical corner cases,
pthread_atfork() or whatever else you propose will miss others too,
and is much uglier codewise, introduces deps, yadda yadda...

Gah, this is all so ugly. I understand systemd is not a program you
are particularly interested in, but making us chase around your
regressions is just mean...

Lennart
--
Lennart Poettering, Red Hat
Reindl Harald
2017-07-12 09:33:30 UTC
Permalink
Raw Message
Post by Lennart Poettering
Post by Florian Weimer
Fork detection using getpid is not reliable. It gives false negatives
in the case of double-forks, where the process can be different but
the PID is the same due to reuse. Considering that this use case is
broken, I don't think it's worthwhile to jump through hoops to support
code which is fundamentally broken anyway.
Uh, that's a bit non-chalant, no? Yes, the UNIX PID concept is awfully
designed, but if you argue on that level, you#d have to remove kill(),
and half of the other syscalls that take a PID from glibc too...
don't get me wrong: this is exactly how you argue to all others over
years "fundamentally broken anyway, fix it"
Florian Weimer
2017-07-12 09:47:38 UTC
Permalink
Raw Message
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
Apparently, this regressed between this version and
glibc-2.24-9.fc25.x86_64 hence.
Yes, I backported the fork cache removal to Fedora 25. There is no
longer a good way to main such a cache in userspace because glibc
cannot intercept anymore all the ways that can change the PID of the
current process because the kernel interfaces for process management
are incredibly rich these days.
Please be more specific here. What is this all about?
We got many bug reports over the years about sandboxes and other heavy
users of namespaces and clone that the glibc PID cache got out of
sync, both in child and parent (!) processes.
have any links?
<https://bugs.chromium.org/p/chromium/issues/detail?id=484870>

You guys ran into this as well and wrote a raw_getpid function which
calls the system call. (You should have reported the bug instead.)
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
What triggered this specifically? is this about docker? docker is
written in golang anyway, iirc, which doesn't bother with linking to
libc anyway?
It needs glibc for access to the host and user databases.
can you elaborate? I fail to see any relationship between
unshare()/fork()/getpid() and NSS?
You asked why docker links against glibc.
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
Is this a glibc upstream choice primarily? Were the regressions this
causes considered?
I raised the problem of applications calling getpid frequently and
named OpenSSL as an example.
Link?
See the collection of links in the other message.
Post by Lennart Poettering
And I am pretty sure the usecase is very valid... And yes,
even if checking getpid() misses some theoretical corner cases,
pthread_atfork() or whatever else you propose will miss others too,
and is much uglier codewise, introduces deps, yadda yadda...
We actually increased the accuracy of your fork detection logic (even
though it's still broken), so I'm puzzled why you keep calling this a
regression.

Lennart Poettering
2017-07-12 08:46:47 UTC
Permalink
Raw Message
Post by Florian Weimer
Post by Lennart Poettering
Apparently, this regressed between this version and
glibc-2.24-9.fc25.x86_64 hence.
Yes, I backported the fork cache removal to Fedora 25. There is no
longer a good way to main such a cache in userspace because glibc
cannot intercept anymore all the ways that can change the PID of the
current process because the kernel interfaces for process management
are incredibly rich these days.
BTW, with this change you are breaking expressly documented behaviour:

http://man7.org/linux/man-pages/man2/getpid.2.html

"Since glibc version 2.3.4, the glibc wrapper function for
getpid() caches PIDs, so as to avoid additional system calls when
a process calls getpid() repeatedly."

Lennart
--
Lennart Poettering, Red Hat
Florian Weimer
2017-07-12 09:31:05 UTC
Permalink
Raw Message
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
Apparently, this regressed between this version and
glibc-2.24-9.fc25.x86_64 hence.
Yes, I backported the fork cache removal to Fedora 25. There is no
longer a good way to main such a cache in userspace because glibc
cannot intercept anymore all the ways that can change the PID of the
current process because the kernel interfaces for process management
are incredibly rich these days.
http://man7.org/linux/man-pages/man2/getpid.2.html
"Since glibc version 2.3.4, the glibc wrapper function for
getpid() caches PIDs, so as to avoid additional system calls when
a process calls getpid() repeatedly."
Note that the manpages project, while extremely valuable, is not
official glibc documentation and sometimes explains library internals
which can change over time.
Lennart Poettering
2017-07-10 11:29:21 UTC
Permalink
Raw Message
Post by v***@pengaru.com
Post by v***@pengaru.com
Post by Mantas Mikulėnas
Back when that commit was made, didn't glibc cache the getpid() result in
userspace? That would explain why it was not noticed.
Hmm, this crossed my mind, and come to think of it I did a dist-upgrade
from Debian jessie to stretch overnight machine and haven't rebooted.
Perhaps the vdso isn't working and the costly getpid() is a red herring, will
reboot and retest to confirm.
It appears Debian has a glibc patch to disable the caching (I was unaware
such an elaborate dance was being performed to cache this!)
https://anonscm.debian.org/cgit/pkg-glibc/glibc.git/commit/debian/patches/any?id=5850253f509604dd46a6131acc057ea26e1588ba
Unsure where I stand on core system software assuming certain syscalls are
always going to be exceptionally cheap though...
Hmm, we generally assume that a couple of syscalls are indeed very
cheap. That's getpid(), getuid() as well as the various ways to get
the system time.

Yes, glibc caching of the PID is very annoying in some cases (as it
complicates code that invokes clone() through a direct syscall
invocation), but still, I think it should be OK assume that it is
cheap to invoke it.

Lennart
--
Lennart Poettering, Red Hat
Michael Chapman
2017-07-08 06:24:18 UTC
Permalink
Raw Message
Post by v***@pengaru.com
In doing some casual journalctl profiling and stracing, it became apparent
that `journalctl -b --no-pager` runs across a significant quantity of logs,
~10% of the time was thrown away on getpid() calls due to commmit a65f06b.
# time ./journalctl -b --no-pager > /dev/null
real 0m11.033s
user 0m10.084s
sys 0m0.943s
# time ./journalctl -b --no-pager > /dev/null
real 0m9.641s
user 0m9.449s
sys 0m0.191s
[...]
Post by v***@pengaru.com
As this is public sd-journal API, it's somewhat set in stone.
So it's arguable whether making an API work in _more_ situations than it
previously did is a "breaking" change.

I've tried to go through the history for the various *_pid_changed()
functions in the APIs systemd presents, and I'm struggling to find a good
justification for them. It seems like it was originally added for sd-bus
in:

https://github.com/systemd/systemd/commit/d5a2b9a6f455468a0f29483303657ab4fd7013d8

And then other APIs copied it to be consistent with sd-bus:

https://github.com/systemd/systemd/commit/a65f06bb27688a6738f2f94b7f055f4c66768d63
https://github.com/systemd/systemd/commit/eaa3cbef3b8c214cd5c2d75b04e70ad477187e17
https://github.com/systemd/systemd/commit/adf412b9ec7292e0c83aaf9ab93e08c2c8bd524a

Unfortunately none of these commits describe what will go wrong if one of
these APIs is used across fork. Does anybody know what specifically is the
problem being addressed here? Can we detect this problem in some other
way?

My guess is that it has something to do with epoll's boneheaded design,
which basically requires code to recreate its epoll sets completely after
a fork. But surely if sd-event is the only thing that uses epoll we can
just have only it check the PID and let it propagate ECHILD back up to
those APIs that use it, rather than testing the PID in each and every API
individually?
Post by v***@pengaru.com
However,
there's nothing preventing the systemd-internal tooling from linking with
a less defensive/conformant underlying implementation shared with the public
API implementation where these kinds of overheads can be preserved.
This would be useful for external users of these APIs as well.
Lennart Poettering
2017-07-10 11:39:50 UTC
Permalink
Raw Message
Post by Michael Chapman
Post by v***@pengaru.com
In doing some casual journalctl profiling and stracing, it became apparent
that `journalctl -b --no-pager` runs across a significant quantity of logs,
~10% of the time was thrown away on getpid() calls due to commmit a65f06b.
# time ./journalctl -b --no-pager > /dev/null
real 0m11.033s
user 0m10.084s
sys 0m0.943s
# time ./journalctl -b --no-pager > /dev/null
real 0m9.641s
user 0m9.449s
sys 0m0.191s
[...]
Post by v***@pengaru.com
As this is public sd-journal API, it's somewhat set in stone.
So it's arguable whether making an API work in _more_ situations than it
previously did is a "breaking" change.
I've tried to go through the history for the various *_pid_changed()
functions in the APIs systemd presents, and I'm struggling to find a good
https://github.com/systemd/systemd/commit/d5a2b9a6f455468a0f29483303657ab4fd7013d8
https://github.com/systemd/systemd/commit/a65f06bb27688a6738f2f94b7f055f4c66768d63
https://github.com/systemd/systemd/commit/eaa3cbef3b8c214cd5c2d75b04e70ad477187e17
https://github.com/systemd/systemd/commit/adf412b9ec7292e0c83aaf9ab93e08c2c8bd524a
Unfortunately none of these commits describe what will go wrong if one of
these APIs is used across fork. Does anybody know what specifically is the
problem being addressed here? Can we detect this problem in some
other way?
This all stems from my experiences with PulseAudio back in the day:
People do not grok the effect of fork(): it only duplicates the
invoking thread, not any other threads of the process, moreover all
data structures are copied as they are, and that's a time bomb really:
consider one of our context objects is being used by one thread at the
moment another thread invokes fork(): the thread using the object is
busy making changes to the object, rearranging some datastructure (for
example, rehashing a hash table, because it hit its fill limit) and
suchlike. Now the fork() happens while it is doing that: the data
structure will be copied in its half-written, half-updated status quo,
and in the child process there's no thread that could finish what has
been started, and there's neither a way to rollback the changes that
are in progress.

In order to protect us against the time bomb we hence have these
checks that error our early and explicitly on all our calls, with
clean error codes as soon as a fork() is detected but people try to
reuse our objects.

Now, you could argue, that on purely single-threaded processes all
this is too restrictive, but it's really not like that anymore, as many
libraries (including our own, for example in sd-resolve) fork off
background threads, and invoke our code in them. Never forget that our
libraries such as sd-bus and so on or invoked by NSS, which is very
frequently invoked in threads (because NSS is synchronous, which is
very much incompatible with most graphical spps).

So yeah, we never know in which context our stuff is called, and hence
I think it's safe to refuse in a friendly way early on rather than
just wait for the time bomb to explode when our code touches those
objects again.

Lennart
--
Lennart Poettering, Red Hat
Michael Chapman
2017-07-10 11:51:13 UTC
Permalink
Raw Message
Post by Lennart Poettering
Post by Michael Chapman
Post by v***@pengaru.com
In doing some casual journalctl profiling and stracing, it became apparent
that `journalctl -b --no-pager` runs across a significant quantity of logs,
~10% of the time was thrown away on getpid() calls due to commmit a65f06b.
# time ./journalctl -b --no-pager > /dev/null
real 0m11.033s
user 0m10.084s
sys 0m0.943s
# time ./journalctl -b --no-pager > /dev/null
real 0m9.641s
user 0m9.449s
sys 0m0.191s
[...]
Post by v***@pengaru.com
As this is public sd-journal API, it's somewhat set in stone.
So it's arguable whether making an API work in _more_ situations than it
previously did is a "breaking" change.
I've tried to go through the history for the various *_pid_changed()
functions in the APIs systemd presents, and I'm struggling to find a good
https://github.com/systemd/systemd/commit/d5a2b9a6f455468a0f29483303657ab4fd7013d8
https://github.com/systemd/systemd/commit/a65f06bb27688a6738f2f94b7f055f4c66768d63
https://github.com/systemd/systemd/commit/eaa3cbef3b8c214cd5c2d75b04e70ad477187e17
https://github.com/systemd/systemd/commit/adf412b9ec7292e0c83aaf9ab93e08c2c8bd524a
Unfortunately none of these commits describe what will go wrong if one of
these APIs is used across fork. Does anybody know what specifically is the
problem being addressed here? Can we detect this problem in some
other way?
People do not grok the effect of fork(): it only duplicates the
invoking thread, not any other threads of the process, moreover all
consider one of our context objects is being used by one thread at the
moment another thread invokes fork(): the thread using the object is
busy making changes to the object, rearranging some datastructure (for
example, rehashing a hash table, because it hit its fill limit) and
suchlike. Now the fork() happens while it is doing that: the data
structure will be copied in its half-written, half-updated status quo,
and in the child process there's no thread that could finish what has
been started, and there's neither a way to rollback the changes that
are in progress.
[...]

Thanks, that really does clear things up.

It's a pity glibc doesn't provide an equivalent for pthread_atfork()
outside of the pthread library. Having a notification that a fork has just
occurred would allow us to do the PID caching ourselves.

Of course, there's still a problem with people calling the clone syscall
directly... but I think once people start doing that we have to trust them
to know what they're doing.
Lennart Poettering
2017-07-10 12:04:59 UTC
Permalink
Raw Message
Post by Michael Chapman
Post by Lennart Poettering
People do not grok the effect of fork(): it only duplicates the
invoking thread, not any other threads of the process, moreover all
consider one of our context objects is being used by one thread at the
moment another thread invokes fork(): the thread using the object is
busy making changes to the object, rearranging some datastructure (for
example, rehashing a hash table, because it hit its fill limit) and
suchlike. Now the fork() happens while it is doing that: the data
structure will be copied in its half-written, half-updated status quo,
and in the child process there's no thread that could finish what has
been started, and there's neither a way to rollback the changes that
are in progress.
[...]
Thanks, that really does clear things up.
It's a pity glibc doesn't provide an equivalent for pthread_atfork() outside
of the pthread library. Having a notification that a fork has just occurred
would allow us to do the PID caching ourselves.
Well, pthread_atfork() is probably more a source of problems than a solution
for them.

Mutexes and fork() do not mix well: if you have a thread that acquired
a mutex right before a fork() then it will cease to exist but the
mutex remains locked. Now, you could use pthread_atfork() to unlock
it, but that really works only in trivial cases, with trivial data
structures, and otherwise creates ABBA problems and similar. I mean,
mutexes are supposed to make pieces of code atomic from the outside
view: but if you duplicate a process without the thread it will appear
aborted to the outside, and that's quite far from "atomic"...
Post by Michael Chapman
Of course, there's still a problem with people calling the clone syscall
directly... but I think once people start doing that we have to trust them
to know what they're doing.
Yes: if you invoke clone() directly, you should really invoke execve()
too soon, and in the time between these two syscalls you should not
invoke getpid() and limit yourself to known safe calls.

Lennart
--
Lennart Poettering, Red Hat
Michael Chapman
2017-07-10 12:27:33 UTC
Permalink
Raw Message
Post by Lennart Poettering
Post by Michael Chapman
Post by Lennart Poettering
People do not grok the effect of fork(): it only duplicates the
invoking thread, not any other threads of the process, moreover all
consider one of our context objects is being used by one thread at the
moment another thread invokes fork(): the thread using the object is
busy making changes to the object, rearranging some datastructure (for
example, rehashing a hash table, because it hit its fill limit) and
suchlike. Now the fork() happens while it is doing that: the data
structure will be copied in its half-written, half-updated status quo,
and in the child process there's no thread that could finish what has
been started, and there's neither a way to rollback the changes that
are in progress.
[...]
Thanks, that really does clear things up.
It's a pity glibc doesn't provide an equivalent for pthread_atfork() outside
of the pthread library. Having a notification that a fork has just occurred
would allow us to do the PID caching ourselves.
Well, pthread_atfork() is probably more a source of problems than a solution
for them.
Mutexes and fork() do not mix well: if you have a thread that acquired
a mutex right before a fork() then it will cease to exist but the
mutex remains locked. Now, you could use pthread_atfork() to unlock
it, but that really works only in trivial cases, with trivial data
structures, and otherwise creates ABBA problems and similar. I mean,
mutexes are supposed to make pieces of code atomic from the outside
view: but if you duplicate a process without the thread it will appear
aborted to the outside, and that's quite far from "atomic"...
I understand that... which is why I was only talking about PID caching.
That is, it could be used to avoid the getpid() calls.

Anyway, it's all moot as I don't think we'd want to use pthread_atfork in
any systemd APIs -- I'm not sure if they all link to libpthread yet
anyway.
Post by Lennart Poettering
Post by Michael Chapman
Of course, there's still a problem with people calling the clone syscall
directly... but I think once people start doing that we have to trust them
to know what they're doing.
Yes: if you invoke clone() directly, you should really invoke execve()
too soon, and in the time between these two syscalls you should not
invoke getpid() and limit yourself to known safe calls.
Lennart
Florian Weimer
2017-07-11 19:22:53 UTC
Permalink
Raw Message
Post by Michael Chapman
It's a pity glibc doesn't provide an equivalent for pthread_atfork()
outside of the pthread library. Having a notification that a fork has just
occurred would allow us to do the PID caching ourselves.
In fact, it does, as a public symbol: __register_atfork.

It's just not really documented, so it lives a bit a vacuum, like the
old __secure_getenv call. To fix this, we need to make the dynamic
linker disregard sonames when resolving non-interposed versioned
symbols, and then move the definition from libpthread to libc.
Florian Weimer
2017-07-11 19:24:17 UTC
Permalink
Raw Message
Post by Lennart Poettering
People do not grok the effect of fork(): it only duplicates the
invoking thread, not any other threads of the process, moreover all
These days, the PID can change even without a fork, so the story is a
bit different.
Lennart Poettering
2017-07-12 07:31:22 UTC
Permalink
Raw Message
Post by Florian Weimer
Post by Lennart Poettering
People do not grok the effect of fork(): it only duplicates the
invoking thread, not any other threads of the process, moreover all
These days, the PID can change even without a fork, so the story is a
bit different.
Can you elaborate?

Are you talking about cases where you invoke clone() directly, instead
of via glibc's wrappers? We do that too in systemd, but I am not sure
this is really reason enough to introduce this regression in glibc:
this is easily worked around (which we do in systemd), and given that
the time between clone() and execve() should be short, and the
code between the two minimal this isn't really much of a problem.

Where was this discussed in detail? Do you have any links about the
discussions about this?

Lennart
--
Lennart Poettering, Red Hat
Florian Weimer
2017-07-12 07:53:10 UTC
Permalink
Raw Message
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
People do not grok the effect of fork(): it only duplicates the
invoking thread, not any other threads of the process, moreover all
These days, the PID can change even without a fork, so the story is a
bit different.
Can you elaborate?
I'm no longer sure that the PID can change with the current kernel,
but I cannot rule it out, either. But other weirdness is possible:
for example, after a fork, getpid and getppid could be equal.
Post by Lennart Poettering
Are you talking about cases where you invoke clone() directly, instead
of via glibc's wrappers? We do that too in systemd, but I am not sure
this is easily worked around (which we do in systemd), and given that
the time between clone() and execve() should be short, and the
code between the two minimal this isn't really much of a problem.
There are other uses of clone which do not immediately lead to an
execve.
Post by Lennart Poettering
Where was this discussed in detail? Do you have any links about the
discussions about this?
It was on libc-alpha and the glibc bug tracker.
Lennart Poettering
2017-07-12 08:25:38 UTC
Permalink
Raw Message
Post by Florian Weimer
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
People do not grok the effect of fork(): it only duplicates the
invoking thread, not any other threads of the process, moreover all
These days, the PID can change even without a fork, so the story is a
bit different.
Can you elaborate?
I'm no longer sure that the PID can change with the current kernel,
for example, after a fork, getpid and getppid could be equal.
Please be less vague. So I understand correctly that the PID cannot
change without fork after all? So why then regress so badly? And what
does getppid have to do with all this?
Post by Florian Weimer
Post by Lennart Poettering
Are you talking about cases where you invoke clone() directly, instead
of via glibc's wrappers? We do that too in systemd, but I am not sure
this is easily worked around (which we do in systemd), and given that
the time between clone() and execve() should be short, and the
code between the two minimal this isn't really much of a problem.
There are other uses of clone which do not immediately lead to an
execve.
Can you elaborate? Have a real-life user of this?
Post by Florian Weimer
Post by Lennart Poettering
Where was this discussed in detail? Do you have any links about the
discussions about this?
It was on libc-alpha and the glibc bug tracker.
Link?

Lennart
--
Lennart Poettering, Red Hat
Michael Chapman
2017-07-12 09:08:19 UTC
Permalink
Raw Message
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
Where was this discussed in detail? Do you have any links about the
discussions about this?
It was on libc-alpha and the glibc bug tracker.
Link?
Lennart
These seem quite relevant:

https://sourceware.org/ml/libc-alpha/2016-10/msg00136.html
https://sourceware.org/bugzilla/show_bug.cgi?id=19957

There's a bunch of other related bugs coming up in my searches too... it
seems that this one change was able to fix many things simultaneously.
Florian Weimer
2017-07-12 09:44:06 UTC
Permalink
Raw Message
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
People do not grok the effect of fork(): it only duplicates the
invoking thread, not any other threads of the process, moreover all
These days, the PID can change even without a fork, so the story is a
bit different.
Can you elaborate?
I'm no longer sure that the PID can change with the current kernel,
for example, after a fork, getpid and getppid could be equal.
Please be less vague. So I understand correctly that the PID cannot
change without fork after all?
I don't know. I don't see such a guarantee provided by the kernel.

I think the important aspect is that as far as glibc is concerned, the
PID *can* change even if its fork and vfork functions are never
called, so we cannot reliably invalidate a cache.

We could decide that it is undefined to call the fork/vfork/clone
system calls directly, but that would break systemd functionality, I
think.
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
Are you talking about cases where you invoke clone() directly, instead
of via glibc's wrappers? We do that too in systemd, but I am not sure
this is easily worked around (which we do in systemd), and given that
the time between clone() and execve() should be short, and the
code between the two minimal this isn't really much of a problem.
There are other uses of clone which do not immediately lead to an
execve.
Can you elaborate? Have a real-life user of this?
src/nspawn/nspawn.c? A lot of things happen in the inner_child
function.
Post by Lennart Poettering
Post by Florian Weimer
Post by Lennart Poettering
Where was this discussed in detail? Do you have any links about the
discussions about this?
It was on libc-alpha and the glibc bug tracker.
Link?
https://sourceware.org/ml/libc-alpha/2016-10/msg00136.html
https://sourceware.org/ml/libc-alpha/2016-10/msg00233.html
https://sourceware.org/ml/libc-alpha/2016-11/msg00247.html
https://sourceware.org/ml/libc-alpha/2016-11/msg00303.html
https://www.postgresql.org/message-id/E1UKzBn-0006c6-***@gemulon.postgresql.org

The last reason explains why using getpid for fork detection does not
work in the way one would expect.
Loading...