Discussion:
PATCH: tmpfiles cleanup
(too old to reply)
Michael Meeks
2012-01-11 12:38:58 UTC
Permalink
Hi there,

Trying to chase down my sudden keyring / tmpfile socket death
syndrome ;-) I poked at the tmpfile cleanup code.

With this debug patch:

diff --git a/src/tmpfiles.c b/src/tmpfiles.c
index 21bf44d..92082e4 100644
--- a/src/tmpfiles.c
+++ b/src/tmpfiles.c
@@ -133,14 +133,19 @@ static void load_unix_sockets(void) {

truncate_nl(line);

+ fprintf (stderr, "line '%s'\n", line);
if (strlen(line) < 53)
continue;

p = line + 53;
+ fprintf (stderr, "line[53] '%s'\n", p);
+
p += strspn(p, WHITESPACE);
p += strcspn(p, WHITESPACE);
p += strspn(p, WHITESPACE);

+ fprintf (stderr, "line[53] '%s'\n", p);
+
if (*p != '/')
continue;

@@ -149,6 +154,7 @@ static void load_unix_sockets(void) {

path_kill_slashes(s);

+ fprintf (stderr, "set_put '%s'\n", s);
if ((k = set_put(unix_sockets, s)) < 0) {
free(s);

I got this output:

line 'f2f8b940: 00000002 00000000 00010000 0001 01
42673259 /tmp/.X11-unix/X0'
line[53] ' /tmp/.X11-unix/X0'
line[53] ''
line 'f1b1fdc0: 00000002 00000000 00010000 0001 01
42674940 /tmp/.ICE-unix/9731'
line[53] ' /tmp/.ICE-unix/9731'
line[53] ''
line 'f2f56040: 00000002 00000000 00010000 0001 01
796 /var/run/dbus/system_bus_socket'
line[53] 'ar/run/dbus/system_bus_socket'
line[53] ''
line 'f2f6eb80: 00000002 00000000 00010000 0001 01
42675813 /tmp/keyring-cNhiNs/control'
line[53] ' /tmp/keyring-cNhiNs/control'
line[53] ''
...

ie. nothing going into the hash; with the attached patch as a fix, and
an additional:

@@ -149,6 +142,7 @@ static void load_unix_sockets(void) {

path_kill_slashes(s);

+ fprintf (stderr, "set_put '%s'\n", s);
if ((k = set_put(unix_sockets, s)) < 0) {
free(s);

I get:

set_put '/tmp/.X11-unix/X0'
set_put '/tmp/.ICE-unix/9731'
set_put '/var/run/dbus/system_bus_socket'
set_put '/tmp/keyring-cNhiNs/control'
set_put '/tmp/keyring-cNhiNs/ssh'
set_put '/tmp/keyring-cNhiNs/gpg'
set_put '/tmp/keyring-cNhiNs/pkcs11'
set_put '/tmp/.esd-1000/socket'
set_put
'/home/michael/.pulse/e840e2e044504d5071681f0d00000658-runtime/native'
set_put
'/home/michael/.pulse/e840e2e044504d5071681f0d00000658-runtime/dbus-socket'
set_put '/tmp/dbus-QcEdBn2bFr'
set_put '/tmp/gdm-session-WfnKDMpY'
set_put '/tmp/.X11-unix/X0'
set_put '/var/run/acpid.socket'
...

which I was expecting.

Hope that's ok: my first prototype systemd patch ;-)

Could this potentially explain my problem ?

All the best,

Michael.
--
***@suse.com <><, Pseudo Engineer, itinerant idiot
Lennart Poettering
2012-01-11 21:01:11 UTC
Permalink
Post by Michael Meeks
Hi there,
Trying to chase down my sudden keyring / tmpfile socket death
syndrome ;-) I poked at the tmpfile cleanup code.
line 'f2f8b940: 00000002 00000000 00010000 0001 01
42673259 /tmp/.X11-unix/X0'
This is interesting, it apparently boils down to the first column being
32bit for you and 64bit for me. I guess just skipping 53 chars is not OK
after all..

Is your machine 32bit?

/me goes and fixes that.

Thanks for tracking this down!

Lennart
--
Lennart Poettering - Red Hat, Inc.
Lennart Poettering
2012-01-11 21:12:57 UTC
Permalink
Post by Lennart Poettering
Is your machine 32bit?
/me goes and fixes that.
Here's the fix:

http://cgit.freedesktop.org/systemd/systemd/patch/?id=fdcad0c25579a60061b1fda956686e878a80faef

What's interesting btw is though the first column of /proc/net/unix is
64bit on 64bit machines the header line didn't get fixed and the table
is all skewed now on 64bit machines. I guess in a way that's a kernel
bug...

Lennart
--
Lennart Poettering - Red Hat, Inc.
Michael Meeks
2012-01-12 10:10:31 UTC
Permalink
Post by Lennart Poettering
Post by Michael Meeks
Trying to chase down my sudden keyring / tmpfile socket death
syndrome ;-) I poked at the tmpfile cleanup code.
...
Post by Lennart Poettering
This is interesting, it apparently boils down to the first column being
32bit for you and 64bit for me. I guess just skipping 53 chars is not OK
after all..
Right, and hard-coding such a number seems just a tad lame (to the
un-initiated me) ;-)
Post by Lennart Poettering
Is your machine 32bit?
Yes.
Post by Lennart Poettering
Thanks for tracking this down!
No problem; it is only the belt - not the braces; I'll knock up
something more robust re-using the linc-cleanup-sockets goodness, that
should also avoid the unpleasant race-condition in there whereby a
socket is created between parsing /proc/net/unix and the deletion phase.
You rob me of the joy of getting a patch into systemd ! ;-)
Post by Lennart Poettering
What's interesting btw is though the first column of /proc/net/unix is
64bit on 64bit machines the header line didn't get fixed and the table
is all skewed now on 64bit machines. I guess in a way that's a kernel
bug...
I'm amazed (given the clear heuristic we have for sorting paths from
non-paths ;-) that we prefer this more complicated, slower, more fragile
code instead - but c'est la vie ;-)

What are we trying to protect against with that extra complexity ? the
kernel switching to dumping a space delimited base64 blob after column
<n> and before the path ? ;-)

But anyhow - working on the next patch ;-)

ATB,

Michael.
--
***@suse.com <><, Pseudo Engineer, itinerant idiot
Michael Meeks
2012-01-12 11:07:30 UTC
Permalink
Post by Michael Meeks
No problem; it is only the belt - not the braces; I'll knock up
something more robust re-using the linc-cleanup-sockets goodness, that
should also avoid the unpleasant race-condition in there whereby a
socket is created between parsing /proc/net/unix and the deletion phase.
Hah - my 'race' is more imaginary than real because of the atime,
mtime, ctime checks (we are actually using atime!?).

Nevertheless, if we want to be truly clean socket-wise we could use
something like the attached ? this is also pretty quick to execute - so
unless there is some serious performance reason, we might consider
dropping the /proc/net/unix parsing altogether ? [ though of course my
understanding may be broken here, perhaps there are odd unix socket
types, and races, and so on that it will not handle ;-) ]

HTH,

Michael.


With the attached & layering this small debug patch on top, I get:

diff --git a/src/tmpfiles.c b/src/tmpfiles.c
index b3d17a7..e1c77c2 100644
--- a/src/tmpfiles.c
+++ b/src/tmpfiles.c
@@ -222,6 +222,8 @@ static bool unix_socket_alive(const char *fn) {
break;
}
done:
+ fprintf (stderr, "Socket '%s' - %s cnx %d errno %d\n",
+ fn, ret ? "alive" : "dead", cnx, cnx < 0 ? errno : 0);
if (sd >= 0)
close (sd);


Layered on top, and the parsed cache disabled I get; run one:

Socket '/tmp/at-spi2/socket-9665-278722862' - dead cnx -1 errno 111
Socket '/tmp/at-spi2/socket-9683-1804289383' - dead cnx -1 errno 111
Socket '/tmp/at-spi2/socket-9658-782288962' - dead cnx -1 errno 111
Socket '/tmp/OSL_PIPE_1000_SingleOfficeIPC_2697e28e886d5a3223771cb079e42' - alive cnx 0 errno 0
Socket '/tmp/pulse-7VI5uG1yhc3u/dbus-socket' - alive cnx 0 errno 0
Socket '/tmp/pulse-7VI5uG1yhc3u/native' - alive cnx 0 errno 0
Socket '/tmp/orbit-michael/linc-1174-0-77e8714ad4876' - dead cnx -1 errno 111
Socket '/tmp/.esd-1000/socket' - alive cnx 0 errno 0
Socket '/tmp/OSL_PIPE_1000_SingleOfficeIPC_16adf7e8b0d32dc9df645db617f77752' - alive cnx 0 errno 0
Socket '/tmp/keyring-cNhiNs/pkcs11' - alive cnx 0 errno 0
Socket '/tmp/keyring-cNhiNs/control' - alive cnx 0 errno 0
Socket '/tmp/keyring-cNhiNs/ssh' - alive cnx 0 errno 0
Socket '/tmp/keyring-cNhiNs/gpg' - alive cnx 0 errno 0
Socket '/tmp/.X11-unix/X0' - alive cnx 0 errno 0
Socket '/tmp/.ICE-unix/9731' - alive cnx 0 errno 0

run two:

Socket '/tmp/OSL_PIPE_1000_SingleOfficeIPC_2697e28e886d5a3223771cb079e42' - alive cnx 0 errno 0
Socket '/tmp/pulse-7VI5uG1yhc3u/dbus-socket' - alive cnx 0 errno 0
Socket '/tmp/pulse-7VI5uG1yhc3u/native' - alive cnx 0 errno 0
Socket '/tmp/.esd-1000/socket' - alive cnx 0 errno 0
Socket '/tmp/OSL_PIPE_1000_SingleOfficeIPC_16adf7e8b0d32dc9df645db617f77752' - alive cnx 0 errno 0
Socket '/tmp/keyring-cNhiNs/pkcs11' - alive cnx 0 errno 0
Socket '/tmp/keyring-cNhiNs/control' - alive cnx 0 errno 0
Socket '/tmp/keyring-cNhiNs/ssh' - alive cnx 0 errno 0
Socket '/tmp/keyring-cNhiNs/gpg' - alive cnx 0 errno 0
Socket '/tmp/.X11-unix/X0' - alive cnx 0 errno 0
Socket '/tmp/.ICE-unix/9731' - alive cnx 0 errno 0
--
***@suse.com <><, Pseudo Engineer, itinerant idiot
Lennart Poettering
2012-01-28 01:20:33 UTC
Permalink
On Thu, 12.01.12 11:07, Michael Meeks (***@suse.com) wrote:

heya,

Sorry for the long delay in responding!
Post by Michael Meeks
Post by Michael Meeks
No problem; it is only the belt - not the braces; I'll knock up
something more robust re-using the linc-cleanup-sockets goodness, that
should also avoid the unpleasant race-condition in there whereby a
socket is created between parsing /proc/net/unix and the deletion phase.
Hah - my 'race' is more imaginary than real because of the atime,
mtime, ctime checks (we are actually using atime!?).
Actually, the timestamps of AF_UNIX sockets nodes are not updated when
one connects to them. :-(
Post by Michael Meeks
Nevertheless, if we want to be truly clean socket-wise we could use
something like the attached ? this is also pretty quick to execute - so
unless there is some serious performance reason, we might consider
dropping the /proc/net/unix parsing altogether ? [ though of course my
understanding may be broken here, perhaps there are odd unix socket
types, and races, and so on that it will not handle ;-) ]
Hmm, so, I see a two problems with this. Firstly your patch currently
doesn't cover SOCK_DGRAM nor SOCK_SEQPACKET AF_UNIX sockets, only
SOCK_STREAM AF_UNIX sockets. This could certainly be fixed, but for
SEQPACKET and STREAM doing a test connect is not passive, but actually
causes the process behing the socket to wake up (it will see an incoming
connection that is immediately terminated). The wake up is probably not
much of a problem in most cases, but in some cases it might: proceses
that terminate when they are idle -- they'll be kept alive through our
repeated checks, even though they should long have died. (and processes
like that are not that uncommon, for example in systemd we ship a couple
of services like that.)

So, I really see the usefulness of your work, and I'd actually prefer
to implement things without parsing /proc/net/unix, but I fear that the
same amount of bugs and problems would be created through those spurious
wakeups of those processes as it fixed by these checks.

(The SOCK_DGRAM part we could actually get working without a wake-up
since connect() on a SOCK_DGRAM socket will not cause the remote side to
wake up, but SOCK_DGRAM is just one kind, and SOCK_STREAM is probably
far more relevant than SOCK_DRGAM and connect() on it does trigger the
wake-up...)

Parsing /proc/net/unix sucks quite a bit I must admit. There's one
improvement that I think we could do here, which is making use of the
more recent netlink based API for detecting local sockets, which the
"ss" tool is now using.

Lennart
--
Lennart Poettering - Red Hat, Inc.
Continue reading on narkive:
Loading...