Discussion:
[Patch 0/2] logind: make sure that closed sessions will be cleaned
(too old to reply)
Djalal Harouni
2014-01-03 13:19:19 UTC
Permalink
On logout pam_systemd should ensures the following:
"If the last concurrent session of a user ends, the $XDG_RUNTIME_DIR
directory and all its contents are removed, too." from manpage.

Using git HEAD, and a simple systemd-nspawn test will show that the
above is not ensured and the sessions will stay!


A simple systemd-nspawn test:

1) login as user X
2) logout
3) login as user Y
4) loginctl (will list session of user X)


In this example we are session c4:

-bash-4.2# loginctl list-sessions
SESSION UID USER SEAT
1 1000 tixxdz seat0
c1 1000 tixxdz seat0
c2 0 root seat0
c3 1000 tixxdz seat0
c4 0 root seat0

-bash-4.2# loginctl show-session --property=State 1 c1 c2 c3 c4
State=closing

State=closing

State=closing

State=closing

State=active


As shown only session c4 is active, all the others are dead sessions.

To close the dead sessions and clean things up, a dbus
TerminateSession()=>session_stop() must be issued...

Please note that I'm running without pam_loginuid.so, due to another
bug related to audit: https://bugzilla.redhat.com/show_bug.cgi?id=966807


Anyway, after some debugging:

It seems that after ReleaseSession() which is called by pam_systemd,
the user,session and seat state files will also still be available.
The garbage-collector will miss them!

In src/login/logind.c:manager_gc() the while loops will never be entered.


The user slice units will start, then the match_job_removed() and co
signals on these units will call session_add_to_gc_queue() and
user_add_to_gc_queue() to push to gc_queue when done, in the mean time
the manager_gc() will consume the gc_queue and remove them

IOW *just* before and after the ReleaseSession() the manager
"{session|user}_gc_queue" queues might be empty, hence session, users
and seats will never be cleaned! the user's slice will still be alive...


To fix this, I'm attaching two patches and I can say that they are
related to each other from the perspective of the described bug, and at
the same time they are independent of each other from a general
perspective!


1) Make method_release_session() call session_add_to_gc_queue()
This will ensure that the released session is in the gc_queue.

This change gives the garbage collector a chance to collect sessions,
and should not affect the logind behaviour and other display managers,
the session_check_gc() is able to detect if the session is still valid.

The thing is that from a quick git log the method_release_session()
never called session_add_to_gc_queue(), so this bug was introduced or
made *visible* by another change... (not sure)


2) As in commit 63966da86d8e, in function session_check_gc() the session
manager will always be around so don't check it in order to
garbage-collect the session.

Thanks!
Djalal Harouni
2014-01-03 13:19:20 UTC
Permalink
Currently on logout, session and user state files might stay and will
not be cleaned up, this is true on systems where dbus TerminateSession()
is not called on logouts.

The manager garbage-collector will miss them due to the session_gc_queue
being empty. A call to dbus TerminateSession() which will call
session_stop() is needed in order to push the session into the
session_gc_queue.

To ensure that sessions will have the chance to be collected, make the
method_release_session() call session_add_to_gc_queue() before
finishing, this gives the manager_gc() a chance to check if the current
session should be collected and if so call session_stop() on it.
---
src/login/logind-dbus.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/src/login/logind-dbus.c b/src/login/logind-dbus.c
index 08510b5..c18a74a 100644
--- a/src/login/logind-dbus.c
+++ b/src/login/logind-dbus.c
@@ -742,6 +742,7 @@ static int method_release_session(sd_bus *bus, sd_bus_message *message, void *us
session_remove_fifo(session);
session_save(session);
user_save(session->user);
+ session_add_to_gc_queue(session);

return sd_bus_reply_method_return(message, NULL);
}
--
1.8.3.1
Djalal Harouni
2014-01-03 13:19:21 UTC
Permalink
As in commit 63966da86, the session manager will always be around, so
make sure that in function session_check_gc() we don't check it. This
gives the manager a chance to garbage-collect sessions.
---
src/login/logind-session.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/src/login/logind-session.c b/src/login/logind-session.c
index dc4b3e1..a78d4dd 100644
--- a/src/login/logind-session.c
+++ b/src/login/logind-session.c
@@ -935,9 +935,6 @@ bool session_check_gc(Session *s, bool drop_not_started) {
if (s->scope_job && manager_job_is_active(s->manager, s->scope_job))
return true;

- if (s->scope && manager_unit_is_active(s->manager, s->scope))
- return true;
-
return false;
}
--
1.8.3.1
Djalal Harouni
2014-01-08 16:12:23 UTC
Permalink
ping?

(Please if I'm missing something let me know)
Post by Djalal Harouni
"If the last concurrent session of a user ends, the $XDG_RUNTIME_DIR
directory and all its contents are removed, too." from manpage.
Using git HEAD, and a simple systemd-nspawn test will show that the
above is not ensured and the sessions will stay!
1) login as user X
2) logout
3) login as user Y
4) loginctl (will list session of user X)
-bash-4.2# loginctl list-sessions
SESSION UID USER SEAT
1 1000 tixxdz seat0
c1 1000 tixxdz seat0
c2 0 root seat0
c3 1000 tixxdz seat0
c4 0 root seat0
-bash-4.2# loginctl show-session --property=State 1 c1 c2 c3 c4
State=closing
State=closing
State=closing
State=closing
State=active
As shown only session c4 is active, all the others are dead sessions.
To close the dead sessions and clean things up, a dbus
TerminateSession()=>session_stop() must be issued...
Please note that I'm running without pam_loginuid.so, due to another
bug related to audit: https://bugzilla.redhat.com/show_bug.cgi?id=966807
It seems that after ReleaseSession() which is called by pam_systemd,
the user,session and seat state files will also still be available.
The garbage-collector will miss them!
In src/login/logind.c:manager_gc() the while loops will never be entered.
The user slice units will start, then the match_job_removed() and co
signals on these units will call session_add_to_gc_queue() and
user_add_to_gc_queue() to push to gc_queue when done, in the mean time
the manager_gc() will consume the gc_queue and remove them
IOW *just* before and after the ReleaseSession() the manager
"{session|user}_gc_queue" queues might be empty, hence session, users
and seats will never be cleaned! the user's slice will still be alive...
To fix this, I'm attaching two patches and I can say that they are
related to each other from the perspective of the described bug, and at
the same time they are independent of each other from a general
perspective!
1) Make method_release_session() call session_add_to_gc_queue()
This will ensure that the released session is in the gc_queue.
This change gives the garbage collector a chance to collect sessions,
and should not affect the logind behaviour and other display managers,
the session_check_gc() is able to detect if the session is still valid.
The thing is that from a quick git log the method_release_session()
never called session_add_to_gc_queue(), so this bug was introduced or
made *visible* by another change... (not sure)
2) As in commit 63966da86d8e, in function session_check_gc() the session
manager will always be around so don't check it in order to
garbage-collect the session.
Thanks!
_______________________________________________
systemd-devel mailing list
http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
Djalal Harouni
http://opendz.org
Zbigniew Jędrzejewski-Szmek
2014-01-08 16:29:05 UTC
Permalink
Post by Djalal Harouni
ping?
(Please if I'm missing something let me know)
No. I think that nobody had time to review this.

Zbyszek
Djalal Harouni
2014-01-09 00:56:26 UTC
Permalink
Post by Zbigniew Jędrzejewski-Szmek
Post by Djalal Harouni
ping?
(Please if I'm missing something let me know)
No. I think that nobody had time to review this.
Zbyszek
Ok, thank you Zbyszek!
--
Djalal Harouni
http://opendz.org
Zbigniew Jędrzejewski-Szmek
2014-01-11 21:26:13 UTC
Permalink
Post by Djalal Harouni
"If the last concurrent session of a user ends, the $XDG_RUNTIME_DIR
directory and all its contents are removed, too." from manpage.
Using git HEAD, and a simple systemd-nspawn test will show that the
above is not ensured and the sessions will stay!
I can't reproduce this (with todays git). In the examples below, I
understand that you're logging in through getty. Can you test with
current git and/or provide a complete recipe to reproduce this?

Zbyszek
Post by Djalal Harouni
1) login as user X
2) logout
3) login as user Y
4) loginctl (will list session of user X)
-bash-4.2# loginctl list-sessions
SESSION UID USER SEAT
1 1000 tixxdz seat0
c1 1000 tixxdz seat0
c2 0 root seat0
c3 1000 tixxdz seat0
c4 0 root seat0
Djalal Harouni
2014-01-12 01:07:32 UTC
Permalink
Post by Zbigniew Jędrzejewski-Szmek
Post by Djalal Harouni
"If the last concurrent session of a user ends, the $XDG_RUNTIME_DIR
directory and all its contents are removed, too." from manpage.
Using git HEAD, and a simple systemd-nspawn test will show that the
above is not ensured and the sessions will stay!
I can't reproduce this (with todays git). In the examples below, I
understand that you're logging in through getty. Can you test with
current git and/or provide a complete recipe to reproduce this?
Yes through getty, and I guess this issue will also be visible for
ssh/remote sessions, or sessions where TerminateSession() is not called.


Ok, testing both a fedora 20 systemd and git!

Using the "Testing in a Container" from:
www.freedesktop.org/wiki/Software/systemd/VirtualizedTesting/

On a fedora 19 system, I installed a Fedora 20:
"yum --releasever=20 --installroot=$HOME/fedora-tree-20 ..."

$ cat /home/tixxdz/fedora-tree-20/etc/redhat-release
Fedora release 20 (Heisenbug)

I set the root password, disable pam_loginuid.so and boot as shown in
the "Testing in a Container" doc

1) login as root
2) logout
3) login as root (again)
-bash-4.2# loginctl
SESSION UID USER SEAT
1 0 root seat0
c1 0 root seat0

2 sessions listed.

As you can see the closed sessions are not cleaned. So I confirm the
bug from testing a Fedora 20 container.



Now from git 8042e377b8bb8f:!

I just do:
$ ./autogen.sh && ./configure --disable-kdbus --libdir=/usr/lib64
$ make -j8
# DESTDIR=/home/tixxdz/fedora-tree make install
# systemd-nspawn -bD /home/tixxdz/fedora-tree/ 3

Please note that: /usr/lib64 is needed since it will not be set
automatically. It will be set by distributions and packagers of systemd

So I make sure that pam_systemd modules are in /usr/lib64/security
instead of /usr/lib/security, since only the ones that are in
/usr/lib64/security are loaded.

Hmm a guess, perhaps the old pam_systemd that got installed by the
systemd rpm package is being loaded instead of the new compiled one!

*Coupled* with graphical display managers who perhaps kill this bug by
forcing a TerminateSession() vs getty logins!

So using today git 8042e377b8 I also confirm the same bug.


Please Zbyszek let me know if the pervious "Testing in a Container"
using a Fedora 20 (a new systemd version), shows something for you?

or perhaps check the timestamp of your *assumed right*
/usr/lib64/security/pam_systemd.so after "make install" ?


Thanks Zbyszek!
--
Djalal Harouni
http://opendz.org
Zbigniew Jędrzejewski-Szmek
2014-01-16 05:01:58 UTC
Permalink
Post by Djalal Harouni
Post by Zbigniew Jędrzejewski-Szmek
Post by Djalal Harouni
"If the last concurrent session of a user ends, the $XDG_RUNTIME_DIR
directory and all its contents are removed, too." from manpage.
Using git HEAD, and a simple systemd-nspawn test will show that the
above is not ensured and the sessions will stay!
I can't reproduce this (with todays git). In the examples below, I
understand that you're logging in through getty. Can you test with
current git and/or provide a complete recipe to reproduce this?
Yes through getty, and I guess this issue will also be visible for
ssh/remote sessions, or sessions where TerminateSession() is not called.
Thank you for the recipe. This helps.

Indeed, in a container (without your patches), sessions remain in
"closing" state. But with your patches, systemd --user instance is
started and killed immediately during login. Not good either :)
With just the first patch, session still remain as "closing".

Also, there seems to be a regression with Fedora installs with yum:
I installed a fresh one, and there was no /var/run -> /run symlink,
the first boot was mostly broken.
-> https://bugzilla.redhat.com/show_bug.cgi?id=1053983

Zbyszek
Djalal Harouni
2014-01-16 10:19:08 UTC
Permalink
Post by Zbigniew Jędrzejewski-Szmek
Post by Djalal Harouni
Post by Zbigniew Jędrzejewski-Szmek
Post by Djalal Harouni
"If the last concurrent session of a user ends, the $XDG_RUNTIME_DIR
directory and all its contents are removed, too." from manpage.
Using git HEAD, and a simple systemd-nspawn test will show that the
above is not ensured and the sessions will stay!
I can't reproduce this (with todays git). In the examples below, I
understand that you're logging in through getty. Can you test with
current git and/or provide a complete recipe to reproduce this?
Yes through getty, and I guess this issue will also be visible for
ssh/remote sessions, or sessions where TerminateSession() is not called.
Thank you for the recipe. This helps.
Indeed, in a container (without your patches), sessions remain in
"closing" state. But with your patches, systemd --user instance is
started and killed immediately during login. Not good either :)
Ok, ouch!
Post by Zbigniew Jędrzejewski-Szmek
With just the first patch, session still remain as "closing".
Ok, thanks for the input, I'll work on it soon
Post by Zbigniew Jędrzejewski-Szmek
I installed a fresh one, and there was no /var/run -> /run symlink,
the first boot was mostly broken.
-> https://bugzilla.redhat.com/show_bug.cgi?id=1053983
Oh yes, I forgot about that, yes I fixed it in my install!

Indeed, the bug is in yum, totally forget to report it, sorry :-/
Post by Zbigniew Jędrzejewski-Szmek
Zbyszek
Thanks!
--
Djalal Harouni
http://opendz.org
Djalal Harouni
2014-01-22 09:24:24 UTC
Permalink
Post by Djalal Harouni
Post by Zbigniew Jędrzejewski-Szmek
Indeed, in a container (without your patches), sessions remain in
"closing" state. But with your patches, systemd --user instance is
started and killed immediately during login. Not good either :)
Ok, ouch!
It seems that the systemd --user instance is killed without my patches!

Systemd git will confirm it.

Ok, after some thoughts and source code analysis, I came to the
conclusion that in order to correctly clean session and user state
files, we need:

a) Improve session_check_gc() and user_check_gc() by using the session
and user states. This implies b)

b) Make sure that the state of the session and the user is always
correctly set.


I'll discuss some notes about a) and follow with patches that will try
to honore b).

I'm sending this to have some feedback before spending more time on it,
ahh yes debugging this beast is hard!

Please note that I did not take a look at seats logic, only sessions and
users using getty and ssh logins.

Please feel free to correct me!


a) Improve session_check_gc() and user_check_gc()

* user_check_gc() has an "if (u->sessions)" check! this should be
updated to check if there are sessions that are not in the "closing"
state.

If *all* the sessions are in the "closing" state then check each
session to see if there are still running processes in the background:
screen,...
If so then abort the user_check_gc() or depend on a per-session
function that will perhaps do the "kill-session-processes".

* session_check_gc() should check if the session is in the "closing"
state and if there are still running processes in the background, then
decide to abort or force the "kill"session-processes"

But it should not do the following check:
"if (s->scope && manager_unit_is_active(s->manager, s->scope))"

Since the scope will always be around, the same was applied in the
commit 63966da86 for user_check_gc(), do not check the user manager!



b) Make sure that the state of the session and the user is always
correctly set. Not to mention that some logind clients depend on it.

So to make a) we must honor that b) is stable, and after an extensive
source code analysis, it seems that the state of users and sessions is
not always correct. I'll list some notes/bugs here:

* It seems that in systemd v204 after a logout from a session where
processes are still running like "screen" the state files will show:

systemd 204
session state file: user state file:
Active = 1 State = active
State = closing ACTIVE_SESSIONS = X

Where for systemd git:
session state file: user state file:
Active = 0 State = closing
State = closing ACTIVE_SESSIONS =

Not to mention that for systemd git, the session and user state files
will show the same even if there are no background processes and they
will not be cleaned (using getty or ssh logins) which is is the object
of this thread!

This is a bug, not fixed yet.


* To track the state of sessions the fifo_fd should be used since the
event manager will call session_dispatch_fifo() to remove it on eof.

Using the session_is_active() *before* the fifo_fd check is not stable
since it will return true before the fifo_fd was even created! and
also before the session scope and user serivce are created, the
session->seat->active will be set earlier before the fifo_fd creation
and will be the last one to be unset after fifo_fd removal.

And currently this affects src/login/logind-user.c:user_get_state()
logic since it checks session_is_active() before calling
session_get_state() which performs the fifo_fd check before the
session_is_active()

So user_get_state() and session_get_state() results might differ!

This is a bug, not fixed yet, user_get_state() should count on
session_get_state().


* It seems there are other problems with the session and user states,
different variables are used to check the state and later these
variables are updated at different places!

The following patches will describe the problem and try to make the
state more stable.

The design was do not add extra flags to 'User' and 'Session' structs
unless this is the only way. The logic is already there, and it needs
a littel synchronization.


Thanks!
--
Djalal Harouni
http://opendz.org
Djalal Harouni
2014-01-22 09:25:51 UTC
Permalink
Currently the user and session states are not stable:

1) session state:
To get the session state the function session_get_state() is used.

Opening state:
At login the D-Bus CreateSession() method will call session_start()
which calls user_start() and session_start_scope() to queue the scope
job and set the session->scope_job

=> session_get_state() == SESSION_OPENING (correct)

Then execution will continue from session_send_create_reply() which
is called by the D-Bus match_job_removed() signal. math_job_removed()
will check if this is the session scope and if this is the previously
queued scope job and if so it will clear the session->scope_job

=> session_get_state() == SESSION_CLOSING (incorrect)
(session closing since fifo_fd == -1)

Then scope is created successfully, call session_send_create_reply()
which will wait for the session scope *and* the user service to be
successfully created.

/* user service is still pending */
if (s->scope_job || s->user->service_job))
return 0

=> session_get_state() == SESSION_CLOSING (incorrect)

else
session_create_fifo() /* fifo_fd finally created */

=> session_get_state() == SESSION_ACTIVE (correct)

2) user state:
To get the user state the function user_get_state() is used.

I'll add that the user_get_state() and session_get_state() do not
have the same logic when it comes to sessions, this will just add
ambiguity. user_get_state() calls session_is_active() before checking
session_get_state(), and session_is_active() will return true right
from the start since the logic is set during D-Bus CreateSession().

Opening state:
At login we have session_start() which calls user_start()

here we already:

=> user_get_state() == USER_ACTIVE (incorrect)
(not fixed in this patch)

At user_start()
user_start_slice() queue the slice and set user->slice_job
user_start_service() queue the service and set user->service_job

=> user_get_state() == USER_OPENING (correct)

Then execution will continue from session_send_create_reply() which
is called by the D-Bus match_job_removed() signal. math_job_removed()
will check if these are the user service and slice and if they are
the previously queued jobs, if so it will clear the:
user->service_job and user->slice_job

=> user_get_state() == USER_ACTIVE (incorrect)
(incorrect since the fifo_fd has not been created)

Then service is created successfully, call session_send_create_reply()
which will wait for the session scope *and* the user service to be
successfully created. If so then session_send_create_reply()
will continue and call session_create_fifo()

=> user_get_state() == USER_ACTIVE (correct)
(fifo_fd was created)

Fix:
To fix the session state and remove the two incorrect SESSION_CLOSING,
we do not clear up the "session->scope_job" when we detect that this is
the session scope, we setup a temporary variable "scope_job" that will
get the current value of "session->scope_job" and update it if
necessary.

Add a new "opening" variable to check if the session scope and user
service were successfully created.

Update the session_send_create_reply() function to receive the "opening"
variable as a third bool parameter and adapt its logic.

Then clear the "session->scope_job" when session_send_create_reply() is
finished, this ensures that the state will just go from:
"SESSION_OPENING" => "SESSION_ACTIVE"

The same goes for the user state, in order to remove the incorrect state
when the fifo_fd is not created but the state shows USER_ACTIVE, we do
not clear the "user->service_job" and "user->slice_job" right away, we
store the state in a temporary variable "service_job" and update it
later if we detect that this is the user service.

Add a new "opening" variable to check if the session scope and user
service were successfully created and pass it to
session_send_create_reply().
---
src/login/logind-dbus.c | 56 ++++++++++++++++++++++++++++-------------
src/login/logind-session-dbus.c | 8 +++---
src/login/logind-session.h | 2 +-
3 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/src/login/logind-dbus.c b/src/login/logind-dbus.c
index 2c86b9f..0123a34 100644
--- a/src/login/logind-dbus.c
+++ b/src/login/logind-dbus.c
@@ -1937,54 +1937,76 @@ int match_job_removed(sd_bus *bus, sd_bus_message *message, void *userdata, sd_b

session = hashmap_get(m->session_units, unit);
if (session) {
+ bool opening, scope_job = !!session->scope_job;

- if (streq_ptr(path, session->scope_job)) {
- free(session->scope_job);
- session->scope_job = NULL;
- }
+ /* Set to false if the session scope was created */
+ if (streq_ptr(path, session->scope_job))
+ scope_job = false;
+
+ /* If the session scope and the user service are still
+ * in the creation process this will be set to true,
+ * otherwise it will be false */
+ opening = scope_job || !!session->user->service_job;

if (session->started) {
if (streq(result, "done"))
- session_send_create_reply(session, NULL);
+ session_send_create_reply(session, NULL, opening);
else {
_cleanup_bus_error_free_ sd_bus_error e = SD_BUS_ERROR_NULL;

sd_bus_error_setf(&e, BUS_ERROR_JOB_FAILED, "Start job for unit %s failed with '%s'", unit, result);
- session_send_create_reply(session, &e);
+ session_send_create_reply(session, &e, opening);
}
} else
session_save(session);

+ if (!scope_job) {
+ /* Clean this up after session_send_create_reply() */
+ free(session->scope_job);
+ session->scope_job = NULL;
+ }
+
session_add_to_gc_queue(session);
}

user = hashmap_get(m->user_units, unit);
if (user) {
+ bool opening, service_job = !!user->service_job;

- if (streq_ptr(path, user->service_job)) {
- free(user->service_job);
- user->service_job = NULL;
- }
-
- if (streq_ptr(path, user->slice_job)) {
- free(user->slice_job);
- user->slice_job = NULL;
- }
+ /* Set to false if the user service was created */
+ if (streq_ptr(path, user->service_job))
+ service_job = false;

LIST_FOREACH(sessions_by_user, session, user->sessions) {
if (!session->started)
continue;

+ /* If the session scope and the user service are
+ * still in the creation process this will be set
+ * to true, otherwise it will be false */
+ opening = service_job || !!session->scope_job;
+
if (streq(result, "done"))
- session_send_create_reply(session, NULL);
+ session_send_create_reply(session, NULL, opening);
else {
_cleanup_bus_error_free_ sd_bus_error e = SD_BUS_ERROR_NULL;

sd_bus_error_setf(&e, BUS_ERROR_JOB_FAILED, "Start job for unit %s failed with '%s'", unit, result);
- session_send_create_reply(session, &e);
+ session_send_create_reply(session, &e, opening);
}
}

+ if (!service_job) {
+ /* Clean this up after session_send_create_reply() */
+ free(user->service_job);
+ user->service_job = NULL;
+ }
+
+ if (streq_ptr(path, user->slice_job)) {
+ free(user->slice_job);
+ user->slice_job = NULL;
+ }
+
user_save(user);
user_add_to_gc_queue(user);
}
diff --git a/src/login/logind-session-dbus.c b/src/login/logind-session-dbus.c
index 54ad827..72bef4c 100644
--- a/src/login/logind-session-dbus.c
+++ b/src/login/logind-session-dbus.c
@@ -640,7 +640,7 @@ int session_send_lock_all(Manager *m, bool lock) {
return r;
}

-int session_send_create_reply(Session *s, sd_bus_error *error) {
+int session_send_create_reply(Session *s, sd_bus_error *error, bool opening) {
_cleanup_bus_message_unref_ sd_bus_message *c = NULL;
_cleanup_close_ int fifo_fd = -1;
_cleanup_free_ char *p = NULL;
@@ -649,12 +649,12 @@ int session_send_create_reply(Session *s, sd_bus_error *error) {

/* This is called after the session scope and the user service
* were successfully created, and finishes where
- * bus_manager_create_session() left off. */
+ * method_create_session() left off. */

if (!s->create_message)
return 0;

- if (!sd_bus_error_is_set(error) && (s->scope_job || s->user->service_job))
+ if (!sd_bus_error_is_set(error) && opening)
return 0;

c = s->create_message;
@@ -663,6 +663,8 @@ int session_send_create_reply(Session *s, sd_bus_error *error) {
if (error)
return sd_bus_reply_method_error(c, error);

+ /* At this stage the session scope and the user service
+ * were successfully created */
fifo_fd = session_create_fifo(s);
if (fifo_fd < 0)
return fifo_fd;
diff --git a/src/login/logind-session.h b/src/login/logind-session.h
index d724e20..0810def 100644
--- a/src/login/logind-session.h
+++ b/src/login/logind-session.h
@@ -150,7 +150,7 @@ int session_send_changed(Session *s, const char *properties, ...) _sentinel_;
int session_send_lock(Session *s, bool lock);
int session_send_lock_all(Manager *m, bool lock);

-int session_send_create_reply(Session *s, sd_bus_error *error);
+int session_send_create_reply(Session *s, sd_bus_error *error, bool opening);

const char* session_state_to_string(SessionState t) _const_;
SessionState session_state_from_string(const char *s) _pure_;
--
1.8.3.1
Djalal Harouni
2014-01-22 09:25:52 UTC
Permalink
To get the state of the session, the session_get_state() is used.
This function will check if the session->scope_job is set then it will
automatically return SESSION_OPENING. This is buggy in the context of
session closing:

At logout or D-Bus TerminateSession() fifo_fd is removed:

=> session_get_state() == SESSION_CLOSING (correct)

Then we have session_stop() which calls session_stop_scope(), this
will queue the scope job to be removed and set the session->scope_job

=> session_get_state() == SESSION_OPENING (incorrect)

After the scope job is removed the state will be again correct

=> session_get_state() == SESSION_CLOSING (correct)

To fix this and make sure that the state will always be SESSION_CLOSING
we add a flag that is used to differentiate between SESSION_OPENING and
SESSION_CLOSING when the 'session->scope_job' is set.

The 'scope_opening' flag will be set to true only during real
session opening in session_start_scope() which means that during session
closing it will be false.

And update session_get_state() to check if the 'scope_opening' is true
before returning the SESSION_OPENING, if it is not then SESSION_CLOSING
will be returned which is the correct behaviour.
---
src/login/logind-session-dbus.c | 3 +++
src/login/logind-session.c | 4 +++-
src/login/logind-session.h | 1 +
3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/login/logind-session-dbus.c b/src/login/logind-session-dbus.c
index 72bef4c..51832d6 100644
--- a/src/login/logind-session-dbus.c
+++ b/src/login/logind-session-dbus.c
@@ -669,6 +669,9 @@ int session_send_create_reply(Session *s, sd_bus_error *error, bool opening) {
if (fifo_fd < 0)
return fifo_fd;

+ /* Clean this up as soon as we have the fifo_fd */
+ s->scope_opening = false;
+
/* Update the session state file before we notify the client
* about the result. */
session_save(s);
diff --git a/src/login/logind-session.c b/src/login/logind-session.c
index ff0a7a4..ba32146 100644
--- a/src/login/logind-session.c
+++ b/src/login/logind-session.c
@@ -491,6 +491,8 @@ static int session_start_scope(Session *s) {

free(s->scope_job);
s->scope_job = job;
+ /* session scope is being created */
+ s->scope_opening = true;
}
}

@@ -875,7 +877,7 @@ void session_add_to_gc_queue(Session *s) {
SessionState session_get_state(Session *s) {
assert(s);

- if (s->scope_job)
+ if (s->scope_job && s->scope_opening)
return SESSION_OPENING;

if (s->fifo_fd < 0)
diff --git a/src/login/logind-session.h b/src/login/logind-session.h
index 0810def..8551cee 100644
--- a/src/login/logind-session.h
+++ b/src/login/logind-session.h
@@ -108,6 +108,7 @@ struct Session {

bool in_gc_queue:1;
bool started:1;
+ bool scope_opening:1; /* session scope is being created */

sd_bus_message *create_message;
--
1.8.3.1
Djalal Harouni
2014-01-22 09:25:53 UTC
Permalink
To get the state of the user, the user_get_state() is used.
This function will check if the user->slice_job or the user->service_job
are set then it will automatically return USER_OPENING. This is buggy
in the context of user closing:

At logout or D-Bus TerminateUser() calls user_stop()
user_stop()
=> session_stop() on sessions
=> session_stop_scope() on sessions

=> user_get_state() == USER_CLOSING or USER_ACTIVE or USER_ONLINE
(incorrect)

(depends on session closing execution and if we have finished the
session scope jobs or not, if all the scopes are being queued
then their session->scope_job will be set which means all
sessions are in the OPENING_STATE, so user state will be
USER_ONLINE).

The previous patch improves session_get_state() logic, and if
scopes are being queued then sessions are in SESSION_CLOSING state
which makes user_get_state() return USER_CLOSING.

At user_stop()
user_stop_slice() queues the slice and sets user->slice_job
user_stop_service() queue the service and sets user->service_job

=> user_get_state() == USER_OPENING (incorrect)

After the slice and service jobs are removed the state will be again
correct.

=> user_get_state() == USER_CLOSING (correct)

To fix this and make sure that the state will always be USER_CLOSING
we add a flag that is used to differentiate between USER_OPENING and
USER_CLOSING when 'slice_job' and 'service_job' are set.

The 'slice_opening' flag for 'user->slice_job' and 'service_opening'
for 'user->service_job' are set to true only during real user opening.
During user closing they are already in the false state.

Update user_get_state() to check if they are set if so this is
USER_OPENING, otherwise we are in the USER_CLOSING.
---
src/login/logind-dbus.c | 2 ++
src/login/logind-user.c | 5 ++++-
src/login/logind-user.h | 2 ++
3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/login/logind-dbus.c b/src/login/logind-dbus.c
index 0123a34..a7452b1 100644
--- a/src/login/logind-dbus.c
+++ b/src/login/logind-dbus.c
@@ -2000,11 +2000,13 @@ int match_job_removed(sd_bus *bus, sd_bus_message *message, void *userdata, sd_b
/* Clean this up after session_send_create_reply() */
free(user->service_job);
user->service_job = NULL;
+ user->service_opening = false;
}

if (streq_ptr(path, user->slice_job)) {
free(user->slice_job);
user->slice_job = NULL;
+ user->slice_opening = false;
}

user_save(user);
diff --git a/src/login/logind-user.c b/src/login/logind-user.c
index 0e46560..200763d 100644
--- a/src/login/logind-user.c
+++ b/src/login/logind-user.c
@@ -352,6 +352,7 @@ static int user_start_slice(User *u) {

free(u->slice_job);
u->slice_job = job;
+ u->slice_opening = true;
}
}

@@ -385,6 +386,7 @@ static int user_start_service(User *u) {

free(u->service_job);
u->service_job = job;
+ u->service_opening = true;
}
}

@@ -637,7 +639,8 @@ UserState user_get_state(User *u) {

assert(u);

- if (u->slice_job || u->service_job)
+ if ((u->slice_job && u->slice_opening) ||
+ (u->service_job && u->service_opening))
return USER_OPENING;

LIST_FOREACH(sessions_by_user, i, u->sessions) {
diff --git a/src/login/logind-user.h b/src/login/logind-user.h
index 0062880..ac43361 100644
--- a/src/login/logind-user.h
+++ b/src/login/logind-user.h
@@ -61,6 +61,8 @@ struct User {

bool in_gc_queue:1;
bool started:1;
+ bool service_opening:1; /* User service is being created */
+ bool slice_opening:1; /* User slice is being created */

LIST_HEAD(Session, sessions);
LIST_FIELDS(User, gc_queue);
--
1.8.3.1
Colin Guthrie
2014-01-22 09:51:05 UTC
Permalink
Post by Djalal Harouni
Post by Djalal Harouni
Post by Zbigniew Jędrzejewski-Szmek
Indeed, in a container (without your patches), sessions remain in
"closing" state. But with your patches, systemd --user instance is
started and killed immediately during login. Not good either :)
Ok, ouch!
It seems that the systemd --user instance is killed without my patches!
Systemd git will confirm it.
Ok, after some thoughts and source code analysis, I came to the
conclusion that in order to correctly clean session and user state
a) Improve session_check_gc() and user_check_gc() by using the session
and user states. This implies b)
b) Make sure that the state of the session and the user is always
correctly set.
FWIW, I have some backported patches in Mageia that are not (yet) in the
fedora package. As you mentioned Fedora earlier I figure that's what you
are using:

Namely a partial backport of cc377381

Patch is here:

http://svnweb.mageia.org/packages/cauldron/systemd/current/SOURCES/0511-logind-Partial-backport-of-cc377381.patch?revision=566087&view=markup&pathrev=566087

It might be useful to apply to Fedora too. EDIT: Perhaps not, see below.
Post by Djalal Harouni
I'll discuss some notes about a) and follow with patches that will try
to honore b).
I'm sending this to have some feedback before spending more time on it,
ahh yes debugging this beast is hard!
Please note that I did not take a look at seats logic, only sessions and
users using getty and ssh logins.
Please feel free to correct me!
a) Improve session_check_gc() and user_check_gc()
* user_check_gc() has an "if (u->sessions)" check! this should be
updated to check if there are sessions that are not in the "closing"
state.
If *all* the sessions are in the "closing" state then check each
screen,...
If the session is closing, it has to, by definition, still have
processes. If not, then the session wouldn't be closing, but actually
fully closed and disappeared...

That said, it's perhaps this check here that actually *prevents* this
final state from being reached due to GC not being run! So this could
fix the chicken+egg problem (I don't know the code, so can only really
talk in generalities!)
Post by Djalal Harouni
If so then abort the user_check_gc() or depend on a per-session
function that will perhaps do the "kill-session-processes".
* session_check_gc() should check if the session is in the "closing"
state and if there are still running processes in the background, then
decide to abort or force the "kill"session-processes"
"if (s->scope && manager_unit_is_active(s->manager, s->scope))"
Since the scope will always be around, the same was applied in the
commit 63966da86 for user_check_gc(), do not check the user manager!
Interesting. It seems that 63966da86 suggests my patch above shouldn't
be needed and in older code, the old "if (u->slice_job ||
u->service_job)" check should just be dropped.
Post by Djalal Harouni
b) Make sure that the state of the session and the user is always
correctly set. Not to mention that some logind clients depend on it.
So to make a) we must honor that b) is stable, and after an extensive
source code analysis, it seems that the state of users and sessions is
* It seems that in systemd v204 after a logout from a session where
systemd 204
Active = 1 State = active
State = closing ACTIVE_SESSIONS = X
Active = 0 State = closing
State = closing ACTIVE_SESSIONS =
Not to mention that for systemd git, the session and user state files
will show the same even if there are no background processes and they
will not be cleaned (using getty or ssh logins) which is is the object
of this thread!
This is a bug, not fixed yet.
So basically the problem is that the gc is simply not triggered often
enough to tidy up these old "closing" sessions with no processes left?
Post by Djalal Harouni
* To track the state of sessions the fifo_fd should be used since the
event manager will call session_dispatch_fifo() to remove it on eof.
Using the session_is_active() *before* the fifo_fd check is not stable
since it will return true before the fifo_fd was even created! and
also before the session scope and user serivce are created, the
session->seat->active will be set earlier before the fifo_fd creation
and will be the last one to be unset after fifo_fd removal.
And currently this affects src/login/logind-user.c:user_get_state()
logic since it checks session_is_active() before calling
session_get_state() which performs the fifo_fd check before the
session_is_active()
So user_get_state() and session_get_state() results might differ!
This is a bug, not fixed yet, user_get_state() should count on
session_get_state().
* It seems there are other problems with the session and user states,
different variables are used to check the state and later these
variables are updated at different places!
The following patches will describe the problem and try to make the
state more stable.
The design was do not add extra flags to 'User' and 'Session' structs
unless this is the only way. The logic is already there, and it needs
a littel synchronization.
I'll let someone else comment on these last points. They seem sensible,
but I'm not familiar with the code here so can't really make educated
comments without a bit more research and am sadly a bit too busy right now.

Cheers!

Col
--
Colin Guthrie
gmane(at)colin.guthr.ie
http://colin.guthr.ie/

Day Job:
Tribalogic Limited http://www.tribalogic.net/
Open Source:
Mageia Contributor http://www.mageia.org/
PulseAudio Hacker http://www.pulseaudio.org/
Trac Hacker http://trac.edgewall.org/
Djalal Harouni
2014-01-23 16:44:55 UTC
Permalink
Post by Colin Guthrie
Post by Djalal Harouni
Post by Djalal Harouni
Post by Zbigniew Jędrzejewski-Szmek
Indeed, in a container (without your patches), sessions remain in
"closing" state. But with your patches, systemd --user instance is
started and killed immediately during login. Not good either :)
Ok, ouch!
It seems that the systemd --user instance is killed without my patches!
Systemd git will confirm it.
Ok, after some thoughts and source code analysis, I came to the
conclusion that in order to correctly clean session and user state
a) Improve session_check_gc() and user_check_gc() by using the session
and user states. This implies b)
b) Make sure that the state of the session and the user is always
correctly set.
FWIW, I have some backported patches in Mageia that are not (yet) in the
fedora package. As you mentioned Fedora earlier I figure that's what you
Yes I'm doing tests with Fedora 20 systemd and systemd from git.
Post by Colin Guthrie
Namely a partial backport of cc377381
http://svnweb.mageia.org/packages/cauldron/systemd/current/SOURCES/0511-logind-Partial-backport-of-cc377381.patch?revision=566087&view=markup&pathrev=566087
It might be useful to apply to Fedora too. EDIT: Perhaps not, see below.
Ok, yes that patch is needed in case the gc is being run while the
service and slice jobs are in the starting or stoping process:

do not garbage-collect while we are in the middel of:

user_start() => user_start_slice() && user_start_service()

and

user_stop() => user_stop_slice() && user_stop_service()

Before and after the gc should run, this for sure prevents races!

Note: after the jobs are finished they will put the user in the gc queue
which will trigger the gc again.

Ok (currently I'm only reporting to systemd upstream).



Ohh I just remembered that Lennart added some logic to wait for the
service to finish startup before notifying the client that he can make
logins, this is perhaps not related but it should interest you:

commit dd9b67aa3e94

http://cgit.freedesktop.org/systemd/systemd/commit/?id=dd9b67aa3e9476b3a4b3e231006eea6d108c841f
Post by Colin Guthrie
Post by Djalal Harouni
I'll discuss some notes about a) and follow with patches that will try
to honore b).
I'm sending this to have some feedback before spending more time on it,
ahh yes debugging this beast is hard!
Please note that I did not take a look at seats logic, only sessions and
users using getty and ssh logins.
Please feel free to correct me!
a) Improve session_check_gc() and user_check_gc()
* user_check_gc() has an "if (u->sessions)" check! this should be
updated to check if there are sessions that are not in the "closing"
state.
If *all* the sessions are in the "closing" state then check each
screen,...
If the session is closing, it has to, by definition, still have
processes. If not, then the session wouldn't be closing, but actually
Of course.
Post by Colin Guthrie
fully closed and disappeared...
Yes that's precisely where the bug is, currently fully closed sessions
are not removed they will not disappear, all processes have been closed
but sessions are still there.

Testing a systemd git will show that getty or ssh sessions will persist
after logouts unless you issue a D-Bus TerminateSession() or
TerminateUser() which is probably the case of display managers

or you will have to manuall do:$ loginctl terminate-{session|user} ...


I'm trying to trigger the gc in a sane manner in pam_systemd, but first
the patches I sent will close various session and user states races.

And yeh I got the concept of how to deal with sessions that still have
processes running in a background. Perhaps add the logind.conf
"KillUserProcesses" check in the gc, if not set then we prevent the gc
from collecting sessions.

Care must be taken to honor "KillUserProcesses" definition of the
logind.conf man page.
Post by Colin Guthrie
That said, it's perhaps this check here that actually *prevents* this
final state from being reached due to GC not being run! So this could
fix the chicken+egg problem (I don't know the code, so can only really
talk in generalities!)
Yes, you are speaking about the "if (u->sessions)" in the
user_check_gc(), yes sure it prevents some cases, just think about
sessions that are in the "closing" state without any process in the
background they should be put in the gc, but this check will prevent
this *and* this is also related to sessions that are not put in the gc
cause their scope will always be running and will always be around:

src/login/logind-session.c:session_check_gc()

if (s->scope && manager_unit_is_active(s->manager, s->scope))


So in order to remove this check we need to make the session state more
stable, fix what I've described in my patches and count on it to perform
the checks in the session_check_gc() function.

And we can't remove the "if (u->sessions)" of the user_check_gc(), it
can be updated to check if all the sessions are in the closing states.

My conclusion is that we need first to make the state of sessions/users
more stable, I've sent patches to try to achieve that.
Post by Colin Guthrie
Post by Djalal Harouni
If so then abort the user_check_gc() or depend on a per-session
function that will perhaps do the "kill-session-processes".
* session_check_gc() should check if the session is in the "closing"
state and if there are still running processes in the background, then
decide to abort or force the "kill"session-processes"
"if (s->scope && manager_unit_is_active(s->manager, s->scope))"
Since the scope will always be around, the same was applied in the
commit 63966da86 for user_check_gc(), do not check the user manager!
Interesting. It seems that 63966da86 suggests my patch above shouldn't
be needed and in older code, the old "if (u->slice_job ||
u->service_job)" check should just be dropped.
Hmm I think your patch is needed here since it will check only during
*creation* and *closing* of the slice and service to prevent races during
that time (create service and slice and close slice and service) with the gc!

After the creation and users logs in, that check in your patch will always
be false, it will be a nop, since service_job will be NULL.

Please check:
http://cgit.freedesktop.org/systemd/systemd/tree/src/login/logind-dbus.c#n1966

service_job and slice_job are set to NULL after the jobs have finished.
Post by Colin Guthrie
Post by Djalal Harouni
b) Make sure that the state of the session and the user is always
correctly set. Not to mention that some logind clients depend on it.
So to make a) we must honor that b) is stable, and after an extensive
source code analysis, it seems that the state of users and sessions is
* It seems that in systemd v204 after a logout from a session where
systemd 204
Active = 1 State = active
State = closing ACTIVE_SESSIONS = X
Active = 0 State = closing
State = closing ACTIVE_SESSIONS =
Not to mention that for systemd git, the session and user state files
will show the same even if there are no background processes and they
will not be cleaned (using getty or ssh logins) which is is the object
of this thread!
This is a bug, not fixed yet.
So basically the problem is that the gc is simply not triggered often
enough to tidy up these old "closing" sessions with no processes left?
Yes!

And if I try to trigger the gc, I always hit situations where sessions
are still active, or during startups which makes logind remove their
state files!

So I started by:
1) Fix session and user state
2) Improve the gc checks to use session and user states
3) Try to push sessions and users in the gc so it can be triggered!
Post by Colin Guthrie
Post by Djalal Harouni
* To track the state of sessions the fifo_fd should be used since the
event manager will call session_dispatch_fifo() to remove it on eof.
Using the session_is_active() *before* the fifo_fd check is not stable
since it will return true before the fifo_fd was even created! and
also before the session scope and user serivce are created, the
session->seat->active will be set earlier before the fifo_fd creation
and will be the last one to be unset after fifo_fd removal.
And currently this affects src/login/logind-user.c:user_get_state()
logic since it checks session_is_active() before calling
session_get_state() which performs the fifo_fd check before the
session_is_active()
So user_get_state() and session_get_state() results might differ!
This is a bug, not fixed yet, user_get_state() should count on
session_get_state().
* It seems there are other problems with the session and user states,
different variables are used to check the state and later these
variables are updated at different places!
The following patches will describe the problem and try to make the
state more stable.
The design was do not add extra flags to 'User' and 'Session' structs
unless this is the only way. The logic is already there, and it needs
a littel synchronization.
I'll let someone else comment on these last points. They seem sensible,
but I'm not familiar with the code here so can't really make educated
comments without a bit more research and am sadly a bit too busy right now.
Yes it took me time to debug this, hope Lennart will have some time and
check it!
Post by Colin Guthrie
Cheers!
Col
Thanks Colin!
--
Djalal Harouni
http://opendz.org
Djalal Harouni
2014-01-30 10:28:42 UTC
Permalink
Hi list,
Post by Djalal Harouni
Post by Djalal Harouni
Post by Zbigniew Jędrzejewski-Szmek
Indeed, in a container (without your patches), sessions remain in
"closing" state. But with your patches, systemd --user instance is
started and killed immediately during login. Not good either :)
Ok, ouch!
It seems that the systemd --user instance is killed without my patches!
Systemd git will confirm it.
Ok, after some thoughts and source code analysis, I came to the
conclusion that in order to correctly clean session and user state
a) Improve session_check_gc() and user_check_gc() by using the session
and user states. This implies b)
b) Make sure that the state of the session and the user is always
correctly set.
Just to let you know that I've a v2 series, a more complete one.

I'll clean it and send it, thanks!
--
Djalal Harouni
http://opendz.org
Continue reading on narkive:
Loading...