Discussion:
Default on failure dependencies
(too old to reply)
Baudouin Feildel
2018-09-15 20:32:45 UTC
Permalink
Hello there,

Few weeks ago I opened the following issue in systemd repository:
https://github.com/systemd/systemd/issues/9373. Seeing no traction from
existing systemd developer, I decided to give it a try and started
working on the feature I wanted in the following fork:
https://github.com/AMDG2/systemd/tree/wip-feature-default-on-failure.

I have the following problems/questions:

- I don't know how to load properly the default on failure dependencies
- I don't know which kind of tests I should write, nor how to write them

About the loading of the default on failure dependencies, I am not sure
I should reproduce what is done when the unit is loading its own
dependencies. Will pid1 be able to load units when parsing the manager
config file?

I could also just keep the list as it is (just a list of string) and
lazy load the unit on the first time they are needed, but is it worth it?

What is your suggestion on the best time and way to load the default on
failure dependency list?

About the test i didn't find documentation on how to write a new test,
which part of a PR should be tested. Does such documentation exists?

Thank you for your help!

Kind regards
Baudouin FEILDEL
Baudouin Feildel
2018-10-01 12:12:21 UTC
Permalink
15 septembre 2018 22:32 "Baudouin Feildel" <***@feildel.fr> a écrit:

> Hello there,
>
> Few weeks ago I opened the following issue in systemd repository:
> https://github.com/systemd/systemd/issues/9373. Seeing no traction from
> existing systemd developer, I decided to give it a try and started
> working on the feature I wanted in the following fork:
> https://github.com/AMDG2/systemd/tree/wip-feature-default-on-failure.
>
> I have the following problems/questions:
>
> - I don't know how to load properly the default on failure dependencies
> - I don't know which kind of tests I should write, nor how to write them
>
> About the loading of the default on failure dependencies, I am not sure
> I should reproduce what is done when the unit is loading its own
> dependencies. Will pid1 be able to load units when parsing the manager
> config file?
>
> I could also just keep the list as it is (just a list of string) and
> lazy load the unit on the first time they are needed, but is it worth it?
>
> What is your suggestion on the best time and way to load the default on
> failure dependency list?
>
> About the test i didn't find documentation on how to write a new test,
> which part of a PR should be tested. Does such documentation exists?
>
> Thank you for your help!
>
> Kind regards
> Baudouin FEILDEL


Hello,

Any thinking on this topic ?

Regards
Baudouin
Lennart Poettering
2018-10-05 17:52:53 UTC
Permalink
On Sa, 15.09.18 22:32, Baudouin Feildel (***@feildel.fr) wrote:

(Sorry for not responding more timely, I have been travelling and am
still catching up with all the email)

> Hello there,
>
> Few weeks ago I opened the following issue in systemd repository:
> https://github.com/systemd/systemd/issues/9373. Seeing no traction from
> existing systemd developer,

Hmm, so, I figure we should have a discussion whether this really is
desirable first, because I am not too sure about that I must say.

So far we are very conservative when it comes to options that are
supposed to affect all units at once, as that tends to create various
problems that are not obvious to solve. For example, if every service
gets this kind of dep, what about the units that these deps are
supposed to start, do you create a cyclic dep there?

Moreover, I figure the services pulled in like this are usually going
to be late boot processes, but this means failures during early boot
would result in a large number of queued services that need to be
dispatched during late boot.

Moreover what happens if a service fails multiple times during early
boot (for example because Restart= is used)? What happens with these
failures, are the earlier ones dropped?

Also, what happens for services that fail during shutdown, would these
also pull in new units? But if they do, then this would result in
cyclic operations if the service to run is a regular service,
i.e. needs all basic system stuff up: we are shutting down, but in
order to process evreything that happened then we need to start
services that reverse the shut down process as they require certain
stuff to be up...

In general, there's the "philosophical incompatibility": stuff that
is supposed to process failures in the service dependency logic,
should probably not be part of the service dependency logic itself.

This all makes me wonder whether a different approach to all of this
wouldn't be better: maybe we should just consider this a logging
problem: let's make sure we log a recognizable log message (i.e. a
structured journal message with a well-defined MESSAGE_ID=) whenever a
service fails. With that in place it should be relatively easy to
write a system service that can run during regular system uptime and
can look in the journal for all failures, including getting live
notifications when something happens. Moreover, this resolves the
problems during early and late boot: the "cursor" logic of the journal
allows such a service to know exactly which failures it already
processed and which ones are still left, and it can process all
failures that took place while it was not running.

Does that make sense?

Lennart

--
Lennart Poettering, Red Hat
Jérémy Rosen
2018-10-08 07:58:36 UTC
Permalink
> This all makes me wonder whether a different approach to all of this
> wouldn't be better: maybe we should just consider this a logging
> problem: let's make sure we log a recognizable log message (i.e. a
> structured journal message with a well-defined MESSAGE_ID=) whenever a
> service fails. With that in place it should be relatively easy to
> write a system service that can run during regular system uptime and
> can look in the journal for all failures, including getting live
> notifications when something happens. Moreover, this resolves the
> problems during early and late boot: the "cursor" logic of the journal
> allows such a service to know exactly which failures it already
> processed and which ones are still left, and it can process all
> failures that took place while it was not running.
>
> Does that make sense?

Could this be generalized to "a structured message whenever a unit
changes state" or would that be too verbose ?

I'm asking because that would be very usefull for post-mortem
diagnostics, statup timings and that sort of stuff...
> Lennart
>

--
SMILE <http://www.smile.eu/>

20 rue des Jardins
92600 AsniÚres-sur-Seine


*Jérémy ROSEN*
Architecte technique
Responsable de l'expertise Smile-ECS

email ***@smile.fr <mailto:***@smile.fr>
phone +33141402967
url http://www.smile.eu

Twitter <https://twitter.com/GroupeSmile> Facebook
<https://www.facebook.com/smileopensource> LinkedIn
<https://www.linkedin.com/company/smile> Github
<https://github.com/Smile-SA>


Découvrez l’univers Smile, rendez-vous sur smile.eu
<http://smile.eu/?utm_source=signature&utm_medium=email&utm_campaign=signature>

eco Pour la planÚte, n'imprimez ce mail que si c'est nécessaire
Lennart Poettering
2018-10-08 08:08:37 UTC
Permalink
On Mo, 08.10.18 09:58, Jérémy Rosen (***@smile.fr) wrote:

>
> > This all makes me wonder whether a different approach to all of this
> > wouldn't be better: maybe we should just consider this a logging
> > problem: let's make sure we log a recognizable log message (i.e. a
> > structured journal message with a well-defined MESSAGE_ID=) whenever a
> > service fails. With that in place it should be relatively easy to
> > write a system service that can run during regular system uptime and
> > can look in the journal for all failures, including getting live
> > notifications when something happens. Moreover, this resolves the
> > problems during early and late boot: the "cursor" logic of the journal
> > allows such a service to know exactly which failures it already
> > processed and which ones are still left, and it can process all
> > failures that took place while it was not running.
> >
> > Does that make sense?
>
> Could this be generalized to "a structured message whenever a unit changes
> state" or would that be too verbose ?

We have that already but only in debug logging mode (systemd-analyze
log-level debug). It's a bit too much noise to turn on by default otherwise...

Lennart

--
Lennart Poettering, Red Hat
Krunal Patel
2018-10-08 09:16:31 UTC
Permalink
Hi,

I just found root cause for this sevice not to start. It was simple parent folder permission issue. In this case it was /opt/apps/sdc which was set to root:root. I changed it to sdc:sdc and it allowed actual sdc_home inside /opt/SP/apps to start service without error code. Can you suggest why it was not the issue for one server where other 5 servers i had to change ownership. Systemd version was same in all 6 servers. Only difference i found was switched-root .Just fyi i am very beginner in for rhel 7 and so systemd troubleshoot.

Thanks,

Krunal.

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: systemd-devel <systemd-devel-***@lists.freedesktop.org> on behalf of Lennart Poettering <***@poettering.net>
Sent: Monday, October 8, 2018 1:38:37 PM
To: Jérémy Rosen
Cc: systemd-***@lists.freedesktop.org
Subject: Re: [systemd-devel] Default on failure dependencies

On Mo, 08.10.18 09:58, Jérémy Rosen (***@smile.fr) wrote:

>
> > This all makes me wonder whether a different approach to all of this
> > wouldn't be better: maybe we should just consider this a logging
> > problem: let's make sure we log a recognizable log message (i.e. a
> > structured journal message with a well-defined MESSAGE_ID=) whenever a
> > service fails. With that in place it should be relatively easy to
> > write a system service that can run during regular system uptime and
> > can look in the journal for all failures, including getting live
> > notifications when something happens. Moreover, this resolves the
> > problems during early and late boot: the "cursor" logic of the journal
> > allows such a service to know exactly which failures it already
> > processed and which ones are still left, and it can process all
> > failures that took place while it was not running.
> >
> > Does that make sense?
>
> Could this be generalized to "a structured message whenever a unit changes
> state" or would that be too verbose ?

We have that already but only in debug logging mode (systemd-analyze
log-level debug). It's a bit too much noise to turn on by default otherwise...

Lennart

--
Lennart Poettering, Red Hat
Baudouin Feildel
2018-10-09 07:40:25 UTC
Permalink
6 octobre 2018 14:22 "Lennart Poettering" <***@poettering.net> a écrit:

> On Sa, 15.09.18 22:32, Baudouin Feildel (***@feildel.fr) wrote:
>
> (Sorry for not responding more timely, I have been travelling and am
> still catching up with all the email)

No problem, we are all busy with tons of emails those days... Thank you
for taking time to understand the need and give a complete answer.

>> Hello there,
>>
>> Few weeks ago I opened the following issue in systemd repository:
>> https://github.com/systemd/systemd/issues/9373. Seeing no traction from
>> existing systemd developer,
>
> Hmm, so, I figure we should have a discussion whether this really is
> desirable first, because I am not too sure about that I must say.
>
> So far we are very conservative when it comes to options that are
> supposed to affect all units at once, as that tends to create various
> problems that are not obvious to solve. For example, if every service
> gets this kind of dep, what about the units that these deps are
> supposed to start, do you create a cyclic dep there?
>
> Moreover, I figure the services pulled in like this are usually going
> to be late boot processes, but this means failures during early boot
> would result in a large number of queued services that need to be
> dispatched during late boot.
>
> Moreover what happens if a service fails multiple times during early
> boot (for example because Restart= is used)? What happens with these
> failures, are the earlier ones dropped?
>
> Also, what happens for services that fail during shutdown, would these
> also pull in new units? But if they do, then this would result in
> cyclic operations if the service to run is a regular service,
> i.e. needs all basic system stuff up: we are shutting down, but in
> order to process evreything that happened then we need to start
> services that reverse the shut down process as they require certain
> stuff to be up...
>
> In general, there's the "philosophical incompatibility": stuff that
> is supposed to process failures in the service dependency logic,
> should probably not be part of the service dependency logic itself.

I completely agree with your analysis, I was worried about consequences I
cannot imagine. Now you show a lot a potential trouble and I agree that
if we can find another solution to the easy service monitoring problem that
would be better.

> This all makes me wonder whether a different approach to all of this
> wouldn't be better: maybe we should just consider this a logging
> problem: let's make sure we log a recognizable log message (i.e. a
> structured journal message with a well-defined MESSAGE_ID=) whenever a
> service fails. With that in place it should be relatively easy to
> write a system service that can run during regular system uptime and
> can look in the journal for all failures, including getting live
> notifications when something happens. Moreover, this resolves the
> problems during early and late boot: the "cursor" logic of the journal
> allows such a service to know exactly which failures it already
> processed and which ones are still left, and it can process all
> failures that took place while it was not running.
>
> Does that make sense?
>
> Lennart
>
> --
> Lennart Poettering, Red Hat

Your proposal make sense. I will try to have a proof of concept in November.
October is already full of work.

I was thinking about another solution, but I am not sure if we have the
tooling available in systemd for that. I was thinking that pid1 can fire
some kind of event available to any service. Then one could write a service
and subscribe to this event. Upon reception of the event the service could
do whatever they want. Does systemd have such event system? Maybe this could
be possible over D-Bus, but is this something sustainable?

Regards
Baudouin Feildel
Continue reading on narkive:
Loading...