Discussion:
Realtime scheduling with CONFIG_RT_GROUP_SCHED=y
Add Reply
Lars Kellogg-Stedman
2017-07-06 14:56:48 UTC
Reply
Permalink
Raw Message
I'm running on a kernel with CONFIG_RT_GROUP_SCHED=y. I understand that
this is counter to the recommendation in the README ("We recommend to turn
off Real-Time group scheduling in the kernel when using systemd...."), but
I don't have control over the kernel configuration.

On this system, it appears that starting "docker" (docker-ce-17.06.0.ce-1)
results in the creation of new cpu cgroups that for some reason apply to
systemd services. That is, after starting docker,
/sys/fs/cgroup/cpu/system.slice exists when previously it didn't.

Once this happens, a service that attempts to set realtime scheduling
(SCHED_RR) via sched_setscheduler() will fail, presumably because the
cgroup has no realtime budget in cpu.rt_runtime_us.

In older versions of systemd one could handle this using the directives
described in
https://www.freedesktop.org/wiki/Software/systemd/MyServiceCantGetRealtime/,
but unfortunately that document, despite being the number 1 search result
for pretty much anything involving "systemd" and "realtime", is obsolete
and those directives no longer exist.

Is there a way to make this work correctly with modern versions of
systemd? I've hacked around it for now by creating
/etc/systemd/system/myservice.service.d/realtime.conf that moves the
service back to the root cgroup and then uses chrt to set the scheduling
policy:

[Service]
ExecStartPost=/bin/cgclassify -g cpu:/ $MAINPID
ExecStartPost=/bin/chrt -r -p 99 $MAINPID

...and while that works, it seems really ugly. I've attempted to set
CPUSchedulingPolicy=rr in the unit, but that simply results in systemd
failing to start the service and logging "Failed at step SETSCHEDULER
spawning...".

Is there a better way of addressing this?
--
Lars Kellogg-Stedman <***@redhat.com>
Lennart Poettering
2017-07-10 08:26:21 UTC
Reply
Permalink
Raw Message
Post by Lars Kellogg-Stedman
I'm running on a kernel with CONFIG_RT_GROUP_SCHED=y. I understand that
this is counter to the recommendation in the README ("We recommend to turn
off Real-Time group scheduling in the kernel when using systemd...."), but
I don't have control over the kernel configuration.
On this system, it appears that starting "docker" (docker-ce-17.06.0.ce-1)
results in the creation of new cpu cgroups that for some reason apply to
systemd services. That is, after starting docker,
/sys/fs/cgroup/cpu/system.slice exists when previously it didn't.
Once this happens, a service that attempts to set realtime scheduling
(SCHED_RR) via sched_setscheduler() will fail, presumably because the
cgroup has no realtime budget in cpu.rt_runtime_us.
In older versions of systemd one could handle this using the directives
described in
https://www.freedesktop.org/wiki/Software/systemd/MyServiceCantGetRealtime/,
but unfortunately that document, despite being the number 1 search result
for pretty much anything involving "systemd" and "realtime", is obsolete
and those directives no longer exist.
Is there a way to make this work correctly with modern versions of
systemd? I've hacked around it for now by creating
/etc/systemd/system/myservice.service.d/realtime.conf that moves the
service back to the root cgroup and then uses chrt to set the scheduling
[Service]
ExecStartPost=/bin/cgclassify -g cpu:/ $MAINPID
ExecStartPost=/bin/chrt -r -p 99 $MAINPID
...and while that works, it seems really ugly. I've attempted to set
CPUSchedulingPolicy=rr in the unit, but that simply results in systemd
failing to start the service and logging "Failed at step SETSCHEDULER
spawning...".
Is there a better way of addressing this?
Hmm, by default, systemd should not be adding anything to the "cpu"
hierarchy, unless at least one service sets CPUShare=, CPUAccounting= or
related, or system-wide DefaultCPUAccounting= is set. There's
currently no nice tool unfortunately to track down why a cgroup was
created though...

Generally, RT group scheduling is not usable unless you explicitly
assign an RT budget to each cgroup that wants to have RT, and you
manually make sure you never hand out more RT budget than
possible. Because that's really nasty and no good defaults can be
picked for this mode we don't support it.

If you ignore this and try to make it work locally YMMV. What you
could do is drop in ExecStartPre= lines into the relevant services
that echo an RT budget into the relevant cgroup files in the "cpu"
hierarchy, possibly propagating these to the parent cgroups. To figure
out the right cgroup path to echo this into you'd have to query
/proc/self/cgroup...

Yeah, it's nasty, but at the moment a more automatic, and friendlier
exposure of the RT budget logic is not planned, as the kernel APIs are
just impossible to use with automatic management...

Lennart
--
Lennart Poettering, Red Hat
Lars Kellogg-Stedman
2017-07-12 15:41:55 UTC
Reply
Permalink
Raw Message
Post by Lars Kellogg-Stedman
In older versions of systemd one could handle this using the directives
described in https://www.freedesktop.org/wiki/Software/systemd/
MyServiceCantGetRealtime/, but unfortunately that document, despite being
the number 1 search result for pretty much anything involving "systemd" and
"realtime", is obsolete and those directives no longer exist.
Is there a way to make this work correctly with modern versions of
systemd? I've hacked around it for now by creating
/etc/systemd/system/myservice.service.d/realtime.conf that moves the
service back to the root cgroup and then uses chrt to set the scheduling
It looks like systemd sets up cgroups before calling ExecStartPre, which
means I can emulate the behavior of those obsolete directives by running:

ExecStartPre=/bin/sh -c 'echo 550000 >
/sys/fs/cgroup/cpu,cpuacct/cpu.rt_runtime_us'
ExecStartPre=/bin/sh -c 'echo 200000 >
/sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_us'
ExecStartPre=/bin/sh -c 'echo 200000 >
/sys/fs/cgroup/cpu,cpuacct/system.slice/myservice.service/cpu.rt_runtime_us'

In an environment where CONFIG_RT_GROUP_SCHED is set, is this the best way
of addressing the problem?
--
Lars Kellogg-Stedman <***@redhat.com>
Lennart Poettering
2017-07-17 08:14:20 UTC
Reply
Permalink
Raw Message
Post by Lars Kellogg-Stedman
Post by Lars Kellogg-Stedman
In older versions of systemd one could handle this using the directives
described in https://www.freedesktop.org/wiki/Software/systemd/
MyServiceCantGetRealtime/, but unfortunately that document, despite being
the number 1 search result for pretty much anything involving "systemd" and
"realtime", is obsolete and those directives no longer exist.
Is there a way to make this work correctly with modern versions of
systemd? I've hacked around it for now by creating
/etc/systemd/system/myservice.service.d/realtime.conf that moves the
service back to the root cgroup and then uses chrt to set the scheduling
It looks like systemd sets up cgroups before calling ExecStartPre, which
ExecStartPre=/bin/sh -c 'echo 550000 >
/sys/fs/cgroup/cpu,cpuacct/cpu.rt_runtime_us'
ExecStartPre=/bin/sh -c 'echo 200000 >
/sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_us'
ExecStartPre=/bin/sh -c 'echo 200000 >
/sys/fs/cgroup/cpu,cpuacct/system.slice/myservice.service/cpu.rt_runtime_us'
In an environment where CONFIG_RT_GROUP_SCHED is set, is this the best way
of addressing the problem?
Yeah, this would probably work, but you should really set
CPUAccounting=1 too, as an indirect way to ensure your unit appears in
the "cpu"/"cpuacct" cgroup hierarchy in the first place.

And I'd probably turn this into a proper shell script, that
dynamically reads the path from /proc/self/cgroup and then propagates
things up properly.

Lennart
--
Lennart Poettering, Red Hat
Lars Kellogg-Stedman
2017-07-20 14:42:42 UTC
Reply
Permalink
Raw Message
Post by Lennart Poettering
And I'd probably turn this into a proper shell script, that
dynamically reads the path from /proc/self/cgroup and then propagates
things up properly.
Lennart,

Thanks for the information. In case anyone comes across this thread and
wonders "what might that shell script look like?", the following seems to
work:

#!/bin/bash

desired_rt_runtime_us=$1
mygroup=${2:-$(awk -F: '$2 == "cpuacct,cpu" {print $3}' /proc/self/cgroup)}

[[ $desired_rt_runtime_us -gt 0 ]] || exit
[[ $mygroup ]] || exit
[[ $mygroup = / ]] && exit

echo "${0##*/}: setting cpu.rt_runtime_us for $mygroup" >&2

cgpath=
IFS=/ read -ra cgroups <<< "${mygroup:1}"
for cg in "${cgroups[@]}"; do
cgpath="${cgpath}/${cg}"
echo "${0##*/}: $desired_rt_runtime_us ->
/sys/fs/cgroup/cpu,cpuacct${cgpath}" >&2
echo "$desired_rt_runtime_us" >
/sys/fs/cgroup/cpu,cpuacct${cgpath}/cpu.rt_runtime_us
done
--
Lars Kellogg-Stedman <***@redhat.com>
Loading...