Discussion:
Setting Environment Configuration (Affinity) for Slices
(too old to reply)
Chris Bell
2015-10-19 15:58:44 UTC
Permalink
Hi all,

Is there a way to set an affinity for an entire slice? Say, for example,
I have system-webhosted.slice, but I only want the services running
within system-webhosted.slice to run on cores 5-8. I can set this
individually per service (systemd.exec man page), but it does not
indicate that I can do this for slices. Also, the systemd.slice man page
says it only accepts resource control directives, not environment config
directives. Is there any way I can set an environment config directive
for an entire slice? Or do I need to do it per-service?

Alternatively, is there a way (and this sounds way too hacky) to
hierarchically order a slice under a service? So, basically I can start
some dummy service with environment configs, and the slice will be a
child of that service and all units in that slice will inherit the
environment configs from the parent of the slice?

/
└─system.slice
└─dummy-slice-wrapper.service
└─system-webhosted.slice
├─httpd.service
└─postgresql.service

Where dummy-slice-wrapper.service has affinity set to 5-8 and has
system-webhosted.slice as a child. Then the services inside the slice
(httpd, postgre) also have their affinity locked to 5-8, without having
to specify it with an override for each service.

I've noticed that systemd-nspawn@<machine>.servce has a child
'system.slice' though I don't know if that setup can enforce what I'd
like it to.

Is there any way to do this with the current setup?

Thanks in advance!!

--Chris
Lennart Poettering
2015-10-19 16:54:33 UTC
Permalink
Post by Chris Bell
Hi all,
Is there a way to set an affinity for an entire slice?
No there is not. But it's definitely our intention to add this, and
expose the "cpuset" cgroup controller this way. Unfortunately the
"cpuset" controller of the kernel currently exposes really awful
behaviour, hence we have not exposed its functionality this way.

However, I just had a long chat with Tejun Heo about this, and we came
to the conclusion that's probably safe to expose a minimal subset of
cpuset now, and reuse the existing CPUAffinity= service setting for
that: right now, it only affects the main process of a service at fork
time (and all child processes forked off from that, recursively), by
using sched_setaffinity(). Our idea would be to propagate it into the
"cpuset.cpus" field too, so that the setting is first passed to
sched_setaffinity(), and then also written to the cpuset
hierarchy. This should be pretty safe, and allow us to make this
available in slices too. It would result in a slight change of
behaviour though, as making adjustments to cpuset would mean that
daemons cannot extend their affinity with sched_setaffinity() above
what was set with cpuset anymore. But I think this is OK.
Post by Chris Bell
Say, for example, I have system-webhosted.slice, but I only want the
services running within system-webhosted.slice to run on cores
5-8. I can set this individually per service (systemd.exec man
page), but it does not indicate that I can do this for slices. Also,
the systemd.slice man page says it only accepts resource control
directives, not environment config directives. Is there any way I
can set an environment config directive for an entire slice? Or do I
need to do it per-service?
The latter. Slices are really about resource control, and an env var
really isn't a resource control knob.
Post by Chris Bell
Alternatively, is there a way (and this sounds way too hacky) to
hierarchically order a slice under a service? So, basically I can start some
dummy service with environment configs, and the slice will be a child of
that service and all units in that slice will inherit the environment
configs from the parent of the slice?
Nope. Slices are the inner nodes of the resource control tree, and
services/scopes are the leaves. That's how they are defined.
Post by Chris Bell
though I don't know if that setup can enforce what I'd like it to.
Well, thats because nspawn is a delegation unit that encapsulates a
completely new cgroup hierarchy of its own, managed by a new systemd
instance.
Post by Chris Bell
Is there any way to do this with the current setup?
I am not sure I understand what you want to to do with the env vars
precisely? what kind of env vars do you intend to set?

Lennart
--
Lennart Poettering, Red Hat
Chris Bell
2015-10-19 17:16:04 UTC
Permalink
Post by Lennart Poettering
Post by Chris Bell
Is there a way to set an affinity for an entire slice?
No there is not. But it's definitely our intention to add this, and
expose the "cpuset" cgroup controller this way. Unfortunately the
"cpuset" controller of the kernel currently exposes really awful
behaviour, hence we have not exposed its functionality this way.
However, I just had a long chat with Tejun Heo about this, and we came
to the conclusion that's probably safe to expose a minimal subset of
cpuset now, and reuse the existing CPUAffinity= service setting for
that: right now, it only affects the main process of a service at fork
time (and all child processes forked off from that, recursively), by
using sched_setaffinity(). Our idea would be to propagate it into the
"cpuset.cpus" field too, so that the setting is first passed to
sched_setaffinity(), and then also written to the cpuset
hierarchy. This should be pretty safe, and allow us to make this
available in slices too. It would result in a slight change of
behaviour though, as making adjustments to cpuset would mean that
daemons cannot extend their affinity with sched_setaffinity() above
what was set with cpuset anymore. But I think this is OK.
So, there's a good chance for a subset cpuset-related options at the
slice level relatively soon, but full capabilities will have to wait
until kernel cgroups are improved?
Post by Lennart Poettering
Post by Chris Bell
Is there any way I >> can set an environment config directive for
an entire slice? Or do I need to do it per-service?
The latter. Slices are really about resource control, and an env var
really isn't a resource control knob.
Post by Chris Bell
Alternatively, is there a way (and this sounds way too hacky) to
hierarchically order a slice under a service?
Nope. Slices are the inner nodes of the resource control tree, and
services/scopes are the leaves. That's how they are defined.
Post by Chris Bell
'system.slice'
though I don't know if that setup can enforce what I'd like it to.
Well, thats because nspawn is a delegation unit that encapsulates a
completely new cgroup hierarchy of its own, managed by a new systemd
instance.
Aha, that makes sense.
Post by Lennart Poettering
I am not sure I understand what you want to to do with the env vars
precisely? what kind of env vars do you intend to set?
Basically, I have a number of services that may or may not be running at
any given time, based on the whims of the users. All of these services
are hosted services of some type, and occasionally they have been known
to eat all CPU cores, lagging everything else. I'm working on setting up
CPU shares and other resource controls to try and keep resources
available for immediate execution of system processes, services, etc.
I'd prefer to do this with affinity; assign critical processes to CPUs
0-1, and the rest limited to subsets of the available remaining CPUs. I
was hoping I could do this in one run by saying "everything in this
slice can must run with this affinity." I can do it on a per-service
basis, but with a large number of services it gets tedious.

I also think it would be convenient in some cases to be able to use the
'Nice' and 'Private{Network,Devices,etc}' directives apply to an entire
slice. That way I can use slices to control, manage, and group related
services. (Example: I'd like to manage postfix and dovecot together in
system-mail.slice. I'd like to be able to use the slice to set exec
options for both services. Then if I add another service to
system-mail.slice, it would also automatically be constrained by the
limits set in system-mail.slice.)

Basically, I think this would be a useful hierarchy level for more
coarse-grained service group management and configuration.

--Chris
Chris Bell
2015-10-19 17:19:42 UTC
Permalink
Post by Chris Bell
Basically, I have a number of services that may or may not be running
at any given time, based on the whims of the users. All of these
services are hosted services of some type, and occasionally they have
been known to eat all CPU cores, lagging everything else. I'm working
on setting up CPU shares and other resource controls to try and keep
resources available for immediate execution of system processes,
services, etc. I'd prefer to do this with affinity; assign critical
processes to CPUs 0-1, and the rest limited to subsets of the
available remaining CPUs. I was hoping I could do this in one run by
saying "everything in this slice can must run with this affinity." I
can do it on a per-service basis, but with a large number of services
it gets tedious.
I also think it would be convenient in some cases to be able to use
the 'Nice' and 'Private{Network,Devices,etc}' directives apply to an
entire slice. That way I can use slices to control, manage, and group
related services. (Example: I'd like to manage postfix and dovecot
together in system-mail.slice. I'd like to be able to use the slice to
set exec options for both services. Then if I add another service to
system-mail.slice, it would also automatically be constrained by the
limits set in system-mail.slice.)
Basically, I think this would be a useful hierarchy level for more
coarse-grained service group management and configuration.
Wait...... Is what I'm looking for here a container?
Lennart Poettering
2015-10-19 17:24:51 UTC
Permalink
Post by Lennart Poettering
However, I just had a long chat with Tejun Heo about this, and we came
to the conclusion that's probably safe to expose a minimal subset of
cpuset now, and reuse the existing CPUAffinity= service setting for
that: right now, it only affects the main process of a service at fork
time (and all child processes forked off from that, recursively), by
using sched_setaffinity(). Our idea would be to propagate it into the
"cpuset.cpus" field too, so that the setting is first passed to
sched_setaffinity(), and then also written to the cpuset
hierarchy. This should be pretty safe, and allow us to make this
available in slices too. It would result in a slight change of
behaviour though, as making adjustments to cpuset would mean that
daemons cannot extend their affinity with sched_setaffinity() above
what was set with cpuset anymore. But I think this is OK.
So, there's a good chance for a subset cpuset-related options at the slice
level relatively soon, but full capabilities will have to wait until kernel
cgroups are improved?
Well, I am not sure what "full capabilities" really mean here. Much of
the cpuset functionality appears to be little else than just help for
writing shell scripts. That part is certainly nothign we want to
expose.

The other part is NUMA memory node stuff, but supposedly that's stuff
that should be dealt with automatically by the kernel, and not need
user configuration. Hence it's nothign we really want to expose right
anytime soon.
Post by Lennart Poettering
I am not sure I understand what you want to to do with the env vars
precisely? what kind of env vars do you intend to set?
Basically, I have a number of services that may or may not be running at any
given time, based on the whims of the users. All of these services are
hosted services of some type, and occasionally they have been known to eat
all CPU cores, lagging everything else. I'm working on setting up CPU shares
and other resource controls to try and keep resources available for
immediate execution of system processes, services, etc. I'd prefer to do
this with affinity; assign critical processes to CPUs 0-1, and the rest
limited to subsets of the available remaining CPUs. I was hoping I could do
this in one run by saying "everything in this slice can must run with this
affinity." I can do it on a per-service basis, but with a large number of
services it gets tedious.
Well, sure, exposing the cpuset knobs as discussed above should make
this easy, and that's precisely what slices have been introduced for.

I was mostly wondering about the env var issue you raised...
I also think it would be convenient in some cases to be able to use the
'Nice' and 'Private{Network,Devices,etc}' directives apply to an entire
slice. That way I can use slices to control, manage, and group related
services. (Example: I'd like to manage postfix and dovecot together in
system-mail.slice. I'd like to be able to use the slice to set exec options
for both services. Then if I add another service to system-mail.slice, it
would also automatically be constrained by the limits set in
system-mail.slice.)
Use CPUShares= as per-slice/per-service/per-scope equivalent of
Nice=.

PrivateXYZ= otoh is very specific to what a daemon does, it's a
sandboxing feature, and sandboxes must always be adjusted to the
individual daemons. I doubt that this is something to support as
anything but a service-specific knob.

Lennart
--
Lennart Poettering, Red Hat
Chris Bell
2015-10-19 17:37:27 UTC
Permalink
Post by Lennart Poettering
Post by Lennart Poettering
However, I just had a long chat with Tejun Heo about this, and we came
to the conclusion that's probably safe to expose a minimal subset of
cpuset now, and reuse the existing CPUAffinity= service setting for
that: right now, it only affects the main process of a service at fork
time (and all child processes forked off from that, recursively), by
using sched_setaffinity(). Our idea would be to propagate it into the
"cpuset.cpus" field too, so that the setting is first passed to
sched_setaffinity(), and then also written to the cpuset
hierarchy. This should be pretty safe, and allow us to make this
available in slices too. It would result in a slight change of
behaviour though, as making adjustments to cpuset would mean that
daemons cannot extend their affinity with sched_setaffinity() above
what was set with cpuset anymore. But I think this is OK.
So, there's a good chance for a subset cpuset-related options at the slice
level relatively soon, but full capabilities will have to wait until kernel
cgroups are improved?
Well, I am not sure what "full capabilities" really mean here. Much of
the cpuset functionality appears to be little else than just help for
writing shell scripts. That part is certainly nothign we want to
expose.
The other part is NUMA memory node stuff, but supposedly that's stuff
that should be dealt with automatically by the kernel, and not need
user configuration. Hence it's nothign we really want to expose right
anytime soon.
Ah, I misunderstood.
Post by Lennart Poettering
Post by Lennart Poettering
I am not sure I understand what you want to to do with the env vars
precisely? what kind of env vars do you intend to set?
Basically, I have a number of services that may or may not be running at any
given time, based on the whims of the users. All of these services are
hosted services of some type, and occasionally they have been known to eat
all CPU cores, lagging everything else. I'm working on setting up CPU shares
and other resource controls to try and keep resources available for
immediate execution of system processes, services, etc. I'd prefer to do
this with affinity; assign critical processes to CPUs 0-1, and the rest
limited to subsets of the available remaining CPUs. I was hoping I could do
this in one run by saying "everything in this slice can must run with this
affinity." I can do it on a per-service basis, but with a large number of
services it gets tedious.
Well, sure, exposing the cpuset knobs as discussed above should make
this easy, and that's precisely what slices have been introduced for.
So I just have to wait for them to be introduced.
Post by Lennart Poettering
I was mostly wondering about the env var issue you raised...
I also think it would be convenient in some cases to be able to use the
'Nice' and 'Private{Network,Devices,etc}' directives apply to an entire
slice. That way I can use slices to control, manage, and group related
services. (Example: I'd like to manage postfix and dovecot together in
system-mail.slice. I'd like to be able to use the slice to set exec options
for both services. Then if I add another service to system-mail.slice, it
would also automatically be constrained by the limits set in
system-mail.slice.)
Use CPUShares= as per-slice/per-service/per-scope equivalent of
Nice=.
PrivateXYZ= otoh is very specific to what a daemon does, it's a
sandboxing feature, and sandboxes must always be adjusted to the
individual daemons. I doubt that this is something to support as
anything but a service-specific knob.
Lennart
Ok, so it seems like most of what I've been trying to implement is
available in some form, just not how I was expecting. I'll take another
look at the Resource Control directives and see how to adjust them for
my needs. It's not as direct as I was hoping, but they seem like they'll
do what I need.

If I have a set of services that really need to be finely controlled I
should probably just run them in a container, and set limits for the
container. Will that work as I am expecting? Will a systemd-nspawn
container respect CPUAffinity settings from the service override file?

Thanks again!!

--Chris
Lennart Poettering
2015-10-19 18:15:00 UTC
Permalink
Post by Lennart Poettering
I was mostly wondering about the env var issue you raised...
Post by Chris Bell
I also think it would be convenient in some cases to be able to use the
'Nice' and 'Private{Network,Devices,etc}' directives apply to an entire
slice. That way I can use slices to control, manage, and group related
services. (Example: I'd like to manage postfix and dovecot together in
system-mail.slice. I'd like to be able to use the slice to set exec options
for both services. Then if I add another service to system-mail.slice, it
would also automatically be constrained by the limits set in
system-mail.slice.)
Use CPUShares= as per-slice/per-service/per-scope equivalent of
Nice=.
PrivateXYZ= otoh is very specific to what a daemon does, it's a
sandboxing feature, and sandboxes must always be adjusted to the
individual daemons. I doubt that this is something to support as
anything but a service-specific knob.
Lennart
Ok, so it seems like most of what I've been trying to implement is available
in some form, just not how I was expecting. I'll take another look at the
Resource Control directives and see how to adjust them for my needs. It's
not as direct as I was hoping, but they seem like they'll do what I need.
If I have a set of services that really need to be finely controlled I
should probably just run them in a container, and set limits for the
container. Will that work as I am expecting? Will a systemd-nspawn container
respect CPUAffinity settings from the service override file?
CPUAffinity= is generally inherited down the process tree. Hence yes,
this will work. But do note that processes may freely readjust ther
own affinity using sched_setaffinity() at any time, and thus are free
to undo the setting. Hooking up cpuset with systemd as proposed this
is not possible anymore. Also, if you we hook up cpuset then it's easy
to readjust the cpuset stuff dynmaically at runtime.

Lennart
--
Lennart Poettering, Red Hat
Chris Bell
2015-10-19 22:49:05 UTC
Permalink
Post by Lennart Poettering
Post by Lennart Poettering
I was mostly wondering about the env var issue you raised...
Post by Chris Bell
I also think it would be convenient in some cases to be able to use the
'Nice' and 'Private{Network,Devices,etc}' directives apply to an entire
slice. That way I can use slices to control, manage, and group related
services. (Example: I'd like to manage postfix and dovecot together in
system-mail.slice. I'd like to be able to use the slice to set exec options
for both services. Then if I add another service to system-mail.slice, it
would also automatically be constrained by the limits set in
system-mail.slice.)
Use CPUShares= as per-slice/per-service/per-scope equivalent of
Nice=.
PrivateXYZ= otoh is very specific to what a daemon does, it's a
sandboxing feature, and sandboxes must always be adjusted to the
individual daemons. I doubt that this is something to support as
anything but a service-specific knob.
Lennart
Ok, so it seems like most of what I've been trying to implement is available
in some form, just not how I was expecting. I'll take another look at the
Resource Control directives and see how to adjust them for my needs. It's
not as direct as I was hoping, but they seem like they'll do what I need.
If I have a set of services that really need to be finely controlled I
should probably just run them in a container, and set limits for the
container. Will that work as I am expecting? Will a systemd-nspawn container
respect CPUAffinity settings from the service override file?
CPUAffinity= is generally inherited down the process tree. Hence yes,
this will work. But do note that processes may freely readjust ther
own affinity using sched_setaffinity() at any time, and thus are free
to undo the setting. Hooking up cpuset with systemd as proposed this
is not possible anymore. Also, if you we hook up cpuset then it's easy
to readjust the cpuset stuff dynmaically at runtime.
Lennart
Ok, I think my best shot is to readjust my strategy with resource
management. I'm sure I can implement a solution with CPU Shares and CPU
usage limits. It doesn't seem like the Affinity option can be enforced
the way I had hoped, anyway.

Thanks for all the help!!

--Chris

Continue reading on narkive:
Loading...