Discussion:
[systemd-devel] regarding to cgroup siblings mask
WaLyong Cho
2015-03-24 11:29:34 UTC
Permalink
Hi,

In recent systemd(from some month ago), when a unit has a mask for cpu
or blockio or memory, this mask is also propagated to siblings by
unit_get_target_mask().
According to some of comments, it seems intentional.

Could anyone explain why?

In our system, some of service have MemoryLimit= options. By this
options, all of other services also create its own cgroup in memory. In
mobile system(or some of other embedded system), this can be heavy load.

If this can be configurable, how about add a configuration for cgroup
mask propagation to siblings?

Thanks,
WaLyong
David Timothy Strauss
2015-03-26 20:33:28 UTC
Permalink
Post by WaLyong Cho
Could anyone explain why?
An admin using CPUShares= or a similar proportional CGroup controller
probably assumes that setting the shares to twice the default (for
example) increases the relative proportion of resources for that unit.
However, that is only true if other units competing for that resource
have the same controller(s) enabled so that the kernel knows to
balance the resources accordingly.

The code in systemd ensures that if any unit uses a proportional
CGroups controller in a slice, all other units in that same slice
enable that controller as well, usually with the default proportions.

unit_get_target_mask() is part of an optimization I added so that
initializing CGroups controllers for a given unit doesn't require
iterating through every other unit in a slice to figure out the
necessary controllers. It provides a bitmask indicating the
controllers in use by its siblings so the unit can enable, say,
CPUShares= if one of its siblings is doing so.
WaLyong Cho
2015-03-27 02:33:21 UTC
Permalink
Post by David Timothy Strauss
Post by WaLyong Cho
Could anyone explain why?
An admin using CPUShares= or a similar proportional CGroup controller
probably assumes that setting the shares to twice the default (for
example) increases the relative proportion of resources for that unit.
However, that is only true if other units competing for that resource
have the same controller(s) enabled so that the kernel knows to
balance the resources accordingly.
The code in systemd ensures that if any unit uses a proportional
CGroups controller in a slice, all other units in that same slice
enable that controller as well, usually with the default proportions.
Thanks, understood. But I think this propagation is needed only for
taking weight argument such like CPUShares=weight,
StartupCPUShares=weight, BlockIOWeight=weight,
StartupBlockIOWeight=weight, BlockIODeviceWeight=device weight. For
example, I don't think MemoryLimit= is not option of proportional. It
just only limit of its cgroup and does not race with other cgroup.

If I'm right, we need to modify unit_get_target_mask() to get only mask
for proportional properties.
Post by David Timothy Strauss
unit_get_target_mask() is part of an optimization I added so that
initializing CGroups controllers for a given unit doesn't require
iterating through every other unit in a slice to figure out the
necessary controllers. It provides a bitmask indicating the
controllers in use by its siblings so the unit can enable, say,
CPUShares= if one of its siblings is doing so.
_______________________________________________
systemd-devel mailing list
http://lists.freedesktop.org/mailman/listinfo/systemd-devel
WaLyong Cho
2015-03-27 06:10:04 UTC
Permalink
Some of cgroup properties does not affect to sibling
cgroups. CPUShares and BlockIOWeight are only needed to be propagated.
---
src/core/cgroup.c | 29 ++++++++++++++++++++++++++++-
src/core/cgroup.h | 2 ++
2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/src/core/cgroup.c b/src/core/cgroup.c
index 6b8abb4..b4b9678 100644
--- a/src/core/cgroup.c
+++ b/src/core/cgroup.c
@@ -458,6 +458,23 @@ void cgroup_context_apply(CGroupContext *c, CGroupControllerMask mask, const cha
}
}

+CGroupControllerMask cgroup_context_get_proportional_mask(CGroupContext *c) {
+ CGroupControllerMask mask = 0;
+
+ /* Get only proportional mask */
+
+ if (c->cpu_shares != (unsigned long) -1 ||
+ c->startup_cpu_shares != (unsigned long) -1)
+ mask |= CGROUP_CPU;
+
+ if (c->blockio_weight != (unsigned long) -1 ||
+ c->startup_blockio_weight != (unsigned long) -1 ||
+ c->blockio_device_weights)
+ mask |= CGROUP_BLKIO;
+
+ return mask;
+}
+
CGroupControllerMask cgroup_context_get_mask(CGroupContext *c) {
CGroupControllerMask mask = 0;

@@ -487,6 +504,16 @@ CGroupControllerMask cgroup_context_get_mask(CGroupContext *c) {
return mask;
}

+CGroupControllerMask unit_get_cgroup_proportional_mask(Unit *u) {
+ CGroupContext *c;
+
+ c = unit_get_cgroup_context(u);
+ if (!c)
+ return 0;
+
+ return cgroup_context_get_proportional_mask(c);
+}
+
CGroupControllerMask unit_get_cgroup_mask(Unit *u) {
CGroupContext *c;

@@ -531,7 +558,7 @@ CGroupControllerMask unit_get_members_mask(Unit *u) {
continue;

u->cgroup_members_mask |=
- unit_get_cgroup_mask(member) |
+ unit_get_cgroup_proportional_mask(member) |
unit_get_members_mask(member);
}
}
diff --git a/src/core/cgroup.h b/src/core/cgroup.h
index 869ddae..2371158 100644
--- a/src/core/cgroup.h
+++ b/src/core/cgroup.h
@@ -98,12 +98,14 @@ void cgroup_context_done(CGroupContext *c);
void cgroup_context_dump(CGroupContext *c, FILE* f, const char *prefix);
void cgroup_context_apply(CGroupContext *c, CGroupControllerMask mask, const char *path, ManagerState state);

+CGroupControllerMask cgroup_context_get_proportional_mask(CGroupContext *c);
CGroupControllerMask cgroup_context_get_mask(CGroupContext *c);

void cgroup_context_free_device_allow(CGroupContext *c, CGroupDeviceAllow *a);
void cgroup_context_free_blockio_device_weight(CGroupContext *c, CGroupBlockIODeviceWeight *w);
void cgroup_context_free_blockio_device_bandwidth(CGroupContext *c, CGroupBlockIODeviceBandwidth *b);

+CGroupControllerMask unit_get_cgroup_proportional_mask(Unit *u);
CGroupControllerMask unit_get_cgroup_mask(Unit *u);
CGroupControllerMask unit_get_siblings_mask(Unit *u);
CGroupControllerMask unit_get_members_mask(Unit *u);
--
1.9.3
David Timothy Strauss
2015-03-28 00:25:53 UTC
Permalink
Post by WaLyong Cho
Thanks, understood. But I think this propagation is needed only for
taking weight argument such like CPUShares=weight,
StartupCPUShares=weight, BlockIOWeight=weight,
StartupBlockIOWeight=weight, BlockIODeviceWeight=device weight. For
example, I don't think MemoryLimit= is not option of proportional. It
just only limit of its cgroup and does not race with other cgroup.
If I'm right, we need to modify unit_get_target_mask() to get only mask
for proportional properties.
I'm pretty sure we already do that. If not, it's a bug.
WaLyong Cho
2015-03-28 02:56:21 UTC
Permalink
Post by David Timothy Strauss
Post by WaLyong Cho
Thanks, understood. But I think this propagation is needed only for
taking weight argument such like CPUShares=weight,
StartupCPUShares=weight, BlockIOWeight=weight,
StartupBlockIOWeight=weight, BlockIODeviceWeight=device weight. For
example, I don't think MemoryLimit= is not option of proportional. It
just only limit of its cgroup and does not race with other cgroup.
If I'm right, we need to modify unit_get_target_mask() to get only mask
for proportional properties.
I'm pretty sure we already do that. If not, it's a bug.
Hmm, it seems not. When I added MemoryLimit= option to just one service,
cgroups for every unit were generated on memory cgroup.
Post by David Timothy Strauss
_______________________________________________
systemd-devel mailing list
http://lists.freedesktop.org/mailman/listinfo/systemd-devel
David Timothy Strauss
2015-03-30 20:13:50 UTC
Permalink
Post by WaLyong Cho
Hmm, it seems not. When I added MemoryLimit= option to just one service,
cgroups for every unit were generated on memory cgroup.
It looks like memory_limit and cpu_quota_per_sec_usec both have this
potential issue. The other four controllers managed this way are
clearly proportional (using weights or shares).
WaLyong Cho
2015-03-31 04:47:15 UTC
Permalink
Post by David Timothy Strauss
Post by WaLyong Cho
Hmm, it seems not. When I added MemoryLimit= option to just one service,
cgroups for every unit were generated on memory cgroup.
It looks like memory_limit and cpu_quota_per_sec_usec both have this
potential issue. The other four controllers managed this way are
clearly proportional (using weights or shares).
Then, is this patch able to be a solution?

http://lists.freedesktop.org/archives/systemd-devel/2015-March/029885.html
David Timothy Strauss
2015-03-26 20:34:07 UTC
Permalink
Post by WaLyong Cho
If this can be configurable, how about add a configuration for cgroup
mask propagation to siblings?
I believe the way to prevent propagation is to separate the units into
different slices.
Lennart Poettering
2015-04-08 16:48:20 UTC
Permalink
Post by WaLyong Cho
Hi,
In recent systemd(from some month ago), when a unit has a mask for cpu
or blockio or memory, this mask is also propagated to siblings by
unit_get_target_mask().
According to some of comments, it seems intentional.
Could anyone explain why?
I added this after talking to Tejun. It's basically what the kernel
will effectively do in "sane behaviour" mode too.

Basically, the kernel really wants to avoid having to compare
individual processes against groups, because behaviour is unclear
then. They want to comapre groups against groups and processes against
processes, but not groups against processes, since in many cases
behaviour is very unclear then. To avoid the ambighities this creates
entirely the answer is to not allow this, and hence always propagate
all controller memberships not only towards the root, but also sideways.
Post by WaLyong Cho
In our system, some of service have MemoryLimit= options. By this
options, all of other services also create its own cgroup in memory. In
mobile system(or some of other embedded system), this can be heavy load.
If this can be configurable, how about add a configuration for cgroup
mask propagation to siblings?
The option for this will go away in the kernel too eventually, and we
shouldn't intrdouce an option for behaviour we already know now we
cannot support for long.

I hope this makes some sense,

Lennart
--
Lennart Poettering, Red Hat
WaLyong Cho
2015-04-09 05:10:11 UTC
Permalink
Post by Lennart Poettering
Post by WaLyong Cho
Hi,
In recent systemd(from some month ago), when a unit has a mask for cpu
or blockio or memory, this mask is also propagated to siblings by
unit_get_target_mask().
According to some of comments, it seems intentional.
Could anyone explain why?
I added this after talking to Tejun. It's basically what the kernel
will effectively do in "sane behaviour" mode too.
Basically, the kernel really wants to avoid having to compare
individual processes against groups, because behaviour is unclear
then. They want to comapre groups against groups and processes against
processes, but not groups against processes, since in many cases
behaviour is very unclear then. To avoid the ambighities this creates
entirely the answer is to not allow this, and hence always propagate
all controller memberships not only towards the root, but also sideways.
OK, understood but I'm not sure the compare group properties came under
for all of cgroup subsystem. In case of CPUShares=, BlockIOWeight= such
like take weight as value, should be. But in case of CPUQuota=,
MemoryLimit=, it just a problems of own. Those are not racing with other
groups. So for this kind of properties do not need to be propagated to
siblings.

I'm not sure this make sense. But if yes, below patch will only
propagate proportional or relative properties.
http://lists.freedesktop.org/archives/systemd-devel/2015-March/029885.html


Thanks,
WaLyong
Lennart Poettering
2015-04-09 08:02:34 UTC
Permalink
Post by WaLyong Cho
Post by Lennart Poettering
Post by WaLyong Cho
In recent systemd(from some month ago), when a unit has a mask for cpu
or blockio or memory, this mask is also propagated to siblings by
unit_get_target_mask().
According to some of comments, it seems intentional.
Could anyone explain why?
I added this after talking to Tejun. It's basically what the kernel
will effectively do in "sane behaviour" mode too.
Basically, the kernel really wants to avoid having to compare
individual processes against groups, because behaviour is unclear
then. They want to comapre groups against groups and processes against
processes, but not groups against processes, since in many cases
behaviour is very unclear then. To avoid the ambighities this creates
entirely the answer is to not allow this, and hence always propagate
all controller memberships not only towards the root, but also sideways.
OK, understood but I'm not sure the compare group properties came under
for all of cgroup subsystem. In case of CPUShares=, BlockIOWeight= such
like take weight as value, should be. But in case of CPUQuota=,
MemoryLimit=, it just a problems of own. Those are not racing with other
groups. So for this kind of properties do not need to be propagated to
siblings.
I'm not sure this make sense. But if yes, below patch will only
propagate proportional or relative properties.
http://lists.freedesktop.org/archives/systemd-devel/2015-March/029885.html
Well, it's a matter of clean design really. Sure, if a controller
doesn't have proportional props it's not that important, but it is a
much simpler design to disallow processes and cgroups on the same
level in the tree, and it's going to be what the "sane behaviour"
cgroup rework of the kernel does...

Lennart
--
Lennart Poettering, Red Hat
Loading...