[systemd-devel] How does hybrid cgroup setup work?

Discussion:

Umut Tezduyar Lindskog

2017-10-28 20:01:18 UTC

Hello,

I am trying to have cpu controller on v1 and memory controller on
unified but it is not working the way I am expecting. When I enable
CONFIG_MEMCG, cgroup is popping up in /proc/cgroups and
mount_cgroup_controllers() is mounting it very early. Once it is
mounted in v1, it becomes unavailable in v2.

Did I misunderstand how hybrid works?
Umut

Lennart Poettering

2017-11-08 20:25:05 UTC

Permalink

Post by Umut Tezduyar Lindskog
Hello,
I am trying to have cpu controller on v1 and memory controller on
unified but it is not working the way I am expecting. When I enable
CONFIG_MEMCG, cgroup is popping up in /proc/cgroups and
mount_cgroup_controllers() is mounting it very early. Once it is
mounted in v1, it becomes unavailable in v2.
Did I misunderstand how hybrid works?

hybrid means that v2 is used only for tracking services (which is a
good thing, since it provides safe notification of when a cgroup
exits), but not for any of the controllers. That means hybrid mode is
mostly compatible with pure v1, except that there's yet another
hierarchy (the v2 one) and systemd uses it for its own purposes.

Lennart

--
Lennart Poettering, Red Hat

Umut Tezduyar Lindskog

2017-11-10 10:11:05 UTC

Permalink

On Wed, Nov 8, 2017 at 9:25 PM, Lennart Poettering

Post by Lennart Poettering

Is there a technical reason why we cannot have a broader hybrid like
some controllers from v1 and some from v2?

We would like to keep using cpu v1 until cpu v2 is accepted but
meanwhile take advantage of improvements on memory v2.

UMUT

Post by Lennart Poettering
Lennart
--
Lennart Poettering, Red Hat

Lennart Poettering

2017-11-10 11:16:11 UTC

Permalink

Post by Umut Tezduyar Lindskog
On Wed, Nov 8, 2017 at 9:25 PM, Lennart Poettering

Post by Lennart Poettering

Is there a technical reason why we cannot have a broader hybrid like
some controllers from v1 and some from v2?
We would like to keep using cpu v1 until cpu v2 is accepted but
meanwhile take advantage of improvements on memory v2.

Well, right now, you have three types of setups:

1) legacy:

/sys/fs/cgroup/ → tmpfs instance
/sys/fs/cgroup/memory/ → memory controller hierarchy
/sys/fs/cgroup/cpu/ → cpu controller heirarchy
…
/sys/fs/cgroup/systemd/ → named hierarchy, which systemd uses to
manage its stuff (and which is mostly internal to systemd)

2) unified:

/sys/fs/cgroup/ → the unified hierarchy (i.e, note that this is one
dir further up, in lieu of the tmpfs)

3) hybrid:

/sys/fs/cgroup/ → tmpfs
/sys/fs/cgroup/memory/ → memory controller hierarchy
/sys/fs/cgroup/cpu/ → cpu controller heirarchy
…
/sys/fs/cgroup/systemd/ → named hierarchy, for compatibility
/sys/fs/cgroup/unified/ → the unified hierarchy with no
controllers, which systemd uses to manage its stuff (and which is
mostly internal to systemd)

Now, supporting #1 and #2 definitely makes sense, as one is the old
setup, and one the future setup. #3 is in most ways compatible to #1,
as the only thing that changes was the addition of one more hierarchy,
but all the old controller hierarchies remain at the same place. Or in
other words, systemd's private hierarchy changes, but all the other
stuff remains at the exact same location

Now, what you propose, is neither similar to 1) or 3) (as the
controllers would be moved in part to the unified hierarchy). Nor
similar to #2 (as the the unified dir would not be /sys/fs/cgroup,
since then you'd have no place to mount the legacy controllers).

Hence, adding what you propose would only be a temporary stop-gap,
that at the moment of adding would already be clearly on its way out,
and I am very conservative of adding support now for something we know
will not survive for long...

And then there's also the big issue: the cgroup code is complex enough
given that we need to support three different setups. I'd really
prefer if we'd not add even more to that. In fact, I am really looking
forward for the day we can drop all cgroup support besides the unified
one from our tree. We could delete *so much* code then! And there's
only one thing hackers prefer over writing code: deleting code... ;-)

Lennart

--
Lennart Poettering, Red Hat

Lennart Poettering

2017-11-13 09:10:55 UTC

Permalink

Post by Lennart Poettering
And then there's also the big issue: the cgroup code is complex enough
given that we need to support three different setups. I'd really
prefer if we'd not add even more to that. In fact, I am really looking
forward for the day we can drop all cgroup support besides the unified
one from our tree. We could delete *so much* code then! And there's
only one thing hackers prefer over writing code: deleting code... ;-)

I guess that day will come when all the controllers move to v2. What
is your knowledge regarding plans of moving all the control groups?
Are they all going to be moved? If so, are they all going to provide
somehow similar functionality? For example I have noticed I cannot
find a similar functionality of "memory.max_usage_in_bytes" in v2
memory control group. I am not sure if everyone will be happily jump
to #2 (unified) way any time soon or if they will still want to use
some parts from v1 in an unified fashion meanwhile.

As I understood Tejun the major controllers will all be moved, but
some some will be dropped entirely, for example the "devices" one
(since what it does is not precisely resource management but access
management, and it is mostly redundant as seccomp + picking carefully
how /dev is put together has the same effect.

the memory controller should be fully moved over already. systemd's
MemoryMax= should do the right thing already.

Lennart

--
Lennart Poettering, Red Hat