Discussion:
[systemd-devel] Is it meant to be possible to set IO[Read|Write]BandwidthMax on a slice ?
Hadrien Grasland
2021-04-08 10:24:56 UTC
Permalink
Hi everyone,

In a scenario where running benchmarks on dedicated hardware is not
possible, I'm trying to momentarily cap the I/O bandwidth used by
interactive user sessions while benchmarks are running, in order to
improve the stability of said benchmark's I/O performance.

In the following discussion, I'll focus on capping the read bandwidth of
/dev/sda for the sake of keeping my examples short, but if I can get
this to work, the idea would be to cap the read and write bandwidth of
all storage devices.

From
https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html
, I understand that I should be able to achieve the intended goal by...

* Running something like `systemctl set-property --runtime user.slice
IOReadBandwidthMax='/dev/sda 1M'` before the benchmark
* Running something like `systemctl set-property --runtime user.slice
IOReadBandwidthMax=` after the benchmark.

However, this is not effective, as can be checked by running `hdparm
-t`, which still observes the full disk bandwidth.

I have tried the following variants:

* `systemd-run -p IOReadBandwidthMax='/dev/sda 1M' -t bash` works
(hdparm only sees 1 MB/s within the resulting shell)
* `systemd-run -t bash` followed by `systemctl set-property --runtime
<new unit> IOReadBandwidthMax='/dev/sda 1M'` also works (hdparm only
sees 1 MB/s).
* More specifically targeting individual users' slices
(user-<uid>.slice) doesn't work.

This looks like a cgroups or systemd bug to me, but I thought I would
cross-check with you before reporting this to my distribution's
bugtracker (my distro packages systemd 246, which is just below your
minimal version criterion for upstream bug reports).

Should I be able to set I/O bandwidth caps on a top-level slice like
user.slice, or is it expected that I can only do it on individual services?

Cheers,
Hadrien
Lennart Poettering
2021-04-08 14:11:23 UTC
Permalink
Post by Hadrien Grasland
Hi everyone,
In a scenario where running benchmarks on dedicated hardware is not
possible, I'm trying to momentarily cap the I/O bandwidth used by
interactive user sessions while benchmarks are running, in order to improve
the stability of said benchmark's I/O performance.
Is this on cgroupsv1 or cgroupsv2?

IIRC there was some issue that the block io controller wasn't fully
recursive on cgroupsv1. It should work on cgroupsv2.

Lennart

--
Lennart Poettering, Berlin
Hadrien Grasland
2021-04-08 15:19:33 UTC
Permalink
Post by Lennart Poettering
Post by Hadrien Grasland
Hi everyone,
In a scenario where running benchmarks on dedicated hardware is not
possible, I'm trying to momentarily cap the I/O bandwidth used by
interactive user sessions while benchmarks are running, in order to improve
the stability of said benchmark's I/O performance.
Is this on cgroupsv1 or cgroupsv2?
IIRC there was some issue that the block io controller wasn't fully
recursive on cgroupsv1. It should work on cgroupsv2.
This is on a hybrid cgroup configuration. I (perhaps mistakenly) assumed
that modern systemd (v246) will use the cgroups v2 hierarchy in that
case, even though cgroups v1 is still exposed for compatibility with
older apps.

Hadrien
Mantas Mikulėnas
2021-04-08 15:27:29 UTC
Permalink
On Thu, Apr 8, 2021 at 6:19 PM Hadrien Grasland <
Post by Hadrien Grasland
On Do, 08.04.21 12:24, Hadrien Grasland (
Post by Hadrien Grasland
Hi everyone,
In a scenario where running benchmarks on dedicated hardware is not
possible, I'm trying to momentarily cap the I/O bandwidth used by
interactive user sessions while benchmarks are running, in order to
improve
Post by Hadrien Grasland
the stability of said benchmark's I/O performance.
Is this on cgroupsv1 or cgroupsv2?
IIRC there was some issue that the block io controller wasn't fully
recursive on cgroupsv1. It should work on cgroupsv2.
This is on a hybrid cgroup configuration. I (perhaps mistakenly) assumed
that modern systemd (v246) will use the cgroups v2 hierarchy in that
case, even though cgroups v1 is still exposed for compatibility with
older apps.
If e.g. the io controller is exposed through cgroups v1, as far as I know
it cannot be simultaneously used through cgroups v2, and vice versa.

(Hmm, wasn't there an option to choose which controllers to assign to v1
and which ones to v2?)
--
Mantas Mikulėnas
Hadrien Grasland
2021-04-08 15:48:20 UTC
Permalink
Post by Mantas Mikulėnas
On Thu, Apr 8, 2021 at 6:19 PM Hadrien Grasland
On Do, 08.04.21 12:24, Hadrien Grasland
Post by Hadrien Grasland
Hi everyone,
In a scenario where running benchmarks on dedicated hardware is not
possible, I'm trying to momentarily cap the I/O bandwidth used by
interactive user sessions while benchmarks are running, in
order to improve
Post by Hadrien Grasland
the stability of said benchmark's I/O performance.
Is this on cgroupsv1 or cgroupsv2?
IIRC there was some issue that the block io controller wasn't fully
recursive on cgroupsv1. It should work on cgroupsv2.
This is on a hybrid cgroup configuration. I (perhaps mistakenly) assumed
that modern systemd (v246) will use the cgroups v2 hierarchy in that
case, even though cgroups v1 is still exposed for compatibility with
older apps.
If e.g. the io controller is exposed through cgroups v1, as far as I
know it cannot be simultaneously used through cgroups v2, and vice versa.
(Hmm, wasn't there an option to choose which controllers to assign to
v1 and which ones to v2?)
Ah yes indeed, I missed this important bit when I last read
https://systemd.io/CGROUP_DELEGATION/ :

"Note that in [hybrid] mode the unified hierarchy won’t have controllers
attached, the controllers are all mounted as separate hierarchies as in
legacy mode, i.e. |/sys/fs/cgroup/unified/| is purely and exclusively
about core cgroup v2 functionality and not about resource management."

Then I guess unless your parenthesized comment is correct (which means
that the above doc is out of date), I must use pure unified/cgroups2
mode in order to do what I want with IO... except I can't switch to it
because this will break slurm's cgroups support, which I need for a
couple of other things, as it seems Slurm only supports v1 at the moment
: https://groups.google.com/g/slurm-users/c/z57-Z3Tz0Oc?pli=1 . Hmmm...

I'd appreciate any confirmation/refuttal on the ability to do adjust
controller asignment in a fine-grained way, otherwise it seems I'll need
to live without I/O caps until slurm implements cgroup v2 support.

Hadrien
Lennart Poettering
2021-04-08 16:08:50 UTC
Permalink
Post by Hadrien Grasland
Post by Lennart Poettering
Post by Hadrien Grasland
Hi everyone,
In a scenario where running benchmarks on dedicated hardware is not
possible, I'm trying to momentarily cap the I/O bandwidth used by
interactive user sessions while benchmarks are running, in order to improve
the stability of said benchmark's I/O performance.
Is this on cgroupsv1 or cgroupsv2?
IIRC there was some issue that the block io controller wasn't fully
recursive on cgroupsv1. It should work on cgroupsv2.
This is on a hybrid cgroup configuration. I (perhaps mistakenly) assumed
that modern systemd (v246) will use the cgroups v2 hierarchy in that case,
even though cgroups v1 is still exposed for compatibility with older apps.
No, hybrid mode means all controllers operate in cgroupsv1 mode. It's
just that the controller-less cgroupsv2 hierarchy is also mounted.

hybrid mode was a mistake, we should never have added that, it's just
a massive maintainance burden for little gain.

Lennart

--
Lennart Poettering, Berlin
Hadrien Grasland
2021-04-08 16:23:12 UTC
Permalink
Post by Lennart Poettering
Post by Hadrien Grasland
Post by Lennart Poettering
Post by Hadrien Grasland
Hi everyone,
In a scenario where running benchmarks on dedicated hardware is not
possible, I'm trying to momentarily cap the I/O bandwidth used by
interactive user sessions while benchmarks are running, in order to improve
the stability of said benchmark's I/O performance.
Is this on cgroupsv1 or cgroupsv2?
IIRC there was some issue that the block io controller wasn't fully
recursive on cgroupsv1. It should work on cgroupsv2.
This is on a hybrid cgroup configuration. I (perhaps mistakenly) assumed
that modern systemd (v246) will use the cgroups v2 hierarchy in that case,
even though cgroups v1 is still exposed for compatibility with older apps.
No, hybrid mode means all controllers operate in cgroupsv1 mode. It's
just that the controller-less cgroupsv2 hierarchy is also mounted.
hybrid mode was a mistake, we should never have added that, it's just
a massive maintainance burden for little gain.
I see, thanks for the clarification.

I also checked that setting block IO caps on a session-xyz.scope (which
is the last systemd hierarchy node above the interactive processes that
I want to cap) does indeed work, which is consistent with your former
recollection that this might be an issue with cgroupsv1 block io
controller not being recursive.

Which gives me a dirty workaround idea for a way to handle my use case
until slurm get their cgroups v2 support in shape: "just" write a
program that parses the output of systemd-cgls, enumerates every
last-level unit below user.slice, and runs systemctl set-property on
that. That's both ugly and TOCTOU-racy (won't handle newly spawned
units, and must account for units that go away), but might work well
enough for my purposes...

Hadrien

Loading...