Discussion:
We are working on trying to scale up to > 1000 containers.
(too old to reply)
Daniel J Walsh
2013-06-18 13:11:05 UTC
Permalink
One concern we have is what will happen to systemd if we start 1000 services
at boot.

systemctl start httpd_sandbox.target

For example.

Is there anything we can do to throttle the start of so many unit files. Or
would systemd do something itself.

This will probably cause libvirt problems also.
David Strauss
2013-06-18 21:17:37 UTC
Permalink
We have machines with thousands of containers on them. The key for us
was understanding that we didn't need thousands of containers to run
after start-up; we needed thousands of containers to be *accessible*
after start-up. The vast majority of our containers use socket
activation or "resurrection" (a sort of
try-to-do-something-and-start-the-corresponding-service-on-failure
path we use).

That said, I'd be interested in allowing caps on parallelism in systemd.
Lennart Poettering
2013-06-20 19:23:30 UTC
Permalink
Post by Daniel J Walsh
One concern we have is what will happen to systemd if we start 1000 services
at boot.
systemctl start httpd_sandbox.target
For example.
Is there anything we can do to throttle the start of so many unit files. Or
would systemd do something itself.
So, we have rate limits on some things. We maintain per-service
ratelimits, and a ratelimit in the main even loop. However, that's
really just a last resort thing. Basically, if the event loop spins more
often than 50.000 times per second we will just totally block execution
for 1s. So things get awfully slow when we do too much stuff so that we
don't consumer 100% CPU forever, and that's all.

I have no experience with running this many services on a machine. I am
sure we can add various bits here and there to make sure things scale
nicely for this. But for that I'd really like some performance data
first, i.e. what actually really happens with the current code.

Also, let me get this right: this is about not overloading the kernel
with starting up too many processes at the same time? Is this really a
problem? I figured our kernel these days wouldn't have much problems
with loads like this...

We have a queue of jobs we need to execute. This jobs basically map to
processes we start. We could certainly add something that throttles
dispatching of this queue if we dispatch too many of them in a short
time. With such an approach we'd continue to run the main event loop as
normal, but simply pause processing of the job queue for a while.

Lennart
--
Lennart Poettering - Red Hat, Inc.
Kok, Auke-jan H
2013-06-20 19:29:49 UTC
Permalink
On Thu, Jun 20, 2013 at 12:23 PM, Lennart Poettering
Post by Lennart Poettering
Post by Daniel J Walsh
One concern we have is what will happen to systemd if we start 1000 services
at boot.
systemctl start httpd_sandbox.target
For example.
Is there anything we can do to throttle the start of so many unit files. Or
would systemd do something itself.
So, we have rate limits on some things. We maintain per-service
ratelimits, and a ratelimit in the main even loop. However, that's
really just a last resort thing. Basically, if the event loop spins more
often than 50.000 times per second we will just totally block execution
for 1s. So things get awfully slow when we do too much stuff so that we
don't consumer 100% CPU forever, and that's all.
I have no experience with running this many services on a machine. I am
sure we can add various bits here and there to make sure things scale
nicely for this. But for that I'd really like some performance data
first, i.e. what actually really happens with the current code.
I'd be very interested to see at least a bootchart and systemd-analyze
plot of this... please post them to share!

Auke
David Strauss
2013-06-24 16:55:10 UTC
Permalink
In our case, the issue wasn't kernel process creation; it was the CPU
and I/O overhead of service start-up. At some point, the system gets
dominated by context-switching, and throughput suffers. For example,
you certainly wouldn't want the box to go into swap because of
start-up allocation spikes.
David Strauss
2013-06-24 16:55:52 UTC
Permalink
For example, you certainly wouldn't want the box to go into swap because of
start-up allocation spikes.
I should clarify: that's not a context-switching example. It's just
another case where throttling might help.

--
David Strauss
| ***@davidstrauss.net
| +1 512 577 5827 [mobile]

Continue reading on narkive:
Loading...