[systemd-devel] Setting up network interfaces for containers with --private-network

Discussion:

Spencer Baugh

2015-04-20 19:25:45 UTC

Hi,

Currently, I can manually set up (or set up with a script) a veth, then
move it in to a systemd-nspawn container with
--network-interface. However, if the container tries to restart (or
exits and needs to be restarted), the network namespace of the container
is destroyed and therefore so is the veth that that namespace
contains. Thus, if the interface isn't recreated by some external force
in the time between stopping and restarting, the next invocation of
systemd-nspawn --network-interface=someif will fail.

To state the problem again more concretely, if I create a veth, assign
one end of the veth to a container started with systemd-nspawn
--network-interface=veth0, then run "reboot" inside the container, the
container will shut down and fail to come back up, as veth0 is
destroyed.

Perhaps, I could hack up some shell script to wrap system-nspawn and
make sure that whenever I run it, an appropriate network interface is
created before actually running systemd-nspawn
--network-interface=someif, but I don't think that's really the best
approach.

I could use --network-bridge on a bridge and get the veth made
automatically, as long as all I wanted to do was add the other end of
the veth to a bridge, and it would be remade whenever the container
restarted. But I think this might be the wrong place for this magic to
live; it's not very configurable and seems rather strange to have
systemd-nspawn doing ad-hoc networking tasks like this.

I'm curious about how this should be done; it seems very important for
serious use of containers. Perhaps networkd could be used to do
something here? What is the best practice?

Thanks,
Spencer Baugh

Lennart Poettering

2015-04-20 23:43:46 UTC

Permalink

Post by Spencer Baugh
Hi,
Currently, I can manually set up (or set up with a script) a veth, then
move it in to a systemd-nspawn container with
--network-interface. However, if the container tries to restart (or
exits and needs to be restarted), the network namespace of the container
is destroyed and therefore so is the veth that that namespace
contains. Thus, if the interface isn't recreated by some external force
in the time between stopping and restarting, the next invocation of
systemd-nspawn --network-interface=someif will fail.
To state the problem again more concretely, if I create a veth, assign
one end of the veth to a container started with systemd-nspawn
--network-interface=veth0, then run "reboot" inside the container, the
container will shut down and fail to come back up, as veth0 is
destroyed.
Perhaps, I could hack up some shell script to wrap system-nspawn and
make sure that whenever I run it, an appropriate network interface is
created before actually running systemd-nspawn
--network-interface=someif, but I don't think that's really the best
approach.
I could use --network-bridge on a bridge and get the veth made
automatically, as long as all I wanted to do was add the other end of
the veth to a bridge, and it would be remade whenever the container
restarted. But I think this might be the wrong place for this magic to
live; it's not very configurable and seems rather strange to have
systemd-nspawn doing ad-hoc networking tasks like this.
I'm curious about how this should be done; it seems very important for
serious use of containers. Perhaps networkd could be used to do
something here? What is the best practice?

So far I'd recommend running networkd on the host and in the
container. If you run it on the host, then it will automatically
configure the hos side of each of nspawn's veth links with a new IP
range, and be a DHCP server on it, as well as do IP
masquerading. Connectivity will hence "just work", if you use networkd
in most cases.

Of course, if you want to manually set up things, without networkd or
an equivalent service, then a lot of things will be more manual: one
way could be to add some script to ExecStartPre= of the service to set
things up for you each time you start the container.

Another option could be to use write a service that sets up the
interface, uses PrivateNetwork= and then use JoinsNamespaceOf= on the
container service towards that service, and turn off nspawn's own
private networking switch. That way PID1 would already set up the
joint namespace for your container, and ensure it is set up properly
by your setup service. And as long as either the setup service or the
container is running the network namespace will stay referenced.

Lennart

--
Lennart Poettering, Red Hat

Spencer Baugh

2015-04-21 02:50:49 UTC

Permalink

Post by Lennart Poettering
So far I'd recommend running networkd on the host and in the
container. If you run it on the host, then it will automatically
configure the hos side of each of nspawn's veth links with a new IP
range, and be a DHCP server on it, as well as do IP
masquerading. Connectivity will hence "just work", if you use networkd
in most cases.

This is in the case where I use --network-bridge, right? Because
otherwise there is no veth to be automatically configured.

Yes, in that case, it is of course very simple, but it is not at all
configurable. I have one thing and one thing only that I want to
configure: The IP address that a given container receives. This seems
like a reasonable thing to want to configure; ultimately there have to
be fixed IP addresses somewhere, and I have a use for containers that
would benefit from having fixed IP addresses.

The way I currently fix the IP address that the container receives is by
fixing the MAC address of the veth; since I am using IPv6 and radvd, the
IP address is deterministically generated from the MAC address. So it
would be helpful if there was a way to do fix the MAC address in
nspawn. Would you accept a patch to add an option to nspawn to specify a
MAC address for the veth? Or is there a better way to go about this?

Post by Lennart Poettering
Of course, if you want to manually set up things, without networkd or
an equivalent service, then a lot of things will be more manual: one
way could be to add some script to ExecStartPre= of the service to set
things up for you each time you start the container.
Another option could be to use write a service that sets up the
interface, uses PrivateNetwork= and then use JoinsNamespaceOf= on the
container service towards that service, and turn off nspawn's own
private networking switch. That way PID1 would already set up the
joint namespace for your container, and ensure it is set up properly
by your setup service. And as long as either the setup service or the
container is running the network namespace will stay referenced.

Hmm, that is an interesting approach... it would be nice to be able to
have networkd be setting up the interface here, though.

I am interested in using networkd to do these things, but at the moment
it doesn't seem to have the required level of power.

Lennart Poettering

2015-04-21 11:02:17 UTC