Discussion:
systemd-nspawn and IPv6
(too old to reply)
Kai Krakow
2015-04-26 14:50:37 UTC
Permalink
Hello!

I've successfully created a Gentoo container on top of a Gentoo host. I can
start the container with machinectl. I can also login using SSH. So mission
almost accomblished (it should become a template for easy vserver cloning).

But from within the IPv6-capable container I cannot access the IPv6 outside
world. Name resolution via IPv6 fails, as does pinging to IPv6. It looks
like systemd-nspawn does only setup IPv4 routes to access outside my gateway
boundary. IPv6 does not work.

I may be missing kernel options or some setup. But before poking around
blindly, I'd like to ask if there's a known problem with systemd-nspawn or
known configuration caveats.

Here's the service file (modified to bind the portage and src tree):

# /etc/systemd/system/systemd-***@gentoo\x2dcontainer\x2dbase.service
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.

[Unit]
Description=Container %I
Documentation=man:systemd-nspawn(1)
PartOf=machines.target
Before=machines.target

[Service]
ExecStart=/usr/bin/systemd-nspawn --quiet --keep-unit --boot --link-
journal=try-guest --network-veth --machine=%I --bind=/usr/portage --bind-
ro=/usr/src
KillMode=mixed
Type=notify
RestartForceExitStatus=133
SuccessExitStatus=133
Delegate=yes
MemoryLimit=4G

[Install]
WantedBy=machines.target
--
Replies to list only preferred.
Lennart Poettering
2015-04-27 14:01:54 UTC
Permalink
Post by Kai Krakow
Hello!
I've successfully created a Gentoo container on top of a Gentoo host. I can
start the container with machinectl. I can also login using SSH. So mission
almost accomblished (it should become a template for easy vserver cloning).
But from within the IPv6-capable container I cannot access the IPv6 outside
world. Name resolution via IPv6 fails, as does pinging to IPv6. It looks
like systemd-nspawn does only setup IPv4 routes to access outside my gateway
boundary. IPv6 does not work.
Well, networkd on the host automatically sets up IPv4 masquerading for
each container. We simply don't do anything equivalent for IPv6
currently.

Ideally we wouldn't have to do NAT for IPv6 to make this work, and
instead would pass on some ipv6 subnet we acquired from uplink without
NAT to each container, but we currently don't have infrastructure for
that in networkd, and I am not even sure how this could really work,
my ipv6-fu is a bit too limited...

or maybe we should do ipv6 nat after all, under the logic that
containers are just an implementation detail of the local host rather
than something to be made visible to the outside world. however code
for this exists neither.

Or in other words: ipv6 setup needs some manual networking setup on
the host.

Lennart
--
Lennart Poettering, Red Hat
Dimitri John Ledkov
2015-04-27 14:44:45 UTC
Permalink
Post by Lennart Poettering
Post by Kai Krakow
Hello!
I've successfully created a Gentoo container on top of a Gentoo host. I can
start the container with machinectl. I can also login using SSH. So mission
almost accomblished (it should become a template for easy vserver cloning).
But from within the IPv6-capable container I cannot access the IPv6 outside
world. Name resolution via IPv6 fails, as does pinging to IPv6. It looks
like systemd-nspawn does only setup IPv4 routes to access outside my gateway
boundary. IPv6 does not work.
Well, networkd on the host automatically sets up IPv4 masquerading for
each container. We simply don't do anything equivalent for IPv6
currently.
Ideally we wouldn't have to do NAT for IPv6 to make this work, and
instead would pass on some ipv6 subnet we acquired from uplink without
NAT to each container, but we currently don't have infrastructure for
that in networkd, and I am not even sure how this could really work,
my ipv6-fu is a bit too limited...
or maybe we should do ipv6 nat after all, under the logic that
containers are just an implementation detail of the local host rather
than something to be made visible to the outside world. however code
for this exists neither.
Or in other words: ipv6 setup needs some manual networking setup on
the host.
One should roll the dice and generate unique local address /48 prefix
and use that to setup local addressing, ideally with
autoconfigurations (e.g. derive a fake mac from container uuid and
using the "hosts's" ULA prefix auto-assign ipv6 address)

For giggles see http://unique-local-ipv6.com/
--
Regards,

Dimitri.
Pura Vida!

https://clearlinux.org
Open Source Technology Center
Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ.
Lennart Poettering
2015-04-27 14:56:18 UTC
Permalink
Post by Dimitri John Ledkov
Post by Lennart Poettering
Well, networkd on the host automatically sets up IPv4 masquerading for
each container. We simply don't do anything equivalent for IPv6
currently.
Ideally we wouldn't have to do NAT for IPv6 to make this work, and
instead would pass on some ipv6 subnet we acquired from uplink without
NAT to each container, but we currently don't have infrastructure for
that in networkd, and I am not even sure how this could really work,
my ipv6-fu is a bit too limited...
or maybe we should do ipv6 nat after all, under the logic that
containers are just an implementation detail of the local host rather
than something to be made visible to the outside world. however code
for this exists neither.
Or in other words: ipv6 setup needs some manual networking setup on
the host.
One should roll the dice and generate unique local address /48 prefix
and use that to setup local addressing, ideally with
autoconfigurations (e.g. derive a fake mac from container uuid and
using the "hosts's" ULA prefix auto-assign ipv6 address)
Well, would that enable automatic, correcting routing between the
container and the host's external network? That's kinda what this all
is about...

Lennart
--
Lennart Poettering, Red Hat
Tomasz Torcz
2015-04-27 14:59:51 UTC
Permalink
Post by Lennart Poettering
Post by Dimitri John Ledkov
Post by Lennart Poettering
Well, networkd on the host automatically sets up IPv4 masquerading for
each container. We simply don't do anything equivalent for IPv6
currently.
Ideally we wouldn't have to do NAT for IPv6 to make this work, and
instead would pass on some ipv6 subnet we acquired from uplink without
NAT to each container, but we currently don't have infrastructure for
that in networkd, and I am not even sure how this could really work,
my ipv6-fu is a bit too limited...
or maybe we should do ipv6 nat after all, under the logic that
containers are just an implementation detail of the local host rather
than something to be made visible to the outside world. however code
for this exists neither.
Or in other words: ipv6 setup needs some manual networking setup on
the host.
One should roll the dice and generate unique local address /48 prefix
and use that to setup local addressing, ideally with
autoconfigurations (e.g. derive a fake mac from container uuid and
using the "hosts's" ULA prefix auto-assign ipv6 address)
Well, would that enable automatic, correcting routing between the
container and the host's external network? That's kinda what this all
is about...
If you have radvd running, it should. By the way, speaking of NAT
in context of IPv6 is a heresy.
--
Tomasz Torcz "God, root, what's the difference?"
xmpp: ***@chrome.pl "God is more forgiving."
Kai Krakow
2015-04-27 18:17:03 UTC
Permalink
Post by Tomasz Torcz
Post by Lennart Poettering
Well, would that enable automatic, correcting routing between the
container and the host's external network? That's kinda what this all
is about...
If you have radvd running, it should. By the way, speaking of NAT
in context of IPv6 is a heresy.
Why? It's purpose here is not saving some addresses (we have many in IPv6),
it's purpose is to have security and containment. The services provided by
the container - at least in my project - are meant to be seen as a service
of the host (as Lennart pointed out as a possible application in another
post). I don't want the containers being addressable/routable from outside
in. And putting a firewall in place to counterfeit this is just security by
obscurity: Have one configuration problem and your firewall is gone and the
container publicly available.

The whole story would be different if I'd setup port forwarding afterwards
to make services from the containers available - but that won't be the case.

Each container has to be in it's own private network (on grouped into a
private network with selected other containers). Only gateway services on
the host system (like a web proxy) are allowed to talk to the containers.
--
Replies to list only preferred.
Lennart Poettering
2015-04-27 19:45:01 UTC
Permalink
Post by Kai Krakow
Post by Tomasz Torcz
Post by Lennart Poettering
Well, would that enable automatic, correcting routing between the
container and the host's external network? That's kinda what this all
is about...
If you have radvd running, it should. By the way, speaking of NAT
in context of IPv6 is a heresy.
Why? It's purpose here is not saving some addresses (we have many in IPv6),
it's purpose is to have security and containment. The services provided by
the container - at least in my project - are meant to be seen as a service
of the host (as Lennart pointed out as a possible application in another
post). I don't want the containers being addressable/routable from outside
in. And putting a firewall in place to counterfeit this is just security by
obscurity: Have one configuration problem and your firewall is gone and the
container publicly available.
The whole story would be different if I'd setup port forwarding afterwards
to make services from the containers available - but that won't be the case.
Sidenote: systemd-nspawn already covers that for ipv4: use the --port=
switch (or -p).

Lennart
--
Lennart Poettering, Red Hat
Kai Krakow
2015-04-27 21:06:46 UTC
Permalink
Post by Lennart Poettering
Post by Kai Krakow
Post by Tomasz Torcz
Post by Lennart Poettering
Well, would that enable automatic, correcting routing between the
container and the host's external network? That's kinda what this all
is about...
If you have radvd running, it should. By the way, speaking of NAT
in context of IPv6 is a heresy.
Why? It's purpose here is not saving some addresses (we have many in
IPv6), it's purpose is to have security and containment. The services
provided by the container - at least in my project - are meant to be seen
as a service of the host (as Lennart pointed out as a possible
application in another post). I don't want the containers being
addressable/routable from outside in. And putting a firewall in place to
counterfeit this is just security by obscurity: Have one configuration
problem and your firewall is gone and the container publicly available.
The whole story would be different if I'd setup port forwarding
afterwards to make services from the containers available - but that
won't be the case.
Sidenote: systemd-nspawn already covers that for ipv4: use the --port=
switch (or -p).
Yes, I know... And I will certainly find a use-case for that. :-)

But the general design of my project is to put containers behind a reverse
proxy like nginx or varnish, setup some caching and waf rules, and
dynamically point incoming web requests to the right container servicing the
right environment. :-)

I will probably pull performance data through such a port forwarding. But
for now the testbed is only my desktop system, some months will pass before
deploying this on a broader basis, it will certainly not start with IPv6
support (but it will be kept in mind), and I still have a lot of ideas to
try out.

I even won't need to have IPv6 pass into the host from external networks
because a proxy will sit inbetween. But it would be nice if containers could
use IPv6 from inside without having to worry about packets could pass in
through a public routing rule. I don't like pulling up a firewall before
everything is settled, tested, and secured. A firewall is only the last
resort barrier. The same holds true for stuff like fail2ban or denyhosts.

For the time being, I should simply turn off IPv6 inside the container.
However, I didn't figure out how to prevent systemd-network inside the
container from doing that.
--
Replies to list only preferred.
Dimitri John Ledkov
2015-04-27 15:09:28 UTC
Permalink
Post by Lennart Poettering
Post by Dimitri John Ledkov
Post by Lennart Poettering
Well, networkd on the host automatically sets up IPv4 masquerading for
each container. We simply don't do anything equivalent for IPv6
currently.
Ideally we wouldn't have to do NAT for IPv6 to make this work, and
instead would pass on some ipv6 subnet we acquired from uplink without
NAT to each container, but we currently don't have infrastructure for
that in networkd, and I am not even sure how this could really work,
my ipv6-fu is a bit too limited...
or maybe we should do ipv6 nat after all, under the logic that
containers are just an implementation detail of the local host rather
than something to be made visible to the outside world. however code
for this exists neither.
Or in other words: ipv6 setup needs some manual networking setup on
the host.
One should roll the dice and generate unique local address /48 prefix
and use that to setup local addressing, ideally with
autoconfigurations (e.g. derive a fake mac from container uuid and
using the "hosts's" ULA prefix auto-assign ipv6 address)
Well, would that enable automatic, correcting routing between the
container and the host's external network? That's kinda what this all
is about...
yes... that is host needs to be assigned a subnet and ip from /48, and
containers routed via that host ip.

Or "simply" (aka "expensively") run radvd on the host for the
containers to do all of that (route & ULA prefix advertisement and
complete auto-configuration therefore)
--
Regards,

Dimitri.
Pura Vida!

https://clearlinux.org
Open Source Technology Center
Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ.
Kai Krakow
2015-04-27 18:25:29 UTC
Permalink
Post by Lennart Poettering
Post by Dimitri John Ledkov
Post by Lennart Poettering
Well, networkd on the host automatically sets up IPv4 masquerading for
each container. We simply don't do anything equivalent for IPv6
currently.
Ideally we wouldn't have to do NAT for IPv6 to make this work, and
instead would pass on some ipv6 subnet we acquired from uplink without
NAT to each container, but we currently don't have infrastructure for
that in networkd, and I am not even sure how this could really work,
my ipv6-fu is a bit too limited...
or maybe we should do ipv6 nat after all, under the logic that
containers are just an implementation detail of the local host rather
than something to be made visible to the outside world. however code
for this exists neither.
Or in other words: ipv6 setup needs some manual networking setup on
the host.
One should roll the dice and generate unique local address /48 prefix
and use that to setup local addressing, ideally with
autoconfigurations (e.g. derive a fake mac from container uuid and
using the "hosts's" ULA prefix auto-assign ipv6 address)
Well, would that enable automatic, correcting routing between the
container and the host's external network? That's kinda what this all
is about...
My IPv6-fu is in apprentice-mode, too. But my first guess would be: no.
Local addressing is not routed AFAIK. So I need a global scope address (and
for my use-case I don't want that) or it has to go through NAT.

You said you don't setup IPv6 masquerading, yet. My first guess was I may
have forgotten to setup IPv6 NAT support in the kernel. I'll check that.
Along with that I'm eager to read about a proper, official solution within
systemd-nspawn here.
--
Replies to list only preferred.
Jörg Thalheim
2015-04-29 06:15:33 UTC
Permalink
Post by Lennart Poettering
Well, would that enable automatic, correcting routing between the
container and the host's external network? That's kinda what this all
is about...
Lennart
In case we know, which interface provides the external network, it is also possible to use proxy ndp
to give containers routeable ips:

sysctl -w net.ipv6.conf.<if>.proxy_ndp=0
ip -6 neigh add proxy <ip> dev <if>

where <if> is the external interface and <ip> is the container ip.
Proxy NDP will reply with Neighbor Advertisement on the interface in
question if somebody has sended a Neighbor Solicitation messages for an added ip (similar to ARP Requests/Response).

To give a container an ip from the subnet advertised on the external interface, it would be required to proxy router advertisements between external interface and bridge (or veth pair).
Afaik their is no such proxy for router advertisements, so it would required to bridge the external interface with the bridge (or the host side of the veth pair),
which would break the isolation between external and internal network. (Maybe somebody has a better solution on how to get an ip via router advertisement)

The cool thing of having 1 routeable ip per container is, you have no longer conflicts with port numbers and can actually bind port 80 multiple times for example. About security concerns:
why not whitelist ports forwarded from external with the --port parameter of systemd-nspawn and block everything else.
The only thing port forwarding hides, is the destination ip source, but does not expose if the ip addressed is assigned to the host, a container or even a complete different host, which uses this host as a router.
Also private addresses would still require to NAT traffic from container to the external network, which requires a lot nasty protocol hacks (ipsec, ftp, SIP, ...)

Ideally nspawn could rely on a service which would either forward router advertisements of an external interface or fallback to private addresses in case, the host does not have external network configured.

About ULA addresses (fd00::/8): couldn't these be generated using the machine-id of a container (does every container has a machine-id? is the mac address stable for containers)?
We have 128 - 8 = 120 bit free, however it is recommend to not use the hole ULA address space but limit it to a prefix.
What also always works is using link-local addresses (not for the containers, because link local addresses are awkward to type, because they always needs to include the interface, but it could be used for the host part in the default gateway)
Alexander E. Patrakov
2015-04-29 07:05:33 UTC
Permalink
Post by Jörg Thalheim
Post by Lennart Poettering
Well, would that enable automatic, correcting routing between the
container and the host's external network? That's kinda what this all
is about...
Lennart
In case we know, which interface provides the external network, it is also possible to use proxy ndp
sysctl -w net.ipv6.conf.<if>.proxy_ndp=0
ip -6 neigh add proxy <ip> dev <if>
where <if> is the external interface and <ip> is the container ip.
Proxy NDP will reply with Neighbor Advertisement on the interface in
question if somebody has sended a Neighbor Solicitation messages for an added ip (similar to ARP Requests/Response).
To give a container an ip from the subnet advertised on the external interface, it would be required to proxy router advertisements between external interface and bridge (or veth pair).
Afaik their is no such proxy for router advertisements, so it would required to bridge the external interface with the bridge (or the host side of the veth pair),
which would break the isolation between external and internal network. (Maybe somebody has a better solution on how to get an ip via router advertisement)
Such proxy exists, it is a part of odhcpd, which is used in OpenWRT.

https://github.com/sbyx/odhcpd
--
Alexander E. Patrakov
Lennart Poettering
2015-04-29 09:36:54 UTC
Permalink
Post by Jörg Thalheim
About ULA addresses (fd00::/8): couldn't these be generated using
the machine-id of a container (does every container has a
machine-id? is the mac address stable for containers)?
We generate the mac addresses for containers from hashes of the
container name. They are hence stable as long as the container name is
stable.

Lennart
--
Lennart Poettering, Red Hat
Kai Krakow
2015-04-27 18:08:08 UTC
Permalink
Post by Lennart Poettering
Post by Kai Krakow
Hello!
I've successfully created a Gentoo container on top of a Gentoo host. I
can start the container with machinectl. I can also login using SSH. So
mission almost accomblished (it should become a template for easy vserver
cloning).
But from within the IPv6-capable container I cannot access the IPv6
outside world. Name resolution via IPv6 fails, as does pinging to IPv6.
It looks like systemd-nspawn does only setup IPv4 routes to access
outside my gateway boundary. IPv6 does not work.
Well, networkd on the host automatically sets up IPv4 masquerading for
each container. We simply don't do anything equivalent for IPv6
currently.
So it was a good idea to ask before poking around... ;-)
Post by Lennart Poettering
Ideally we wouldn't have to do NAT for IPv6 to make this work, and
instead would pass on some ipv6 subnet we acquired from uplink without
NAT to each container, but we currently don't have infrastructure for
that in networkd, and I am not even sure how this could really work,
my ipv6-fu is a bit too limited...
or maybe we should do ipv6 nat after all, under the logic that
containers are just an implementation detail of the local host rather
than something to be made visible to the outside world. however code
for this exists neither.
Well, my expectation would be to have NAT for IPv6 here. Why should be NAT
IPv4 private addresses by default but not IPv6 private addresses?

The obvious would be that "it just works." If I wanted routable IPv4, I'd
configure that. If I wanted routable IPv6, I'd do that, too. But it'd be
pretty surprising to have IPv4 NAT but IPv6 public access if radvd
propagated a routable address. This could also become a security problem by
surprise.

So I suggest, by default both protocols should behave the same.

For my project IPv6 is currently no requirement but it's a future
improvement plan. I just wanted to test it out. So currently I could resort
back to switch off IPv6 in the container, tho it's also not obvious how to
do it. It's probably done by means of putting some config in
/etc/systemd/network within the container.
Post by Lennart Poettering
Or in other words: ipv6 setup needs some manual networking setup on
the host.
Or there... Any pointers?

Thanks,
Kai
--
Replies to list only preferred.
Lennart Poettering
2015-04-27 19:42:55 UTC
Permalink
Post by Kai Krakow
Post by Lennart Poettering
Or in other words: ipv6 setup needs some manual networking setup on
the host.
Or there... Any pointers?
Not really. You have to set up ipv6 masquerading with ip6tables. And
ensure the containers get ipv6 addresses that are stable enough that
you can refer to them from the ip6tables rules...

Lennart
--
Lennart Poettering, Red Hat
Kai Krakow
2015-04-27 23:17:43 UTC
Permalink
Post by Lennart Poettering
Post by Kai Krakow
Post by Lennart Poettering
Or in other words: ipv6 setup needs some manual networking setup on
the host.
Or there... Any pointers?
Not really. You have to set up ipv6 masquerading with ip6tables. And
ensure the containers get ipv6 addresses that are stable enough that
you can refer to them from the ip6tables rules...
Somehow I thought I would be smart by adding this ExecPostStart script (OTOH
it's probably just time for bed):

#!/bin/bash
IFNAME=${1:0:14} # %I is passed here
if [ -n "$IFNAME" ]; then
IP=$(ip -6 addr show dev $IFNAME scope global | awk '/inet6/ { print
$2 }')
/sbin/sysctl net.ipv6.conf.$IFNAME.forwarding=1
[ -z "$IP" ] || /sbin/ip6tables -t nat -I POSTROUTING --source $IP
--dest ::/0
fi
exit 0

and adding Address=::0/126 to the [Network] section of ve-* devices...

But somehow it does not work. If I run it manually after starting the
container, it does its work. Of course, inside the container, it won't have
the counterpart address assigned (it works for DHCPv4 only).

If I modify the script to use scope link instead of global, it also works -
but that won't route anyways.

I suppose, when ExecPostStart is running, the link is just not ready yet. An
IP address fc00::... will be added to the interface, tho. So at least that
works.
--
Replies to list only preferred.
Continue reading on narkive:
Loading...