Discussion:
[systemd-devel] systemd-nspawn create container under unprivileged user
Vasiliy Tolstov
2015-02-04 22:03:47 UTC
Permalink
Hello!
Does it possible to create container as regular user? Oh what capabilities
i need to add to create container not using root?
--
Vasiliy Tolstov,
e-mail: ***@selfip.ru
jabber: ***@selfip.ru
Alban Crequy
2015-02-05 09:44:05 UTC
Permalink
[reposting - sorry I forgot to Cc the mailing list]
Post by Vasiliy Tolstov
Hello!
Does it possible to create container as regular user? Oh what capabilities i
need to add to create container not using root?
Hello,

Manual page namespaces(7):

Creation of new namespaces using clone(2) and unshare(2) in most cases
requires the CAP_SYS_ADMIN capability. User namespaces are the
exception: since Linux 3.8, no privilege is required to create a user
namespace.

systemd-nspawn uses: src/nspawn/nspawn.c:

pid = raw_clone(SIGCHLD|CLONE_NEWNS|
(arg_share_system ? 0 : CLONE_NEWIPC|CLONE_NEWPID|CLONE_NEWUTS)|
(arg_private_network ? CLONE_NEWNET : 0), NULL);

So you need to have CAP_SYS_ADMIN to use systemd-nspawn.


If you want to try user namespaces, it is something that is still
moving... Manual page user_namespaces(7):

Starting in Linux 3.8, unprivileged processes can create
user namespaces, and mount, PID, IPC, network, and UTS
namespaces can be created with just the CAP_SYS_ADMIN
capability in the caller's user namespace.

But it is not true in most Linux distributions as they disable
unprivileged user namespaces and require CAP_SYS_ADMIN anyway. See for
example:
http://anonscm.debian.org/viewvc/kernel/dists/trunk/linux/debian/patches/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch?revision=20773&view=markup
and: echo 1 > /proc/sys/kernel/unprivileged_userns_clone

Additionally, the program userns_child_exec.c included in manual page
namespaces(7) does not work as is yet because since the changes
introduced by CVE-2014-8989, it needs to adjust /proc/pid/setgroups.
See:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=66d2f338ee4c449396b6f99f5e75cd18eb6df272

Cheers,
Alban
Vasiliy Tolstov
2015-02-05 11:48:46 UTC
Permalink
Post by Alban Crequy
Creation of new namespaces using clone(2) and unshare(2) in most cases
requires the CAP_SYS_ADMIN capability. User namespaces are the
exception: since Linux 3.8, no privilege is required to create a user
namespace.
So as i understand i can't create full featured container with network
under non root user (and not have cap_sys_admin)
--
Vasiliy Tolstov,
e-mail: ***@selfip.ru
jabber: ***@selfip.ru
Alban Crequy
2015-02-05 22:38:54 UTC
Permalink
Post by Alban Crequy
Creation of new namespaces using clone(2) and unshare(2) in most cases
requires the CAP_SYS_ADMIN capability. User namespaces are the
exception: since Linux 3.8, no privilege is required to create a user
namespace.
So as i understand i can't create full featured container with network under
non root user (and not have cap_sys_admin)
caps like CAP_SYS_ADMIN don't have an global meaning anymore but
refers to operations a process can do *in its current namespace*. An
unprivileged process (uid!=0, without cap_sys_admin) can join a user
namespace and get uid=0 & cap_sys_admin for operations inside the user
namespace, but it will still have uid!=0 & !cap_sys_admin for
operations in the parent user namespace.

user_namespaces(7) contains userns_child_exec.c and it creates a fully
featured container with network without being root. (I attached a
patched version I was testing)

# # Because I'm using the kernel patched by my distribution
# echo 1 > /proc/sys/kernel/unprivileged_userns_clone

$ gcc -lcap -o userns_child_exec userns_child_exec.c

Here it seems to work:

***@alban:~$ ls -l /tmp/userns_child_exec
-rwxr-xr-x 1 alban alban 14488 Feb 5 23:24 /tmp/userns_child_exec
***@alban:~$ id -u
1000
***@alban:~$ ip link # ---> will show lo, eth0, wlan0...
***@alban:~$ /tmp/userns_child_exec -p -m -U -M '0 1000 1' -G '0
1000 1' -n bash
About to exec bash
***@alban:~# id
uid=0(root) gid=0(root) groups=0(root),65534(nogroup)
***@alban:~# ip link # ---> only lo visible in this namespace

Cheers,
Alban
Lennart Poettering
2015-02-10 11:56:45 UTC
Permalink
Post by Vasiliy Tolstov
Post by Alban Crequy
Creation of new namespaces using clone(2) and unshare(2) in most cases
requires the CAP_SYS_ADMIN capability. User namespaces are the
exception: since Linux 3.8, no privilege is required to create a user
namespace.
So as i understand i can't create full featured container with network
under non root user (and not have cap_sys_admin)
unprivileged containers are unlikely to ever support that. creating a
network interface on the host will necessary require privileges. If
you hence want "full network" support (by which i assume you mean veth
links and stuff), then you are generally out of luck...

You can run nspawn containers without CAP_SYS_ADMIN via nspawn's
--drop-capability=CAP_SYS_ADMIN switch. However, YMMY, as the code you
run inside of the container must be Ok with that not having those
perms and systemd at least until very recently didn't like that at
all...

Lennart
--
Lennart Poettering, Red Hat
Lennart Poettering
2015-02-10 11:52:34 UTC
Permalink
Post by Vasiliy Tolstov
Hello!
Does it possible to create container as regular user? Oh what capabilities
i need to add to create container not using root?
Invoking containers without privileges is not supported by nspawn, and
this is unlikely to change, as I fail to see any strong usecase for
this...

If somebody can englighten me about the usecase for allowing
containers to be run by unprivileged users, I'd be willing to change
my mind though...

Note that to my knowledge any support for unprivileged containers has
been disabled in the kernel on many distros though including Fedora's,
since it's basically one giant security hole.

Note that many of machinectl's commands involve polkit checks, which
means it's easy to open them up for unprivileged clients. However,
in that case the containers would be forked off and maintained
privileged, only the clients will be unprivileged...

LXC supports unprivileged containers though, this might be an option
for you.

Lennart
--
Lennart Poettering, Red Hat
Djalal Harouni
2015-02-11 12:53:40 UTC
Permalink
Post by Lennart Poettering
Post by Vasiliy Tolstov
Hello!
Does it possible to create container as regular user? Oh what capabilities
i need to add to create container not using root?
Invoking containers without privileges is not supported by nspawn, and
this is unlikely to change, as I fail to see any strong usecase for
this...
If somebody can englighten me about the usecase for allowing
containers to be run by unprivileged users, I'd be willing to change
my mind though...
A quick argument against it, IOW just wait and see!

As unprivileged we don't have CAP_SYS_MODULE set, but inside
unprivileged containers we are root, and a call to cap_get_flag() on
CAP_SYS_MODULE will return CAP_SET! but hey in reality this is not true,
we don't have CAP_SYS_MODULE... this will confuse programs running
inside containers, we'll have to add more code paths for this special
case... and not only CAP_SYS_MODULE, perhaps there are other cases...
--
Djalal Harouni
http://opendz.org
Lennart Poettering
2015-02-11 16:06:56 UTC
Permalink
Post by Djalal Harouni
Post by Lennart Poettering
Post by Vasiliy Tolstov
Hello!
Does it possible to create container as regular user? Oh what capabilities
i need to add to create container not using root?
Invoking containers without privileges is not supported by nspawn, and
this is unlikely to change, as I fail to see any strong usecase for
this...
If somebody can englighten me about the usecase for allowing
containers to be run by unprivileged users, I'd be willing to change
my mind though...
A quick argument against it, IOW just wait and see!
As unprivileged we don't have CAP_SYS_MODULE set, but inside
unprivileged containers we are root, and a call to cap_get_flag() on
CAP_SYS_MODULE will return CAP_SET! but hey in reality this is not true,
we don't have CAP_SYS_MODULE... this will confuse programs running
inside containers, we'll have to add more code paths for this special
case... and not only CAP_SYS_MODULE, perhaps there are other cases...
Well, but we could drop CAP_SYS_MODULE both before and after setting
up the userns, so that the cap is missing fro the PID both inside and
outside of it...

Lennart
--
Lennart Poettering, Red Hat
Djalal Harouni
2015-02-11 16:53:59 UTC
Permalink
Post by Lennart Poettering
Post by Djalal Harouni
Post by Lennart Poettering
Post by Vasiliy Tolstov
Hello!
Does it possible to create container as regular user? Oh what capabilities
i need to add to create container not using root?
Invoking containers without privileges is not supported by nspawn, and
this is unlikely to change, as I fail to see any strong usecase for
this...
If somebody can englighten me about the usecase for allowing
containers to be run by unprivileged users, I'd be willing to change
my mind though...
A quick argument against it, IOW just wait and see!
As unprivileged we don't have CAP_SYS_MODULE set, but inside
unprivileged containers we are root, and a call to cap_get_flag() on
CAP_SYS_MODULE will return CAP_SET! but hey in reality this is not true,
we don't have CAP_SYS_MODULE... this will confuse programs running
inside containers, we'll have to add more code paths for this special
case... and not only CAP_SYS_MODULE, perhaps there are other cases...
Well, but we could drop CAP_SYS_MODULE both before and after setting
up the userns, so that the cap is missing fro the PID both inside and
outside of it...
Indeed, yes but still there are other obscure cases, like CAP_SYS_ADMIN,
even if you have it, you won't be able to mount file systems like btrfs
and others, only a subset of virtual filesystems support unprivileged
user mounting... yeh we could drop it too, and it seems that systemd was
adapted recently to work in this situation, but what about other code ?
or if you want todo some sort of system replication inside container...

I guess we'll endup trying to know if this is the real capability or the
diminished version... or if we are inside a userns...
Post by Lennart Poettering
Lennart
--
Lennart Poettering, Red Hat
--
Djalal Harouni
http://opendz.org
Lennart Poettering
2015-02-11 17:00:02 UTC
Permalink
Post by Djalal Harouni
Post by Lennart Poettering
Post by Djalal Harouni
Post by Lennart Poettering
Post by Vasiliy Tolstov
Hello!
Does it possible to create container as regular user? Oh what capabilities
i need to add to create container not using root?
Invoking containers without privileges is not supported by nspawn, and
this is unlikely to change, as I fail to see any strong usecase for
this...
If somebody can englighten me about the usecase for allowing
containers to be run by unprivileged users, I'd be willing to change
my mind though...
A quick argument against it, IOW just wait and see!
As unprivileged we don't have CAP_SYS_MODULE set, but inside
unprivileged containers we are root, and a call to cap_get_flag() on
CAP_SYS_MODULE will return CAP_SET! but hey in reality this is not true,
we don't have CAP_SYS_MODULE... this will confuse programs running
inside containers, we'll have to add more code paths for this special
case... and not only CAP_SYS_MODULE, perhaps there are other cases...
Well, but we could drop CAP_SYS_MODULE both before and after setting
up the userns, so that the cap is missing fro the PID both inside and
outside of it...
Indeed, yes but still there are other obscure cases, like CAP_SYS_ADMIN,
even if you have it, you won't be able to mount file systems like btrfs
and others, only a subset of virtual filesystems support unprivileged
user mounting... yeh we could drop it too, and it seems that systemd was
adapted recently to work in this situation, but what about other code ?
or if you want todo some sort of system replication inside
container...
Well, some mounting is allowed if you have in CAP_SYS_ADMIN, so we can
pass this out, I figure...

Note that the inability to mount btrfs shouldn't be too limiting,
since we don't expose physical devices in nspawn anyway, and what you
don't have you cannot mount anyway...

Lennart
--
Lennart Poettering, Red Hat
Loading...