Discussion:
Unable to run systemd in an LXC / cgroup container.
(too old to reply)
Michael H. Warfield
2012-10-21 21:25:05 UTC
Permalink
Hello,

This is being directed to the systemd-devel community but I'm cc'ing the
lxc-users community and the Fedora community on this for their input as
well. I know it's not always good to cross post between multiple lists
but this is of interest to all three communities who may have valuable
input.

I'm new to this particular list, just having joined after tracking a
problem down to some systemd internals...

Several people over the last year or two on the lxc-users list have been
discussions trying to run certain distros (notably Fedora 16 and above,
recent Arch Linux and possibly others) in LXC containers, virualizing
entire servers this way. This is very similar to Virtuoso / OpenVZ only
it's using the native Linux cgroups for the containers (primary reason I
dumped OpenVZ was to avoid their custom patched kernels). These recent
distros have switched to systemd for the main init process and this has
proven to be disastrous for those of us using LXC and trying to install
or update our containers.

To put it bluntly, it doesn't work and causes all sorts of problems on
the host.

To summarize the problem... The LXC startup binary sets up various
things for /dev and /dev/pts for the container to run properly and this
works perfectly fine for SystemV start-up scripts and/or Upstart.
Unfortunately, systemd has mounts of devtmpfs on /dev and devpts
on /dev/pts which then break things horribly. This is because the
kernel currently lacks namespaces for devices and won't for some time to
come (in design). When devtmpfs gets mounted over top of /dev in the
container, it then hijacks the hosts console tty and several other
devices which had been set up through bind mounts by LXC and should have
been LEFT ALONE.

Yes! I recognize that this problem with devtmpfs and lack of namespaces
is a potential security problem anyways that could (and does) cause
serious container-to-host problems. We're just not going to get that
fixed right away in the linux cgroups and namespaces.

How do we work around this problem in systemd where it has hard coded
mounts in the binary that we can't override or configure? Or is it
there and I'm just missing it trying to examine the sources? That's how
I found where the problem lay.

Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Kay Sievers
2012-10-22 00:53:19 UTC
Permalink
Post by Michael H. Warfield
This is being directed to the systemd-devel community but I'm cc'ing the
lxc-users community and the Fedora community on this for their input as
well. I know it's not always good to cross post between multiple lists
but this is of interest to all three communities who may have valuable
input.
I'm new to this particular list, just having joined after tracking a
problem down to some systemd internals...
Several people over the last year or two on the lxc-users list have been
discussions trying to run certain distros (notably Fedora 16 and above,
recent Arch Linux and possibly others) in LXC containers, virualizing
entire servers this way. This is very similar to Virtuoso / OpenVZ only
it's using the native Linux cgroups for the containers (primary reason I
dumped OpenVZ was to avoid their custom patched kernels). These recent
distros have switched to systemd for the main init process and this has
proven to be disastrous for those of us using LXC and trying to install
or update our containers.
To put it bluntly, it doesn't work and causes all sorts of problems on
the host.
To summarize the problem... The LXC startup binary sets up various
things for /dev and /dev/pts for the container to run properly and this
works perfectly fine for SystemV start-up scripts and/or Upstart.
Unfortunately, systemd has mounts of devtmpfs on /dev and devpts
on /dev/pts which then break things horribly. This is because the
kernel currently lacks namespaces for devices and won't for some time to
come (in design). When devtmpfs gets mounted over top of /dev in the
container, it then hijacks the hosts console tty and several other
devices which had been set up through bind mounts by LXC and should have
been LEFT ALONE.
Yes! I recognize that this problem with devtmpfs and lack of namespaces
is a potential security problem anyways that could (and does) cause
serious container-to-host problems. We're just not going to get that
fixed right away in the linux cgroups and namespaces.
How do we work around this problem in systemd where it has hard coded
mounts in the binary that we can't override or configure? Or is it
there and I'm just missing it trying to examine the sources? That's how
I found where the problem lay.
As a first step, this probably explains most of it:
http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface

Kay
Michael H. Warfield
2012-10-22 02:06:53 UTC
Permalink
Post by Kay Sievers
Post by Michael H. Warfield
This is being directed to the systemd-devel community but I'm cc'ing the
lxc-users community and the Fedora community on this for their input as
well. I know it's not always good to cross post between multiple lists
but this is of interest to all three communities who may have valuable
input.
I'm new to this particular list, just having joined after tracking a
problem down to some systemd internals...
Several people over the last year or two on the lxc-users list have been
discussions trying to run certain distros (notably Fedora 16 and above,
recent Arch Linux and possibly others) in LXC containers, virualizing
entire servers this way. This is very similar to Virtuoso / OpenVZ only
it's using the native Linux cgroups for the containers (primary reason I
dumped OpenVZ was to avoid their custom patched kernels). These recent
distros have switched to systemd for the main init process and this has
proven to be disastrous for those of us using LXC and trying to install
or update our containers.
To put it bluntly, it doesn't work and causes all sorts of problems on
the host.
To summarize the problem... The LXC startup binary sets up various
things for /dev and /dev/pts for the container to run properly and this
works perfectly fine for SystemV start-up scripts and/or Upstart.
Unfortunately, systemd has mounts of devtmpfs on /dev and devpts
on /dev/pts which then break things horribly. This is because the
kernel currently lacks namespaces for devices and won't for some time to
come (in design). When devtmpfs gets mounted over top of /dev in the
container, it then hijacks the hosts console tty and several other
devices which had been set up through bind mounts by LXC and should have
been LEFT ALONE.
Yes! I recognize that this problem with devtmpfs and lack of namespaces
is a potential security problem anyways that could (and does) cause
serious container-to-host problems. We're just not going to get that
fixed right away in the linux cgroups and namespaces.
How do we work around this problem in systemd where it has hard coded
mounts in the binary that we can't override or configure? Or is it
there and I'm just missing it trying to examine the sources? That's how
I found where the problem lay.
http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface
A very long ways, yeah. That looks like it could be just what we've
been looking for. Just gotta figure out how to set that environment
variable but that's up to a couple of others to comment on in the
lxc-users list. Then we'll see where we go from there.

Many thanks!
Post by Kay Sievers
Kay
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Michael H. Warfield
2012-10-22 13:04:34 UTC
Permalink
Post by Michael H. Warfield
Post by Kay Sievers
Post by Michael H. Warfield
This is being directed to the systemd-devel community but I'm cc'ing the
lxc-users community and the Fedora community on this for their input as
well. I know it's not always good to cross post between multiple lists
but this is of interest to all three communities who may have valuable
input.
I'm new to this particular list, just having joined after tracking a
problem down to some systemd internals...
Several people over the last year or two on the lxc-users list have been
discussions trying to run certain distros (notably Fedora 16 and above,
recent Arch Linux and possibly others) in LXC containers, virualizing
entire servers this way. This is very similar to Virtuoso / OpenVZ only
it's using the native Linux cgroups for the containers (primary reason I
dumped OpenVZ was to avoid their custom patched kernels). These recent
distros have switched to systemd for the main init process and this has
proven to be disastrous for those of us using LXC and trying to install
or update our containers.
To put it bluntly, it doesn't work and causes all sorts of problems on
the host.
To summarize the problem... The LXC startup binary sets up various
things for /dev and /dev/pts for the container to run properly and this
works perfectly fine for SystemV start-up scripts and/or Upstart.
Unfortunately, systemd has mounts of devtmpfs on /dev and devpts
on /dev/pts which then break things horribly. This is because the
kernel currently lacks namespaces for devices and won't for some time to
come (in design). When devtmpfs gets mounted over top of /dev in the
container, it then hijacks the hosts console tty and several other
devices which had been set up through bind mounts by LXC and should have
been LEFT ALONE.
Yes! I recognize that this problem with devtmpfs and lack of namespaces
is a potential security problem anyways that could (and does) cause
serious container-to-host problems. We're just not going to get that
fixed right away in the linux cgroups and namespaces.
How do we work around this problem in systemd where it has hard coded
mounts in the binary that we can't override or configure? Or is it
there and I'm just missing it trying to examine the sources? That's how
I found where the problem lay.
http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface
A very long ways, yeah. That looks like it could be just what we've
been looking for. Just gotta figure out how to set that environment
variable but that's up to a couple of others to comment on in the
lxc-users list. Then we'll see where we go from there.
Many thanks!
Post by Kay Sievers
Kay
Regards,
Mike
I've just performed a very quick check on my Arch Linux system here.
# cat /proc/1/environ
TERM=linuxRD_TIMESTAMP=
# cat /proc/1/environ
STY=623.systemd-lithiumTERM=screenTERMCAP=SC|screen|VT 100/ANSI X3.64
virtual terminal:\
:DO=\E[%dB:LE=\E[%dD:RI=\E[%dC:UP=\E[%dA:bs:bt=\E[Z:\
:cd=\E[J:ce=\E[K:cl=\E[H\E[J:cm=\E[%i%d;%dH:ct=\E[3g:\
:do=^J:nd=\E[C:pt:rc=\E8:rs=\Ec:sc=\E7:st=\EH:up=\EM:\
:le=^H:bl=^G:cr=^M:it#8:ho=\E[H:nw=\EE:ta=^I:is=\E)0:\
:li#24:co#80:am:xn:xv:LP:sr=\EM:al=\E[L:AL=\E[%dL:\
:cs=\E[%i%d;%dr:dl=\E[M:DL=\E[%dM:dc=\E[P:DC=\E[%dP:\
:ke=\E[?1l\E>:vi=\E[?25l:ve=\E[34h\E[?25h:vs=\E[34l:\
:ti=\E[?1049h:te=\E[?1049l:k0=\E[10~:k1=\EOP:k2=\EOQ:\
:k3=\EOR:k4=\EOS:k5=\E[15~:k6=\E[17~:k7=\E[18~:\
:k8=\E[19~:k9=\E[20~:k;=\E[21~:F1=\E[23~:F2=\E[24~:\
:kI=\E[2~:kD=\E[3~:ku=\EOA:kd=\EOB:kr=\EOC:kl=\EOD:WINDOW=0SHELL=/bin/shPATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binLANG=en_GB.UTF-8container=lxc
So it looks like that "container" environment variable is already set on
PID1
Yeah, I saw that myself last night. Testing that out and it's still not
working here (although it doesn't seem to be grabbing the host console
now) if I use systemd but upstart fires right up and I see that
container variable set. Looked like a number of mounts listed on that
wiki page. Maybe something is missing. Right now it's just hanging
trying to start the container and, when I subsequently try to shut the
container down it results in a hung resource and it can't delete the
cgroups directory because it's busy. Only thing I did was change the
link to /sbin/init from upstart to systemd and it's now dead and I'll
have to reboot the host to free the resource. :-P
Regards,
John
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Lennart Poettering
2012-10-22 14:11:59 UTC
Permalink
Post by Michael H. Warfield
Hello,
This is being directed to the systemd-devel community but I'm cc'ing the
lxc-users community and the Fedora community on this for their input as
well. I know it's not always good to cross post between multiple lists
but this is of interest to all three communities who may have valuable
input.
I'm new to this particular list, just having joined after tracking a
problem down to some systemd internals...
Several people over the last year or two on the lxc-users list have been
discussions trying to run certain distros (notably Fedora 16 and above,
recent Arch Linux and possibly others) in LXC containers, virualizing
entire servers this way. This is very similar to Virtuoso / OpenVZ only
it's using the native Linux cgroups for the containers (primary reason I
dumped OpenVZ was to avoid their custom patched kernels). These recent
distros have switched to systemd for the main init process and this has
proven to be disastrous for those of us using LXC and trying to install
or update our containers.
Note that it is explicitly our intention to make running systemd inside
of containers as smooth as possibly. The notes Kay linked summarize what
the container manager needs to do for best integration.
Post by Michael H. Warfield
To summarize the problem... The LXC startup binary sets up various
things for /dev and /dev/pts for the container to run properly and this
works perfectly fine for SystemV start-up scripts and/or Upstart.
Unfortunately, systemd has mounts of devtmpfs on /dev and devpts
on /dev/pts which then break things horribly. This is because the
kernel currently lacks namespaces for devices and won't for some time to
come (in design). When devtmpfs gets mounted over top of /dev in the
container, it then hijacks the hosts console tty and several other
devices which had been set up through bind mounts by LXC and should have
been LEFT ALONE.
Please initialize a minimal tmpfs on /dev. systemd will then work fine.
Post by Michael H. Warfield
Yes! I recognize that this problem with devtmpfs and lack of namespaces
is a potential security problem anyways that could (and does) cause
serious container-to-host problems. We're just not going to get that
fixed right away in the linux cgroups and namespaces.
No, devtmpfs really doesn't need updating, containers simply shouldn't
use it.
Post by Michael H. Warfield
How do we work around this problem in systemd where it has hard coded
mounts in the binary that we can't override or configure? Or is it
there and I'm just missing it trying to examine the sources? That's how
I found where the problem lay.
systemd will make use of pre-existing mounts if they exist, and only
mount something new if they don't exist.

Note that there are reports that LXC has issues with the fact that newer
systemd enables shared mount propagation for all mounts by default (this
should actually be beneficial for containers as this ensures that new
mounts appear in the containers). LXC when run on such a system fails as
soon as it tries to use pivot_root(), as that is incompatible with
shared mount propagation. The needs fixing in LXC: it should use MS_MOVE
or MS_BIND to place the new root dir in / instead. A short term
work-around is to simply remount the root tree to private before
invoking LXC.

Lennart
--
Lennart Poettering - Red Hat, Inc.
Michael H. Warfield
2012-10-22 15:48:41 UTC
Permalink
Post by Lennart Poettering
Post by Michael H. Warfield
Hello,
This is being directed to the systemd-devel community but I'm cc'ing the
lxc-users community and the Fedora community on this for their input as
well. I know it's not always good to cross post between multiple lists
but this is of interest to all three communities who may have valuable
input.
I'm new to this particular list, just having joined after tracking a
problem down to some systemd internals...
Several people over the last year or two on the lxc-users list have been
discussions trying to run certain distros (notably Fedora 16 and above,
recent Arch Linux and possibly others) in LXC containers, virualizing
entire servers this way. This is very similar to Virtuoso / OpenVZ only
it's using the native Linux cgroups for the containers (primary reason I
dumped OpenVZ was to avoid their custom patched kernels). These recent
distros have switched to systemd for the main init process and this has
proven to be disastrous for those of us using LXC and trying to install
or update our containers.
Note that it is explicitly our intention to make running systemd inside
of containers as smooth as possibly. The notes Kay linked summarize what
the container manager needs to do for best integration.
Post by Michael H. Warfield
To summarize the problem... The LXC startup binary sets up various
things for /dev and /dev/pts for the container to run properly and this
works perfectly fine for SystemV start-up scripts and/or Upstart.
Unfortunately, systemd has mounts of devtmpfs on /dev and devpts
on /dev/pts which then break things horribly. This is because the
kernel currently lacks namespaces for devices and won't for some time to
come (in design). When devtmpfs gets mounted over top of /dev in the
container, it then hijacks the hosts console tty and several other
devices which had been set up through bind mounts by LXC and should have
been LEFT ALONE.
Please initialize a minimal tmpfs on /dev. systemd will then work fine.
My containers have a reasonable /dev that work with Upstart just fine
but they are not on tmpfs. Is mounting tmpfs on /dev and recreating
that minimal /dev required?
Post by Lennart Poettering
Post by Michael H. Warfield
Yes! I recognize that this problem with devtmpfs and lack of namespaces
is a potential security problem anyways that could (and does) cause
serious container-to-host problems. We're just not going to get that
fixed right away in the linux cgroups and namespaces.
No, devtmpfs really doesn't need updating, containers simply shouldn't
use it.
Ok, yeah. That seems to be at the heart of the problem we're trying to
solve.
Post by Lennart Poettering
Post by Michael H. Warfield
How do we work around this problem in systemd where it has hard coded
mounts in the binary that we can't override or configure? Or is it
there and I'm just missing it trying to examine the sources? That's how
I found where the problem lay.
systemd will make use of pre-existing mounts if they exist, and only
mount something new if they don't exist.
So you're saying that, if we have something mounted on /dev, that's what
prevents systemd from mounting devtmpfs on /dev? That could be
problematical. Tested out a couple of options there that didn't work.
That's going to take some effort.
Post by Lennart Poettering
Note that there are reports that LXC has issues with the fact that newer
systemd enables shared mount propagation for all mounts by default (this
should actually be beneficial for containers as this ensures that new
mounts appear in the containers). LXC when run on such a system fails as
soon as it tries to use pivot_root(), as that is incompatible with
shared mount propagation. The needs fixing in LXC: it should use MS_MOVE
or MS_BIND to place the new root dir in / instead. A short term
work-around is to simply remount the root tree to private before
invoking LXC.
But, I have systemd running on my host system (F17) and containers with
sysvinit or upstart inits are all starting just fine. That sounds like
it should impact all containers as pivot_root() is issued before systemd
in the container is started. Or am I missing something here? That
sounds like a problem for Serge and others to investigate further. I'll
see about trying that workaround though.
Post by Lennart Poettering
Lennart
--
Lennart Poettering - Red Hat, Inc.
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Lennart Poettering
2012-10-22 20:50:19 UTC
Permalink
Post by Michael H. Warfield
Post by Lennart Poettering
Post by Michael H. Warfield
To summarize the problem... The LXC startup binary sets up various
things for /dev and /dev/pts for the container to run properly and this
works perfectly fine for SystemV start-up scripts and/or Upstart.
Unfortunately, systemd has mounts of devtmpfs on /dev and devpts
on /dev/pts which then break things horribly. This is because the
kernel currently lacks namespaces for devices and won't for some time to
come (in design). When devtmpfs gets mounted over top of /dev in the
container, it then hijacks the hosts console tty and several other
devices which had been set up through bind mounts by LXC and should have
been LEFT ALONE.
Please initialize a minimal tmpfs on /dev. systemd will then work fine.
My containers have a reasonable /dev that work with Upstart just fine
but they are not on tmpfs. Is mounting tmpfs on /dev and recreating
that minimal /dev required?
Well, it can be any kind of mount really. Just needs to be a mount. And
the idea is to use tmpfs for this.

What /dev are you currently using? It's probably not a good idea to
reuse the hosts' /dev, since it contains so many device nodes that
should not be accessible/visible to the container.
Post by Michael H. Warfield
Post by Lennart Poettering
systemd will make use of pre-existing mounts if they exist, and only
mount something new if they don't exist.
So you're saying that, if we have something mounted on /dev, that's what
prevents systemd from mounting devtmpfs on /dev?
Yes.
Post by Michael H. Warfield
But, I have systemd running on my host system (F17) and containers with
sysvinit or upstart inits are all starting just fine. That sounds like
it should impact all containers as pivot_root() is issued before systemd
in the container is started. Or am I missing something here? That
sounds like a problem for Serge and others to investigate further. I'll
see about trying that workaround though.
The "shared" issue is F18, and it's about running LXC on a systemd
system, not about running systemd inside of LXC.

Lennart
--
Lennart Poettering - Red Hat, Inc.
Michael H. Warfield
2012-10-22 20:59:22 UTC
Permalink
Post by Lennart Poettering
Post by Michael H. Warfield
Post by Lennart Poettering
Post by Michael H. Warfield
To summarize the problem... The LXC startup binary sets up various
things for /dev and /dev/pts for the container to run properly and this
works perfectly fine for SystemV start-up scripts and/or Upstart.
Unfortunately, systemd has mounts of devtmpfs on /dev and devpts
on /dev/pts which then break things horribly. This is because the
kernel currently lacks namespaces for devices and won't for some time to
come (in design). When devtmpfs gets mounted over top of /dev in the
container, it then hijacks the hosts console tty and several other
devices which had been set up through bind mounts by LXC and should have
been LEFT ALONE.
Please initialize a minimal tmpfs on /dev. systemd will then work fine.
My containers have a reasonable /dev that work with Upstart just fine
but they are not on tmpfs. Is mounting tmpfs on /dev and recreating
that minimal /dev required?
Well, it can be any kind of mount really. Just needs to be a mount. And
the idea is to use tmpfs for this.
What /dev are you currently using? It's probably not a good idea to
reuse the hosts' /dev, since it contains so many device nodes that
should not be accessible/visible to the container.
Got it. And that explains the problems we're seeing but also what I'm
seeing in some libvirt-lxc related pages, which is a separate and
distinct project in spite of the similarities in the name...

http://wiki.1tux.org/wiki/Lxc/Installation#Additional_notes

Unfortunately, in our case, merely getting a mount in there is a
complication in that it also has to be populated but, at least, we
understand the problem set now.
Post by Lennart Poettering
Post by Michael H. Warfield
Post by Lennart Poettering
systemd will make use of pre-existing mounts if they exist, and only
mount something new if they don't exist.
So you're saying that, if we have something mounted on /dev, that's what
prevents systemd from mounting devtmpfs on /dev?
Yes.
Post by Michael H. Warfield
But, I have systemd running on my host system (F17) and containers with
sysvinit or upstart inits are all starting just fine. That sounds like
it should impact all containers as pivot_root() is issued before systemd
in the container is started. Or am I missing something here? That
sounds like a problem for Serge and others to investigate further. I'll
see about trying that workaround though.
The "shared" issue is F18, and it's about running LXC on a systemd
system, not about running systemd inside of LXC.
Whew! I'll deal with F18 when I need to deal with F18. That explains
why my F17 hosts are running and gives Serge and others a chance to
address this, forewarned. Thanks for that info.
Post by Lennart Poettering
Lennart
--
Lennart Poettering - Red Hat, Inc.
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Michael H. Warfield
2012-10-25 15:59:10 UTC
Permalink
Sorry for taking a few days to get back on this. I was delivering a
guest lecture up at Fordham University last Tuesday so I was out of
pocket a couple of days or I would have responded sooner...
Post by Michael H. Warfield
Post by Lennart Poettering
Post by Michael H. Warfield
Post by Lennart Poettering
Post by Michael H. Warfield
To summarize the problem... The LXC startup binary sets up various
things for /dev and /dev/pts for the container to run properly and this
works perfectly fine for SystemV start-up scripts and/or Upstart.
Unfortunately, systemd has mounts of devtmpfs on /dev and devpts
on /dev/pts which then break things horribly. This is because the
kernel currently lacks namespaces for devices and won't for some time to
come (in design). When devtmpfs gets mounted over top of /dev in the
container, it then hijacks the hosts console tty and several other
devices which had been set up through bind mounts by LXC and should have
been LEFT ALONE.
Please initialize a minimal tmpfs on /dev. systemd will then work fine.
My containers have a reasonable /dev that work with Upstart just fine
but they are not on tmpfs. Is mounting tmpfs on /dev and recreating
that minimal /dev required?
Well, it can be any kind of mount really. Just needs to be a mount. And
the idea is to use tmpfs for this.
What /dev are you currently using? It's probably not a good idea to
reuse the hosts' /dev, since it contains so many device nodes that
should not be accessible/visible to the container.
Got it. And that explains the problems we're seeing but also what I'm
seeing in some libvirt-lxc related pages, which is a separate and
distinct project in spite of the similarities in the name...
http://wiki.1tux.org/wiki/Lxc/Installation#Additional_notes
Unfortunately, in our case, merely getting a mount in there is a
complication in that it also has to be populated but, at least, we
understand the problem set now.
Ok... Serge and I were corresponding on the lxc-users list and he had a
suggestion that worked but I consider to be a bit of a sub-optimal
workaround. Ironically, it was to mount devtmpfs on /dev. We don't
(currently) have a method to auto-populate a tmpfs mount with the needed
devices and this provided it. It does have a problem that makes me
uncomfortable in that the container now has visibility into the
hosts /dev system. I'm a security expert and I'm not comfortable with
that "solution" even with the controls we have. We can control access
but still, not happy with that.

I now have a container that starts with systemd running more or less
properly. We do have some problems with the convention that has been
set up, however.

When running in this mode, you run on the console and you don't spawn
getty's on the tty's. There seems to be a problem with this.

In this mode, if I manually start the container in a terminal window,
that eventually results in a login prompt there. Under sysvinit and
upstart I don't get that and can detach.

If I run lxc-console (which attaches to one of the vtys) it gives me
nothing. Under sysvinit and upstart I get vty login prompts because
they have started getty on those vtys. This is important in case
network access has not started for one reason or another and the
container was started detached in the background.

If I start lxc-start in detached mode (-d -o {logfile}) lxc-start
redirects the system console to the log file and goes daemon. In this
case, the systemd container hangs and never starts.

I SUSPECT the hang condition is something to do with systemd trying to
start and interactive console on /dev/console, which sysvinit and
upstart do not do. Maybe we have to do something different with the
redirects in this case, but it's not working consistent with the other
packages. We should also start appropriate gettys on those vtys if they
are configured. Maybe start the getty's if the tty? exists up to a
configured limit (and don't restart if they immediately fail) and
obviously don't start them if they don't. It then gives up control over
that process. Also don't start a login on /dev/console if you DO start
a getty? That would make your behavior congruent with that of the other
two systems.

I've got some more problems relating to shutting down containers, some
of which may be related to mounting tmpfs on /run to which /var/run is
symlinked to. We're doing halt / restart detection by monitoring utmp
in that directory but it looks like utmp isn't even in that directory
anymore and mounting tmpfs on it was always problematical. We may have
to have a more generic method to detect when a container has shut down
or is restarting in that case. I'm also finding we end up with dangling
resources where we can't remove to cgroup directories after a halt and
that creates a serious problem I have to investigate further. Not sure
if it's a host problem running on F17 or it something to do with running
systemd in a container but I can not shut down this particular container
and subsequently restart it without restarting the entire host. Not
good is an understatement.

Regards,
Mike
Post by Michael H. Warfield
Post by Lennart Poettering
Post by Michael H. Warfield
Post by Lennart Poettering
systemd will make use of pre-existing mounts if they exist, and only
mount something new if they don't exist.
So you're saying that, if we have something mounted on /dev, that's what
prevents systemd from mounting devtmpfs on /dev?
Yes.
Post by Michael H. Warfield
But, I have systemd running on my host system (F17) and containers with
sysvinit or upstart inits are all starting just fine. That sounds like
it should impact all containers as pivot_root() is issued before systemd
in the container is started. Or am I missing something here? That
sounds like a problem for Serge and others to investigate further. I'll
see about trying that workaround though.
The "shared" issue is F18, and it's about running LXC on a systemd
system, not about running systemd inside of LXC.
Whew! I'll deal with F18 when I need to deal with F18. That explains
why my F17 hosts are running and gives Serge and others a chance to
address this, forewarned. Thanks for that info.
Post by Lennart Poettering
Lennart
--
Lennart Poettering - Red Hat, Inc.
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Serge Hallyn
2012-10-25 16:19:54 UTC
Permalink
Post by Michael H. Warfield
Sorry for taking a few days to get back on this. I was delivering a
guest lecture up at Fordham University last Tuesday so I was out of
pocket a couple of days or I would have responded sooner...
Post by Michael H. Warfield
Post by Lennart Poettering
Post by Michael H. Warfield
Post by Lennart Poettering
Post by Michael H. Warfield
To summarize the problem... The LXC startup binary sets up various
things for /dev and /dev/pts for the container to run properly and this
works perfectly fine for SystemV start-up scripts and/or Upstart.
Unfortunately, systemd has mounts of devtmpfs on /dev and devpts
on /dev/pts which then break things horribly. This is because the
kernel currently lacks namespaces for devices and won't for some time to
come (in design). When devtmpfs gets mounted over top of /dev in the
container, it then hijacks the hosts console tty and several other
devices which had been set up through bind mounts by LXC and should have
been LEFT ALONE.
Please initialize a minimal tmpfs on /dev. systemd will then work fine.
My containers have a reasonable /dev that work with Upstart just fine
but they are not on tmpfs. Is mounting tmpfs on /dev and recreating
that minimal /dev required?
Well, it can be any kind of mount really. Just needs to be a mount. And
the idea is to use tmpfs for this.
What /dev are you currently using? It's probably not a good idea to
reuse the hosts' /dev, since it contains so many device nodes that
should not be accessible/visible to the container.
Got it. And that explains the problems we're seeing but also what I'm
seeing in some libvirt-lxc related pages, which is a separate and
distinct project in spite of the similarities in the name...
http://wiki.1tux.org/wiki/Lxc/Installation#Additional_notes
Unfortunately, in our case, merely getting a mount in there is a
complication in that it also has to be populated but, at least, we
understand the problem set now.
Ok... Serge and I were corresponding on the lxc-users list and he had a
suggestion that worked but I consider to be a bit of a sub-optimal
workaround. Ironically, it was to mount devtmpfs on /dev. We don't
Oh, sorry - I take back that suggestion :)

Note that we have mount hooks, so templates could install a mount hook to
mount a tmpfs onto /dev and populate it.

Or, if everyone is going to need it, we could just add a 'lxc.populatedevs = 1'
option which does that without needing a hook.

devtmpfs should not be used in containers :)

-serge
Michael H. Warfield
2012-10-25 16:39:12 UTC
Permalink
Post by Serge Hallyn
Post by Michael H. Warfield
Sorry for taking a few days to get back on this. I was delivering a
guest lecture up at Fordham University last Tuesday so I was out of
pocket a couple of days or I would have responded sooner...
Post by Michael H. Warfield
Post by Lennart Poettering
Post by Michael H. Warfield
Post by Lennart Poettering
Post by Michael H. Warfield
To summarize the problem... The LXC startup binary sets up various
things for /dev and /dev/pts for the container to run properly and this
works perfectly fine for SystemV start-up scripts and/or Upstart.
Unfortunately, systemd has mounts of devtmpfs on /dev and devpts
on /dev/pts which then break things horribly. This is because the
kernel currently lacks namespaces for devices and won't for some time to
come (in design). When devtmpfs gets mounted over top of /dev in the
container, it then hijacks the hosts console tty and several other
devices which had been set up through bind mounts by LXC and should have
been LEFT ALONE.
Please initialize a minimal tmpfs on /dev. systemd will then work fine.
My containers have a reasonable /dev that work with Upstart just fine
but they are not on tmpfs. Is mounting tmpfs on /dev and recreating
that minimal /dev required?
Well, it can be any kind of mount really. Just needs to be a mount. And
the idea is to use tmpfs for this.
What /dev are you currently using? It's probably not a good idea to
reuse the hosts' /dev, since it contains so many device nodes that
should not be accessible/visible to the container.
Got it. And that explains the problems we're seeing but also what I'm
seeing in some libvirt-lxc related pages, which is a separate and
distinct project in spite of the similarities in the name...
http://wiki.1tux.org/wiki/Lxc/Installation#Additional_notes
Unfortunately, in our case, merely getting a mount in there is a
complication in that it also has to be populated but, at least, we
understand the problem set now.
Ok... Serge and I were corresponding on the lxc-users list and he had a
suggestion that worked but I consider to be a bit of a sub-optimal
workaround. Ironically, it was to mount devtmpfs on /dev. We don't
Oh, sorry - I take back that suggestion :)
Well, it worked (sort of) and reinforced what the problem was and where
the solution lay so there's no need to be sorry for it. We learned and
we know why it's not the right solution. This is good. We made a lot
of progress on this just in the last week. This is very good.
Post by Serge Hallyn
Note that we have mount hooks, so templates could install a mount hook to
mount a tmpfs onto /dev and populate it.
Ah, now that is interesting. I haven't looked at that before. I need
to explore that further.
Post by Serge Hallyn
Or, if everyone is going to need it, we could just add a 'lxc.populatedevs = 1'
option which does that without needing a hook.
Eventually, with Fedora (and later RHEL / CentOS / SL), Arch Linux, and
others going to systemd, I think this is going to be needed sooner than
later.
Post by Serge Hallyn
devtmpfs should not be used in containers :)
Concur!
Post by Serge Hallyn
-serge
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Michael H. Warfield
2012-10-25 17:23:32 UTC
Permalink
Hey Serge,
Post by Serge Hallyn
Post by Michael H. Warfield
Sorry for taking a few days to get back on this. I was delivering a
guest lecture up at Fordham University last Tuesday so I was out of
pocket a couple of days or I would have responded sooner...
Post by Michael H. Warfield
Post by Lennart Poettering
Post by Michael H. Warfield
Post by Lennart Poettering
Post by Michael H. Warfield
To summarize the problem... The LXC startup binary sets up various
things for /dev and /dev/pts for the container to run properly and this
works perfectly fine for SystemV start-up scripts and/or Upstart.
Unfortunately, systemd has mounts of devtmpfs on /dev and devpts
on /dev/pts which then break things horribly. This is because the
kernel currently lacks namespaces for devices and won't for some time to
come (in design). When devtmpfs gets mounted over top of /dev in the
container, it then hijacks the hosts console tty and several other
devices which had been set up through bind mounts by LXC and should have
been LEFT ALONE.
Please initialize a minimal tmpfs on /dev. systemd will then work fine.
My containers have a reasonable /dev that work with Upstart just fine
but they are not on tmpfs. Is mounting tmpfs on /dev and recreating
that minimal /dev required?
Well, it can be any kind of mount really. Just needs to be a mount. And
the idea is to use tmpfs for this.
What /dev are you currently using? It's probably not a good idea to
reuse the hosts' /dev, since it contains so many device nodes that
should not be accessible/visible to the container.
Got it. And that explains the problems we're seeing but also what I'm
seeing in some libvirt-lxc related pages, which is a separate and
distinct project in spite of the similarities in the name...
http://wiki.1tux.org/wiki/Lxc/Installation#Additional_notes
Unfortunately, in our case, merely getting a mount in there is a
complication in that it also has to be populated but, at least, we
understand the problem set now.
Ok... Serge and I were corresponding on the lxc-users list and he had a
suggestion that worked but I consider to be a bit of a sub-optimal
workaround. Ironically, it was to mount devtmpfs on /dev. We don't
Oh, sorry - I take back that suggestion :)
Note that we have mount hooks, so templates could install a mount hook to
mount a tmpfs onto /dev and populate it.
Ok... I've done some cursory search and turned up nothing but some
comments about "pre mount hooks". Where is the documentation about this
feature and how I might use / implement it? Some examples would
probably suffice. Is there a require release version of lxc-utils?
Post by Serge Hallyn
Or, if everyone is going to need it, we could just add a 'lxc.populatedevs = 1'
option which does that without needing a hook.
devtmpfs should not be used in containers :)
-serge
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Michael H. Warfield
2012-10-25 17:35:11 UTC
Permalink
Post by Michael H. Warfield
Hey Serge,
...
Post by Michael H. Warfield
Post by Serge Hallyn
Oh, sorry - I take back that suggestion :)
Note that we have mount hooks, so templates could install a mount hook to
mount a tmpfs onto /dev and populate it.
Ok... I've done some cursory search and turned up nothing but some
comments about "pre mount hooks". Where is the documentation about this
feature and how I might use / implement it? Some examples would
probably suffice. Is there a require release version of lxc-utils?
I think I found what I needed in the changelog here:

http://www.mail-archive.com/lxc-***@lists.sourceforge.net/msg01490.html

I'll play with it and report back.
Post by Michael H. Warfield
Post by Serge Hallyn
Or, if everyone is going to need it, we could just add a 'lxc.populatedevs = 1'
option which does that without needing a hook.
devtmpfs should not be used in containers :)
-serge
Regards,
Mike
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
http://p.sf.net/sfu/appdyn_sfd2d_oct
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Serge Hallyn
2012-10-25 19:02:21 UTC
Permalink
Post by Michael H. Warfield
Post by Michael H. Warfield
Hey Serge,
...
Post by Michael H. Warfield
Post by Serge Hallyn
Oh, sorry - I take back that suggestion :)
Note that we have mount hooks, so templates could install a mount hook to
mount a tmpfs onto /dev and populate it.
Ok... I've done some cursory search and turned up nothing but some
comments about "pre mount hooks". Where is the documentation about this
feature and how I might use / implement it? Some examples would
probably suffice. Is there a require release version of lxc-utils?
I'll play with it and report back.
Also the "Lifecycle management hooks" section in
https://help.ubuntu.com/12.10/serverguide/lxc.html

Note that I'm thinking that having lxc-start guess how to fill in /dev
is wrong, because different distros and even different releases of the
same distros have different expectations. For instance ubuntu lucid
wants /dev/shm to be a directory, while precise+ wants a symlink. So
somehow the template should get involved, be it by adding a hook, or
simply specifying a configuration file which lxc uses internally to
decide how to create /dev.

Personally I'd prefer if /dev were always populated by the templates,
and containers (i.e. userspace) didn't mount a fresh tmpfs for /dev.
But that does complicate userspace, and we've seen it in debian/ubuntu
as well (i.e. at certain package upgrades which rely on /dev being
cleared after a reboot).

-serge
Michael H. Warfield
2012-10-25 19:42:54 UTC
Permalink
Post by Serge Hallyn
Post by Michael H. Warfield
Post by Michael H. Warfield
Hey Serge,
...
Post by Michael H. Warfield
Post by Serge Hallyn
Oh, sorry - I take back that suggestion :)
Note that we have mount hooks, so templates could install a mount hook to
mount a tmpfs onto /dev and populate it.
Ok... I've done some cursory search and turned up nothing but some
comments about "pre mount hooks". Where is the documentation about this
feature and how I might use / implement it? Some examples would
probably suffice. Is there a require release version of lxc-utils?
I'll play with it and report back.
Also the "Lifecycle management hooks" section in
https://help.ubuntu.com/12.10/serverguide/lxc.html
This isn't working...

Based on what was in both of those articles, I added this entry to
another container (Plover) to test...

lxc.hook.mount = /var/lib/lxc/Plover/mount

When I run "lxc-start -n Plover", I see this:

[***@forest ~]# lxc-start -n Plover
lxc-start: unknow key lxc.hook.mount
lxc-start: failed to read configuration file

I'm running the latest rc...

[***@forest ~]# rpm -qa | grep lxc
lxc-0.8.0.rc2-1.fc16.x86_64
lxc-libs-0.8.0.rc2-1.fc16.x86_64
lxc-doc-0.8.0.rc2-1.fc16.x86_64

Is it something in git that hasn't made it to a release yet?
Post by Serge Hallyn
Note that I'm thinking that having lxc-start guess how to fill in /dev
is wrong, because different distros and even different releases of the
same distros have different expectations. For instance ubuntu lucid
wants /dev/shm to be a directory, while precise+ wants a symlink. So
somehow the template should get involved, be it by adding a hook, or
simply specifying a configuration file which lxc uses internally to
decide how to create /dev.
I agree this needs to be by some sort of convention or template that we
can adjust.
Post by Serge Hallyn
Personally I'd prefer if /dev were always populated by the templates,
and containers (i.e. userspace) didn't mount a fresh tmpfs for /dev.
But that does complicate userspace, and we've seen it in debian/ubuntu
as well (i.e. at certain package upgrades which rely on /dev being
cleared after a reboot).
-serge
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Michael H. Warfield
2012-10-26 02:59:26 UTC
Permalink
Post by Michael H. Warfield
Post by Serge Hallyn
Post by Michael H. Warfield
Post by Michael H. Warfield
Hey Serge,
...
Post by Michael H. Warfield
Post by Serge Hallyn
Oh, sorry - I take back that suggestion :)
Note that we have mount hooks, so templates could install a mount hook to
mount a tmpfs onto /dev and populate it.
Ok... I've done some cursory search and turned up nothing but some
comments about "pre mount hooks". Where is the documentation about this
feature and how I might use / implement it? Some examples would
probably suffice. Is there a require release version of lxc-utils?
I'll play with it and report back.
Also the "Lifecycle management hooks" section in
https://help.ubuntu.com/12.10/serverguide/lxc.html
This isn't working...
Based on what was in both of those articles, I added this entry to
another container (Plover) to test...
lxc.hook.mount = /var/lib/lxc/Plover/mount
lxc-start: unknow key lxc.hook.mount
lxc-start: failed to read configuration file
I'm running the latest rc...
lxc-0.8.0.rc2-1.fc16.x86_64
lxc-libs-0.8.0.rc2-1.fc16.x86_64
lxc-doc-0.8.0.rc2-1.fc16.x86_64
Is it something in git that hasn't made it to a release yet?
nm... I see it. It's in git and hasn't made it to a release. I'm
working on a git build to test now. If this is something that solves
some of this, we need to move things along here and get these things
moved out. According to git, 0.8.0rc2 was 7 months ago? What's the
show stoppers here?
Post by Michael H. Warfield
Post by Serge Hallyn
Note that I'm thinking that having lxc-start guess how to fill in /dev
is wrong, because different distros and even different releases of the
same distros have different expectations. For instance ubuntu lucid
wants /dev/shm to be a directory, while precise+ wants a symlink. So
somehow the template should get involved, be it by adding a hook, or
simply specifying a configuration file which lxc uses internally to
decide how to create /dev.
I agree this needs to be by some sort of convention or template that we
can adjust.
Post by Serge Hallyn
Personally I'd prefer if /dev were always populated by the templates,
and containers (i.e. userspace) didn't mount a fresh tmpfs for /dev.
But that does complicate userspace, and we've seen it in debian/ubuntu
as well (i.e. at certain package upgrades which rely on /dev being
cleared after a reboot).
-serge
Regards,
Mike
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Michael H. Warfield
2012-10-26 13:32:17 UTC
Permalink
Adding in the lxc-devel list.
Post by Michael H. Warfield
Post by Michael H. Warfield
Post by Serge Hallyn
Post by Michael H. Warfield
Post by Michael H. Warfield
Hey Serge,
...
Post by Michael H. Warfield
Post by Serge Hallyn
Oh, sorry - I take back that suggestion :)
Note that we have mount hooks, so templates could install a mount hook to
mount a tmpfs onto /dev and populate it.
Ok... I've done some cursory search and turned up nothing but some
comments about "pre mount hooks". Where is the documentation about this
feature and how I might use / implement it? Some examples would
probably suffice. Is there a require release version of lxc-utils?
I'll play with it and report back.
Also the "Lifecycle management hooks" section in
https://help.ubuntu.com/12.10/serverguide/lxc.html
This isn't working...
Based on what was in both of those articles, I added this entry to
another container (Plover) to test...
lxc.hook.mount = /var/lib/lxc/Plover/mount
lxc-start: unknow key lxc.hook.mount
lxc-start: failed to read configuration file
I'm running the latest rc...
lxc-0.8.0.rc2-1.fc16.x86_64
lxc-libs-0.8.0.rc2-1.fc16.x86_64
lxc-doc-0.8.0.rc2-1.fc16.x86_64
Is it something in git that hasn't made it to a release yet?
nm... I see it. It's in git and hasn't made it to a release. I'm
working on a git build to test now. If this is something that solves
some of this, we need to move things along here and get these things
moved out. According to git, 0.8.0rc2 was 7 months ago? What's the
show stoppers here?
While the git repo says 7 months ago, the date stamp on the
lxc-0.8.0-rc2 tarball is from July 10, so about 3-1/2 months ago.
Sounds like we've accumulated some features (like the hooks) we are
going to need like months ago to deal with this systemd debacle. How
close are we to either 0.8.0rc3 or 0.8.0? Any blockers or are we just
waiting on some more features?
Post by Michael H. Warfield
Post by Michael H. Warfield
Post by Serge Hallyn
Note that I'm thinking that having lxc-start guess how to fill in /dev
is wrong, because different distros and even different releases of the
same distros have different expectations. For instance ubuntu lucid
wants /dev/shm to be a directory, while precise+ wants a symlink. So
somehow the template should get involved, be it by adding a hook, or
simply specifying a configuration file which lxc uses internally to
decide how to create /dev.
I agree this needs to be by some sort of convention or template that we
can adjust.
Post by Serge Hallyn
Personally I'd prefer if /dev were always populated by the templates,
and containers (i.e. userspace) didn't mount a fresh tmpfs for /dev.
But that does complicate userspace, and we've seen it in debian/ubuntu
as well (i.e. at certain package upgrades which rely on /dev being
cleared after a reboot).
-serge
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Serge Hallyn
2012-10-26 14:07:28 UTC
Permalink
Post by Michael H. Warfield
Adding in the lxc-devel list.
Post by Michael H. Warfield
Post by Michael H. Warfield
Post by Serge Hallyn
Post by Michael H. Warfield
Post by Michael H. Warfield
Hey Serge,
...
Post by Michael H. Warfield
Post by Serge Hallyn
Oh, sorry - I take back that suggestion :)
Note that we have mount hooks, so templates could install a mount hook to
mount a tmpfs onto /dev and populate it.
Ok... I've done some cursory search and turned up nothing but some
comments about "pre mount hooks". Where is the documentation about this
feature and how I might use / implement it? Some examples would
probably suffice. Is there a require release version of lxc-utils?
I'll play with it and report back.
Also the "Lifecycle management hooks" section in
https://help.ubuntu.com/12.10/serverguide/lxc.html
This isn't working...
Based on what was in both of those articles, I added this entry to
another container (Plover) to test...
lxc.hook.mount = /var/lib/lxc/Plover/mount
lxc-start: unknow key lxc.hook.mount
lxc-start: failed to read configuration file
I'm running the latest rc...
lxc-0.8.0.rc2-1.fc16.x86_64
lxc-libs-0.8.0.rc2-1.fc16.x86_64
lxc-doc-0.8.0.rc2-1.fc16.x86_64
Is it something in git that hasn't made it to a release yet?
nm... I see it. It's in git and hasn't made it to a release. I'm
working on a git build to test now. If this is something that solves
some of this, we need to move things along here and get these things
moved out. According to git, 0.8.0rc2 was 7 months ago? What's the
show stoppers here?
While the git repo says 7 months ago, the date stamp on the
lxc-0.8.0-rc2 tarball is from July 10, so about 3-1/2 months ago.
Sounds like we've accumulated some features (like the hooks) we are
going to need like months ago to deal with this systemd debacle. How
close are we to either 0.8.0rc3 or 0.8.0? Any blockers or are we just
waiting on some more features?
Daniel has simply been too busy. Stéphane has made a new branch which
cherrypicks 50 bugfixes for 0.8.0, with the remaining patches (about
twice as many) left for 0.9.0. I'm hoping we get 0.8.0 next week :)
Michael H. Warfield
2012-10-26 14:58:25 UTC
Permalink
Post by Serge Hallyn
Post by Michael H. Warfield
Post by Michael H. Warfield
nm... I see it. It's in git and hasn't made it to a release. I'm
working on a git build to test now. If this is something that solves
some of this, we need to move things along here and get these things
moved out. According to git, 0.8.0rc2 was 7 months ago? What's the
show stoppers here?
While the git repo says 7 months ago, the date stamp on the
lxc-0.8.0-rc2 tarball is from July 10, so about 3-1/2 months ago.
Sounds like we've accumulated some features (like the hooks) we are
going to need like months ago to deal with this systemd debacle. How
close are we to either 0.8.0rc3 or 0.8.0? Any blockers or are we just
waiting on some more features?
Daniel has simply been too busy.
Don't I know THAT feeling all too well. Over on the Samba Team (where
I'm the chief security consultant on the team) we're all too busy with
juggling our domain and our web cert. On top of that, I've got my day
job (of course). On top of that, I've got about six other OpenSource
projects I'm juggling (including this one). On top of that, I've got a
consulting customer that's going through fits. And the beat goes on.

I'll test out things as fast as I can. I need this. This suddenly got
very interesting as soon as we had a thread to pick at on the systemd
ball of yarn.
Post by Serge Hallyn
Stéphane has made a new branch which
cherrypicks 50 bugfixes for 0.8.0, with the remaining patches (about
twice as many) left for 0.9.0. I'm hoping we get 0.8.0 next week :)
I'm hoping the hook patches are in that cherry picked basket. We really
need them if that's what it takes to make this work. Looking forward to
it. :-)=)

I'm going to look further into this whole redirect /dev/console to a log
hang thing. That's not good and may need to be resolved soon as well.
I can live with losing the vty's although I disagree with Stéphan's
arguments. They (systemd) are behaving significantly different from
sysvinit and upstart and they claim they want to be transparent? Not.
No matter. We need to make that work properly as well, agree with them
or disagree with them.

Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Michael H. Warfield
2012-10-27 16:45:24 UTC
Permalink
Post by Michael H. Warfield
Adding in the lxc-devel list.
Post by Michael H. Warfield
Post by Michael H. Warfield
Post by Serge Hallyn
Post by Michael H. Warfield
Post by Michael H. Warfield
Hey Serge,
...
Post by Michael H. Warfield
Post by Serge Hallyn
Oh, sorry - I take back that suggestion :)
Note that we have mount hooks, so templates could install a mount hook to
mount a tmpfs onto /dev and populate it.
Ok... I've done some cursory search and turned up nothing but some
comments about "pre mount hooks". Where is the documentation about this
feature and how I might use / implement it? Some examples would
probably suffice. Is there a require release version of lxc-utils?
I'll play with it and report back.
Also the "Lifecycle management hooks" section in
https://help.ubuntu.com/12.10/serverguide/lxc.html
This isn't working...
Based on what was in both of those articles, I added this entry to
another container (Plover) to test...
lxc.hook.mount = /var/lib/lxc/Plover/mount
lxc-start: unknow key lxc.hook.mount
lxc-start: failed to read configuration file
I'm running the latest rc...
lxc-0.8.0.rc2-1.fc16.x86_64
lxc-libs-0.8.0.rc2-1.fc16.x86_64
lxc-doc-0.8.0.rc2-1.fc16.x86_64
Is it something in git that hasn't made it to a release yet?
nm... I see it. It's in git and hasn't made it to a release. I'm
working on a git build to test now. If this is something that solves
some of this, we need to move things along here and get these things
moved out. According to git, 0.8.0rc2 was 7 months ago? What's the
show stoppers here?
While the git repo says 7 months ago, the date stamp on the
lxc-0.8.0-rc2 tarball is from July 10, so about 3-1/2 months ago.
Sounds like we've accumulated some features (like the hooks) we are
going to need like months ago to deal with this systemd debacle. How
close are we to either 0.8.0rc3 or 0.8.0? Any blockers or are we just
waiting on some more features?
Daniel has simply been too busy. Stéphane has made a new branch which
cherrypicks 50 bugfixes for 0.8.0, with the remaining patches (about
twice as many) left for 0.9.0. I'm hoping we get 0.8.0 next week :)
Trying to build latest from git. This is not good...

checking sys/apparmor.h usability... no
checking sys/apparmor.h presence... no
checking for sys/apparmor.h... no
configure: error: You must install the AppArmor development package in
order to compile lxc

What am I suppose to do on Fedora where we don't have that package? Is
it available in another repo somewhere? I'm looking and not finding.

Regards,
Mike
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Lxc-users mailing list
https://lists.sourceforge.net/lists/listinfo/lxc-users
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Michael H. Warfield
2012-10-27 16:53:22 UTC
Permalink
Post by Michael H. Warfield
Post by Michael H. Warfield
Adding in the lxc-devel list.
Post by Michael H. Warfield
Post by Michael H. Warfield
Post by Serge Hallyn
Post by Michael H. Warfield
Post by Michael H. Warfield
Hey Serge,
...
Post by Michael H. Warfield
Post by Serge Hallyn
Oh, sorry - I take back that suggestion :)
Note that we have mount hooks, so templates could install a mount hook to
mount a tmpfs onto /dev and populate it.
Ok... I've done some cursory search and turned up nothing but some
comments about "pre mount hooks". Where is the documentation about this
feature and how I might use / implement it? Some examples would
probably suffice. Is there a require release version of lxc-utils?
I'll play with it and report back.
Also the "Lifecycle management hooks" section in
https://help.ubuntu.com/12.10/serverguide/lxc.html
This isn't working...
Based on what was in both of those articles, I added this entry to
another container (Plover) to test...
lxc.hook.mount = /var/lib/lxc/Plover/mount
lxc-start: unknow key lxc.hook.mount
lxc-start: failed to read configuration file
I'm running the latest rc...
lxc-0.8.0.rc2-1.fc16.x86_64
lxc-libs-0.8.0.rc2-1.fc16.x86_64
lxc-doc-0.8.0.rc2-1.fc16.x86_64
Is it something in git that hasn't made it to a release yet?
nm... I see it. It's in git and hasn't made it to a release. I'm
working on a git build to test now. If this is something that solves
some of this, we need to move things along here and get these things
moved out. According to git, 0.8.0rc2 was 7 months ago? What's the
show stoppers here?
While the git repo says 7 months ago, the date stamp on the
lxc-0.8.0-rc2 tarball is from July 10, so about 3-1/2 months ago.
Sounds like we've accumulated some features (like the hooks) we are
going to need like months ago to deal with this systemd debacle. How
close are we to either 0.8.0rc3 or 0.8.0? Any blockers or are we just
waiting on some more features?
Daniel has simply been too busy. Stéphane has made a new branch which
cherrypicks 50 bugfixes for 0.8.0, with the remaining patches (about
twice as many) left for 0.9.0. I'm hoping we get 0.8.0 next week :)
Trying to build latest from git. This is not good...
checking sys/apparmor.h usability... no
checking sys/apparmor.h presence... no
checking for sys/apparmor.h... no
configure: error: You must install the AppArmor development package in
order to compile lxc
What am I suppose to do on Fedora where we don't have that package? Is
it available in another repo somewhere? I'm looking and not finding.
nm... I see that --enable-apparmor is defaulted to on. I just had to
add an option to --disable-apparmor. Sorry for the noise.
Post by Michael H. Warfield
Regards,
Mike
Mike
Post by Michael H. Warfield
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Lxc-users mailing list
https://lists.sourceforge.net/lists/listinfo/lxc-users
------------------------------------------------------------------------------
WINDOWS 8 is here.
Millions of people. Your app in 30 days.
Visit The Windows 8 Center at Sourceforge for all your go to resources.
http://windows8center.sourceforge.net/
join-generation-app-and-make-money-coding-fast/
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Lennart Poettering
2012-10-25 21:45:30 UTC
Permalink
Post by Serge Hallyn
Post by Michael H. Warfield
Post by Michael H. Warfield
Ok... I've done some cursory search and turned up nothing but some
comments about "pre mount hooks". Where is the documentation about this
feature and how I might use / implement it? Some examples would
probably suffice. Is there a require release version of lxc-utils?
I'll play with it and report back.
Also the "Lifecycle management hooks" section in
https://help.ubuntu.com/12.10/serverguide/lxc.html
Note that I'm thinking that having lxc-start guess how to fill in /dev
is wrong, because different distros and even different releases of the
same distros have different expectations. For instance ubuntu lucid
wants /dev/shm to be a directory, while precise+ wants a symlink. So
somehow the template should get involved, be it by adding a hook, or
simply specifying a configuration file which lxc uses internally to
decide how to create /dev.
/dev/shm can be created/mounted/symlinked by the OS in the
container. This is nothing LXC should care about.

My recommendation for LXC would be to unconditionally pre-mount /dev as
tmpfs, and add exactly the device nodes /dev/null, /dev/zero, /dev/full,
/dev/urandom, /dev/random, /dev/tty, /dev/ptmx to it. That is the
minimal set you need to boot a machine. All further
submounts/symlinks/dirs can be created by the OS boot logic in the
container.

That's what libvirt-lxc and nspawn do, and is what we defined in:

http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface

It would be good if LXC would do the same in order to minimize the
manual user configuration necessary.

Lennart
--
Lennart Poettering - Red Hat, Inc.
Serge Hallyn
2012-10-26 13:38:22 UTC
Permalink
Post by Lennart Poettering
Post by Serge Hallyn
Post by Michael H. Warfield
Post by Michael H. Warfield
Ok... I've done some cursory search and turned up nothing but some
comments about "pre mount hooks". Where is the documentation about this
feature and how I might use / implement it? Some examples would
probably suffice. Is there a require release version of lxc-utils?
I'll play with it and report back.
Also the "Lifecycle management hooks" section in
https://help.ubuntu.com/12.10/serverguide/lxc.html
Note that I'm thinking that having lxc-start guess how to fill in /dev
is wrong, because different distros and even different releases of the
same distros have different expectations. For instance ubuntu lucid
wants /dev/shm to be a directory, while precise+ wants a symlink. So
somehow the template should get involved, be it by adding a hook, or
simply specifying a configuration file which lxc uses internally to
decide how to create /dev.
/dev/shm can be created/mounted/symlinked by the OS in the
container. This is nothing LXC should care about.
My recommendation for LXC would be to unconditionally pre-mount /dev as
tmpfs, and add exactly the device nodes /dev/null, /dev/zero, /dev/full,
/dev/urandom, /dev/random, /dev/tty, /dev/ptmx to it. That is the
minimal set you need to boot a machine. All further
submounts/symlinks/dirs can be created by the OS boot logic in the
container.
I'm thinking we'll do that, optionally. Templates (including fedora
and ubuntu) can simply always set the option to mount and fill /dev.
Others (like busybox and mini-sshd) won't.
Post by Lennart Poettering
http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface
It would be good if LXC would do the same in order to minimize the
manual user configuration necessary.
Lennart
Agreed it simplifies things for full system containers with modern distros.

thanks,
-serge
Lennart Poettering
2012-10-25 21:38:09 UTC
Permalink
Post by Michael H. Warfield
Post by Michael H. Warfield
http://wiki.1tux.org/wiki/Lxc/Installation#Additional_notes
Unfortunately, in our case, merely getting a mount in there is a
complication in that it also has to be populated but, at least, we
understand the problem set now.
Ok... Serge and I were corresponding on the lxc-users list and he had a
suggestion that worked but I consider to be a bit of a sub-optimal
workaround. Ironically, it was to mount devtmpfs on /dev. We don't
(currently) have a method to auto-populate a tmpfs mount with the needed
devices and this provided it. It does have a problem that makes me
uncomfortable in that the container now has visibility into the
hosts /dev system. I'm a security expert and I'm not comfortable with
that "solution" even with the controls we have. We can control access
but still, not happy with that.
That's a pretty bad idea, access control to the device nodes in devtmpfs
is controlled by the host's udev instance. That means if your group/user
lists in the container and the host differ you already lost. Also access
control in udev is dynamic, due to stuff like uaccess and similar. You
really don't want to to have that into the container, i.e. where device
change ownership all the time with UIDs/GIDs that make no sense at all
in the container.

In general I think it's a good idea not to expose any "real" devices to
the container, but only the "virtual" ones that are programming
APIs. That means: no /dev/sda, or /dev/ttyS0, but /dev/null, /dev/zero,
/dev/random, /dev/urandom. And creating the latter in a tmpfs is quite
simple.
Post by Michael H. Warfield
If I run lxc-console (which attaches to one of the vtys) it gives me
nothing. Under sysvinit and upstart I get vty login prompts because
they have started getty on those vtys. This is important in case
network access has not started for one reason or another and the
container was started detached in the background.
The getty behaviour of systemd in containers is documented here:

http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface

If LXC mounts ptys on top of the VT devices that's a really bad idea
too, since /dev/tty1 and friends expose a number of APIs beyond the mere
tty device that you cannot emulate with that. It includes files in /sys,
as well as /dev/vcs and /dev/vcsa, various ioctls, and so on. Heck, even
the most superficial of things, the $TERM variable will be
incorrect. LXC shouldn't do that.

LXC really shouldn't pretent a pty was a VT tty, it's not.
Michael H. Warfield
2012-10-26 00:03:46 UTC
Permalink
Post by Michael H. Warfield
I've got some more problems relating to shutting down containers, some
of which may be related to mounting tmpfs on /run to which /var/run is
symlinked to. We're doing halt / restart detection by monitoring utmp
in that directory but it looks like utmp isn't even in that directory
anymore and mounting tmpfs on it was always problematical. We may have
to have a more generic method to detect when a container has shut down
or is restarting in that case.
I can't parse this. The system call reboot() is virtualized for
containers just fine and the container managaer (i.e. LXC) can check for
that easily.
The problem we have had was with differentiating between reboot and halt
to either shut the container down cold or restarted it. You say
"easily" and yet we never came up with an "easy" solution and monitored
utmp instead for the next runlevel change. What is your "easy" solution
for that problem?
Lennart
--
Lennart Poettering - Red Hat, Inc.
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Serge Hallyn
2012-10-26 01:30:48 UTC
Permalink
Post by Michael H. Warfield
Post by Michael H. Warfield
I've got some more problems relating to shutting down containers, some
of which may be related to mounting tmpfs on /run to which /var/run is
symlinked to. We're doing halt / restart detection by monitoring utmp
in that directory but it looks like utmp isn't even in that directory
anymore and mounting tmpfs on it was always problematical. We may have
to have a more generic method to detect when a container has shut down
or is restarting in that case.
I can't parse this. The system call reboot() is virtualized for
containers just fine and the container managaer (i.e. LXC) can check for
that easily.
The problem we have had was with differentiating between reboot and halt
to either shut the container down cold or restarted it. You say
"easily" and yet we never came up with an "easy" solution and monitored
utmp instead for the next runlevel change. What is your "easy" solution
for that problem?
I think you're on older kernels, where we had to resort to that. Pretty
recently Daniel Lezcano's patch was finally accepted upstream, which lets
a container call reboot() and lets the parent of init tell whether it
called reboot or shutdown by looking at wTERMSIG(status).

-serge
Michael H. Warfield
2012-10-26 02:07:37 UTC
Permalink
Post by Serge Hallyn
Post by Michael H. Warfield
Post by Michael H. Warfield
I've got some more problems relating to shutting down containers, some
of which may be related to mounting tmpfs on /run to which /var/run is
symlinked to. We're doing halt / restart detection by monitoring utmp
in that directory but it looks like utmp isn't even in that directory
anymore and mounting tmpfs on it was always problematical. We may have
to have a more generic method to detect when a container has shut down
or is restarting in that case.
I can't parse this. The system call reboot() is virtualized for
containers just fine and the container managaer (i.e. LXC) can check for
that easily.
The problem we have had was with differentiating between reboot and halt
to either shut the container down cold or restarted it. You say
"easily" and yet we never came up with an "easy" solution and monitored
utmp instead for the next runlevel change. What is your "easy" solution
for that problem?
I think you're on older kernels, where we had to resort to that. Pretty
recently Daniel Lezcano's patch was finally accepted upstream, which lets
a container call reboot() and lets the parent of init tell whether it
called reboot or shutdown by looking at wTERMSIG(status).
Now THAT is wonderful news! I hadn't realized that had been accepted.
So we no longer need to rely on the old utmp kludge?
Post by Serge Hallyn
-serge
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Serge Hallyn
2012-10-26 13:12:35 UTC
Permalink
Post by Michael H. Warfield
Post by Serge Hallyn
Post by Michael H. Warfield
Post by Michael H. Warfield
I've got some more problems relating to shutting down containers, some
of which may be related to mounting tmpfs on /run to which /var/run is
symlinked to. We're doing halt / restart detection by monitoring utmp
in that directory but it looks like utmp isn't even in that directory
anymore and mounting tmpfs on it was always problematical. We may have
to have a more generic method to detect when a container has shut down
or is restarting in that case.
I can't parse this. The system call reboot() is virtualized for
containers just fine and the container managaer (i.e. LXC) can check for
that easily.
The problem we have had was with differentiating between reboot and halt
to either shut the container down cold or restarted it. You say
"easily" and yet we never came up with an "easy" solution and monitored
utmp instead for the next runlevel change. What is your "easy" solution
for that problem?
I think you're on older kernels, where we had to resort to that. Pretty
recently Daniel Lezcano's patch was finally accepted upstream, which lets
a container call reboot() and lets the parent of init tell whether it
called reboot or shutdown by looking at wTERMSIG(status).
Now THAT is wonderful news! I hadn't realized that had been accepted.
So we no longer need to rely on the old utmp kludge?
Yup :) It was very liberating, in terms of what containers can do with
mounting.
Michael H. Warfield
2012-10-26 15:58:33 UTC
Permalink
Post by Lennart Poettering
Post by Michael H. Warfield
Post by Michael H. Warfield
http://wiki.1tux.org/wiki/Lxc/Installation#Additional_notes
Unfortunately, in our case, merely getting a mount in there is a
complication in that it also has to be populated but, at least, we
understand the problem set now.
Ok... Serge and I were corresponding on the lxc-users list and he had a
suggestion that worked but I consider to be a bit of a sub-optimal
workaround. Ironically, it was to mount devtmpfs on /dev. We don't
(currently) have a method to auto-populate a tmpfs mount with the needed
devices and this provided it. It does have a problem that makes me
uncomfortable in that the container now has visibility into the
hosts /dev system. I'm a security expert and I'm not comfortable with
that "solution" even with the controls we have. We can control access
but still, not happy with that.
That's a pretty bad idea, access control to the device nodes in devtmpfs
is controlled by the host's udev instance. That means if your group/user
lists in the container and the host differ you already lost. Also access
control in udev is dynamic, due to stuff like uaccess and similar. You
really don't want to to have that into the container, i.e. where device
change ownership all the time with UIDs/GIDs that make no sense at all
in the container.
Concur.
Post by Lennart Poettering
In general I think it's a good idea not to expose any "real" devices to
the container, but only the "virtual" ones that are programming
APIs. That means: no /dev/sda, or /dev/ttyS0, but /dev/null, /dev/zero,
/dev/random, /dev/urandom. And creating the latter in a tmpfs is quite
simple.
Post by Michael H. Warfield
If I run lxc-console (which attaches to one of the vtys) it gives me
nothing. Under sysvinit and upstart I get vty login prompts because
they have started getty on those vtys. This is important in case
network access has not started for one reason or another and the
container was started detached in the background.
http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface
Sorry. This is unacceptable. We need some way that these will be
active and you will be consistent with other containers.
Post by Lennart Poettering
If LXC mounts ptys on top of the VT devices that's a really bad idea
too, since /dev/tty1 and friends expose a number of APIs beyond the mere
tty device that you cannot emulate with that. It includes files in /sys,
as well as /dev/vcs and /dev/vcsa, various ioctls, and so on. Heck, even
the most superficial of things, the $TERM variable will be
incorrect. LXC shouldn't do that.
REGARDLESS. I'm in this situation now testing what I thought was a hang
condition (which is proving to be something else). I started a
container detached redirecting the console to a file (a parameter I was
missing) and the log to another file (which I had been doing). But, for
some reason, sshd is not starting up. I have no way to attach to the
bloody console of the container and I have no getty's on a vty I can
attach to using lxc-console and I can't remote access a container which,
for all other intents and purposes, appears to be running fine.
Parameterize this bloody thing so we can have control over it.
Michael H. Warfield
2012-10-26 17:18:05 UTC
Permalink
Post by Michael H. Warfield
Post by Lennart Poettering
Post by Michael H. Warfield
Post by Michael H. Warfield
http://wiki.1tux.org/wiki/Lxc/Installation#Additional_notes
Unfortunately, in our case, merely getting a mount in there is a
complication in that it also has to be populated but, at least, we
understand the problem set now.
Ok... Serge and I were corresponding on the lxc-users list and he had a
suggestion that worked but I consider to be a bit of a sub-optimal
workaround. Ironically, it was to mount devtmpfs on /dev. We don't
(currently) have a method to auto-populate a tmpfs mount with the needed
devices and this provided it. It does have a problem that makes me
uncomfortable in that the container now has visibility into the
hosts /dev system. I'm a security expert and I'm not comfortable with
that "solution" even with the controls we have. We can control access
but still, not happy with that.
That's a pretty bad idea, access control to the device nodes in devtmpfs
is controlled by the host's udev instance. That means if your group/user
lists in the container and the host differ you already lost. Also access
control in udev is dynamic, due to stuff like uaccess and similar. You
really don't want to to have that into the container, i.e. where device
change ownership all the time with UIDs/GIDs that make no sense at all
in the container.
Concur.
Post by Lennart Poettering
In general I think it's a good idea not to expose any "real" devices to
the container, but only the "virtual" ones that are programming
APIs. That means: no /dev/sda, or /dev/ttyS0, but /dev/null, /dev/zero,
/dev/random, /dev/urandom. And creating the latter in a tmpfs is quite
simple.
Post by Michael H. Warfield
If I run lxc-console (which attaches to one of the vtys) it gives me
nothing. Under sysvinit and upstart I get vty login prompts because
they have started getty on those vtys. This is important in case
network access has not started for one reason or another and the
container was started detached in the background.
http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface
Sorry. This is unacceptable. We need some way that these will be
active and you will be consistent with other containers.
Post by Lennart Poettering
If LXC mounts ptys on top of the VT devices that's a really bad idea
too, since /dev/tty1 and friends expose a number of APIs beyond the mere
tty device that you cannot emulate with that. It includes files in /sys,
as well as /dev/vcs and /dev/vcsa, various ioctls, and so on. Heck, even
the most superficial of things, the $TERM variable will be
incorrect. LXC shouldn't do that.
REGARDLESS. I'm in this situation now testing what I thought was a hang
condition (which is proving to be something else). I started a
container detached redirecting the console to a file (a parameter I was
missing) and the log to another file (which I had been doing). But, for
some reason, sshd is not starting up. I have no way to attach to the
bloody console of the container and I have no getty's on a vty I can
attach to using lxc-console and I can't remote access a container which,
for all other intents and purposes, appears to be running fine.
Parameterize this bloody thing so we can have control over it.
Here's another weirdism that's in your camp...

The reason that sshd did not start was because the network did not start
(IPv6 was up but IPv4 was not and the startup of several services failed
as a consequence). Trying to restart the network manually resulted in
this:

[***@alcove mhw]# ifdown eth0
./network-functions: line 237: cd: /var/run/netreport: No such file or directory
[***@alcove mhw]# ifup eth0
./network-functions: line 237: cd: /var/run/netreport: No such file or directory
[***@alcove mhw]# ls /var/run/
dbus messagebus.pid rpcbind.sock systemd user
log mount syslogd.pid udev

What the hell is this? /var/run is symlinked to /run and is mounted
with a tmpfs.

So I created that directory and could ifup the the network and start
sshd. So I did a little check on the run levels... Hmmm... F17
container (Alcove) in an F17 host (Forest). WHAT is going ON here? Is
this why the network didn't start?

[***@forest mhw]# runlevel
N 5

[***@alcove mhw]# runlevel
unknown

[***@alcove mhw]# chkconfig

Note: This output shows SysV services only and does not include native
systemd services. SysV configuration data might be overridden by native
systemd configuration.

modules_dep 0:off 1:off 2:on 3:on 4:on 5:on 6:off
netconsole 0:off 1:off 2:off 3:off 4:off 5:off 6:off
network 0:off 1:off 2:off 3:on 4:off 5:off 6:off


Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Colin Guthrie
2012-10-27 18:44:37 UTC
Permalink
'Twas brillig, and Michael H. Warfield at 26/10/12 18:18 did gyre and
Post by Michael H. Warfield
What the hell is this? /var/run is symlinked to /run and is mounted
with a tmpfs.
Yup, that's how /var/run and /run is being handled these days.

It provides a consistent space to pass info from the initrd over to the
main system and has various other uses also.

If you want to ensure files are created in this folder, just drop a
config file in to /usr/lib/tmpfiles.d/ in the package in question. See
man systemd-tmpfiles for more info.

Could be some packages are not fully upgraded to this concept in F17. As
a non-fedora user, I can't really comment on that specifically.

Col
--
Colin Guthrie
gmane(at)colin.guthr.ie
http://colin.guthr.ie/

Day Job:
Tribalogic Limited http://www.tribalogic.net/
Open Source:
Mageia Contributor http://www.mageia.org/
PulseAudio Hacker http://www.pulseaudio.org/
Trac Hacker http://trac.edgewall.org/
Michael H. Warfield
2012-10-27 19:37:54 UTC
Permalink
Post by Colin Guthrie
'Twas brillig, and Michael H. Warfield at 26/10/12 18:18 did gyre and
Post by Michael H. Warfield
What the hell is this? /var/run is symlinked to /run and is mounted
with a tmpfs.
Yup, that's how /var/run and /run is being handled these days.
It provides a consistent space to pass info from the initrd over to the
main system and has various other uses also.
Interesting. I hadn't considered that aspect of it before. Very
interesting.
Post by Colin Guthrie
If you want to ensure files are created in this folder, just drop a
config file in to /usr/lib/tmpfiles.d/ in the package in question. See
man systemd-tmpfiles for more info.
NOW THAT is something else I needed to know about! Thank you very very
much! Learned something new. This whole thing has been a massive
learning experience getting this container kick started.
Post by Colin Guthrie
Could be some packages are not fully upgraded to this concept in F17. As
a non-fedora user, I can't really comment on that specifically.
As it turns out, the kernel has had some of our patches applied that I
wasn't aware of vis-a-vis reboot/halt and this should no longer be an
issue. I'm still struggling with the tmpfs on /dev thing and have run
into a catch-22 with regards to that. I can mount tmpfs on /dev just
fine and can populate it just fine in a post mount hook but, then, we're
trying to mount a devpts file system on /dev/pts before we've had a
chance to populate it and it's then crashing on the mount. Sigh... I
think that's going to now have to wait for Serge or Daniel to comment
on.
Post by Colin Guthrie
Col
--
Colin Guthrie
gmane(at)colin.guthr.ie
http://colin.guthr.ie/
Tribalogic Limited http://www.tribalogic.net/
Mageia Contributor http://www.mageia.org/
PulseAudio Hacker http://www.pulseaudio.org/
Trac Hacker http://trac.edgewall.org/
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Michael H. Warfield
2012-10-26 16:11:42 UTC
Permalink
Post by Michael H. Warfield
I SUSPECT the hang condition is something to do with systemd trying to
start and interactive console on /dev/console, which sysvinit and
upstart do not do.
Yes, this is documented, please see the link I already posted, and which
I linked above a second time.
This may have been my fault. I was using the -o option to lxc-start
(output logfile) and failed to specify the -c (console output redirect)
option. It seems to fire up nicely (albeit with other problems) with
that additional option. Continuing my research.
Post by Michael H. Warfield
I've got some more problems relating to shutting down containers, some
of which may be related to mounting tmpfs on /run to which /var/run is
symlinked to. We're doing halt / restart detection by monitoring utmp
in that directory but it looks like utmp isn't even in that directory
anymore and mounting tmpfs on it was always problematical. We may have
to have a more generic method to detect when a container has shut down
or is restarting in that case.
I can't parse this. The system call reboot() is virtualized for
containers just fine and the container managaer (i.e. LXC) can check for
that easily.
Apparently, in recent kernels, we can. Unfortunately, I'm still finding
that I can not restart a container I have previously halted. I have no
problem with sysvinit and upstart systems on this host, so it is a
container problem peculiar to systemd containers. Continuing to
research that problem.
Lennart
--
Lennart Poettering - Red Hat, Inc.
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Michael H. Warfield
2012-10-26 21:02:00 UTC
Permalink
Post by Michael H. Warfield
Post by Michael H. Warfield
I SUSPECT the hang condition is something to do with systemd trying to
start and interactive console on /dev/console, which sysvinit and
upstart do not do.
Yes, this is documented, please see the link I already posted, and which
I linked above a second time.
This may have been my fault. I was using the -o option to lxc-start
(output logfile) and failed to specify the -c (console output redirect)
option. It seems to fire up nicely (albeit with other problems) with
that additional option. Continuing my research.
Confirming. Using the -c option for the console file works.
Unfortunately, thanks to no getty's on the ttys so lxc-console does not
work and no way to connect to that console redirect and the failure of
the network to start, I'm still trying to figure out just what is face
planting in a container I can not access. :-/=/ Punch out the punch
list one PUNCH at at time here.
Post by Michael H. Warfield
Post by Michael H. Warfield
I've got some more problems relating to shutting down containers, some
of which may be related to mounting tmpfs on /run to which /var/run is
symlinked to. We're doing halt / restart detection by monitoring utmp
in that directory but it looks like utmp isn't even in that directory
anymore and mounting tmpfs on it was always problematical. We may have
to have a more generic method to detect when a container has shut down
or is restarting in that case.
I can't parse this. The system call reboot() is virtualized for
containers just fine and the container managaer (i.e. LXC) can check for
that easily.
Apparently, in recent kernels, we can. Unfortunately, I'm still finding
that I can not restart a container I have previously halted. I have no
problem with sysvinit and upstart systems on this host, so it is a
container problem peculiar to systemd containers. Continuing to
research that problem.
Lennart
--
Lennart Poettering - Red Hat, Inc.
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Michael H. Warfield
2012-10-26 16:36:48 UTC
Permalink
Post by Michael H. Warfield
I've got some more problems relating to shutting down containers, some
of which may be related to mounting tmpfs on /run to which /var/run is
symlinked to. We're doing halt / restart detection by monitoring utmp
in that directory but it looks like utmp isn't even in that directory
anymore and mounting tmpfs on it was always problematical. We may have
to have a more generic method to detect when a container has shut down
or is restarting in that case.
I can't parse this. The system call reboot() is virtualized for
containers just fine and the container managaer (i.e. LXC) can check for
that easily.
I strongly suspect that the condition I'm dealing with (not being able
to restart the container) is an artifact of the devtmpfs kludge. I'm
seeing some errors relating to /dev/loop* busy that seems to be related
to the hung resources resulting in the inability to remove the zombie
container. Disregard until I can get further information following a
switch to a template based setup.
Lennart
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Michael H. Warfield
2012-11-06 16:07:10 UTC
Permalink
Post by Lennart Poettering
Note that there are reports that LXC has issues with the fact that newer
systemd enables shared mount propagation for all mounts by default (this
should actually be beneficial for containers as this ensures that new
mounts appear in the containers). LXC when run on such a system fails as
soon as it tries to use pivot_root(), as that is incompatible with
shared mount propagation. The needs fixing in LXC: it should use MS_MOVE
or MS_BIND to place the new root dir in / instead. A short term
work-around is to simply remount the root tree to private before
invoking LXC.
In another thread, Serge had some heartburn over this shared mount
propagation which then rang a bell in my head about past problems we
have seen.
Post by Lennart Poettering
...
This was from another threat with the systemd guys.
Post by Lennart Poettering
Note that there are reports that LXC has issues with the fact that newer
systemd enables shared mount propagation for all mounts by default (this
should actually be beneficial for containers as this ensures that new
mounts appear in the containers). LXC when run on such a system fails
MS_SLAVE does this as well. MS_SHARED means container mounts also
propagate into the host, which is less desirable in most cases.
Here's where we've seen some problems in the past. It's not just mounts
that are propagated but remounts as well. The problem arose that some
of us had our containers on a separate partition. When we would shut a
container down, that container tried to remount its file systems ro
which then propagated back into the host causing the hosts file system
to be ro (doesn't happen if you are running on the host's root fs for
the containers) and from there across into the other containers.

Are you using MS_SHARED or MS_SLAVE for this? If you are using
MS_SHARED do you create a potential security problem where actions in
the container can bleed into the state of the host and into other
containers. That's highly undesirable. If a mount in a propagates back
into the host and is then reflected to another container sharing that
same mount tree (I have shared partitions specific to that sort of
thing) does that create an information disclosure situation of one
container mounts a new file system and the other container sees the new
mount? I don't know if the mount propagation would reflect back up the
shared tree or not but I have certainly seen remounts do this. I don't
see that as desirable. Maybe I'm misunderstand how this is suppose to
work but I intend to test out those scenarios when I have a chance. I
do know that, when testing that ro problem, I was able to remount a
partition ro in one container and it would switch in the host and the
other container and I could the remount it rw in the other container and
have it propagate back. Not good.

Can you offer any clarity on this?
Post by Lennart Poettering
Post by Lennart Poettering
as
soon as it tries to use pivot_root(), as that is incompatible with
shared mount propagation. The needs fixing in LXC: it should use MS_MOVE
or MS_BIND to place the new root dir in / instead. A short term
Actually not quite sure how this would work. It should be possible
to set up a set of conditions to work around this, but the kernel
checks at do_pivotroot are pretty harsh - mnt->mnt_parent of both
the new root and current root have to be not shared. So perhaps
we actually first chroot into a dir whose parent is non-shared,
then pivot_root from there? :)
(Simple chroot in place of pivot_root still does not suffice, not
only because of chroot escapes, but also different results in
/proc/pid/mountinfo and friends)
Comments on Serge's points?

At this point, we see where this will become problematical in Fedora 18
but appears to already be problematical in NixOS that another user is
running and which containers systemd 195 in the host.

We've had problems with chroot in the past due to chroot escapes and
other problems years ago as Serge mentioned.
Post by Lennart Poettering
Lennart
--
Lennart Poettering - Red Hat, Inc.
Regards,
Mike
--
Michael H. Warfield (AI4NB) | (770) 985-6132 | ***@WittsEnd.com
/\/\|=mhw=|\/\/ | (678) 463-0932 | http://www.wittsend.com/mhw/
NIC whois: MHW9 | An optimist believes we live in the best of all
PGP Key: 0x674627FF | possible worlds. A pessimist is sure of it!
Lennart Poettering
2012-11-09 23:52:13 UTC
Permalink
Post by Michael H. Warfield
Here's where we've seen some problems in the past. It's not just mounts
that are propagated but remounts as well. The problem arose that some
of us had our containers on a separate partition. When we would shut a
container down, that container tried to remount its file systems ro
which then propagated back into the host causing the hosts file system
to be ro (doesn't happen if you are running on the host's root fs for
the containers) and from there across into the other containers.
Are you using MS_SHARED or MS_SLAVE for this? If you are using
MS_SHARED do you create a potential security problem where actions in
the container can bleed into the state of the host and into other
containers. That's highly undesirable.
The root namespace is MS_SHARED, and nspawn and libvirt-lxc containers
are MS_SLAVE. That ensures mounts from the host propagate to the
containers but not vice versa.
Post by Michael H. Warfield
Post by Lennart Poettering
Post by Lennart Poettering
as
soon as it tries to use pivot_root(), as that is incompatible with
shared mount propagation. The needs fixing in LXC: it should use MS_MOVE
or MS_BIND to place the new root dir in / instead. A short term
Actually not quite sure how this would work. It should be possible
to set up a set of conditions to work around this, but the kernel
checks at do_pivotroot are pretty harsh - mnt->mnt_parent of both
the new root and current root have to be not shared. So perhaps
we actually first chroot into a dir whose parent is non-shared,
then pivot_root from there? :)
(Simple chroot in place of pivot_root still does not suffice, not
only because of chroot escapes, but also different results in
/proc/pid/mountinfo and friends)
Comments on Serge's points?
Don't use privot_root. Instead use MS_MOVE to move the container root to
/.
Post by Michael H. Warfield
At this point, we see where this will become problematical in Fedora 18
but appears to already be problematical in NixOS that another user is
running and which containers systemd 195 in the host.
THere's nothing really problematical with this. LXC should stop using
pivot_root, and use MS_MOVE instead.
Post by Michael H. Warfield
We've had problems with chroot in the past due to chroot escapes and
other problems years ago as Serge mentioned.
chroot() is not useful for this. You should invoke chroot() once, to
fix chroot after adjusting the namespace, but that's not the call that
actually shifts the namespace around. That should be done with MS_MOVE.

The code should like this:

http://cgit.freedesktop.org/systemd/systemd/tree/src/nspawn/nspawn.c#n1264

Lennart
--
Lennart Poettering - Red Hat, Inc.
Continue reading on narkive:
Loading...