Discussion:
[PATCH] timesyncd: tighten unit file
(too old to reply)
Topi Miettinen
2015-01-25 10:23:02 UTC
Permalink
There's no need for CAP_CHOWN, CAP_DAC_OVERRIDE or CAP_FOWNER.

No new privileges are needed, especially no setuid fixups are expected.

Device policy can be closed, timesyncd does not access any devices.

Timesyncd only needs write access to /var/lib/systemd/clock. There's no
need to access /boot nor most API filesystems.

Install a system call filter, generated with 'strace -c'.

Only one additional process is needed.

Mounts should not propagate back, so set the MountFlags to slave
(actually default since we use e.g. PrivateTmp).
---
units/systemd-timesyncd.service.in | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/units/systemd-timesyncd.service.in b/units/systemd-timesyncd.service.in
index 39edafc..ef09f05 100644
--- a/units/systemd-timesyncd.service.in
+++ b/units/systemd-timesyncd.service.in
@@ -22,12 +22,21 @@ Type=notify
Restart=always
RestartSec=0
ExecStart=@rootlibexecdir@/systemd-timesyncd
-CapabilityBoundingSet=CAP_SYS_TIME CAP_SETUID CAP_SETGID CAP_SETPCAP CAP_CHOWN CAP_DAC_OVERRIDE CAP_FOWNER
+CapabilityBoundingSet=CAP_SYS_TIME CAP_SETUID CAP_SETGID CAP_SETPCAP
+NoNewPrivileges=true
+SecureBits=no-setuid-fixup no-setuid-fixup-locked
PrivateTmp=yes
PrivateDevices=yes
+DevicePolicy=closed
ProtectSystem=full
ProtectHome=yes
+InaccessibleDirectories=/dev/pts /dev/shm /dev/mqueue /dev/hugepages /boot /sys
+ReadOnlyDirectories=/
+ReadWriteDirectories=/var/lib/systemd/clock
WatchdogSec=1min
+SystemCallFilter=recvfrom clock_gettime prctl read open close stat fstat poll lseek mmap mprotect munmap brk rt_sigaction rt_sigprocmask ioctl access madvise socket connect sendto sendmsg recvmsg bind getsockname socketpair setsockopt getsockopt clone fcntl umask getrlimit setgroups setresuid setresgid capget capset arch_prctl gettid futex set_tid_address epoll_wait epoll_ctl inotify_add_watch set_robust_list utimensat timerfd_create timerfd_settime signalfd4 epoll_create1 inotify_init1 clock_adjtime sendmmsg
+LimitNPROC=2
+MountFlags=slave

[Install]
WantedBy=sysinit.target
--
2.1.4
Lennart Poettering
2015-01-27 01:54:11 UTC
Permalink
Post by Topi Miettinen
There's no need for CAP_CHOWN, CAP_DAC_OVERRIDE or CAP_FOWNER.
Hmm, that's not true, is it? load_clock_timestamp() is invoked before
we drop privs in the daemon. And it certainly calls fchmod() and
fchown(), so that it can later on still access the clock touch file.

What am I missing?
Post by Topi Miettinen
No new privileges are needed, especially no setuid fixups are
expected.
Yeah, we should probably set this for most of our daemons, not just
timesyncd.

I am not entirely sure how NoNewPrivileges= and "no-setuid-fixup"
actually relate to each other. I.e. what does one do that the other
doesn't? And if they do different things, isn't NoNewPrivileges= kinda
a superset of no-setuid-fixup, and thus makes setting the latter
redundant?

Reading through Documentation/prctl/no_new_privs.txt in the
kernel sources I found this paragraph:

"Be careful, though: LSMs might also not tighten constraints on
exec in no_new_privs mode. (This means that setting up a
general-purpose service launcher to set no_new_privs before
execing daemons may interfere with LSM-based sandboxing.)"

This kinda suggests that SELinux and friends might not like
NoNewPrivileges=, hence I am not entirely sure it is really the right
option for us.

Hence I am wondering if "no-setuid-fixup" (and -locked) might be the
better option for us to enable everywhere?
Post by Topi Miettinen
Device policy can be closed, timesyncd does not access any devices.
Note that we set PrivateDevices=yes already, which implies
DevicePolicy=closed.
Post by Topi Miettinen
Timesyncd only needs write access to /var/lib/systemd/clock. There's no
need to access /boot nor most API filesystems.
Well, /boot is already covered by ProtectSystem=full actually (but
this wasn't documented, admittedly. Updated the man page now in gate.)

I am a bit concerned about listing by default all these directories in
the unit file. I mean, the reason why ProtectSystem=, ProtectHome= and
suchlike exists is precisely to break this down to a few booleans (or
boolean-like enums), instead of listing all directories in all unit
files all the time. I mean, it's great if people do this on their
local systems, but to make this palatable generically upstream I
created ProtectSystem= and ProtectHome= to hide the directory lists
away.

The problem with these lists is that we need to be really conservative
with them, since many of the libs we use (including glibc) use
different interface "behind our back", and thus writing generally
valid rules, that work on all archs, on all distros, on all
glibc/libary versions is hard. This is what SELinux policy does, but I
*really* didn't want to create another implementation of such a
finegrained policy, if you follow what I mean.
Post by Topi Miettinen
Install a system call filter, generated with 'strace -c'.
Hmm, I am a bit concerned about this part, I am not sure we can really
do this upstream. Different glibc and other library versions we use
invoke different syscalls. Moreover, different archs use different
syscalls, too. This makes it really difficult putting together syscall
filters upsteram that work generically, on all downstream versions.

I think SystemCallFilter= is something that end-users can use on their
systems, but I am not sure we can use this upstream. :-(
Post by Topi Miettinen
Only one additional process is needed.
Hmm, LimitNPROC=2 unfortunately doesn't do what we want it to do:
since the daemon drops the to the systemd-timesync user on its own,
PID 1 invokes it as root, not as the the systemd-timesync user, which
makes the excercise pointless... We should probably handle this better
though, and warn, instead of silently accepting it. Added that to the
TODO list now.

I have now changed the timesyncd code to do the right thing on its own
after dropping privs. During runtime RLIMIT_NPROC=2 should now be set.
Post by Topi Miettinen
Mounts should not propagate back, so set the MountFlags to slave
(actually default since we use e.g. PrivateTmp).
This is implied by PrivateTmp=, indeed, hence unnecessary.

Lennart
--
Lennart Poettering, Red Hat
Topi Miettinen
2015-01-27 19:47:49 UTC
Permalink
Post by Lennart Poettering
Post by Topi Miettinen
There's no need for CAP_CHOWN, CAP_DAC_OVERRIDE or CAP_FOWNER.
Hmm, that's not true, is it? load_clock_timestamp() is invoked before
we drop privs in the daemon. And it certainly calls fchmod() and
fchown(), so that it can later on still access the clock touch file.
What am I missing?
It works for me because the file exists already with correct owner and
permissions. I'd put CAP_CHOWN and CAP_FOWNER back for now.

Maybe the clock file could reside in a separate subdirectory, then the
directory owner and permissions could be set up already at installation?
Post by Lennart Poettering
Post by Topi Miettinen
No new privileges are needed, especially no setuid fixups are
expected.
Yeah, we should probably set this for most of our daemons, not just
timesyncd.
I am not entirely sure how NoNewPrivileges= and "no-setuid-fixup"
actually relate to each other. I.e. what does one do that the other
doesn't? And if they do different things, isn't NoNewPrivileges= kinda
a superset of no-setuid-fixup, and thus makes setting the latter
redundant?
I think NoNewPrivileges (prctl(PR_SET_NO_NEW_PRIVS)) prevents any kind
of uid/gid changes, while no-setuid-fixup does not restrict uid change
but does not elevate the capability set.
Post by Lennart Poettering
Reading through Documentation/prctl/no_new_privs.txt in the
"Be careful, though: LSMs might also not tighten constraints on
exec in no_new_privs mode. (This means that setting up a
general-purpose service launcher to set no_new_privs before
execing daemons may interfere with LSM-based sandboxing.)"
This kinda suggests that SELinux and friends might not like
NoNewPrivileges=, hence I am not entirely sure it is really the right
option for us.
Hence I am wondering if "no-setuid-fixup" (and -locked) might be the
better option for us to enable everywhere?
I don't think timesyncd is executing any helper programs, especially
set-uid or with file system capabilities, so both should work fine.
Post by Lennart Poettering
Post by Topi Miettinen
Device policy can be closed, timesyncd does not access any devices.
Note that we set PrivateDevices=yes already, which implies
DevicePolicy=closed.
Post by Topi Miettinen
Timesyncd only needs write access to /var/lib/systemd/clock. There's no
need to access /boot nor most API filesystems.
Well, /boot is already covered by ProtectSystem=full actually (but
this wasn't documented, admittedly. Updated the man page now in gate.)
I am a bit concerned about listing by default all these directories in
the unit file. I mean, the reason why ProtectSystem=, ProtectHome= and
suchlike exists is precisely to break this down to a few booleans (or
boolean-like enums), instead of listing all directories in all unit
files all the time. I mean, it's great if people do this on their
local systems, but to make this palatable generically upstream I
created ProtectSystem= and ProtectHome= to hide the directory lists
away.
The problem with these lists is that we need to be really conservative
with them, since many of the libs we use (including glibc) use
different interface "behind our back", and thus writing generally
valid rules, that work on all archs, on all distros, on all
glibc/libary versions is hard. This is what SELinux policy does, but I
*really* didn't want to create another implementation of such a
finegrained policy, if you follow what I mean.
I'm not sure. Shouldn't we then ship a SELinux policy file then? Would
you be interested in AppArmor profile for timesyncd, I have one? Also,
if a distro uses weird SELinux policies or setuid helpers at every
possible opportunity, shouldn't they have some responsibility of fixing
their setup?
Post by Lennart Poettering
Post by Topi Miettinen
Install a system call filter, generated with 'strace -c'.
Hmm, I am a bit concerned about this part, I am not sure we can really
do this upstream. Different glibc and other library versions we use
invoke different syscalls. Moreover, different archs use different
syscalls, too. This makes it really difficult putting together syscall
filters upsteram that work generically, on all downstream versions.
I think SystemCallFilter= is something that end-users can use on their
systems, but I am not sure we can use this upstream. :-(
Right, this was generated on x86_64.
Post by Lennart Poettering
Post by Topi Miettinen
Only one additional process is needed.
since the daemon drops the to the systemd-timesync user on its own,
PID 1 invokes it as root, not as the the systemd-timesync user, which
makes the excercise pointless... We should probably handle this better
though, and warn, instead of silently accepting it. Added that to the
TODO list now.
This probably applies to other limits too?
Post by Lennart Poettering
I have now changed the timesyncd code to do the right thing on its own
after dropping privs. During runtime RLIMIT_NPROC=2 should now be set.
Post by Topi Miettinen
Mounts should not propagate back, so set the MountFlags to slave
(actually default since we use e.g. PrivateTmp).
This is implied by PrivateTmp=, indeed, hence unnecessary.
Lennart
Lennart Poettering
2015-01-27 21:16:22 UTC
Permalink
Post by Topi Miettinen
Post by Lennart Poettering
Post by Topi Miettinen
There's no need for CAP_CHOWN, CAP_DAC_OVERRIDE or CAP_FOWNER.
Hmm, that's not true, is it? load_clock_timestamp() is invoked before
we drop privs in the daemon. And it certainly calls fchmod() and
fchown(), so that it can later on still access the clock touch file.
What am I missing?
It works for me because the file exists already with correct owner and
permissions. I'd put CAP_CHOWN and CAP_FOWNER back for now.
Well, but this also means CAP_DAC_OVERRIDE is required, as otherwise
we won't be able to access the file while we are still root, and have
not dropped privileges anymore.

Note that timesyncd actually drops all remaining caps when changng
to the "systemd-timesync" user, with the exception of
CAP_SYS_TIME. This means that the caps configured in the unit file
don't matter too much as they only are in effect for the short time
when timesyncd is initializing and hasn't dropped all the remaining
caps except for CAP_SYS_TIME yet.
Post by Topi Miettinen
Maybe the clock file could reside in a separate subdirectory, then the
directory owner and permissions could be set up already at
installation?
Well, to enable stateless systems I think it is a good idea to write
services in a way that they can rebuild what they need in /var on
their own, should it be missing, so that /var can be flushed out and
things will just work when the machine is then booted again.

All our daemons are written in a style where /var is reconstructed
implicitly if it is missing, and timesyncd really should work the same
way.
Post by Topi Miettinen
Post by Lennart Poettering
I am not entirely sure how NoNewPrivileges= and "no-setuid-fixup"
actually relate to each other. I.e. what does one do that the other
doesn't? And if they do different things, isn't NoNewPrivileges= kinda
a superset of no-setuid-fixup, and thus makes setting the latter
redundant?
I think NoNewPrivileges (prctl(PR_SET_NO_NEW_PRIVS)) prevents any kind
of uid/gid changes, while no-setuid-fixup does not restrict uid change
but does not elevate the capability set.
The PR_SET_NO_NEW_PRIVS docs say explicitly it only applies to
execve().

To be honest, I am still very puzzled about this. The docs are awful
about this...
Post by Topi Miettinen
Post by Lennart Poettering
Reading through Documentation/prctl/no_new_privs.txt in the
"Be careful, though: LSMs might also not tighten constraints on
exec in no_new_privs mode. (This means that setting up a
general-purpose service launcher to set no_new_privs before
execing daemons may interfere with LSM-based sandboxing.)"
This kinda suggests that SELinux and friends might not like
NoNewPrivileges=, hence I am not entirely sure it is really the right
option for us.
Hence I am wondering if "no-setuid-fixup" (and -locked) might be the
better option for us to enable everywhere?
I don't think timesyncd is executing any helper programs, especially
set-uid or with file system capabilities, so both should work fine.
Well, the way I read the paragraph above if we set PR_SET_NO_NEW_PRIVS
after forking, but before execing systemd-timesyncd, and that binary
has an selinux label on it, that the selinux label would be ignore,
and things would continue to run with the same label as we forked
off. This pretty much renders SELinux usesless, since all services
where we set the bit for would then run as "init_t"... and that's
something we really shouldn't do.
Post by Topi Miettinen
Post by Lennart Poettering
The problem with these lists is that we need to be really conservative
with them, since many of the libs we use (including glibc) use
different interface "behind our back", and thus writing generally
valid rules, that work on all archs, on all distros, on all
glibc/libary versions is hard. This is what SELinux policy does, but I
*really* didn't want to create another implementation of such a
finegrained policy, if you follow what I mean.
I'm not sure. Shouldn't we then ship a SELinux policy file then? Would
you be interested in AppArmor profile for timesyncd, I have one? Also,
if a distro uses weird SELinux policies or setuid helpers at every
possible opportunity, shouldn't they have some responsibility of fixing
their setup?
Well, SELinux policy is managed in a central selinux policy database
that is shipped in one big RPM. My selinux-fu is not good enough to
maintain the policy file in systemd, and i am not sure this even is
generic enough to be able to ship the same selinux policy that works
across all distros that do selinux.

If Apparmor policies are standardized and stand-alone enough, and
there's a clear way to install them, and you are willing to look after
it, then yes, I'd merge a patch that adds apparmor profiles to systemd
upstream.
Post by Topi Miettinen
Post by Lennart Poettering
since the daemon drops the to the systemd-timesync user on its own,
PID 1 invokes it as root, not as the the systemd-timesync user, which
makes the excercise pointless... We should probably handle this better
though, and warn, instead of silently accepting it. Added that to the
TODO list now.
This probably applies to other limits too?
RLIMIT_NPROC is kinda special. While the other resource limits are
per-process, this one is actual per-user. This means that if we apply
the other limits to the process we forked off, where we will invoke
execve() in next, then this will stay locl to that process and be
properly inherited into the final daemon.

Or in other words: the other resource limits are exposed nicely in
systemd, and are already useful. It's just RLIMIT_NPROC that is a bit
useless currently...

Lennart
--
Lennart Poettering, Red Hat
Topi Miettinen
2015-01-27 21:58:20 UTC
Permalink
Post by Lennart Poettering
Post by Topi Miettinen
Post by Lennart Poettering
Post by Topi Miettinen
There's no need for CAP_CHOWN, CAP_DAC_OVERRIDE or CAP_FOWNER.
Hmm, that's not true, is it? load_clock_timestamp() is invoked before
we drop privs in the daemon. And it certainly calls fchmod() and
fchown(), so that it can later on still access the clock touch file.
What am I missing?
It works for me because the file exists already with correct owner and
permissions. I'd put CAP_CHOWN and CAP_FOWNER back for now.
Well, but this also means CAP_DAC_OVERRIDE is required, as otherwise
we won't be able to access the file while we are still root, and have
not dropped privileges anymore.
Note that timesyncd actually drops all remaining caps when changng
to the "systemd-timesync" user, with the exception of
CAP_SYS_TIME. This means that the caps configured in the unit file
don't matter too much as they only are in effect for the short time
when timesyncd is initializing and hasn't dropped all the remaining
caps except for CAP_SYS_TIME yet.
Post by Topi Miettinen
Maybe the clock file could reside in a separate subdirectory, then the
directory owner and permissions could be set up already at
installation?
Well, to enable stateless systems I think it is a good idea to write
services in a way that they can rebuild what they need in /var on
their own, should it be missing, so that /var can be flushed out and
things will just work when the machine is then booted again.
All our daemons are written in a style where /var is reconstructed
implicitly if it is missing, and timesyncd really should work the same
way.
Nice, but the directories could be created with tmpfiles.d then?
Post by Lennart Poettering
Post by Topi Miettinen
Post by Lennart Poettering
I am not entirely sure how NoNewPrivileges= and "no-setuid-fixup"
actually relate to each other. I.e. what does one do that the other
doesn't? And if they do different things, isn't NoNewPrivileges= kinda
a superset of no-setuid-fixup, and thus makes setting the latter
redundant?
I think NoNewPrivileges (prctl(PR_SET_NO_NEW_PRIVS)) prevents any kind
of uid/gid changes, while no-setuid-fixup does not restrict uid change
but does not elevate the capability set.
The PR_SET_NO_NEW_PRIVS docs say explicitly it only applies to
execve().
To be honest, I am still very puzzled about this. The docs are awful
about this...
Post by Topi Miettinen
Post by Lennart Poettering
Reading through Documentation/prctl/no_new_privs.txt in the
"Be careful, though: LSMs might also not tighten constraints on
exec in no_new_privs mode. (This means that setting up a
general-purpose service launcher to set no_new_privs before
execing daemons may interfere with LSM-based sandboxing.)"
This kinda suggests that SELinux and friends might not like
NoNewPrivileges=, hence I am not entirely sure it is really the right
option for us.
Hence I am wondering if "no-setuid-fixup" (and -locked) might be the
better option for us to enable everywhere?
I don't think timesyncd is executing any helper programs, especially
set-uid or with file system capabilities, so both should work fine.
Well, the way I read the paragraph above if we set PR_SET_NO_NEW_PRIVS
after forking, but before execing systemd-timesyncd, and that binary
has an selinux label on it, that the selinux label would be ignore,
and things would continue to run with the same label as we forked
off. This pretty much renders SELinux usesless, since all services
where we set the bit for would then run as "init_t"... and that's
something we really shouldn't do.
seccomp_filter.txt on the other hand says that
"Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or
run with CAP_SYS_ADMIN privileges in its namespace. If these are not
true, -EACCES will be returned. This requirement ensures that filter
programs cannot be applied to child processes with greater privileges
than the task that installed them."

Does this mean that SELinux and seccomp are mutually exclusive? Awful
design if so.
Post by Lennart Poettering
Post by Topi Miettinen
Post by Lennart Poettering
The problem with these lists is that we need to be really conservative
with them, since many of the libs we use (including glibc) use
different interface "behind our back", and thus writing generally
valid rules, that work on all archs, on all distros, on all
glibc/libary versions is hard. This is what SELinux policy does, but I
*really* didn't want to create another implementation of such a
finegrained policy, if you follow what I mean.
I'm not sure. Shouldn't we then ship a SELinux policy file then? Would
you be interested in AppArmor profile for timesyncd, I have one? Also,
if a distro uses weird SELinux policies or setuid helpers at every
possible opportunity, shouldn't they have some responsibility of fixing
their setup?
Well, SELinux policy is managed in a central selinux policy database
that is shipped in one big RPM. My selinux-fu is not good enough to
maintain the policy file in systemd, and i am not sure this even is
generic enough to be able to ship the same selinux policy that works
across all distros that do selinux.
If Apparmor policies are standardized and stand-alone enough, and
there's a clear way to install them, and you are willing to look after
it, then yes, I'd merge a patch that adds apparmor profiles to systemd
upstream.
Well, I'm no expert on AppArmor policies. This is what I have:

#include <tunables/global>

/lib/systemd/systemd-timesyncd {
#include <abstractions/nameservice>

capability setgid,
capability setuid,
capability sys_time,
capability setpcap,

/etc/ld.so.cache r,
/etc/systemd/timesyncd.conf r,
/lib/systemd/systemd-timesyncd mr,
/lib/x86_64-linux-gnu/libattr.so.* mr,
/lib/x86_64-linux-gnu/libc-*.so mr,
/lib/x86_64-linux-gnu/libcap.so.* mr,
/lib/x86_64-linux-gnu/libm-*.so mr,
/lib/x86_64-linux-gnu/libnsl-*.so mr,
/lib/x86_64-linux-gnu/libpthread-*.so mr,
/lib/x86_64-linux-gnu/libresolv-*.so mr,
/proc/cmdline r,
/proc/sys/kernel/random/boot_id r,
/run/systemd/netif/state r,
/var/lib/systemd/clock w,
}

The x86_64-linux-gnu part looks x86-64 and Debian-specific to me.
Post by Lennart Poettering
Post by Topi Miettinen
Post by Lennart Poettering
since the daemon drops the to the systemd-timesync user on its own,
PID 1 invokes it as root, not as the the systemd-timesync user, which
makes the excercise pointless... We should probably handle this better
though, and warn, instead of silently accepting it. Added that to the
TODO list now.
This probably applies to other limits too?
RLIMIT_NPROC is kinda special. While the other resource limits are
per-process, this one is actual per-user. This means that if we apply
the other limits to the process we forked off, where we will invoke
execve() in next, then this will stay locl to that process and be
properly inherited into the final daemon.
Or in other words: the other resource limits are exposed nicely in
systemd, and are already useful. It's just RLIMIT_NPROC that is a bit
useless currently...
Lennart
Lennart Poettering
2015-01-27 22:19:36 UTC
Permalink
Post by Topi Miettinen
Post by Lennart Poettering
Well, to enable stateless systems I think it is a good idea to write
services in a way that they can rebuild what they need in /var on
their own, should it be missing, so that /var can be flushed out and
things will just work when the machine is then booted again.
All our daemons are written in a style where /var is reconstructed
implicitly if it is missing, and timesyncd really should work the same
way.
Nice, but the directories could be created with tmpfiles.d then?
Well, we could. But I really like robust deamons that just work on
their own, I must say...
Post by Topi Miettinen
Post by Lennart Poettering
Well, the way I read the paragraph above if we set PR_SET_NO_NEW_PRIVS
after forking, but before execing systemd-timesyncd, and that binary
has an selinux label on it, that the selinux label would be ignore,
and things would continue to run with the same label as we forked
off. This pretty much renders SELinux usesless, since all services
where we set the bit for would then run as "init_t"... and that's
something we really shouldn't do.
seccomp_filter.txt on the other hand says that
"Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or
run with CAP_SYS_ADMIN privileges in its namespace. If these are not
true, -EACCES will be returned. This requirement ensures that filter
programs cannot be applied to child processes with greater privileges
than the task that installed them."
Does this mean that SELinux and seccomp are mutually exclusive? Awful
design if so.
Well, no it doesn't mean that. If systemd sets up a seccomp filter it
has CAP_SYS_ADMIN, hence all is good. And it can leave
PR_SET_NO_NEW_PRIVS off, and thus not break selinux.
Post by Topi Miettinen
Post by Lennart Poettering
If Apparmor policies are standardized and stand-alone enough, and
there's a clear way to install them, and you are willing to look after
it, then yes, I'd merge a patch that adds apparmor profiles to systemd
upstream.
#include <tunables/global>
/lib/systemd/systemd-timesyncd {
#include <abstractions/nameservice>
capability setgid,
capability setuid,
capability sys_time,
capability setpcap,
this is missing the three fs related caps at least...
Post by Topi Miettinen
/etc/ld.so.cache r,
/etc/systemd/timesyncd.conf r,
/lib/systemd/systemd-timesyncd mr,
/lib/x86_64-linux-gnu/libattr.so.* mr,
/lib/x86_64-linux-gnu/libc-*.so mr,
/lib/x86_64-linux-gnu/libcap.so.* mr,
/lib/x86_64-linux-gnu/libm-*.so mr,
/lib/x86_64-linux-gnu/libnsl-*.so mr,
/lib/x86_64-linux-gnu/libpthread-*.so mr,
/lib/x86_64-linux-gnu/libresolv-*.so mr,
/proc/cmdline r,
/proc/sys/kernel/random/boot_id r,
/run/systemd/netif/state r,
This is not sufficient, as it also wants to read the networkd
confguration, to get NTP servers from it, if networkd is running.

Anyway, if this is cleaned up, and overlooked by somebody who knows
apparmor, and is properly installed by automake, i'd take a patch for
this. However, as i have no clue of Apparmor and can't test this I'd
fully rely on contributions for this one.

Lennart
--
Lennart Poettering, Red Hat
Topi Miettinen
2015-02-01 09:28:07 UTC
Permalink
Post by Lennart Poettering
Post by Topi Miettinen
Post by Lennart Poettering
Well, the way I read the paragraph above if we set PR_SET_NO_NEW_PRIVS
after forking, but before execing systemd-timesyncd, and that binary
has an selinux label on it, that the selinux label would be ignore,
and things would continue to run with the same label as we forked
off. This pretty much renders SELinux usesless, since all services
where we set the bit for would then run as "init_t"... and that's
something we really shouldn't do.
seccomp_filter.txt on the other hand says that
"Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or
run with CAP_SYS_ADMIN privileges in its namespace. If these are not
true, -EACCES will be returned. This requirement ensures that filter
programs cannot be applied to child processes with greater privileges
than the task that installed them."
Does this mean that SELinux and seccomp are mutually exclusive? Awful
design if so.
Well, no it doesn't mean that. If systemd sets up a seccomp filter it
has CAP_SYS_ADMIN, hence all is good. And it can leave
PR_SET_NO_NEW_PRIVS off, and thus not break selinux.
So in conclusion, only the SecureBits part would be interesting. Looking
at the unit files, which of them expect to run setuid helpers? I looked
at the unit files and briefly at the corresponding programs (also to get
to know them better), so I offer my guesses below:

units/console-getty.service.m4.in

getty launches user shell, so no.

units/console-shell.service.m4.in

sulogin launches root shell, so no.

units/container-***@.service.m4.in

getty launches user shell, so no.

units/debug-shell.service.in

sushell launches root shell, so no.

units/emergency.service.in

sulogin launches root shell, so no.

units/***@.service.m4

getty launches user shell, so no.

units/halt-local.service.in

Local scripts could launch anything, so no.

units/initrd-cleanup.service.in

Only calls systemctl, so yes.

units/initrd-parse-etc.service.in

Only calls systemctl, so yes.

units/initrd-switch-root.service.in

Only calls systemctl, so yes.

units/initrd-udevadm-cleanup-db.service.in

Only calls udevadm, so yes.

units/kmod-static-nodes.service.in

kmod is not part of systemd, but I wouldn't expect it to use setuid
helpers, so yes?

units/ldconfig.service

ldconfig is not part of systemd, but I wouldn't expect it to use setuid
helpers, so yes?

units/quotaon.service.in

quotaon is not part of systemd, but I wouldn't expect it to use setuid
helpers, so yes?

units/rc-local.service.in

Local scripts could launch anything, so no.

units/rescue.service.in

sulogin launches root shell, so no.

units/serial-***@.service.m4

getty launches user shell, so no.

units/systemd-ask-password-console.service.in

Only calls systemd-ask-password-agent, so yes.

units/systemd-ask-password-wall.service.in

Only calls systemd-ask-password-agent, so yes.

units/systemd-***@.service.in

Only calls systemd-backlight, so yes.

units/systemd-binfmt.service.in

Only calls systemd-binfmt, so yes.

units/systemd-bootchart.service.in

Only calls systemd-bootchart, so yes.

units/systemd-bus-proxyd.service.m4.in

Only calls systemd-bus-proxyd, so yes.

units/systemd-firstboot.service.in

Only calls systemd-firstboot. It doesn't seem to use pam_unix (which
could launch setuid helper unix_chkpwd) for checking passwords, but
internal implementation, so yes.

units/systemd-fsck-root.service.in

Only calls systemd-fsck, so yes.

units/systemd-***@.service.in

Only calls systemd-fsck, so yes.

units/systemd-halt.service.in

Only calls systemctl, so yes.

units/systemd-hibernate-***@.service.in

Only calls systemd-hibernate-resume, so yes.

units/systemd-hibernate.service.in

Only calls systemd-sleep, so yes.

units/systemd-hostnamed.service.in

Only calls systemd-hostnamed, so yes.

units/systemd-hwdb-update.service.in

Only calls systemd-hwdb, so yes.

units/systemd-hybrid-sleep.service.in

Only calls systemd-sleep, so yes.

units/systemd-importd.service.in

Calls systemd-importd, which in turn executes several helpers,
especially gpg which could be setuid to lock memory, so no.

units/systemd-initctl.service.in

Only calls systemd-initctl, so yes.

units/systemd-journal-catalog-update.service.in

Only calls systemd-journalctl, so yes.

units/systemd-journald.service.in

Only calls systemd-journald, so yes.

units/systemd-journal-flush.service.in

Only calls systemd-journalctl, so yes.

units/systemd-journal-gatewayd.service.in

Only calls systemd-journal-gatewayd, so yes.

units/systemd-journal-remote.service.in

Calls systemd-journal-remote, which in turn uses microhttpd that uses
several external libs, maybe no?

units/systemd-journal-upload.service.in

Only calls systemd-journal-upload (uses curl), so yes.

units/systemd-kexec.service.in

Only calls systemctl, so yes.

units/systemd-localed.service.in

Only calls systemd-localed, so yes.

units/systemd-logind.service.in

Calls systemd-logind, which can start user services, so no?

units/systemd-machined.service.in

Launches machine instances, but that is done via dbus, so yes?

units/systemd-machine-id-commit.service.in

Only calls systemd-machine-id-commit, so yes.

units/systemd-modules-load.service.in

Calls systemd-modules-load which uses libkmod, so yes?

units/systemd-networkd.service.in

Only calls systemd-networkd, so yes? Setting the bits could be done in
drop_privileges() if this is OK for other users too (bus-proxyd,
resolved, journald, and timesyncd).

units/systemd-networkd-wait-online.service.in

Only calls systemd-networkd-wait-online, so yes.

units/systemd-***@.service.in

Can launch anything, so no.

units/systemd-poweroff.service.in

Only calls systemctl, so yes.

units/systemd-quotacheck.service.in

quotacheck is not part of systemd, but I wouldn't expect it to use
setuid helpers, so yes?

units/systemd-random-seed.service.in

Only calls systemd-random-seed, so yes.

units/systemd-reboot.service.in

Only calls systemctl, so yes.

units/systemd-remount-fs.service.in

Launches /bin/mount which usually is setuid. As it is executed as root
with full capabilities, it should be OK.

units/systemd-resolved.service.in

Only calls systemd-resolved, so yes.

units/systemd-***@.service.in

Only calls systemd-rfkill, so yes.

units/systemd-shutdownd.service.in

Only calls systemctl, so yes.

units/systemd-suspend.service.in

Only calls systemd-sleep, so yes.

units/systemd-sysctl.service.in

Only calls systemd-sysctl, so yes.

units/systemd-sysusers.service.in

Calls systemd-sysusers. NSS helpers could in theory be setuid, so no.

units/systemd-timedated.service.in

Only calls systemd-timedated, so yes.

units/systemd-timesyncd.service.in

Only calls systemd-timesyncd, so yes.

units/systemd-tmpfiles-clean.service.in

Only calls systemd-tmpfiles, so yes.

units/systemd-tmpfiles-setup-dev.service.in

Only calls systemd-tmpfiles, so yes.

units/systemd-tmpfiles-setup.service.in

Only calls systemd-tmpfiles, so yes.

units/systemd-udevd.service.in

udevd helpers can be anything, so no.

units/systemd-udev-settle.service.in

Only calls udevadm, so yes.

units/systemd-udev-trigger.service.in

Only calls udevadm, so yes.

units/systemd-update-done.service.in

systemd-update-done uses SELinux libs which could do anything, so no?

units/systemd-update-utmp-runlevel.service.in

systemd-update-utmp uses audit libs which could do anything, so no?

units/systemd-update-utmp.service.in

systemd-update-utmp uses audit libs which could do anything, so no?

units/systemd-user-sessions.service.in

Only calls systemd-user-sessions, so yes.

units/systemd-vconsole-setup.service.in

systemd-vconsole-setup uses loadkeys and setfont which are not part of
systemd, but they are unlikely to be setuid, so yes.

units/***@.service.m4.in

Calls systemd --user, which will start user specified units which can be
setuid, so no.

-Topi
Lennart Poettering
2015-02-02 20:43:59 UTC
Permalink
Post by Topi Miettinen
Post by Lennart Poettering
Post by Topi Miettinen
Post by Lennart Poettering
Well, the way I read the paragraph above if we set PR_SET_NO_NEW_PRIVS
after forking, but before execing systemd-timesyncd, and that binary
has an selinux label on it, that the selinux label would be ignore,
and things would continue to run with the same label as we forked
off. This pretty much renders SELinux usesless, since all services
where we set the bit for would then run as "init_t"... and that's
something we really shouldn't do.
seccomp_filter.txt on the other hand says that
"Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or
run with CAP_SYS_ADMIN privileges in its namespace. If these are not
true, -EACCES will be returned. This requirement ensures that filter
programs cannot be applied to child processes with greater privileges
than the task that installed them."
Does this mean that SELinux and seccomp are mutually exclusive? Awful
design if so.
Well, no it doesn't mean that. If systemd sets up a seccomp filter it
has CAP_SYS_ADMIN, hence all is good. And it can leave
PR_SET_NO_NEW_PRIVS off, and thus not break selinux.
So in conclusion, only the SecureBits part would be interesting. Looking
at the unit files, which of them expect to run setuid helpers? I looked
at the unit files and briefly at the corresponding programs (also to get
In general, I would only bother with setting the securebits for
long-running services. The stuff that only sets up something and
quickly exits is probably not worth the efforts...

Also, anything that invokes generic user code (such as the gettys or
logind) can't really be locked down, since we should not restrict what
the users can do.

I'd also not bother with debugging/profiling tools such as bootchart,
since they aren't part of normal code paths.
Post by Topi Miettinen
units/systemd-hostnamed.service.in
units/systemd-importd.service.in
Calls systemd-importd, which in turn executes several helpers,
especially gpg which could be setuid to lock memory, so no.
Actually, importd already drops most caps on its own. It even takes
most caps away from gpg... I think it is safe to use securebits on this.
Post by Topi Miettinen
units/systemd-journald.service.in
units/systemd-journal-gatewayd.service.in
units/systemd-journal-remote.service.in
units/systemd-journal-upload.service.in
units/systemd-localed.service.in
units/systemd-logind.service.in
units/systemd-machined.service.in
units/systemd-networkd.service.in
units/systemd-resolved.service.in
units/systemd-timedated.service.in
units/systemd-timesyncd.service.in
I think adding the securebits stuff should be safe for all of these...

Lennart
--
Lennart Poettering, Red Hat
Cameron Norman
2015-01-27 23:12:08 UTC
Permalink
Post by Topi Miettinen
#include <tunables/global>
/lib/systemd/systemd-timesyncd {
I am not sure how that would be done, but this needs to handle
timesyncd being in /usr/lib/systemd as well as /lib.

Also, adding `flags=(complain)` just before the curly brace puts the
profile into complain mode by default.
Post by Topi Miettinen
#include <abstractions/nameservice>
capability setgid,
capability setuid,
capability sys_time,
capability setpcap,
/etc/ld.so.cache r,
/etc/systemd/timesyncd.conf r,
/lib/systemd/systemd-timesyncd mr,
/lib/x86_64-linux-gnu/libattr.so.* mr,
/lib/x86_64-linux-gnu/libc-*.so mr,
/lib/x86_64-linux-gnu/libcap.so.* mr,
/lib/x86_64-linux-gnu/libm-*.so mr,
/lib/x86_64-linux-gnu/libnsl-*.so mr,
/lib/x86_64-linux-gnu/libpthread-*.so mr,
/lib/x86_64-linux-gnu/libresolv-*.so mr,
Use the variable `@{multiarch}` in place of `x86...`. Also, it is
probably desirable to add `{,usr/}` between the / and lib in these
lines for distros like Arch that have made the /usr merge.
Post by Topi Miettinen
/proc/cmdline r,
/proc/sys/kernel/random/boot_id r,
@{PROC} rather than /proc, so `@{PROC/cmdline r,`.
Post by Topi Miettinen
/run/systemd/netif/state r,
I have seen compatibility for pre-/run distros (i.e. adding `{,var/}`
before the run but after the slash), but probably not relevant for a
systemd daemon.
Post by Topi Miettinen
/var/lib/systemd/clock w,
}
Then post to ***@lists.ubuntu.com asking for a review.

Lennart: if you really want to test the profile, you just need to spin
up an OpenSuSE, Ubuntu, or Debian VM (on debian you need to install
and enable apparmor, which takes a short while).

Cheers,
--
Cameron Norman
Lennart Poettering
2015-01-28 00:18:21 UTC
Permalink
Post by Cameron Norman
Lennart: if you really want to test the profile, you just need to spin
up an OpenSuSE, Ubuntu, or Debian VM (on debian you need to install
and enable apparmor, which takes a short while).
Well, I have no personal interest in AppArmor, and I really have
enough stuff to do. If AA shall be supported in systemd, then I am
happy to merge stuff for it, if it is reviewed properly, but I am not
the one to test it. Sorry...

Lennart
--
Lennart Poettering, Red Hat
Cristian Rodríguez
2015-02-01 14:21:17 UTC
Permalink
Post by Lennart Poettering
Post by Cameron Norman
Lennart: if you really want to test the profile, you just need to spin
up an OpenSuSE, Ubuntu, or Debian VM (on debian you need to install
and enable apparmor, which takes a short while).
Well, I have no personal interest in AppArmor, and I really have
enough stuff to do. If AA shall be supported in systemd, then I am
happy to merge stuff for it, if it is reviewed properly, but I am not
the one to test it. Sorry...
Hi Lennart:

To make apparmor work, only the initial "policy loading" bits (like
selinux, smack..etc) needs to be implemented..currently it is done by
some really ugly init script.Appamor policies however, just like the
case of selinux have to go somewhere else, namely the apparmor upstream
repository.
Lennart Poettering
2015-02-02 20:45:05 UTC
Permalink
Post by Lennart Poettering
Post by Cameron Norman
Lennart: if you really want to test the profile, you just need to spin
up an OpenSuSE, Ubuntu, or Debian VM (on debian you need to install
and enable apparmor, which takes a short while).
Well, I have no personal interest in AppArmor, and I really have
enough stuff to do. If AA shall be supported in systemd, then I am
happy to merge stuff for it, if it is reviewed properly, but I am not
the one to test it. Sorry...
To make apparmor work, only the initial "policy loading" bits (like selinux,
smack..etc) needs to be implemented..currently it is done by some really
ugly init script.
Would be happy to take a patch for that.
Appamor policies however, just like the case of selinux
have to go somewhere else, namely the apparmor upstream repository.
OK, if it's customary to ship apparmor profiles in a centralized
distro-wide policy package, then we shouldn't include this in
systemd. Same case as for the SELinux policy I figure...

Lennart
--
Lennart Poettering, Red Hat
Cameron Norman
2015-01-27 22:58:10 UTC
Permalink
On Tue, Jan 27, 2015 at 1:16 PM, Lennart Poettering
Post by Lennart Poettering
Post by Topi Miettinen
I'm not sure. Shouldn't we then ship a SELinux policy file then? Would
you be interested in AppArmor profile for timesyncd, I have one? Also,
if a distro uses weird SELinux policies or setuid helpers at every
possible opportunity, shouldn't they have some responsibility of fixing
their setup?
Well, SELinux policy is managed in a central selinux policy database
that is shipped in one big RPM. My selinux-fu is not good enough to
maintain the policy file in systemd, and i am not sure this even is
generic enough to be able to ship the same selinux policy that works
across all distros that do selinux.
If Apparmor policies are standardized and stand-alone enough, and
there's a clear way to install them, and you are willing to look after
it, then yes, I'd merge a patch that adds apparmor profiles to systemd
upstream.
A good idea would be to set the apparmor profile(s) to warn-only mode
for some period of time, and then let distros patch (this would be a
one liner) that to be in enforce mode if they want to test it out.

One possible issue is that AppArmor profiles are installed in /etc.
Will that be a problem WRT the whole stateless system initiative, or
is it an acceptable exception to the "only comments in /etc" rule?

Cheers,
--
Cameron
Lennart Poettering
2015-01-28 00:15:08 UTC
Permalink
Post by Cameron Norman
On Tue, Jan 27, 2015 at 1:16 PM, Lennart Poettering
Post by Lennart Poettering
Post by Topi Miettinen
I'm not sure. Shouldn't we then ship a SELinux policy file then? Would
you be interested in AppArmor profile for timesyncd, I have one? Also,
if a distro uses weird SELinux policies or setuid helpers at every
possible opportunity, shouldn't they have some responsibility of fixing
their setup?
Well, SELinux policy is managed in a central selinux policy database
that is shipped in one big RPM. My selinux-fu is not good enough to
maintain the policy file in systemd, and i am not sure this even is
generic enough to be able to ship the same selinux policy that works
across all distros that do selinux.
If Apparmor policies are standardized and stand-alone enough, and
there's a clear way to install them, and you are willing to look after
it, then yes, I'd merge a patch that adds apparmor profiles to systemd
upstream.
A good idea would be to set the apparmor profile(s) to warn-only mode
for some period of time, and then let distros patch (this would be a
one liner) that to be in enforce mode if they want to test it out.
One possible issue is that AppArmor profiles are installed in /etc.
Will that be a problem WRT the whole stateless system initiative, or
is it an acceptable exception to the "only comments in /etc" rule?
Well, there's support for copying data from /usr to /etc on first boot
using tmpfile's "C" lines. However, that's supposed to be used only
as temporarily glue. Ideally all softare would work fine without /etc
around and do the right thing on its own. Also, apparmor probably
should operate before tmpfiles has run.

So yeah, apparmor working like that is not compatible withe stateless
systems that shall be able to boot up without /etc around.

Lennart
--
Lennart Poettering, Red Hat
Loading...