this blog post https://brauner.github.io/2019/02/12/privileged-containers.html … -- highlights something commonly misunderstood: Docker's --privileged is /not/ the same as an "privileged container" in general container parlance, and i wish they'd change the naming!
-
Show this thread
-
"privileged container" means the behavior of e.g. UID 0 in the container is the same as outside of it i.e. no user namespaces, which "map" IDs in the container to others on the host. this is /not/ Docker's default, so almost always Docker containers are "privileged"
2 replies 0 retweets 1 likeShow this thread -
note that this is a sort of reductive explanation of user namespaces. user namespaces are a broader feature, e.g. they change how capabilities are evaluated as well. see this talk from
@brau_ner:https://www.youtube.com/watch?v=-PZNF8XDNn8 …2 replies 0 retweets 0 likesShow this thread -
.
@brau_ner https://youtu.be/-PZNF8XDNn8?t=223 … notes 'no real privilege separation for most ns' http://man7.org/linux/man-pages/man2/setns.2.html … says a thread must have SYS_ADMIN w/r/t user ns to pop out of e.g. network ns what about when no user ns? setns needs fd to target ns, so maybe you can't get a handle?1 reply 0 retweets 0 likesShow this thread -
if you have pointers to kernel src i'd love them! general pattern of 'you can do X if you have Y cap in Z user ns' http://man7.org/linux/man-pages/man7/user_namespaces.7.html … 'Holding CAP_SYS_ADMIN within the user namespace' but not sure what the kernel does w/out user ns (Docker), do you just have to hold the cap?
1 reply 0 retweets 0 likesShow this thread -
-
answering my own question here... 'Each process is a member of exactly one user namespace.' http://man7.org/linux/man-pages/man7/user_namespaces.7.html … and 'A Linux system starts out with a single namespace of each type,' https://en.wikipedia.org/wiki/Linux_namespaces … so the checks occur. Still need SYS_ADMIN to traverse with setns.
1 reply 0 retweets 0 likesShow this thread -
Replying to @randohacker @brau_ner
the entire system starts out in the init_user_ns, which is an ancestor of all other user namespaces
1 reply 1 retweet 1 like -
and the capability check helper for actions that affect the entire system, capable(...), is actually defined based on ns_capable(&init_user_ns, ...)
1 reply 1 retweet 1 like -
an LXC-specific question. unprivileged containers are spawned with the CLONE_NEWUSER flag https://github.com/lxc/lxc/blob/d0b950440a8e5f9984520ab8c88e22a37a5469bc/src/lxc/start.c#L1755 … puts them in a new user ns. even if idmaps overlap (no security.idmap.isolated), cap in one container can't be used in another, right?
2 replies 0 retweets 0 likes
you can't use capabilities on objects that aren't inside your user namespace; but for inodes, which aren't really associated with a specific namespace, the rule is that an inode is contained by every user namespace that maps its UID and GID, see capable_wrt_inode_uidgid()
-
-
on top of that, you can setuid() to UIDs that are mapped in your namespace, which allows you to then access some types of objects
0 replies 1 retweet 1 likeThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.