RootlessKit: Linux-native fakeroot using user namespaces
RootlessKit is a Linux-native implementation of "fake root" using user_namespaces(7)
.
The purpose of RootlessKit is to run Docker and Kubernetes as an unprivileged user (known as "Rootless mode"), so as to protect the real root on the host from potential container-breakout attacks.
What RootlessKit actually does
RootlessKit creates user_namespaces(7)
and mount_namespaces(7)
,
and executes newuidmap(1)
/newgidmap(1)
along with subuid(5)
and subgid(5)
.
RootlessKit also supports isolating network_namespaces(7)
with userspace NAT using "slirp".
Kernel-mode NAT using SUID-enabled lxc-user-nic(1)
is also experimentally supported.
Similar projects
Tools based on LD_PRELOAD
(not enough to run rootless containers and yet lacks support for static binaries):
Tools based on ptrace(2)
(not enough to run rootless containers and yet slow):
Tools based on user_namespaces(7)
(as in RootlessKit, but without support for --copy-up
, --net
, ...):
Projects using RootlessKit
Container engines:
Container image builders:
- BuildKit: Next-generation
docker build
backend
Kubernetes distributions:
- Usernetes: Docker & Kubernetes, installable under a non-root user's
$HOME
.
- k3s: Lightweight Kubernetes
Setup
Run make && sudo make install
.
The following binaries will be installed:
/usr/local/bin/rootlesskit
/usr/local/bin/rootlessctl
/usr/local/bin/rootlesskit-docker-proxy
(Can be safely removed if you do not use Docker)
Requirements
subuid
-
newuidmap
and newgidmap
need to be installed on the host. These commands are provided by the uidmap
package on most distributions.
-
/etc/subuid
and /etc/subgid
should contain more than 65536 sub-IDs. e.g. penguin:231072:65536
. These files are automatically configured on most distributions.
$ id -u
1001
$ whoami
penguin
$ grep "^$(whoami):" /etc/subuid
penguin:231072:65536
$ grep "^$(whoami):" /etc/subgid
penguin:231072:65536
See also https://rootlesscontaine.rs/getting-started/common/subuid/
sysctl
Some distros require setting up sysctl:
- Debian (excluding Ubuntu) and Arch:
sudo sh -c "echo 1 > /proc/sys/kernel/unprivileged_userns_clone"
- RHEL/CentOS 7 (excluding RHEL/CentOS 8):
sudo sh -c "echo 28633 > /proc/sys/user/max_user_namespaces"
To persist sysctl configurations, edit /etc/sysctl.conf
or add a file under /etc/sysctl.d
.
See also https://rootlesscontaine.rs/getting-started/common/sysctl/
Usage
Inside rootlesskit bash
, your UID is mapped to 0 but it is not the real root:
(host)$ rootlesskit bash
(rootlesskit)# id
uid=0(root) gid=0(root) groups=0(root),65534(nogroup)
(rootlesskit)# ls -l /etc/shadow
-rw-r----- 1 nobody nogroup 1050 Aug 21 19:02 /etc/shadow
(rootlesskit)# cat /etc/shadow
cat: /etc/shadow: Permission denied
Environment variables are kept untouched:
(host)$ rootlesskit bash
(rootlesskit)# echo $USER
penguin
(rootlesskit)# echo $HOME
/home/penguin
(rootlesskit)# echo $XDG_RUNTIME_DIR
/run/user/1001
Filesystems can be isolated from the host with --copy-up
:
(host)$ rootlesskit --copy-up=/etc bash
(rootlesskit)# rm /etc/resolv.conf
(rootlesskit)# vi /etc/resolv.conf
You can even create network namespaces with Slirp:
(host)$ rootlesskit --copy-up=/etc --copy-up=/run --net=slirp4netns --disable-host-loopback bash
(rootleesskit)# ip netns add foo
...
Full CLI options
$ rootlesskit --help
NAME:
rootlesskit - Linux-native fakeroot using user namespaces
USAGE:
rootlesskit [global options] [arguments...]
VERSION:
1.1.0
DESCRIPTION:
RootlessKit is a Linux-native implementation of "fake root" using user_namespaces(7).
Web site: https://github.com/rootless-containers/rootlesskit
Examples:
# spawn a shell with a new user namespace and a mount namespace
rootlesskit bash
# make /etc writable
rootlesskit --copy-up=/etc bash
# set mount propagation to rslave
rootlesskit --propagation=rslave bash
# create a network namespace with slirp4netns, and expose 80/tcp on the namespace as 8080/tcp on the host
rootlesskit --copy-up=/etc --net=slirp4netns --disable-host-loopback --port-driver=builtin -p 127.0.0.1:8080:80/tcp bash
Note: RootlessKit requires /etc/subuid and /etc/subgid to be configured by the real root user.
See https://rootlesscontaine.rs/getting-started/common/ .
OPTIONS:
Misc:
--debug debug mode (default: false)
--help, -h show help (default: false)
--version, -v print the version (default: false)
Mount:
--copy-up value [ --copy-up value ] mount a filesystem and copy-up the contents. e.g. "--copy-up=/etc" (typically required for non-host network)
--copy-up-mode value copy-up mode [tmpfs+symlink]
--propagation value mount propagation [rprivate, rslave]
Network:
--net value network driver [host, slirp4netns, vpnkit, lxc-user-nic(experimental)]
--mtu value MTU for non-host network (default: 65520 for slirp4netns, 1500 for others) (default: 0)
--cidr value CIDR for slirp4netns network (default: 10.0.2.0/24)
--ifname value Network interface name (default: tap0 for slirp4netns and vpnkit, eth0 for lxc-user-nic)
--disable-host-loopback prohibit connecting to 127.0.0.1:* on the host namespace (default: false)
--ipv6 enable IPv6 routing. Unrelated to port forwarding. Only supported for slirp4netns. (experimental) (default: false)
Network [lxc-user-nic]:
--lxc-user-nic-binary value path of lxc-user-nic binary for --net=lxc-user-nic
--lxc-user-nic-bridge value lxc-user-nic bridge name
Network [slirp4netns]:
--slirp4netns-binary value path of slirp4netns binary for --net=slirp4netns
--slirp4netns-sandbox value enable slirp4netns sandbox (experimental) [auto, true, false] (the default is planned to be "auto" in future)
--slirp4netns-seccomp value enable slirp4netns seccomp (experimental) [auto, true, false] (the default is planned to be "auto" in future)
Network [vpnkit]:
--vpnkit-binary value path of VPNKit binary for --net=vpnkit
Port:
--port-driver value port driver for non-host network. [none, builtin, slirp4netns]
--publish value, -p value [ --publish value, -p value ] publish ports. e.g. "127.0.0.1:8080:80/tcp"
Process:
--pidns create a PID namespace (default: false)
--cgroupns create a cgroup namespace (default: false)
--utsns create a UTS namespace (default: false)
--ipcns create an IPC namespace (default: false)
--reaper value enable process reaper. Requires --pidns. [auto,true,false]
--evacuate-cgroup2 value evacuate processes into the specified subgroup. Requires --pidns and --cgroupns
State:
--state-dir value state directory
SubID:
--subid-source value the source of the subids. "dynamic" executes /usr/bin/getsubids. "static" reads /etc/{subuid,subgid}. [auto,dynamic,static]
State directory
The following files will be created in the state directory, which can be specified with --state-dir
:
lock
: lock file
child_pid
: decimal PID text that can be used for nsenter(1)
.
api.sock
: REST API socket. See ./docs/api.md
and ./docs/port.md
.
If --state-dir
is not specified, RootlessKit creates a temporary state directory on /tmp
and removes it on exit.
Undocumented files are subject to change.
Environment variables
The following environment variables will be set for the child process:
ROOTLESSKIT_STATE_DIR
(since v0.3.0): absolute path to the state dir
ROOTLESSKIT_PARENT_EUID
(since v0.8.0): effective UID
ROOTLESSKIT_PARENT_EGID
(since v0.8.0): effective GID
Undocumented environment variables are subject to change.
Additional documents
./docs/network.md
: Networking (--net
, --mtu
, --cidr
, --disable-host-loopback
, --slirp4netns-*
, ...)
./docs/port.md
: Port forwarding (--port-driver
, -p
, ...)
./docs/mount.md
: Mount (--propagation
, ...)
./docs/process.md
: Process (--pidns
, --reaper
, --cgroupns
, --evacuate-cgroup2
, ...)
./docs/api.md
: REST API
./docs/subid.md
: Sub UIDs and sub GIDs