2017-12-04

The Advent of Void: Day 4: containers

Today we introduce containers a small suite of tools to create and run programs in “containers” using linux namespaces.

The containers package leverages the same kernel features as Docker and LXC, but those pieces of software are large and provide many features not directly related to container technology itself. containers only does the bare minimum.

containers comes with three tools: contain, pseudo and inject.

contain handles the details of running init (or /bin/sh if no init is specified). Inside its container, contain is the computer, much like the lxc or docker daemons.

pseudo uses CLONE_NEWUSER to create a user namespace which makes it possible to map UIDs and GIDs from inside the namespaces to real UIDs and GIDs outside the namespace.

inject is used to run a new programm in a running container, like spawning a shell to configure the container form inside.

To setup a small void container we run xbps-install with pseudo to populate a directory with a base system where files that are usually owned by root are owned by our user outside the namespace. Inside the namespace they will appear as if those files are owned by root.

$ pseudo xbps-install -R https://repo.voidlinux.eu/current/ -MSr /tmp/void base-voidstrap
[*] Updating `https://repo.voidlinux.eu/current//armv7l-repodata' ...
armv7l-repodata: 1140KB [avg rate: 9126MB/s]
`https://repo.voidlinux.eu/current/' repository has been RSA signed by "Void Linux"
Fingerprint: 60:ae:0c:d6:f0:95:17:80:bc:93:46:7a:89:af:a3:2d
Do you want to import this public key? [Y/n] y

Name            Action    Version           New version            Download size
xbps-triggers   install   -                 0.102_3                8116B
base-files      install   -                 0.139_9                51KB
ncurses-base    install   -                 6.0_2                  23KB
glibc           install   -                 2.26_3                 6187KB
[...]
you want to continue? [Y/n]
[*] Downloading binary packages
[...]
[*] Verifying package integrity
[...]
[*] Running transaction tasks
[...]
[*] Configuring unpacked packages
[...]
92 downloaded, 92 installed, 0 updated, 92 configured, 0 removed.

With the populated root directory we can already use contain to run a shell inside our container.

$ id
uid=1000([...]) gid=1000([...]) groups=[...]
$ contain /tmp/void/
# id
uid=0(root) gid=0(root) groups=0(root)

Before we “boot” the void container, we disable the agetty services, because we don’t need them and they would just restart in a loop, because the ttys are not available.

$ rm /tmp/void/etc/runit/runsvdir/default/agetty*

Now we can copy our hosts /etc/resolv.conf into the containers root simplify the setup. To share the host network unprivileged with the container we can use the -n flag. By default contain runs a shell inside of the container, to “boot” the container we specify /bin/init.

$ cp /etc/resolv.conf /tmp/void/etc/
$ contain -n /tmp/void/ /bin/init
- runit: $Id: 25da3b86f7bed4038b8a039d2f8e8c9bbcf0822b $: booting.
- runit: enter stage: /etc/runit/1
=> Welcome to Void!
=> Mounting pseudo-filesystems...
mount: /sys: permission denied.
mount: /sys/kernel/security: mount point does not exist.
=> Initializing random seed...
=> Setting up loopback interface...
RTNETLINK answers: Operation not permitted
=> Setting up hostname to 'void-live'...
/etc/runit/1: 13: /etc/runit/core-services/05-misc.sh: cannot create /proc/sys/kernel/hostname: Permission denied
=> Loading sysctl(8) settings...
* Applying /usr/lib/sysctl.d/void.conf ...
sysctl: permission denied on key 'kernel.core_uses_pid'
sysctl: permission denied on key 'fs.protected_hardlinks'
sysctl: permission denied on key 'fs.protected_symlinks'
sysctl: permission denied on key 'kernel.kptr_restrict'
sysctl: permission denied on key 'kernel.dmesg_restrict'
sysctl: permission denied on key 'kernel.perf_event_paranoid'
sysctl: cannot stat /proc/sys/kernel/kexec_load_disabled: No such file or directory
sysctl: cannot stat /proc/sys/kernel/yama/ptrace_scope: No such file or directory
* Applying /etc/sysctl.conf ...
install: cannot change ownership of '/run/utmp': Invalid argument
dmesg: read kernel buffer failed: Operation not permitted
=> Initialization complete, running stage 2...
- runit: leave stage: /etc/runit/1
- runit: enter stage: /etc/runit/2
runsvchdir: default: current.

While the container is running we can use the inject tool to enter the namespace.

$ sudo inject $(pgrep contain) /bin/bash
bash-4.4# id
uid=0(root) gid=0(root) groups=0(root)
bash-4.4# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether de:ad:be:ef:de:ad brd ff:ff:ff:ff:ff:ff
bash-4.4# sv s /var/service/*
run: /var/service/udevd: (pid 41) 61s
bash-4.4# xbps-install -Su
[*] Updating `https://repo.voidlinux.eu/current/armv7l-repodata' ...
bash-4.4# ps auxf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root        43  0.0  0.2   3136  2556 ?        S    16:43   0:00 /bin/bash
root        48  0.0  0.1   2776  1604 ?        R+   16:45   0:00  \_ ps auxf
root         1  0.0  0.0    668     4 console  Ss+  16:42   0:00 runit
root        34  0.0  0.0   1788   180 ?        Ss   16:42   0:00 runsvdir -P /run/runit/runsvdir/current log: ......
root        40  0.0  0.0   1648   192 ?        Ss   16:42   0:00  \_ runsv udevd
root        41  0.0  0.2  10476  2060 ?        S    16:42   0:00      \_ udevd

The functionality of contain can be further extended with two flags: -i to run a program inside of the namespace and -o to run a program outside of the namespace.

As example we can use the -i flag to bind mount a directory into the namespace

$ contain -i "mkdir data; mount --bind $(pwd) data" -n /tmp/void/ /bin/init
[...]
$ sudo inject $(pgrep contain) /bin/bash
bash-4.4# ls /data/
hello  world
bash-4.4#

Instead of passing the shell commands to -i and -o you can create scripts to prepare your containers. The scripts can mount filesystems, setup virtual network interfaces and more.