lxkns

package module
v0.15.5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 8, 2020 License: Apache-2.0 Imports: 18 Imported by: 0

README ΒΆ

Linux kernel Namespaces

PkgGoDev Architecture GitHub build and test Go Report Card

lxkns is a Golang package for discovering Linux kernel namespaces. In every nook and cranny of your Linux hosts. This package also features marshalling and unmarshalling namespace discovery results to and from JSON – which is especially useful to separate the super-privileged scanner from non-root frontends: run namespace discoveries as a containerized service.

In addition, lxkns comes with a set of unique CLI namespace discovery tools and also helps Go programs with switching namespaces.

And all that tested with Go 1.13-1.15. And even with support for the new time namespaces.

πŸ”Ž Comprehensive Namespace Discovery

When compared to most well-known and openly available CLI tools, such as lsns, the lxkns package detects namespaces even in places of a running Linux system other tools typically do not consider. In particular:

  1. from the procfs filesystem in /proc/[PID]/ns/* – as lsns and other tools do.
  2. bind-mounted namespaces, via /proc/[PID]/mountinfo. Our discovery method even finds bind-mounted namespaces in other mount namespaces than the current one in which the discovery starts.
  3. file descriptor-referenced namespaces, via /proc/[PID]/fd/*.
  4. intermediate hierarchical user and PID namespaces, via NS_GET_PARENT (man 2 ioctl_ns).
  5. user namespaces owning non-user namespaces, via NS_GET_USERNS (man 2 ioctl_ns).
tool /proc/[PID]/ns/* β‘  bind mounts β‘‘ /proc/[PID]/fd/* β‘’ hierarchy β‘£ owning user namespaces β‘€
lsns βœ“
lxkns βœ“ βœ“ βœ“ βœ“ βœ“

Applications can control the extent to which a lxkns discovery tries to ferret out namespaces from the nooks and crannies of Linux hosts.

Some discovery methods are more expensive than others, especially the discovery of bind-mounted namespaces in other mount namespaces. The reason lies in the design of the Go runtime which runs multiple threads and Linux not allowing multi-threaded processes to switch mount namespaces. In order to work around this constraint, lxkns must fork and immediately re-execute the process it is used in. Applications that want to use such advanced discovery methods thus must call reexec.CheckAction() as early as possible in their main() function. For this, you need to import "github.com/thediveo/gons/reexec".

In addition, lxkns also discovers some control group information for the processes attached to namespaces. In particular, the discovery will relate processes to the control groups created for the "cpu" v1 controller type. To a limited extend, the names of these control groups will relate to the partitioning of processes using Linux kernel namespaces. For instance, processes in Docker containers will show control group names in the form of docker/<id>, where the id is the usual 64 hex char string. Plain containerd container processes will show up with <namespace>/<id> control group names.

🧰 lxkns Tools

But lxkns is more than "just" a Golang package. It also features...

  • lxkns discovery service exposing namespace discovery information via a simple REST API. Of course, our service is build with, guess, lxkns.
  • CLI tools also build on top of lxkns (we do eat our own dog food).
πŸ‹ lxkns REST Service

To give the containerized lxkns discovery service a test drive (needs Docker with docker-compose to be installed) you can play around with our "Linux namespaces" react app:

  1. make deploy,
  2. and then navigate to http://localhost:5010. The lxkns web app should load automatically and then display the discovery results. The app bar controls show tooltips when hovering over them.
    • ☰ opens the drawer, where you can navigate to different namespace views. In particular, an "all" namespaces view along the user namespace hierarchy, as well as per-type views which focus on a specific type of namespace each, with the attached processes, organized by their cgroup hierarchy.
    • > collapses all namespace nodes, except for top-level nodes (initial user and PID namespaces, all other namespaces).
    • v expands all namespace nodes.
    • ⟳ manuals refresh -- whenevery you want; displays a progress indicator in case of slow refreshes.
    • πŸ”„ opens a pop-up menu to change the refresh interval or switch off automatic refresh. Your refresh setting will be stored in your browser's local storage.
Obligatory Eye Candy

The lxkns web app offers several different views onto the Linux kernel namespaces. To navigate between them, click on the "hamburger" icon to open the drawer or swipe from the left on touch-enabled devices.

lxkns app navigation

See all network namespaces with the "leader" processes attached to them. Please note the cgroup paths, which help us in identifying Docker containers.

lxkns app network namespaces

There's a neat feature in the lxkns app: if we look more closely at the PID namespaces we notice that one of our Docker containers (the one with the lxkns service) doesn't use its own PID namespace, but instead is attached to the initial PID namespace.

lxkns app PID namespaces

Besides the namespace-type specific views, there's the all-in-one view, which is organized along the hierarchy of user namespaces. The rationale here is that in the Linux kernel architecture, user namespaces own all other namespaces.

lxkns app all namespaces

lxkns Service Container Deployment

Some deployment notes about the lxkns service container:

  • read-only: the lxkns service can be used on a read-only container filesystem.
  • non-root: the holy grail of container hardening ... wait till you get to see our capabilities 😏
  • unprivileged: because that doesn't mean in-capable 😈
  • capabilities: not much to see here, just CAP_SYS_ADMIN, CAP_SYS_CHROOT, CAP_SYS_PTRACE, and CAP_DAC_READ_SEARCH.
πŸ–₯️ CLI Tools

To build and install all CLI tools:

  • system install: simply run make install to install the tools into your system, defaults to /usr/local/bin.
  • local install: go install ./cmd/... ./examples/lsallns installs only into $GOPATH/bin.

The tools:

  • lsuns GoDoc: shows all user namespaces in your Linux host, in a neat hierarchy. Moreover, it can also show the non-user namespaces "owned" by user namespaces. This ownership information is important with respect to capabilities and processes switching namespaces using setns() (man 2 setns).

  • lspidns GoDoc: shows all PID namespaces in your Linux host, in a neat hierarchy. Optionally, the owning user namespaces can be shown interleaved with the PID namespace hierarchy.

  • pidtree GoDoc: shows either the process hierarchy within the PID namespace hierarchy or a single branch only.

  • nscaps GoDoc: determines a process' capabilities in a namespace, and then displays the owning user namespace hierarchy (or hierarchies) of the process and target namespace, together with the current process and namespace capabilities.

  • dumpns GoDoc: runs a namespace (and process) discovery and then dumps the results as JSON.

lsuns

In its simplest form, lsuns shows the hierarchy of user namespaces.

$ sudo lsuns
user:[4026531837] process "systemd" (1) created by UID 0 ("root")
β”œβ”€ user:[4026532454] process "unshare" (98171) controlled by "user.slice" created by UID 1000 ("harald")
└─ user:[4026532517] process "upowerd" (96159) controlled by "system.slice/upower.service" created by UID 0 ("root")

Note: lsuns does not only show the user namespaces with their IDs and hierarchy. It also shows the "most senior" process attached to the particular user namespace, as well as the user "owning" the user namespace. The "most senior" process is the top-most process in the process tree still attached to a (user) namespace; in case of multiple top-most processes – such as init(1) and kthreadd(2) – the older process will be choosen (or the one if the lowest PID as in case of the same-age init and kthreadd).

The control group name ("controlled by ...") is the name of the v1 "cpu" control sub-group controlling a particular most senior process. This name is relative to the root of the control group filesystem (such as /sys/fs/cgroup). The root is left out in order to reduce clutter.

Showing Owned (Non-User) Namespaces

It gets more interesting with the -d (details) flag: lsuns then additionally displays all non-user namespaces owned by the user namespaces. In Linux-kernel namespace parlance, "owning" refers to the relationship between a newly created namespace and the user namespace that was active at the time the new namespace was created. For convenience, lsuns sorts the owned namespaces first alphabetically by type, and second numerically by namespace IDs.

$ sudo lsuns -d
user:[4026531837] process "systemd" (1) created by UID 0 ("root")
β”‚  ⋄─ cgroup:[4026531835] process "systemd" (1)
β”‚  ⋄─ ipc:[4026531839] process "systemd" (1)
β”‚  ⋄─ ipc:[4026532332] process "systemd" (5492) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a"
β”‚  ⋄─ ipc:[4026532397] process "sleep" (6025) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/default/sleepy"
β”‚  ⋄─ mnt:[4026531840] process "systemd" (1)
β”‚  ⋄─ mnt:[4026531860] process "kdevtmpfs" (33)
β”‚  ⋄─ mnt:[4026532184] process "systemd-udevd" (946) controlled by "system.slice/systemd-udevd.service"
β”‚  ⋄─ mnt:[4026532245] process "haveged" (1688) controlled by "system.slice/haveged.service"
β”‚  ⋄─ mnt:[4026532246] process "systemd-timesyn" (1689) controlled by "system.slice/systemd-timesyncd.service"
β”‚  ⋄─ mnt:[4026532248] process "systemd-network" (1709) controlled by "system.slice/systemd-networkd.service"
β”‚  ⋄─ mnt:[4026532267] process "systemd-resolve" (1711) controlled by "system.slice/systemd-resolved.service"
β”‚  ⋄─ mnt:[4026532268] process "NetworkManager" (1757) controlled by "system.slice/NetworkManager.service"
β”‚  ⋄─ mnt:[4026532269] bind-mounted at "/run/snapd/ns/lxd.mnt"
β”‚  ⋄─ mnt:[4026532325] process "irqbalance" (1761) controlled by "system.slice/irqbalance.service"
β”‚  ⋄─ mnt:[4026532326] process "systemd-logind" (1779) controlled by "system.slice/systemd-logind.service"
β”‚  ⋄─ mnt:[4026532327] process "ModemManager" (1840) controlled by "system.slice/ModemManager.service"
β”‚  ⋄─ mnt:[4026532330] process "systemd" (5492) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a"
β”‚  ⋄─ mnt:[4026532388] process "bluetoothd" (2239) controlled by "system.slice/bluetooth.service"
β”‚  ⋄─ mnt:[4026532395] process "sleep" (6025) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/default/sleepy"
β”‚  ⋄─ mnt:[4026532513] process "colord" (83614) controlled by "system.slice/colord.service"
β”‚  ⋄─ mnt:[4026532516] process "upowerd" (96159) controlled by "system.slice/upower.service"
β”‚  ⋄─ net:[4026531905] process "systemd" (1)
β”‚  ⋄─ net:[4026532191] process "haveged" (1688) controlled by "system.slice/haveged.service"
β”‚  ⋄─ net:[4026532274] process "rtkit-daemon" (2211) controlled by "system.slice/rtkit-daemon.service"
β”‚  ⋄─ net:[4026532335] process "systemd" (5492) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a"
β”‚  ⋄─ net:[4026532400] process "sleep" (6025) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/default/sleepy"
β”‚  ⋄─ pid:[4026531836] process "systemd" (1)
β”‚  ⋄─ pid:[4026532333] process "systemd" (5492) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a"
β”‚  ⋄─ pid:[4026532398] process "sleep" (6025) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/default/sleepy"
β”‚  ⋄─ uts:[4026531838] process "systemd" (1)
β”‚  ⋄─ uts:[4026532185] process "systemd-udevd" (946) controlled by "system.slice/systemd-udevd.service"
β”‚  ⋄─ uts:[4026532247] process "systemd-timesyn" (1689) controlled by "system.slice/systemd-timesyncd.service"
β”‚  ⋄─ uts:[4026532324] process "systemd-logind" (1779) controlled by "system.slice/systemd-logind.service"
β”‚  ⋄─ uts:[4026532331] process "systemd" (5492) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a"
β”‚  ⋄─ uts:[4026532396] process "sleep" (6025) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/default/sleepy"
β”œβ”€ user:[4026532454] process "unshare" (98171) controlled by "user.slice" created by UID 1000 ("harald")
β”‚     ⋄─ mnt:[4026532455] process "unshare" (98171) controlled by "user.slice"
β”‚     ⋄─ mnt:[4026532457] process "unshare" (98172) controlled by "user.slice"
β”‚     ⋄─ pid:[4026532456] process "unshare" (98172) controlled by "user.slice"
β”‚     ⋄─ pid:[4026532458] process "bash" (98173) controlled by "user.slice"
└─ user:[4026532517] process "upowerd" (96159) controlled by "system.slice/upower.service" created by UID 0 ("root")
lspidns

On its surface, lspidns might appear to be lsuns twin, but now for PID namespaces.

pid:[4026531836] process "systemd" (1)
β”œβ”€ pid:[4026532333] process "systemd" (5492) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a"
β”‚  └─ pid:[4026532398] process "sleep" (6025) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/default/sleepy"
└─ pid:[4026532456] process "unshare" (99577) controlled by "user.slice"
   └─ pid:[4026532459] process "unshare" (99578) controlled by "user.slice"
      └─ pid:[4026532460] process "bash" (99579) controlled by "user.slice"

Nota Bene: if you look closely at the control group names of the PID namespace processes, then you might notice that there is an outer Docker container with an inner container. This inner container happens to be a containerd container in the "default" namespace.

User-PID Hierarchy

But hidden beneath the surface lies the -u flag; "u" as in user namespace. Now what have user namespaces to do with PID namespaces? Like other non-user namespaces, also PID namespaces are owned by user namespaces. -u now tells lspidns to show a "synthesized" hierarchy where owning user namespaces and owned PID namespaces are laid out in a single tree.

user:[4026531837] process "systemd" (1) created by UID 0 ("root")
└─ pid:[4026531836] process "systemd" (1)
   β”œβ”€ pid:[4026532333] process "systemd" (5492) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a"
   β”‚  └─ pid:[4026532398] process "sleep" (6025) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/default/sleepy"
   └─ user:[4026532454] process "unshare" (99576) controlled by "user.slice" created by UID 1000 ("harald")
      └─ pid:[4026532456] process "unshare" (99577) controlled by "user.slice"
         └─ user:[4026532457] process "unshare" (99577) controlled by "user.slice" created by UID 1000 ("harald")
            └─ pid:[4026532459] process "unshare" (99578) controlled by "user.slice"
               └─ pid:[4026532460] process "bash" (99579) controlled by "user.slice"

Note: this tree-like representation is possible because the capabilities rules for user and PID namespaces forbid user namespaces criss-crossing PID namespaces and vice versa.

pidtree

pidtree shows either the process hierarchy within the PID namespace hierarchy or a single branch only. It additionally shows translated PIDs, which are valid only inside the PID namespace processes are joined to. Such as in "containerd" (24446=78), where the PID namespace-local PID is 78, but inside the initial (root) PID namespace the PID is 24446 instead.

$ sudo pidtree
pid:[4026531836], owned by UID 0 ("root")
β”œβ”€ "systemd" (1)
β”‚  β”œβ”€ "systemd-journal" (910) controlled by "system.slice/systemd-journald.service"
β”‚  β”œβ”€ "systemd-udevd" (946) controlled by "system.slice/systemd-udevd.service"
...
β”‚  β”œβ”€ "containerd" (1836) controlled by "system.slice/containerd.service"
β”‚  β”‚  └─ "containerd-shim" (5472) controlled by "system.slice/containerd.service"
β”‚  β”‚     └─ pid:[4026532333], owned by UID 0 ("root")
β”‚  β”‚        β”œβ”€ "systemd" (5492/1) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a"
β”‚  β”‚        β”‚  β”œβ”€ "systemd-journal" (5642/66) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/system.slice/systemd-journald.service"
β”‚  β”‚        β”‚  β”œβ”€ "containerd" (5709/72) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/system.slice/containerd.service"
β”‚  β”‚        β”‚  β”œβ”€ "setup.sh" (5712/73) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/system.slice/testing.service"
β”‚  β”‚        β”‚  β”‚  └─ "ctr" (5978/107) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/system.slice/testing.service"
β”‚  β”‚        β”‚  └─ "containerd-shim" (5999/126) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/system.slice/containerd.service"
β”‚  β”‚        β”‚     └─ pid:[4026532398], owned by UID 0 ("root")
β”‚  β”‚        β”‚        β”œβ”€ "sleep" (6025/146/1) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/default/sleepy"
β”‚  β”‚        β”‚        └─ "sh" (6427/235/7) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/default/sleepy"

Alternatively, it can show just a single branch down to a PID inside a specific PID namespace.

$ sudo pidtree -n pid:[4026532398] -p 7
pid:[4026531836], owned by UID 0 ("root")
└─ "systemd" (1)
   └─ "containerd" (1836) controlled by "system.slice/containerd.service"
      └─ "containerd-shim" (5472) controlled by "system.slice/containerd.service"
         └─ pid:[4026532333], owned by UID 0 ("root")
            └─ "systemd" (5492/1) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a"
               └─ "containerd-shim" (5999/126) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/system.slice/containerd.service"
                  └─ pid:[4026532398], owned by UID 0 ("root")
                     └─ "sh" (6427/235/7) controlled by "docker/c8bf69d0651425244f472e89677177e3d488274f1d242c62a50a82f35feb8c4a/default/sleepy"

Please see also the pidtree command documentation.

nscaps

nscaps calculates a process capabilities in another namespace, based on the owning user namespace hierarchy. It then displays both the process' and target namespace user namespace hierarchy for better visual reference how process and target namespace relate to each other.

Examples like the one below will give unsuspecting security "experts" a series of fits -- despite this example being perfectly secure.

β›› user:[4026531837] process "systemd" (129419)
β”œβ”€ process "nscaps" (210373)
β”‚     ⋄─ (no capabilities)
└─ βœ“ user:[4026532342] process "unshare" (176744)
   └─ target net:[4026532353] process "unshare" (176744)
         ⋄─ cap_audit_control    cap_audit_read       cap_audit_write      cap_block_suspend
         ⋄─ cap_chown            cap_dac_override     cap_dac_read_search  cap_fowner
         [...]
         ⋄─ cap_syslog           cap_wake_alarm

...it's secure, because our superpower process can't do anything outside its realm. But the horror on the faces of security experts will be priceless.

β›” user:[4026531837] process "systemd" (211474)
β”œβ”€ β›› user:[4026532468] process "unshare" (219837)
β”‚  └─ process "unshare" (219837)
β”‚        ⋄─ cap_audit_control    cap_audit_read       cap_audit_write      cap_block_suspend
β”‚        ⋄─ cap_chown            cap_dac_override     cap_dac_read_search  cap_fowner
β”‚        ⋄─ cap_fsetid           cap_ipc_lock         cap_ipc_owner        cap_kill
β”‚        ⋄─ cap_lease            cap_linux_immutable  cap_mac_admin        cap_mac_override
β”‚        ⋄─ cap_mknod            cap_net_admin        cap_net_bind_service cap_net_broadcast
β”‚        ⋄─ cap_net_raw          cap_setfcap          cap_setgid           cap_setpcap
β”‚        ⋄─ cap_setuid           cap_sys_admin        cap_sys_boot         cap_sys_chroot
β”‚        ⋄─ cap_sys_module       cap_sys_nice         cap_sys_pacct        cap_sys_ptrace
β”‚        ⋄─ cap_sys_rawio        cap_sys_resource     cap_sys_time         cap_sys_tty_config
β”‚        ⋄─ cap_syslog           cap_wake_alarm
└─ target net:[4026531905] process "systemd" (211474)
      ⋄─ (no capabilities)

Please see also the nscaps command documentation.

dumpns

The lxkns namespace discovery information can also be easily made available to your own scripts, et cetera. Without having to integrate the Go package, simply run the dumpns CLI binary: it dumps fresh discovery results as JSON.

$ dumpns
{
  "namespaces": {
    "4026531840": {
      "nsid": 4026531840,
      "type": "mnt",
      "owner": 4026531837,
      "reference": "/proc/2849/ns/mnt",
      "leaders": [
        2849,
        2770,
        2662,
        2847
      ]
    },
    "4026531835": {
      "nsid": 4026531835,
      "type": "cgroup",
      "owner": 4026531837,
      "reference": "/proc/2849/ns/cgroup",
      "leaders": [
        2849,
...

Package Usage

For the really gory stuff, take a look at the examples/ and cmd/ directories. 😁

πŸ”Ž Discovery

The following example code runs a full namespace discovery using Discover(FullDiscovery) and then prints all namespaces found, sorted by their type, then by their ID.

package main

import (
    "fmt"
    "github.com/thediveo/gons/reexec"
    "github.com/thediveo/lxkns"
    "github.com/thediveo/lxkns/model"
)

func main() {
    reexec.CheckAction() // must be called before a full discovery
    result := lxkns.Discover(lxkns.FullDiscovery)
    for nsidx := model.MountNS; nsidx < model.NamespaceTypesCount; nsidx++ {
        for _, ns := range result.SortedNamespaces(nsidx) {
            fmt.Println(ns.String())
        }
    }
}
πŸ“‘ Marshalling and Unmarshalling

lxkns supports un/marshalling discovery results from/to JSON, this handles both the namespaces and process information.

package main

import (
    "fmt"
    "github.com/thediveo/gons/reexec"
    "github.com/thediveo/lxkns"
    apitypes "github.com/thediveo/lxkns/api/types"
)

func main() {
    reexec.CheckAction() // only for discovery, not for unmarshalling
    b, _ := json.Marshal(apitypes.NewDiscoveryResult(lxkns.Discover(lxkns.FullDiscovery)))

    dr := apitypes.NewDiscoveryResult(nil)
    _ = json.Unmarshal(b, &dr)
    result := (*lxkns.DiscoveryResult)(dr)
}

Note: discovery results need to be "wrapped" in order to be un/marshal-able.

πŸ”§ Tinkering

make targets:

  • test: builds and runs all tests inside a container; the tests are run twice, once as root and once as a non-root user.
  • deploy and undeploy: builds and starts, or stops, the containerized lxkns discovery service.
  • coverage: runs a full coverage on all tests in the module, once as root, once as non-root, resulting in a single coverage.html.
  • clean: removes coverage files, as well as any top-level CLI tool binaries that happened to end up there instead of ${GOPATH}/bin.
  • install: builds and installs the binaries into ${GOPATH}/bin, then installs these binaries into /usr/local/bin.

lxkns is Copyright 2020 Harald Albrecht, and licensed under the Apache License, Version 2.0.

Documentation ΒΆ

Overview ΒΆ

Package lxkns discovers Linux kernel namespaces of types cgroup, ipc, mount, net, pid, time, user, and uts. This package discovers namespaces not only when processes have joined them, but also when namespaces have been bind-mounted or are only referenced anymore by process file descriptors or within a hierarchy (PID and user only).

In case of PID and user namespaces, lxkns additionally discovers their hierarchies, except when running on a really ancient kernel before 4.9. Furthermore, for user namespaces the owning user ID and the owned namespaces will be discovered too. Time namespaces require a kernel 5.6 or later.

And finally, lxkns relates namespaces to the "leading" (or "root") processes joined to them; this relationship is basically derived for convenience from the process tree hierarchy. The kernel itself doesn't define any special relationship between namespaces and processes except for the "attachment" of processes joining namespaces.

The namespace discovery process can be controlled in several aspects, according to the range of discovery of namespace types and places to search namespaces for, according to the needs of API users of the lxkns package.

Discovery ΒΆ

A namespace discovery is just a single call to function lxkns.Discover(). Additionally, there's a one-time support function call to reexec.CheckAction() required as early as possible in main().

import (
    "github.com/thediveo/gons/reexec"
    "github.com/thediveo/lxkns"
)

func main() {
    reexec.CheckAction()
    ...
    allns := lxkns.Discover(lxkns.FullDiscovery)
    ...
}

Technical note: in order to discover namespaces in some locations, such as bind-mounted namespaces, lxkns needs to fork the process it used from in, in order to switch the forked copy into other mount namespaces for further discovery. In order to implement this mechanism as painless as possible, process using lxkns need to call reexec.CheckAction() as early as possible from their main().

Basics of the lxkns Information Model ΒΆ

Not totally unexpectedly, the lxkns discovery information model at its most basic level comprises ... namespaces. In the previous code snippet, the information model returned is stored in the "allns" variable for further processing. The result organizes the namespaces found by type. For instance, the following code snippet prints all namespaces, sorted first by type and then by namespace identifier:

// Iterate over all 7 types of Linux-kernel namespaces, then over all
// namespaces of a given type...
for nsidx := range allns.Namespaces {
    for _, ns := range allns.SortedNamespaces(lxkns.NamespaceTypeIndex(nsidx)) {
        println(ns.Type().Name(), ns.ID().Ino)
    }
}

Because namespaces have no order defined, the discovery results "list" the namespaces in per-type maps, indexed by namespace identifiers. For convenience, SortedNamespaces() returns the namespaces of a specific type in a slice instead of a map, sorted numerically by the namespace identifiers (that is, sorting by inode numbers, ignoring dev IDs at this time).

Technically, these namespace identifiers are tuples consisting of 64bit unsigned inode numbers and (~64bit) device ID numbers, and come from the special "nsfs" namespace filesystem integrated with the Linux kernel. And before someone tries: nope, the nsfs cannot be mounted; and it even does not appear in the kernel's list of namespaces.

Unprivileged Discovery and How To Not Panic ΒΆ

While it is possible to discover namespaces without root privileges, this won't return the full set of namespaces in a Linux host. The reason is that while an unprivileged discovery is allowed to see some basic information about all processes in the system, it is not allowed to query the namespaces such privileged processes are joined too. In addition, an unprivileged discovery may turn up namespaces (for instance, when bind-mounted) for which the identifier is discovered, but further information, such as the parent or child namespaces for PID and user namespaces, is undiscoverable.

Users of the lxkns information model thus must be prepared to handle incomplete information yielded by unprivileged lxkns.Discover() calls. In particular, applications must be prepared to handle:

  • more than a single "initial" namespace per type of namespace,
  • PID and user namespaces without a parent namespace,
  • namespaces without owning user namespaces,
  • processes not related to any namespace.

In consequence, always check interface values and pointers for nil values like a pro. You can find many examples in the sources for the "lsuns", "lspidns", and "pidtree" CLI tools (inside the cmd sub-package).

In-Capabilities ΒΆ

It is possible to run full discoveries without being root, when given the discovery process the following effective capabilities:

  • CAP_SYS_PTRACE -- no joking here, that's what needed for reading namespace references from /proc/[PID]/ns/*
  • CAP_SYS_CHROOT -- for mount namespace switching
  • CAP_SYS_ADMIN -- for mount namespace switching
  • CAP_DAC_READ_SEARCH -- for reading details of bind-mounted namespaces

Considering that especially CAP_SYS_PTRACE being essential there's probably not much difference to "just be root" in the end, unless you want show off your "capabilities capabilities".

Namespace Hierarchies ΒΆ

PID and user namespaces form separate and independent namespaces hierarchies. This parent-child hierarchy is exposed through the lxkns.Hierarchy interface of the discovered namespaces.

Please note that lxkns represents namespaces often using the lxkns.Namespace interface when the specific type of namespace doesn't matter. In case of PID and user-type namespaces an lxkns.Namespace can be "converted" into an interface value of type lxkns.Hierarchy using a type assertion, in order to access the particular namespace hierarchy.

// If it's a PID or user namespace, then we can turn a "Namespace"
// into an "Hierarchy" in order to access hierarchy information.
if hns, ok := ns.(lxkns.Hierarchy); ok {
    if hns.Parent() != nil {
        ...
    }
    for _, childns := range hns.Children() {
        ...
    }
}

Ownership ΒΆ

User namespaces play the central role in controlling the access of processes to other namespaces as well as the capabilities process gain when allowed to join user namespaces. A comprehensive discussion of the rules and their ramifications is beyond this package documentation. For starters, please refer to the man page for user_namespaces(7), http://man7.org/linux/man-pages/man7/user_namespaces.7.html.

The controlling role of user namespaces show up in the discovery information model as owner-owneds relationships: user namespaces own non-user namespaces. And non-user namespaces are owned by user namespaces, the "ownings". In case you are now scratching your head "why the Gopher" the owned namespaces are referred to as "ownings": welcome to the wonderful Gopher world of "er"-ers, where interface method naming conventions create wonderful identifier art.

If a namespace interface value represents a user-type namespace, then it can be "converted" into an lxkns.Ownership interface value using a type assertion. This interface discloses which namespaces are owned by a particular user namespace. Please note that this does not include child user namespaces, use Hierarchy.Children() instead.

// Get the user namespace -owned-> namespaces relationships.
if owns, ok := ns.(lxkns.Ownership); ok {
    for _, ownedns := range owns.Ownings() {
        ...
    }
}

In the opposite direction, the owner of a namespace can be directly queried via the lxkns.Namespace interface (again, only for non-user namespaces):

// Get the namespace -owned by-> user namespace relationship.
ownerns := ns.Owner()

When asking a user namespace for its owner, the parent user namespace is returned in accordance with the Linux ioctl()s for discovering the ownership of namespaces.

Namespaces and Processes ΒΆ

The lxkns discovery information model also relates processes to namespaces, and vice versa. After all, processes are probably the main source for discovering namespaces.

For this reason, the discovery results (in "allns" in case of the above discovery code example) not only list the namespaces found, but also a snapshot of the process tree at discovery time (please relax now, as this is a snapshot of the "tree", not of all the processes themselves).

// Get the init(1) process representation.
initprocess := allns.Processes[lxkns.PIDType(1)]
for _, childprocess := range initprocess.Children() {
    ...
}

Please note that the process tree information is for convenience; it's not a replacement for the famous gopsutil package in many use cases. However, the process tree information show which namespaces are used by (or "joined by") which particular processes.

// Show all namespaces joined by a specific process, such as init(1).
for nsidx := lxkns.MountNS; nsidx < lxkns.NamespaceTypesCount; nsidx++ {
    println(initprocess.Namespaces[nsidx].String())
}

It's also possible, given a specific namespace, to find the processes joined to this namespace. However, the lxkns information model optimizes this relationship information on the assumption that in many situations not the list of all processes joined to a namespace is needed, but actually only the so-called "leader" process or processes.

A leader process of namespace X is the process topmost in the process tree hierarchy of processes joined to namespace X. It is perfectly valid for a namespace to have more than one leader process joined to it. An example is a container with its own processes joined to the container namespaces, and an additional "visiting" process also joined to one or several namespaces of this container. The lxkns information then is able to correctly handle and represent such system states.

// Show the leader processes joined to the initial user namespace.
for _, leaders := range initprocess.Namespaces[lxkns.UserNS].Leaders() {
    ...
}

Architecture ΒΆ

Please find more details about the lxkns information model in the architectural documents: https://github.com/TheDiveO/lxkns/blob/master/docs/architecture.md.

Index ΒΆ

Constants ΒΆ

View Source
const SemVersion = "0.15.5"

SemVersion is the semantic version string of the lxkns module.

Variables ΒΆ

View Source
var FullDiscovery = DiscoverOpts{}

FullDiscovery sets the discovery options to a full and thus extensive discovery process. This is the preferred option setup for most use cases, unless you know exactly what you're doing and want to fine-tune the discovery process by selectively switching off certain discovery elements.

View Source
var NoDiscovery = DiscoverOpts{
	SkipProcs:      true,
	SkipTasks:      true,
	SkipFds:        true,
	SkipBindmounts: true,
	SkipHierarchy:  true,
	SkipOwnership:  true,
}

NoDiscovery set the discovery options to not discover anything. This option set can be used to start from when only a few chosen discovery methods are to be enabled.

Functions ΒΆ

func NSpid ΒΆ added in v0.9.0

func NSpid(proc *model.Process) (pids []model.PIDType)

NSpid returns the list of namespaced PIDs for the process proc, based on information from the /proc filesystem (the "NSpid:" field in particular). NSpid only returns the list of PIDs, but not the corresponding PID namespaces; this is because the Linux kernel doesn't give us the namespace information as part of the process status. Instead, a caller (such as NewPIDMap) needs to combine a namespaced PIDs list with the hierarchy of PID namespaces to calculate the correct namespacing.

func ReexecIntoAction ΒΆ added in v0.9.0

func ReexecIntoAction(actionname string, namespaces []model.Namespace, result interface{}) (err error)

ReexecIntoAction forks and then re-executes this process in order to run a specific action (indicated by actionname) in a set of (different) Linux kernel namespaces. The stdout result of running the action is then deserialized as JSON into the specified result element.

func ReexecIntoActionEnv ΒΆ added in v0.9.0

func ReexecIntoActionEnv(actionname string, namespaces []model.Namespace, envvars []string, result interface{}) (err error)

ReexecIntoActionEnv forks and then re-executes this process in order to run a specific action (indicated by actionname) in a set of (different) Linux kernel namespaces. It also passes the additional environment variables specified in envvars. The stdout result of running the action is then deserialized as JSON into the specified result element.

func SortChildNamespaces ΒΆ added in v0.9.0

func SortChildNamespaces(nslist []model.Hierarchy) []model.Hierarchy

SortChildNamespaces returns a sorted copy of a list of hierarchical namespaces. The namespaces are sorted by their namespace ids in ascending order. Please note that the list itself is flat, but this function can only be used on hierarchical namespaces (PID, user).

func SortNamespaces ΒΆ added in v0.9.0

func SortNamespaces(nslist []model.Namespace) []model.Namespace

SortNamespaces returns a sorted copy of a list of namespaces. The namespaces are sorted by their namespace ids in ascending order.

func SortedNamespaces ΒΆ added in v0.9.0

func SortedNamespaces(nsmap model.NamespaceMap) []model.Namespace

SortedNamespaces returns the namespaces from a map sorted.

Types ΒΆ

type BindmountedNamespaceInfo ΒΆ added in v0.11.0

type BindmountedNamespaceInfo struct {
	ID        species.NamespaceID   `json:"id"`
	Type      species.NamespaceType `json:"type"`
	Path      string                `json:"path"`
	OwnernsID species.NamespaceID   `json:"ownernsid"`
	Log       []string              `json:"log"` // not strictly necessary, yet very helpful.
}

BindmountedNamespaceInfo describes a bind-mounted namespace in some (other) mount namespace, including the owning user namespace ID, so we can later correctly set up the ownership relations in the discovery results.

type DiscoverOpts ΒΆ added in v0.9.0

type DiscoverOpts struct {
	// The types of namespaces to discover: this is an OR'ed combination of
	// Linux kernel namespace constants, such as CLONE_NEWNS, CLONE_NEWNET, et
	// cetera. If zero, defaults to discovering all namespaces.
	NamespaceTypes species.NamespaceType `json:"-"`

	// Where to scan (or not scan) for signs of namespaces?
	SkipProcs      bool `json:"skipped-procs"`      // Don't scan processes.
	SkipTasks      bool `json:"skipped-tasks"`      // Don't scan threads, a.k.a. tasks.
	SkipFds        bool `json:"skipped-fds"`        // Don't scan process file descriptors for references to namespaces.
	SkipBindmounts bool `json:"skipped-bindmounts"` // Don't scan for bind-mounted namespaces.
	SkipHierarchy  bool `json:"skipped-hierarchy"`  // Don't discover the hierarchy of PID and user namespaces.
	SkipOwnership  bool `json:"skipped-ownership"`  // Don't discover the ownership of non-user namespaces.
}

DiscoverOpts gives control over the extend and thus time and resources spent on discovering Linux kernel namespaces, their relationships between them, and with processes.

type DiscoveryResult ΒΆ added in v0.9.0

type DiscoveryResult struct {
	Options           DiscoverOpts        // options used during discovery.
	Namespaces        model.AllNamespaces // all discovered namespaces, subject to filtering according to Options.
	InitialNamespaces model.NamespacesSet // the 7 initial namespaces.
	UserNSRoots       []model.Namespace   // the topmost user namespace(s) in the hierarchy
	PIDNSRoots        []model.Namespace   // the topmost PID namespace(s) in the hierarchy
	Processes         model.ProcessTable  // processes checked for namespaces.
}

DiscoveryResult stores the results of a tour through Linux processes and kernel namespaces.

func Discover ΒΆ added in v0.9.0

func Discover(opts DiscoverOpts) *DiscoveryResult

Discover returns the Linux kernel namespaces found, based on discovery options specified in the call. The discovery results also specify the initial namespaces, as well the process table/tree on which the discovery bases at least in part.

func (*DiscoveryResult) SortedNamespaces ΒΆ added in v0.9.0

func (dr *DiscoveryResult) SortedNamespaces(nsidx model.NamespaceTypeIndex) []model.Namespace

SortedNamespaces returns a sorted list of discovered namespaces of the specified type. The namespaces are sorted by their identifier, which is an inode number (on the special "nsfs" filesystem), ignoring a namespace's device ID.

type NamespacedPID ΒΆ added in v0.9.0

type NamespacedPID struct {
	PIDNS model.Namespace // PID namespace ID for PID.
	PID   model.PIDType   // PID within PID namespace (of ID).
}

NamespacedPID is PID in the context of a specific PID namespace.

type NamespacedPIDs ΒΆ added in v0.9.0

type NamespacedPIDs []NamespacedPID

NamespacedPIDs is a list of PIDs for the same process, but in different PID namespaces. The order of the list is undefined.

func (NamespacedPIDs) PIDs ΒΆ added in v0.9.0

func (ns NamespacedPIDs) PIDs() []model.PIDType

PIDs just returns the different PIDs assigned to a single process in different PID namespaces, without the namespaces. This is a convenience function for those lazy cases where just the PID list is wanted, but no PID namespace details.

type PIDMap ΒΆ added in v0.9.0

type PIDMap map[NamespacedPID]NamespacedPIDs

PIDMap maps a single namespaced PID to the list of PIDs for this process in different PID namespaces. Further PIDMap methods then allow simple translation of PIDs between different PID namespaces.

func NewPIDMap ΒΆ added in v0.9.0

func NewPIDMap(result *DiscoveryResult) PIDMap

NewPIDMap returns a new PID map based on the specified discovery results and further information gathered from the /proc filesystem.

func (PIDMap) NamespacedPIDs ΒΆ added in v0.9.0

func (pidmap PIDMap) NamespacedPIDs(pid model.PIDType, from model.Namespace) NamespacedPIDs

NamespacedPIDs returns for a specific namespaced PID the list of all PIDs the corresponding process has been given in different PID namespaces. Returns nil if the PID doesn't exist in the specified PID namespace. The list is ordered from the topmost PID namespace down to the leaf PID namespace to which a process actually is joined to.

func (PIDMap) Translate ΒΆ added in v0.9.0

func (pidmap PIDMap) Translate(pid model.PIDType, from model.Namespace, to model.Namespace) model.PIDType

Translate translates a PID "pid" in PID namespace "from" to the corresponding PID in PID namespace "to". Returns 0, if PID "pid" either does not exist in namespace "from", or PID namespace "to" isn't either a parent or child of PID namespace "from".

Directories ΒΆ

Path Synopsis
api
types
Package types defines the common types for (un)marshalling elements of the lxkns information model from/to JSON.
Package types defines the common types for (un)marshalling elements of the lxkns information model from/to JSON.
cmd
dumpns
dumpns runs a namespace (and process) discovery and then dumps the results as JSON.
dumpns runs a namespace (and process) discovery and then dumps the results as JSON.
internal/pkg/cli
Package cli handles registering CLI flags via a plug-in mechanism.
Package cli handles registering CLI flags via a plug-in mechanism.
internal/pkg/filter
Package filter provides CLI-controlled filtering of namespaces by type.
Package filter provides CLI-controlled filtering of namespaces by type.
internal/pkg/style
Package style styles text output of the CLI commands with foreground and background colors, as well as different text styles (bold, italics, ...).
Package style styles text output of the CLI commands with foreground and background colors, as well as different text styles (bold, italics, ...).
internal/test/getstdout
Package getstdout captures os.Stdout and os.Stderr while executing a specified function, returning the captured output afterwards.
Package getstdout captures os.Stdout and os.Stderr while executing a specified function, returning the captured output afterwards.
lspidns
lspidns lists the tree of PID namespaces, optionally with their owning user namespaces.
lspidns lists the tree of PID namespaces, optionally with their owning user namespaces.
lsuns
lsuns lists the tree of user namespaces, optionally with the other namespaces they own.
lsuns lists the tree of user namespaces, optionally with the other namespaces they own.
nscaps
nscaps determines a process' capabilities in some namespace.
nscaps determines a process' capabilities in some namespace.
pidtree
pidtree displays a tree (or only a single branch) of processes together with their PID namespaces, and additionally also shows the local PIDs of processes (where applicable).
pidtree displays a tree (or only a single branch) of processes together with their PID namespaces, and additionally also shows the local PIDs of processes (where applicable).
examples
internal
log
Package log allows consumers of the lxkns module to forward logging originating in the lxkns module to whatever logger module they prefer.
Package log allows consumers of the lxkns module to forward logging originating in the lxkns module to whatever logger module they prefer.
logrus
Package logrus enables logging within the lxkns module and directs all logging output to the sirupsen/logrus logging module.
Package logrus enables logging within the lxkns module and directs all logging output to the sirupsen/logrus logging module.
Package model defines the core of lxkns information model: Linux kernel namespaces and processes, and how they relate to each other.
Package model defines the core of lxkns information model: Linux kernel namespaces and processes, and how they relate to each other.
Package nstest provides testing support in the context of Linux kernel namespaces.
Package nstest provides testing support in the context of Linux kernel namespaces.
gmodel
Package gmodel provides Gomega matches for lxkns model elements.
Package gmodel provides Gomega matches for lxkns model elements.
ops
Package ops provides a Golang-idiomatic API to the query and switching operations on Linux-kernel namespaces, hiding ioctl()s and syscalls.
Package ops provides a Golang-idiomatic API to the query and switching operations on Linux-kernel namespaces, hiding ioctl()s and syscalls.
internal/opener
Package opener provides access to the file descriptors of namespace references.
Package opener provides access to the file descriptors of namespace references.
portable
Package portable provides so-called "portable" namespace references with validation and "locking" (keeping the referenced namespace open and thus alive).
Package portable provides so-called "portable" namespace references with validation and "locking" (keeping the referenced namespace open and thus alive).
relations
Package relations gives access to properties of and relationships between Linux-kernel namespaces, such as type and ID of a namespace, its owning user namespace, parent namespace in case of hierarchical namespaces, et cetera.
Package relations gives access to properties of and relationships between Linux-kernel namespaces, such as type and ID of a namespace, its owning user namespace, parent namespace in case of hierarchical namespaces, et cetera.
Package species defines the type constants and type names of the 8 Linux kernel namespace types ("species").
Package species defines the type constants and type names of the 8 Linux kernel namespace types ("species").

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL