Skip to content

Pantavisor Architecture

What is it?

Pantavisor is the container-based init system that makes it possible to enable your static firmware Linux device into a multi-function one. With Pantavisor, it becomes straightforward to control the lifecycle of your device firmware, including now only management of your base OS but also providing extended features through a rich applications and services system based on micro containers.

Pantavisor is remotely controlled through an instance of Pantahub, which is the operational controller running in the cloud. More details of Pantahub can be found in its own documentation.

Software Specs

Pantavisor is written in pure C, with emphasis on targeting the deeply embedded Linux ecosystem. It is meant to be a single-binary init system that boots directly from the kernel and becomes the first process to run, which then brings up the rest of the system as a set of well-defined micro containers.

We aim to bring a full set of containers functionality into the single binary, while keeping its size as small as possible to make sure it can cover the low end of the market. Depending on the functions built in the size can oscillate, but the average size for a fully functional system puts the Pantavisor binary at around 350 Kilobytes (as compressed initial ramdisk).

Pantavisor uses pure Linux container technology, for which we use parts of the LXC suite as a library to wrap around the basic building blocks of containers. Due to the fact that LXC is also a pure C project, we are able to keep the overall footprint of Pantavisor quite small.

Licensing

Pantavisor is MIT licensed, which means tons of flexibility for commercial projects, without the headaches of a pure GPL codebase. The container runtime plugin system makes a clear separation of the provider library and Pantavisor itself. Architecture

Pantavisor enabled systems define their running software as a set of containers. This basically means that any complex Linux system can be broken down into a set of containers, including the base OS itself. Because we are able to do this, it becomes straightforward to manage the lifecycle of each component separately.

We defined each application or service as a container, which includes all of the associated objects that are needed to start the said containers. In our case, this is:

  • Root filesystem of the container
  • Configuration file for liblxc
  • Pantavisor helper metadata
  • Extra volumes and storage configuration

Then, on a multi-service system we have several definitions of these which make up the full running system. We call this the Pantavisor State Format.

Trails and Steps

The core concept behind designing a system as a set of containers, and held in a single definition document, is that these can be revisioned. We have two concepts which make it possible for devices to be fully defined in a revisioned system:

  • Step: This is a single device revision, one version of the definition document which outlines all the objects, containers and system assets that are part of a specific firmware revision.

  • Trail: This is a revisioned list of Steps, ordered historically. A single device will always have an associated trail with it, which it “follows”. Think of it as a live representation of what the software on the device has been and will be at all times.

Each step contains the State, which is the Json document that outlines the objects and containers that will be run. This single document makes up the blueprint and soul of the device, whenever it boots up all systems are brought up from it, and because it is a single Json document it means that verifying the consistency of the system is straightforward.

Pantavisor State Format

When a device runs, and Pantavisor comes up, it will set up the needed things such as backing storage and so on, and then will attempt to run the set of containers that define the system. In order to do this, it follows a definition format for the software/firmware of the system. This is the current Pantavisor State Format.

{
  "#spec": "pantavisor-service-system@1",
  "README.md": "519e5dc940967bf75201d5234c29cbec4ec69e730ba7b692aac2922787b14576",
  "alpine-base-device/lxc.container.conf": "c1867cddc38f1397638401a1022fc3d81883b9d2101dbac166d6d38bb2ba194f",
  "alpine-base-device/root.squashfs": "93c9f4a6adf4c7e1ef0924d4b78dffc29accb0691cde831013bb00bda09fd39b",
  "alpine-base-device/root.squashfs.docker-digest": "efe74c53f27ef3490e25df5cc1e490a0431aa86abebef39a355030791e8f73df",
  "alpine-base-device/run.json": {
    "#spec": "service-manifest-run@1",
    "config": "lxc.container.conf",
    "name": "alpine-base-device",
    "root-volume": "root.squashfs",
    "storage": {
      "lxc-overlay": {
        "persistence": "permanent"
      }
    },
    "type": "lxc",
    "volumes": []
  },
  "alpine-base-device/src.json": {
    "#spec": "service-manifest-src@1",
    "config": {
      "Entrypoint": "/sbin/init"
    },
    "docker_digest": "sha256:66aac2d1e56e5fb45617d5b5c65d2fbb4e4d1c0e2d676db949a84ef1c3b7b5cc",
    "docker_name": "registry.gitlab.com/pantacor/pv-platforms/alpine-base",
    "docker_tag": "ARM32V6",
    "name": "alpine-base-device",
    "persistence": {
      "lxc-overlay": "permanent"
    },
    "template": "builtin-lxc-docker"
  },
  "bsp/firmware.squashfs": "dbb6f487f18a6a87b44cd20a4a3c2a3a9cefa007764c468dbd4aec34faaef94e",
  "bsp/kernel.img": "9def55991ee1f5cb2f0f1da12d188db1137fcfaa1eaf4be28fc6a453a661ad2d",
  "bsp/modules.squashfs": "92dcdb6bc3876643bed6aa2b21caa5b276bc1cd9a6163a33c5f90b0f3b6db6b1",
  "bsp/pantavisor.cpio.xz4": "f7c0a36ba791e1a6c435ce58713f74658d5018512cd96037c40a6fb394716755",
  "bsp/run.json": {
    "#spec": "bsp-manifest-run@1",
    "addons": [],
    "firmware": "firmware.squashfs",
    "initrd": "pantavisor.cpio.xz4",
    "linux": "kernel.img",
    "modules": "modules.squashfs"
  },
  "bsp/src.json": {
    "#spec": "bsp-manifest-src@1",
    "firmware": "firmware.squashfs",
    "modules": "modules.squashfs",
    "name": "RPi3 Pantavisor(TM) BSP",
    "platform": "arm-rpi3",
    "pvr": "https://pvr.pantahub.com/pantabsps/arm-rpi3"
  },
  "network-mapping.json": {},
  "ph-vpn/lxc.container.conf": "355b0f70c0a303da8feed67e77bf50aa90418bbf35ec649fd2dbcfc6f63d7a10",
  "ph-vpn/root.squashfs": "38f750de0d39cf199b164f0720342bc2d295bf9d42337b74edb406a9dddee129",
  "ph-vpn/root.squashfs.docker-digest": "5a282c74d8a82f8864ea515e425d1a3f930e918fbab7cf29bde014debad6f526",
  "ph-vpn/run.json": {
    "#spec": "service-manifest-run@1",
    "config": "lxc.container.conf",
    "name": "ph-vpn",
    "root-volume": "root.squashfs",
    "storage": {
      "docker-docker--volume-test": {
        "persistence": "revision"
      },
      "lxc-overlay": {
        "persistence": "permanent"
      }
    },
    "type": "lxc",
    "volumes": []
  },
  "ph-vpn/src.json": {
    "#spec": "service-manifest-src@1",
    "config": {
      "Entrypoint": "/sbin/init"
    },
    "docker_digest": "sha256:f9bf499acc67878ce491bd473723c4b3cd43d5c3172f5e862e6e556a116bd537",
    "docker_name": "registry.gitlab.com/pantacor/pv-platforms/ph-vpn",
    "docker_tag": "ARM32V6-eff894854ed734f772ffbf510b782a33c600f61f",
    "name": "ph-vpn",
    "persistence": {
      "lxc-overlay": "permanent"
    },
    "template": "builtin-lxc-docker"
  },
  "pv-avahi/lxc.container.conf": "103d53c34dd5a960393ed4d7e78b46dff79c22608692ce1dce26cf90395d421e",
  "pv-avahi/root.squashfs": "34a8c2319e9cc18a76c094e4a4a2de28af5a66a2bf12929d9e8502ddf37e3532",
  "pv-avahi/root.squashfs.docker-digest": "5c97f17732a971465cda5210c6bc918fe1c234c5cc460615a1d1af24b2188530",
  "pv-avahi/run.json": {
    "#spec": "service-manifest-run@1",
    "config": "lxc.container.conf",
    "name": "pv-avahi",
    "root-volume": "root.squashfs",
    "storage": {
      "lxc-overlay": {
        "persistence": "permanent"
      }
    },
    "type": "lxc",
    "volumes": []
  },
  "pv-avahi/src.json": {
    "#spec": "service-manifest-src@1",
    "config": {
      "Entrypoint": "/sbin/init"
    },
    "docker_digest": "sha256:e0642b8bfb88ece396fc679baecf9a4d510bea4492e834dc3bd8e38c713130f7",
    "docker_name": "registry.gitlab.com/pantacor/pv-platforms/pv-avahi",
    "docker_tag": "ARM32V6",
    "name": "pv-avahi",
    "persistence": {
      "lxc-overlay": "permanent"
    },
    "template": "builtin-lxc-docker"
  },
  "release-log.txt": "f0491a2096790b25131015b8a0dcc70ee957e3f62b6d58cf2d31e363a9d75d43",
  "storage-mapping.json": {}
}

As you can see, the Json-based format is a flat representation of a set of either binary objects (files on disk) or inline Json documents. Each “platform” or “service” is self contained in its own namespaced keys, with a set of documents that make it possible for Pantavisor and Pantahub to do their jobs:

  • run.json: this is the base json that Pantavisor interprets when it needs to run a service. All service rules such as entry point, networking, storage, etc are held in this place and translated into the container runtime in question when the container executes.

  • src.json: this is the base json document that Pantahub and PVR use to assemble containers on demand from repository sources such as a Docker registry or Docker hub itself.

  • root.squashfs: always named like this, self explanatory root filesystem of the container in question. This is mounted with the relevant overlays and storage locations in order to run the container.

Thanks to the namespaced / tiered format, we can have multiple containers running in parallel on the system and defined on this document without limitations.

Everything else that is not a json document, is a binary object. Binary objects are uniquely referenced by their sha256, which lets Pantavisor fetch the relevant objects from the Pantahub object storage. In the most simple case, for platforms, the only binary object is the root.squashfs image file.

In addition to services, Pantavisor also defines the BSP (Board support package) of the device. The bsp section is similar to the others, with the main difference that there’s a handful of different assets.

At the bare minimum, the bsp provides a kernel image as well as the Pantavisor initrd itself. In addition to this, it can also provide a modules squashfs image and a firmware (loadable binaries) squashfs image. All of these are managed by Pantavisor and mounted in the right way to present the relevant assets to the running containers.

Kernel and BSP

Pantavisor is in charge of the lifecycle of the kernel, modules and firmware -- in addition to application containers.

Due to the nature of Pantavisor-enabled systems, each version of the running software makes the kernel and the initrd available in a specific (versioned) location for the bootloader to find. In the case of u-boot, this means loading the right assets from the filesystem and booting them, while giving the Pantavisor init hints about the version that is currently being run (through the kernel command line).

Pantavisor makes the kernel and the initrd available in the following locations:

/$storage/trails/$revision/.pv/.pv-kernel
/$storage/trails/$revision/.pv/.pv-initrd

Due to this, bootloaders can easily find the assets if the requested revision number is known. If for some reason that fails, the base (0) revision can always be booted to get the process back in sync.

In addition to the kernel and initrd (Pantavisor). The system can also provide a set of modules and firmware files inside of squashfs filesystem images. These are mounted and mapped to the relevant versioned locations under /lib/modules and /lib/firmware and made available for the platforms to use.

Platforms and Services

In a Pantavisor system, everything is either a BSP or a Platform. We also call platforms Services because they provide a whole set of functional features inside of a single container.

At the bare minimum, a system has at least one platform, which is what would be considered the rootfs in a legacy system. The main platform is usually in charge of bringing up hardware, dealing with network, and making sure that Pantavisor itself can reach Pantahub. In a situation where the device can reliably gain networking and reach Pantahub, it can always be remotely managed.

Platforms are container definitions. We have our own high level container definition format which is part of the Pantavisor State Format defined above. Here we configure some aspects of the container to be run, including not only where to find the rootfs but also how to setup storage volumes for it.

There are no limitations as to the type of platforms that can be run on the system, nor a limitation on the number. The constraints are defined by the hardware itself and its own limits as to how much can be run concurrently on it. For resource control, we leverage the container runtime’s own features such as cgroups and security management from LXC.

All files needed to run a given platform for a given revision can be found under the following directory structure:

/$storage/trails/$revision/$platform_name/

At the minimum, root.squashfs is needed, as well as an LXC configuration file and the Json metadata for Pantavisor (run.json).

Volumes

In addition to Platforms, another important feature of Pantavisor is Volumes. Volumes basically describe any storage entity, be it read only backing store or logical writable storage locations that Platforms can make use of for their own runtime needs.

The most basic Volume which is always present for a platform is the root filesystem (root.squashfs) from above. In addition to this, a platform may define several more auxiliary image volumes through the image-volumes metadata key in the run.json control file.

The platform can also define writable storage locations of three different persistence options:

  • Permanent: Not revision, single location backing store of writable capabilities for general data storage.

  • Revision: As the name implies, it is a writable storage location which is pegged to the revision of the platform in question. In an update, a new revision storage is presented.

  • Boot: A pure throwaway tmpfs-backed storage location for scratch data generated.

The platform may define several of these and use them in whatever way wanted. The only required volume is the lxc-overlay one which is used in our LXC Configuration template to provide the backing storage for the container. By default this is of Permanent persistence, but may also be changed as per needs. There is no hard requirement for a container to have any persistent storage, as it could very well get going with just a tmpfs-backed one which resets on reboot.

Storage

Pantavisor aims to do a lightweight abstraction of the backing storage structure. There is usually one filesystem where everything runs, but the topology of this disk is not completely exposed to the running containers. Instead, the choices of volume storage as per the previous section define how the different options of storage are provided.

Pantavisor defines which block device and partition should be used for the single backing storage. There is support for all the normal filesystems and usual block devices, but also we have support for UBIFS and JFFS2 backing storage. Bootloader Integration Out of the box Pantavisor supports integration with the most common bootloaders for both the embedded as well as the more feature-complete system use cases: u-boot and grub. Technically, integration can be extended for any other custom made bootloader types as there are just a handful of things that need to be provided, mainly:

  • Boot scripting support to trigger try/fail decisions
  • Filesystem read support (for either of ext, ubifs, jffs2, fat)
  • Environment storage to save try/fail hint flags

Monitoring

Through its connection to Pantahub, the device can expose lots of information about itself. By default, all Pantavisor logs are pushed up to Pantahub and available for remote introspection. Log sources are not only Pantavisor itself but also the running containers, which can be configured to export output from both the main logging system as well as separate processes within.

In addition to this, other operational data is available through our data sources. Aggregation of data yields information about the quality of connectivity, bandwidth, network traffic, source IPs, etc.

Metadata

Pantavisor also exposes an interface to deal with metadata locally on the device and with Pantahub. Through Pantahub, the user can push metadata in the form of key-value pairs down to the device, which can then be seen by the platforms through a simple file interface.

The other way is also possible, with the device being able to push device metadata up to Pantahub whenever it wants. This can be used to automatically publish information such as network setup, IP addresses, geolocation -- but also to implement handshake protocols, as the metadata interface is a two-way asynchronous channel, a device can present metadata on Pantahub and have a secondary service consume this and post a response on the user metadata.

Updates and Pantahub control

Pantavisor processes revision updates (new steps) through a reliable, transactional mechanism. Due to the nature of Pantavisor states being installed on disk in a revisioned manner, it is relatively simple to install all the needed assets and fully verify before attempting to start such a step. If for some reason any of the verifications failed, it is easy to discard the whole update and start over.

By default, Pantavisor polls the Pantahub trail that it follows, checking for NEW steps that have not been processed before. Most of the times this quick poll results in a nothing-new outcome, which sends Pantavisor back into idling until it’s next poll cycle.

In the case of an update being available, the first thing Pantavisor does is lock the update on the cloud by indicating that it has seen it and scheduled it for processing. After this point, the update on Pantahub can no longer be cancelled or deleted.

Due to the flat object architecture, all that Pantavisor needs to do is get the State document, unpack the Json files on disk and then download the relevant objects from the objects/ API as per their referenced SHAs. By putting all of this on disk, the system can be given the instruction to tear it’s state down and try to bring up the new state.

Our transactional update process means that any new state run attempt will have to undergo a set of verifications before it can be considered clean; at the very basis this means making sure that all containers started correctly and didn’t exit within a timeframe, as well as making sure that upon an update, we should continue to be able to reach a Pantahub instance.

Copyright (C) Pantacor Ltd 2019

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".

A copy of this license can be found on the LICENSE file, or at: https://www.gnu.org/licenses/fdl-1.3.en.html