In this blog post, we’re going to build on what we discussed in our last post — Container Security: Understanding the Hierarchy of Runtimes — to provide a better understanding of the hierarchy of runtimes. Specifically, we will continue to dive into Kubernetes runtimes and also how container shims work!
But first, we need to take a moment to make sure we understand the functional differences and symbiosis between Kubernetes and Docker. For many security professionals (not just those with a non-technical background), this question can get brushed aside, but a clear understanding is essential.
At a high level, Kubernetes and Docker serve different purposes, and operate in different places of the overall hierarchy of container operations. Kubernetes can run without Docker, and Docker can run without Kubernetes; however, they often work together, leveraging each technology’s intended purpose. Docker is a technology for automating the process of deploying containers. Kubernetes is orchestration software that gives us an API to manage how the containers will run.
In a broad sense, Docker runs on nodes, and Kubernetes runs clusters of nodes. To run containers in pods, Kubernetes uses runtimes. Considering what we know about runtimes and how they are defined, Docker can be considered a runtime for Kubernetes, and is a high-level runtime as defined in our last post.
CRI and the Kubelet
Before we get into the runtimes, we need to define two items, and at a minimum have a contextual understanding, in order to move forward. These items are CRI and kubelet.
To address the challenge of solidifying runtime choices into Kubernetes, the community defined an interface and the functions that a runtime would be expected to perform within Kubernetes. This brings us to CRI, or the Container Runtime Interface (CRI).
CRI has standardized expectations of a compatible runtime. Three of these fundamental expectations are that the runtime:
- Can both start and stop pods
- Can support operation calls — Start, Stop, Kill, Delete
- Provides image management from the registry
CRI connects the kubelet to other runtimes. The kubelet is the main agent that runs on each worker node and ensures that containers are running in a pod. When it comes alive, the kubelet uses CRI to work with whatever runtime is present on that specific node. Kubelet fundamentally needs the runtime to:
- Provide image management
- Prepare the environment to instantiate the container
- Prepare the network for the pod
CRI works like another API that lets us swap runtime applications as opposed to having them fixed to the kubelet. CRI is like a tendon between Kubernetes and a runtime that lets a pod work in a Kubernetes cluster.
While lesser known and developmental runtimes are being used for specific purposes as the applications of containerized deployments grow, in this post we will look at the three most commonly observed high-level runtimes working with Kubernetes: CRI-O, Docker, and containerd.
CRI-O is a Kubernetes-specific, high-level runtime. It can run as a lightweight alternative to (or in conjunction with) Docker as a runtime for Kubernetes. CRI-O can work with any OCI runtime to run pods, manage images, and pull registry images. We most commonly see CRI-O supporting runC (low-level runtime).
CRI-O’s individual components can be found and researched further on their respective GitHub repositories:
- OCI compatible runtime
- networking (CNI)
- container monitoring (conmon)
Docker was originally developed as a monolithic daemon (dockerd) that was broken up as it evolved. The low-level runtime features that were present in early versions were separated into two daemons: runC and containerd. Dockerd has features for building images, while containerd manages and runs images.
While Docker can, and commonly is used without Kubernetes, it remains a primary runtime. With Kubernetes as the orchestrator, current versions of Docker package, build, and run containers. It was the first open source container runtime, and is still a time-tested staple for developers and engineers with an ever-growing multitude of use cases.
As mentioned in our last post, containerd is a high-level runtime and daemon that can be thought of as an API faceplate of types for other runtimes in the stack. containerd has mechanisms for building container platforms and has an API used by remote applications to provide monitoring and delegation. containerd is automatically installed with Docker, but can also be installed and used independently.
containerd is more focused, operationally, than Docker. It can’t build images — but is designed to be easily embeddable. In the operational hierarchy, containerd calls the containerd-shim, which in turn, calls runC to run the container image.
While not a runtime, there is a link in the chain that needs to be discussed to fully understand how a static image becomes a deployed container. This is the container “shim.” The shim sits between the container manager and a runtime to facilitate communication and prevent integration problems that may arise. It allows for daemon-less containers. It basically sits as the parent of the container’s processes to facilitate communications, and eliminates the long running runtime processes for containers. The processes of the shim and the container are bound tightly; however, they are totally separated from the process of the container manager. The easiest way to spot the shim is to inspect the process tree on a Linux host with a running Docker container — it will appear as
The shim allows a number of actions to take place, including the following:
- It allows a runtime (runC) to exit after the container is started. Without this we would still be subject to long runtime processes.
- If Docker or containerd fails, it keeps STDIO open for the container. Without it the container would exit.
In Closing . . .
As stated before, understanding the process that a container goes through to get from image to live and deployed, and the runtimes that make them happen, is critical in enabling us to think about designing, deploying, and maintaining a secure containerized environment. Making sense of noteworthy container exploits (like CVE-2019-5736, which allowed for full breakout back up to the host) requires a fundamental understanding of their intended functionality and where they fit within the overall hierarchy of processes that bring the image to life.
Now that we’ve taken a dive into the world of runtimes, let’s go a little deeper in our next post to better understand how malicious actors have exploited containers recently. This will help us understand where we are today, and how we can position ourselves in the strongest stance possible against future exploitation.