Source Code Analysis — Understanding CNI’s usage from kubelet, container runtime
This is the third note on learning Kubernetes networking.
- Deep Dive into Kubernetes Network Model and Communication
- Understanding the Container Network Interface (CNI)
- Source code analysis: how kubelet and container runtime work with CNI
- Learning Kubernetes VXLAN network from Flannel
- Kubernetes network learning with Cilium and eBPF
In the previous article, through the interpretation of the CNI specification, we understood the operations and processes of network configuration. Among the several operations in the network, besides
CNI_COMMAND, there are three other parameters that are almost always provided:
CNI_NETNS. These parameters all come from the container runtime. This article will analyze the use of CNI by combining the source code of Kubernetes and Containerd.
The source code of Kubernetes is from the branch
release-1.24, and that of Containerd is from the branch
Creating a Pod
In the previous kubelet source code analysis, it was mentioned that
Kubelet#syncLoop() continuously monitors changes from files, apiserver, http to update the status of the pod. When writing that article, the analysis ended here. Because the work after this is handed over to the container runtime to complete the creation and running of sandbox and various containers, see
kubelet encapsulates requests for creating and running sandbox and containers, calls the container runtime interface, and delegates the specific work to the container runtime (Container Runtime Interface, abbreviated as CRI, will be studied in the future).
Reference Source Code
Remember in the first article of the series, when we viewed namespaces on the node, the process of the network namespace was
lsns -t net
NS TYPE NPROCS PID USER NETNSID NSFS COMMAND
4026531992 net 126 1 root unassigned /lib/systemd/systemd --system --deserialize 31
4026532247 net 1 83224 uuidd unassigned /usr/sbin/uuidd --socket-activation
4026532317 net 4 129820 65535 0 /run/netns/cni-607c5530-b6d8-ba57-420e-a467d7b10c56 /pause
When Kubernetes creates a pod, it first creates a sandbox container (using the
pause image, execute
/pause to enter sleep mode when starting). We know that Kubernetes allows multiple containers in a pod, and this sandbox container creates and maintains the network namespace, and other containers in the pod will join this namespace. Because the pause image is simple enough, it will not cause errors that cause the network management space to be deleted when an error occurs. The sandbox container plays a crucial role, it serves as the process with PID 1 in the process tree of the PID process space, and other container processes take it as the parent process. When other container processes become orphan processes, they can be cleaned up.
Creating Sandbox Container
RuntimeServiceServer defines the service interface provided by the runtime to the outside. In addition to managing sandbox and container-related operations, there are also streaming-related operations, namely common
portforward. For streaming related content, you can refer to a previous article 《Source Code Analysis of the Working Principle of kubectl port-forward》.
Let’s look at the container-related part.
criService implements the
RuntimeServiceServer interface. The request to create a sandbox container enters the
criService processing flow through the CRI UDS (Unix domain socket) interface
criService#RunPodSandbox(), it is responsible for creating and running sandbox containers and ensuring that the container status is normal.
- The container runtime first initializes the container object and generates the necessary parameters
- Then it will create the pod network namespace, generate the necessary parameters
- Then call the CNI interface to configure the network space of the pod, such as creating a network interface, allocating IP addresses, creating veth, setting routes, and a series of operations. These operations are implemented by the specific network plugin. There are differences in implementation between different plugins. After understanding the specifications, the network configuration is not difficult. Among them, 2 and 3 may be executed multiple times:
- Read network configuration
- Find binary files
- Execute binary files
- Feedback results to the container runtime
- Finally, it is to create the sandbox container. This process is related to the type of operating system and will call the corresponding operating system method to complete the creation of the container.
If you are learning containers from scratch, I recommend reading Ivan Velichko’s 《Learning Containers From The Bottom Up》
Reference Source Code:
Creating Other Containers
Next, it is to create other containers in the pod: temporary (
ephemeral), initialization (
init), and ordinary containers. When creating these containers, the container will be added to the network namespace of the sandox. This is not expanded here, the detailed logic can refer to containerd's
Reference Source Code
Following the introduction of CNI specifications in the last article, this time I introduced the use of CNI, as well as the interaction with the container runtime and the creation process of Pod.
Different CNI plugins implement different network functions. In the next article, I will take Flannel as an example to understand the implementation of CNI and Kubernetes VXLAN network.
Why introduce flannel? Because one of the development environments I often use, k3s, uses flannel network by default. Another development environment is k8e, k8e uses Cilium by default, and cilium’s cni is also one of the series of articles.