Source Code Analysis — A Comprehensive Understanding of Kubelet

Addo Zhang
6 min readJun 13, 2023

--

This article primarily delves into a source code analysis of the kubelet’s functions, key components, and its booting process, summarizing the working principle of kubelet.

Kubelet Introduction

From the official architecture diagram, it’s quite easy to locate the kubelet.

The description of kubelet’s function is visible when executing kubelet -h:

  • The kubelet is the key “node agent” that runs on each Node. It registers the Node using one of the following: the host’s hostname; parameters overriding the host; or logic specified by a cloud provider.
  • The kubelet works based on PodSpec. PodSpec is a YAML or JSON object describing a Pod. The kubelet receives a set of PodSpec through various mechanisms (primarily apiserver) and ensures the containers described therein are running smoothly.

In addition to the PodSpec provided by the apiserver, it can also be provided via:

  • Files
  • HTTP endpoints
  • HTTP servers

In essence, the function of the kubelet is to report Node information and manage (create, destroy) Pods. While these tasks may seem straightforward, they are far from it. Every point requires extensive discussion, for example, the Node’s computing resources aren’t just limited to traditional CPU, memory, and disk, but also provide extensions to support resources like GPU; a Pod not only has containers, but also related network and security policies.

Kubelet Architecture

Key Components

The architecture of the kubelet comprises numerous components, some of the most important ones include:

PLEG

Also known as Pod Lifecycle Event Generator. It generates lifecycle events (ContainerStarted, ContainerDied, ContainerRemoved, ContainerChanged) for Pods.

It maintains a Pod cache, regularly obtains Pod information through ContainerRuntime, compares it with the information in the cache to generate the above events, and writes these events into the channel it maintains.

PodWorkers

They handle the synchronization of Pods in the events. The core method managePodLoop() indirectly calls kubelet.syncPod() to synchronize the Pod:

  • If the Pod is being created, its delay is recorded.
  • Generate the API Status of the Pod, i.e., v1.PodStatus: Convert the status from runtime to API status.
  • Record the time it takes for the Pod to go from pending to running.
  • Update the status of the pod in StatusManager.
  • Kill Pods that shouldn’t be running.
  • Only launch Pods that use the host network if the network plugin is not ready.
  • Create a Mirror Pod if the static pod does not exist.
  • Create a file system directory for the Pod: Pod directory, volume directory, plugin directory.
  • Mount volumes for the Pod using VolumeManager.
  • Get image pull secrets.
  • Call the #SyncPod() method of container runtime.

PodManager

It stores the desired state of the Pod, serving Pods from different channels of the kubelet service.

StatsProvider

It provides statistics for nodes and containers, with two implementations: cAdvisor and CRI.

ContainerRuntime

As the name suggests, it interacts with advanced container runtimes that comply with the CRI specification.

Deps.PodConfig

PodConfig is a configuration multiplexer, which combines many Pod configuration sources into a single, consistent structure and sequentially passes incremental change notifications to listeners.

The configuration sources include: Files, apiserver, HTTP.

#syncLoop

It receives Pod change notifications from PodConfig, scheduled tasks, events from PLEG, as well as events from ProbeManager, and synchronizes the Pod to the desired state

PodAdmitHandlers

PodAdmitHandlers are a series of processors that are called during the pod admission process, such as the eviction handler (when the node’s memory is under pressure, Pods with a QoS setting of BestEffort are not evicted), shutdown admit handler (when the node is shutting down, it doesn't process pod synchronization operations), and so on.

OOMWatcher

OOMWatcher retrieves the OOM logs of containers from system logs, encapsulates them into events, and logs them.

VolumeManager

The VolumeManager runs a set of asynchronous loops, determining which volumes need to be attached/mounted/unmounted/detached and performing the operations based on the pods scheduled on this node.

CertificateManager

CertificateManager handles certificate rotation.

ProbeManager

ProbeManager includes three types of Probes and provides probe result caching and channels.

  • LivenessManager
  • ReadinessManager
  • StartupManager

EvictionManager

EvictionManager monitors the resource usage of Node nodes, evicts Pods to release resources according to eviction rules, and alleviates the pressure on the nodes.

PluginManager

The PluginManager runs a set of asynchronous loops, determining which plugins need to be registered/unregistered on this node and executing the operations. Like CSI drivers and device manager plugins (Device Plugin).

CSI

Container Storage Interface, a storage driver implemented by storage vendors.

Device Manager Plugin (Device Plugin)

Kubernetes provides a device plugin framework that you can use to expose system hardware resources to Kubelet. Vendors can implement device plugins, which you can manually deploy or deploy as a DaemonSet, without having to customize the code of Kubernetes itself. Target devices include GPUs, high-performance NICs, FPGAs, InfiniBand adapters, and other similar computing resources that may require vendor-specific initialization and setup.

Kubelet’s Booting Process

To analyze the booting process of kubelet, we can start with how kubelet runs. Find a Node node, and you can easily find the kubelet process. Since it is started in systemd mode, you can also check its status through systemctl.

Kubelet Boot Command

kubelet’s boot command (in a minikube environment)

$ ps -aux | grep '/kubelet' | grep -v grep
root 4917 2.6 0.3 1857652 106152 ? Ssl 01:34 13:05 /var/lib/minikube/binaries/v1.21.0/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=docker --hostname-override=1.21.0 --kubeconfig=/etc/kubernetes/kubelet.conf --node-ip=192.168.64.5

or

$ systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; disabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Sun 2021-06-13 01:34:42 UTC; 11h ago
Docs: http://kubernetes.io/docs/
Main PID: 4917 (kubelet)
Tasks: 15 (limit: 38314)
Memory: 39.4M
CGroup: /system.slice/kubelet.service
└─4917 /var/lib/minikube/binaries
/v1.21.0/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=docker --hostname-override=1.21.0 --kubeconfig=/etc/kubernetes/kubelet.conf --node-ip=192.168.64

Source Code Analysis

Obtain the code from the git@github.com:kubernetes/kubernetes.git repository and use the latest release-1.21 branch.

  • The main method at cmd/kubelet/kubelet.go:35 is the entry point of the program.
  • Calls the NewKubeletCommand method to create the command
  • Executes the command
  • The Run method at cmd/kubelet/app/server.go:434.
  • Calls the RunKubelet method.
  • Calls the createAndInitKubelet method to create and initialize kubelet
  • The NewMainKubelet method at pkg/kubelet/kubelet.go, creates the various components of kubelet. There are dozens of components, see the kubelet architecture.
  • Calls the BirtyCry method to emit the Starting event
  • Calls the StartGarbageCollection method to start ContainerGC and ImageGC
  • Calls the startKubelet method (widely uses goroutines and channels)
  • goroutine: kubelet.Run()
  • Initializes modules
  • Metrics-related
  • Creates file system directories
  • Creates container log directories
  • Starts ImageGCManager
  • Starts ServerCertificateManager
  • Starts OOMWatcher
  • Starts ResourceAnalyzer
  • goroutine: VolumeManager.Run() begins to process the mounting and unmounting of Pod Volumes
  • goroutine: status update fastStatusUpdateOnce() (updates Pod CIDR -> updates ContainerRuntime status -> updates Node node status)
  • goroutine: NodeLeaseController.Run() updates node lease
  • goroutine: podKiller.PerformPodKillingWork kills pods that have not been properly processed
  • StatusManager.Start() begins to update Pod status to apiserver
  • RuntimeClassManager.Start()
  • PLEG.Start(): Continually gets the status of Pods/containers from ContainerRuntime, compares it with the kubelet local cache, and generates corresponding Event
  • syncLoop() key point, continuously monitors and processes changes from files, apiserver, and http . Including the addition, update, graceful deletion, non-graceful deletion, and reconciliation of Pods.
  • Starts server, exposes /healthz endpoint
  • Notifies systemd that kuberlet service has started

Kubelet’s Operating Principles

  1. Changes to Pod configuration coming from static files, the apiserver, and HTTP requests are sent to kubelet.syncLoop.
  2. PLEG periodically fetches the status of Pods on the node via the container runtime, compares it with the Pod information in its cache, packages it into events, and enters the PLEG’s channel.
  3. Work queues periodically check Pods.
  4. Pods in the ProbeManager’s channel.
  5. Points 1–4 all enter the syncLoopIteration, obtaining the corresponding Pods from their channels, storing the Pod information in PodManager, and then distributing them to PodWorker to complete a series of synchronization tasks.

Conclusion

The discussion about the kubelet start-up flow ends here. Despite its complexity, there are traces to follow. As long as you understand kubelet’s positioning and role in Kubernetes, it’s easy to understand its workflow.

In the future, we will delve deeper into the creation and startup process of Pods.

--

--

Addo Zhang

CNCF Ambassador | LF APAC OpenSource Evangelist | Microsoft MVP | SA and Evangelist at https://flomesh.io | Programmer | Blogger | Mazda Lover | Ex-BBer