Exploring eBPF Implementation through Linux Source Code

Addo Zhang
5 min readApr 1, 2024

--

Last year, I delved into eBPF and shared several eBPF-related learning notes focusing on its applications. To prepare for my upcoming article, I’ve decided to start with the Linux source code this time, aiming for a deeper understanding of how eBPF works. Thus, this piece is another learning note. If you’re intrigued by the workings of eBPF, feel free to join me on this journey. Any feedback on the article is highly appreciated.

I won’t be going into an extensive introduction to eBPF here. For that, you can refer to my other article, Accelerate network packets transmission with eBPF, and Tracing packets datapath in Kubernetes network to get a basic understanding of eBPF and its applications in network acceleration.

Moving forward, we will use the program bpf_sockops from eBPF sockops as an example, in conjunction with the Linux v6.8 source code to explore the workings of eBPF.

BPF Program Operations

In the load.sh script, the loading and attaching operations of the program are completed. The following commands use bpftool to perform the loading and attaching of the BPF program, respectively.

# Load
sudo bpftool prog load bpf_sockops.o "/sys/fs/bpf/bpf_sockop"
# Attach
sudo bpftool cgroup attach "/sys/fs/cgroup/unified/" sock_ops pinned "/sys/fs/bpf/bpf_sockop"

Here, bpftool is a command-line tool that wraps the kernel function bpf(), used for managing and manipulating BPF programs and Maps.

Loading

sudo bpftool prog load bpf_sockops.o "/sys/fs/bpf/bpf_sockop"

The command bpftool prog load loads bpf_sockops.o into the path /sys/fs/bpf/bpf_sockop.

The loading of the BPF program by bpftool is accomplished by calling bpf() with the command BPF_PROG_LOADand passing in the loading options bpf_prog_load_opts:

syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr))
  • syscall bpf() is the bpf system function.
  • __sys_bpf executes the bpf command BPF_PROG_LOAD.
  • bpf_prog_load allocates memory for the program, initializes it, checks certifications, runs the verifier, creates file descriptors (fd), etc.

Once the program is successfully loaded, it can then be attached.

Attaching

sudo bpftool cgroup attach "/sys/fs/cgroup/unified/" sock_ops pinned "/sys/fs/bpf/bpf_sockop"

The command bpftool cgroup attach attachs the loaded (and pinned to the filesystem) program /sys/fs/bpf/bpf_sockop to the cgroup /sys/fs/cgroup/unified/, with the attachment type sock_ops. This sock_ops is defined by the libbpf library used by bpftool and also serves as an ELF section name. It corresponds to the BPF program type BPF_PROG_TYPE_SOCK_OPS, and the attachment type is BPF_CGROUP_SOCK_OPS.

In eBPF programming, ELF (Executable and Linkable Format) files are used to store compiled eBPF programs and related data. An ELF file consists of multiple sections, each containing different types of information, such as program code, symbol tables, debug information, etc.

The sock_ops type in libbpf => BPF program type BPF_PROG_TYPE_SOCK_OPS => attach type BPF_CGROUP_SOCK_OPS, corresponds to the section (__section) named sockops in the program bpf_sockops.c.

About the sock_ops attach point:

sock_ops typically refers to a series of functions and operations in the Linux kernel that handle socket operations.

sock_ops can include a range of operations, such as creating sockets, binding sockets to specific addresses and ports, listening for connection requests from other sockets, accepting connection requests, sending and receiving data, and closing sockets, among others. These operations are usually provided through a set of predefined APIs, such as the POSIX socket API, which defines a series of functions like socket(), bind(), listen(), accept(), send(), recv(), and close(), for application programs to call.

This time, bpftool performs the BPF_PROG_ATTACH operation via the bpf() system call, passing in the attachment options bpf_prog_attach_opts to complete the process.

syscall(__NR_bpf, BPF_PROG_ATTACH, &attr, sizeof(attr))

cgroup_bpf_enabled_key is a counter for specific types of cgroup BPF programs.

!!! This counter is utilized at runtime.

With this, we have successfully attached the program to the cgroup’s sock_ops.

Socket Operations (sock_ops)

Socket operations are numerous, and here we take the server-side accept operation during the connection establishment process as an example.

Starting with the system call accept:

BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB is the operator for socket.accept() to accept a connection request and complete the connection establishment. It's one of many sock_ops operators. These operators can be viewed as events(Event-driven part), with the execution of programs being event-driven. For example:

  • The operator for completing the three-way handshake from the client side is BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB;
  • The operator for a socket entering the listen state is BPF_SOCK_OPS_TCP_LISTEN_CB;
  • The operator for data acknowledgment is BPF_SOCK_OPS_DATA_ACK_CB;
  • The operator for TCP state changes is BPF_SOCK_OPS_STATE_CB.

The execution of BPF programs follows accordingly, without further elaboration here. For those interested, more analysis is available here(Implementation part).

--

--

Addo Zhang
Addo Zhang

Written by Addo Zhang

CNCF Ambassador | LF APAC OpenSource Evangelist | Microsoft MVP | SA and Evangelist at https://flomesh.io | Programmer | Blogger | Mazda Lover | Ex-BBer

No responses yet