From LB Ingress to ZTM — A New Approach to Cluster Service Exposure
On December 28 (last Saturday), I was honored to participate in the “Cloud Native + AI Meetup Guangzhou” jointly organized by the KubeSphere community and the Higress community. At the event, I delivered a presentation on “From LB Ingress to ZTM: A New Approach to Exposing Cluster Services.” Here, I will share the content of my talk with a slightly adjusted title.
The Need for Exposing Cluster Services
The need for exposing cluster services arises from the virtualization and network isolation of Kubernetes services. As we know, Kubernetes Pods are dynamic, potentially being deleted and rebuilt frequently, and rescheduled to different nodes, with IP addresses changing accordingly. Kubernetes uses Services to provide a stable interface for accessing Pods, thereby abstracting these services.
Services provide Pods with a stable DNS name and virtual IP address without relying on the Pods’ temporary IPs. Therefore, for internal communication within a cluster, accessing through the Service’s ClusterIP works without any issues.
However, a Service’s ClusterIP is only accessible within the cluster, and cannot be accessed externally. The Service DNS name can be resolved only within the cluster. This network isolation, serving as a network protection mechanism, ensures that access to Pods and Services is limited to within the cluster.
In practical use, however, we often need to expose services outside the cluster for external users. At this point, additional components are needed to enable external exposure of cluster services. In more advanced scenarios, such as multi-cluster or multi-cloud, a more flexible and dynamic way of exposing cluster services is even more important.
Methods for Exposing Cluster Services
Kubernetes provides several abstractions for exposing cluster services. For external access, there are LoadBalancer, NodePort, and the more advanced Ingress (which may gradually be replaced by the Gateway API, but both are essentially similar in implementation. Below, “Ingress” refers to the high-level traffic entry management in Kubernetes).
Each approach works differently, with its own advantages and disadvantages, suitable for various scenarios.
LoadBalancer
LoadBalancer is one of the common ways to expose cluster services. In cloud environments, using a cloud provider’s load balancer can distribute traffic to multiple Pods in the cluster. In on-premises environments, open-source load balancers can be used. Our discussion here focuses mainly on open-source load balancers.
kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx LoadBalancer 10.43.177.15 192.168.2.1 80:32186/TCP 2m18s
In open-source service load balancers, there are two common implementation types. The first has several examples such as MetaLB, QingCloud’s OpenELB, kube-vip, and PureLB. These implementations are essentially similar: each runs a controller Pod on every node in the cluster. This controller Pod listens for changes to Services of type LoadBalancer, then configures a VIP for them and binds the VIP to the node’s NIC. Finally, the VIP is announced to the external network through either ARP (Layer 2) or BGP (Layer 3).
If multiple routes exist to a particular VIP, Equal-Cost Multi-Path (ECMP) routing is commonly used to distribute traffic across multiple nodes for load balancing.
The second implementation type is relatively simpler, such as K3s’s ServiceLB (formerly Klipper), which is based on iptables and HostNetwork. When it detects that a Service of type LoadBalancer has been created, ServiceLB creates a DaemonSet (DS) for that Service. The DS creates a proxy container on each node, using the hostNetwork network mode and hostPort for the Service’s port.
The entry point is each node’s IP. When the proxy container receives traffic, it forwards it to the Service’s clusterIP via iptables, and finally to the Pods.
The advantage of this approach is that it is lightweight, does not depend on a cloud provider’s load balancer, has low cost, and is easy to use, making it suitable for constrained network or resource environments. However, it also has obvious drawbacks: all nodes listen on the same port, it lacks advanced features such as TLS termination, and traffic forwarding relies on node-level networking, which can become a bottleneck under high concurrency.
NodePort
If LoadBalancer is the more advanced approach, then NodePort is the most straightforward. NodePort is a Service type that opens a fixed port (30000–32767) on each node in the cluster, forwarding traffic on that port to the Pods in the backend.
For each NodePort type Service, Kubernetes assigns a port. The kube-proxy running on each node sets up iptables rules based on Service and Pod information. Thus, when traffic arrives at a node’s NodePort port, iptables matches it to the corresponding Service and forwards it to the backend Pods.
In terms of scenarios, NodePort is suitable for small-scale clusters or testing environments that do not require load balancing; directly exposing the node’s IP and port is enough. Its drawback is the lack of automated load balancing, the exposure of node IPs, and relatively weak security.
Ingress
Ingress is a built-in traffic management object in Kubernetes that defines HTTP or HTTPS routing rules. Ingress is implemented by an Ingress Controller, which dynamically updates the load balancer configuration based on Ingress resources to distribute traffic. Over time, Ingress may be replaced by the Gateway API, which offers more flexible extensibility, but the two are similar in practice: both rely on a proxy and a controller.
Here, we use the more familiar Ingress as an example:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: example
spec:
rules:
- http:
paths:
- backend:
service:
name: foo
port:
number: 8080
path: /foo
- backend:
service:
name: bar
port:
number: 8080
path: /bar
Ingress is designed to act as a single entry point for the cluster, managing multiple services through domain- or path-based routing. Unlike LoadBalancer or NodePort, Ingress operates at Layer 7 of the OSI model, supporting protocols like HTTP and HTTPS. It provides advanced traffic management features such as TLS offloading, rate limiting, canary releases, circuit breaking, and redirection. It can also implement more flexible traffic control through complex routing rules.
In terms of scenarios, Ingress is suitable for cases requiring complex routing rules and HTTPS support. Its drawbacks include relatively complex configuration and the need for an Ingress Controller. Moreover, the Ingress proxy itself still needs to be exposed externally, requiring an additional LoadBalancer.
Having introduced these approaches, we can now see if there is an alternative way to expose cluster services.
Zero-Exposure-Faced Network (ZTM)
A zero-exposure-faced network, ZTM (Zero Trust Mesh), is a decentralized, HTTP/2-based tunneling network infrastructure software designed for scenarios such as remote work and IoT edge networks. It is open source.
ZTM can run on any existing IP network, including but not limited to LAN, the public internet, container networks, and so on. ZTM provides the essential network foundation for securing application communications, including connectivity, port-based access control, mTLS-encrypted tunnels, certificate-based identity, access control, load balancing, and other fundamental networking and security capabilities.
ZTM can run on a variety of devices and supports multiple CPU architectures such as x86, ARM, MIPS, RISC-V, and LoongArch, as well as operating systems like Linux, Windows, macOS, FreeBSD, and Android.
ZTM Architecture
ZTM’s architecture is straightforward:
- ZTM Agent: Deployed on devices that need to join the zero-trust network, such as personal computers, servers, or edge devices. It initiates an encrypted tunnel, securely forwarding device traffic to the ZTM Hub.
- ZTM Hub: A distributed access point that establishes an encrypted tunnel with each Agent, forwarding requests coming from the Agents, enabling multi-point access and high availability.
By connecting to the ZTM Hub, Agents can form a secure zero-trust network over any existing network — LAN, internet, container network, etc. — ensuring secure communication among devices.
In ZTM terminology, this network is called a Mesh. A Mesh is a logical network formed by multiple Agents connected through encrypted tunnels. An Agent can join multiple Meshes to connect with multiple networks.
ZTM Features
- Zero Firewall: No firewall configuration is required along the path, simplifying management.
- Zero Port: No open ports, easy to deal with port scanning attempts.
- Zero Ops: No virtual network adapters, no routes, no firewalls needed on endpoints.
- Zero Privilege: The Agent runs in user space without privileged access, making it more secure.
- Zero Routing: Access based on service discovery, eliminating complicated and error-prone routing configurations.
- Zero Trust: Certificate-based identity recognition, trusted devices, and end-to-end identity and access control.
ZTM App
The zt-app framework is an application development framework built on ZTM, offering a standardized development interface that makes it easier for developers to build decentralized applications. Its design goals are “simple, easy-to-use, secure,” providing convenient development tools.
ZTM is written in PipyJS, a customized JavaScript designed for Pipy. Developers can conveniently build ZTM Apps using PipyJS.
Within ZTM, several key Apps are built-in:
- zt-tunnel: Establishes secure TCP/UDP tunnels between endpoints.
- zt-proxy: SOCKS/HTTP proxy that receives traffic from one endpoint and forwards it to another.
- zt-script: Executes scripts remotely at endpoints.
- zt-terminal: Provides remote shell access to endpoints.
Tunnel (zt-tunnel)
A zero-trust tunnel eliminates the constraints of physical distance and network isolation, allowing you to access devices from anywhere. zt-tunnel features two core concepts:
- Outbound: The tunnel’s exit point, located in the target device’s network.
- Inbound: The tunnel’s entry point, located in the accessing party’s network.
For example, consider a scenario where we have two isolated networks that are connected into a ZTM network. In our office network, there are two devices: a Windows device supporting Remote Desktop and a Linux device supporting SSH. We call these remote devices.
When we need to access these two devices from outside, we simply create two Outbounds (named tcp/rdp and tcp/ssh-vm) on the ZTM Agent in the office network, each pointing to one of the remote devices.
In the external network’s ZTM Agent, we create two Inbounds, each pointing to one of the Outbounds above and specifying a port. When we need to access the remote devices, we only need to access the local Agent’s address and the specified inbound port, for instance, curl http://192.168.1.30:8080
.
If we need to access another network’s devices, we just run a ZTM Agent in that network, connect it to the same ZTM Hub, and create the corresponding Outbounds and Inbounds for those devices.
Proxy (zt-proxy)
Using zt-tunnel for device access requires creating Outbounds and Inbounds for each service, which may be cumbersome. zt-proxy can simplify this process by enabling you to access devices through a proxy.
zt-proxy can provide an HTTP or SOCKS proxy to the accessing party. In the target network’s ZTM Agent, simply add the target addresses for the proxy (either IP subnets or domain names with wildcards). Then, the external user only needs the target IP address or domain name and can access the devices via the proxy.
By using IP subnets or domain names, device additions can be batched, making this approach suitable for scenarios with multiple devices. With the proxy, you can access remote devices as if you were in the same LAN.
Having reviewed the basics of ZTM, we can now see how to use it to expose cluster services.
Using ZTM to Expose Cluster Services
Some might already have guessed that ZTM can be used to connect networks both inside and outside the cluster — or even multiple clusters.
Leveraging the ZTM Hub, we can deploy ZTM Agents both inside and outside the cluster to establish secure connectivity. Even in two isolated networks, we can visit the cluster’s internal services through the HTTP/2 tunnel provided by ZTM.
Exposing Cluster Services via zt-tunnel
In this scenario, we use ZTM’s mesh network to connect isolated networks inside and outside the cluster. Within the cluster, we can create an Outbound via zt-tunnel for any service that needs to be exposed externally, then create an Inbound on the external ZTM Agent. This way, we can access the cluster’s internal Service through the ZTM tunnel.
Once the tunnel’s Inbound and Outbound are configured, we can access the cluster’s Service A by using the Agent’s IP and the configured tunnel Inbound port, for example, curl http://192.168.1.30:8080
.
If there are other services to be accessed, simply create additional Outbounds and Inbounds for them.
Exposing Cluster Services via zt-proxy
If you need to expose many services, zt-proxy can simplify the process. Once the mesh network is connected, you can configure zt-proxy in the cluster to target the cluster’s internal DNS, such as *.svc.cluster.local
, thereby exposing all services in the cluster as proxy targets—or you can limit it to just the services in a particular namespace.
Then, you can access the cluster’s internal services by using the external Agent’s proxy, for example:
curl -x http://192.168.1.30:8080 http://svc-a.svc:8080
By controlling the DNS names of the services to be exposed, you can manage the range of services exposed.
Demo
To demonstrate how ZTM exposes cluster services, here is a simple demo.
In this demo, two completely isolated clusters are created using K3d, and then they are connected using ZTM. After that, it shows how to use both ZTM tunnels and proxies to achieve cross-cluster service access.
Feel free to try it out if you’re interested.
Comparison of Solutions
After introducing various ways to expose cluster services and explaining ZTM, we can compare these approaches.
FeatureNodePortLoadBalancerZTMTechnical PrincipleiptablesDNAT (BGP/ARP)HTTP/2 reverse tunnelAccess MethodNode IP + fixed portCloud/open-source LB + external IPTunnel, proxyNetwork RequirementsNode must have a fixed or reachable IPExternal IP address, BGP/ARP network environmentNo direct network exposure neededApplicable ScenariosSmall clusters/test environments with quick exposurePublic cloud clusters needing highly available and stable external accessZero-exposure security scenarios, remote and cross-cluster accessEase of UseSimple and direct, few configurationsAutomated setup relying on cloud platforms or open-source load balancersRequires deploying ZTM Agent and Hub
More Use Cases for ZTM
Remote Command Execution (zt-terminal)
zt-terminal is a zero-trust terminal that, based on “device + certificate” for identity and access control, provides a remote command-line tool. Once authorized, one device can run shell commands on another device.
On the remote network side, configure the user who can run commands:
ztm terminal config --add-user zt-user1
Then, on the local network side, you can access the shell on the specified device:
ztm terminal open ep-office
Distributed File Sharing (zt-cloud)
Within the mesh network, you can use zt-cloud for distributed file sharing. Files published on one Agent can be broadcast throughout the mesh network to other Agents, enabling file distribution.
Other Scenarios
Below are some other application scenarios for ZTM; there are many more awaiting exploration. If you’re interested, join the ZTM user group; feel free to contact me privately to join.
- Intranet penetration
- Remote work
- Secure access to cloud resources
- Remote debugging
- Multi-cluster networking
- Remote device access
- File sharing
Conclusion
In this talk, we discussed the need to expose cluster services and common methods for doing so in Kubernetes. We then introduced the basics of ZTM, its features, and how to use ZTM to expose cluster services.
I hope this talk gives you an understanding of ZTM’s fundamental concepts and features, as well as how to use ZTM to expose cluster services.