Exploring Distributed Tracing Collaboration between Service Mesh and OpenTelemetry

Addo Zhang
4 min readDec 9, 2023

--

In the previous article, How to Install Otel Probes Non-Intrusively in k8s was introduced, achieving non-intrusive (though some languages still cannot, like Go’s eBPF due to stringent kernel requirements) distributed tracing.

Post-publication of this article, a reader commented on the “non-intrusive” aspect of the javaagent, necessitating a clarification. “Non-intrusive” primarily refers to functionalities achievable without modifying the application’s business logic code. It is transparent and imperceptible to the application, allowing developers to focus on business development; also, it is easier to integrate due to no need for code alterations, and simpler to maintain, ensuring feature consistency across various languages and frameworks.

The Java Agent loads at JVM startup, altering bytecode at runtime to inject tracing code, instead of modifying the application’s source code layer.

Background

Distributed Tracing

Distributed tracing is a key technology for monitoring and diagnosing microservices request flows and an essential part of observability, offering deep insights into complex interactions and performance issues within a microservices architecture. It manages complexity by providing a clear view of service-to-service request chains, aiding in identifying performance bottlenecks, optimizing resource allocation, swiftly pinpointing and resolving faults, and enhancing overall system reliability.

Non-Intrusive Distributed Tracing in Service Mesh

Here it is again, non-intrusiveness! The proxy in a service mesh automatically handles all inbound and outbound network communications, capturing, recording, and analyzing detailed minutiae of requests and responses between services, such as request timing, duration, status codes, and other metadata. This implementation method is transparent to the application itself and is more thorough than the Java Agent’s runtime bytecode modification.

A prerequisite here is the application’s ability to pass context information in requests, allowing the sidecar proxy-generated and sent tracing information to be connected, avoiding broken chains.

Although the non-intrusive distributed tracing of the mesh displays the request chain, as shown above, each span represents information from the sidecar proxy.

Following the previous article, today we will explore the integration of Service Mesh FSM with OpenTelemetry to implement full-path distributed tracing of applications and the mesh.

Demonstration

Architecture

Environment Configuration

For the installation of Jaeger, cert-manager, and Otel operator, refer to the previous article.

Configuring Instrumentation

Next, we configure the installation and settings of the probe. Detailed configuration instructions can be found in the Instrumentation API documentation.

According to the FSM Distributed Tracing Documentation, FSM supports the Zipkin protocol, so in propagators we use b3multi, employing the B3 multi-header format to pass the following information in request headers:

  • x-b3-traceid
  • x-b3-spanid
  • x-b3-parentspanid
  • x-b3-sampled
  • x-b3-flags

This time we use the sample namespace.

kubectl create namespace sample
kubectl apply -n sample -f - <<EOF
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: instrumentation-sample
spec:
propagators:
- b3multi
sampler:
type: parentbased_traceidratio
argument: "1"
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: otel-collector.default:4318
EOF

Configuring OpenTelemetry Collector

For detailed configuration of the Otel collector, refer to the official documentation.

  • Receivers: configure otlp to receive tracing information from applications, and zipkin to receive reports from sidecar, using endpoint 0.0.0.0:9411.
  • Exporters: configure Jager’s otlp endpoint jaeger.default:4317.
  • Pipeline services: use otlp and zipkin as input sources, directing output to jaeger.
kubectl apply -f - <<EOF
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otel
spec:
config: |
receivers:
otlp:
protocols:
grpc:
http:
zipkin:
endpoint: "0.0.0.0:9411"
exporters:
otlp/jaeger:
endpoint: "jaeger.default:4317"
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp, zipkin]
exporters: [otlp/jaeger]
EOF

Installing Service Mesh FSM

We install FSM via CLI, downloading the latest official release 1.1.4.

system=$(uname -s | tr '[:upper:]' '[:lower:]')
arch=$(uname -m | sed -E 's/x86_/amd/' | sed -E 's/aarch/arm/')
release=v1.1.4
curl -L https://github.com/flomesh-io/fsm/releases/download/$release/fsm-$release-$system-$arch.tar.gz | tar -vxzf -
./$system-$arch/fsm version

During installation, enable distributed tracing and point the address to the Otel Collector’s zipkin receiver, with the endpoint /api/v2/spans.

fsm install \
--set=fsm.tracing.enable=true \
--set=fsm.tracing.address=otel-collector.default \
--set=fsm.tracing.port=9411 \
--set=fsm.tracing.endpoint=/api/v2/spans

Deploying the Example Application

Add the sample namespace to the service mesh and deploy the application.

fsm namespace add sample
kubectl apply -n sample -f https://raw.githubusercontent.com/addozhang/http-sample/main/manifests/service-v1.yaml

Verify the application pod is injected with a sidecar and running normally.

kubectl get po -n sample
NAME READY STATUS RESTARTS AGE
service-c-66bf9dcc7b-pdj8p 2/2 Running 0 38s
service-b-586cfc5ccd-k9qrs 2/2 Running 0 37s
service-a-7cf7bc5bcc-tgjzz 2/2 Running 0 37s

Testing

pod_name="$(kubectl get pod -n sample -l app=service-a -o jsonpath='{.items[0].metadata.name}')"
kubectl port-forward -n sample $pod_name 8080:8080 &
curl localhost:8080

After sending the request, open the Jaeger UI.

jaeger_pod="$(kubectl get pod -l app=jaeger -o jsonpath='{.items[0].metadata.name}')"
kubectl port-forward $jaeger_pod 16686:16686 &

In the Jaeger UI, we can see the tracing timeline is richer, including span data from both the application and the sidecar proxy.

--

--

Addo Zhang
Addo Zhang

Written by Addo Zhang

CNCF Ambassador | LF APAC OpenSource Evangelist | Microsoft MVP | SA and Evangelist at https://flomesh.io | Programmer | Blogger | Mazda Lover | Ex-BBer

No responses yet