OpenTelemetry Operator for Kubernetes实践

Posted by Pu Ming on Friday, February 24, 2023

一. 运行环境

Kubernetes 1.23.1 单集群两节点 192.168.19.210/192.168.19.211

Ubuntu 20.04

微服务开发语言:Python 3.7

二. 实践过程

2.1 部署追踪服务Jaeger

部署All in one jaeger

yaml如下

apiVersion: v1
kind: Service
metadata:
  name: jaeger-all-in-one
  namespace: opentelemetry
  labels:
    app: opentelemetry
    component: otel-collector
spec:
  ports:
  - name: collector
    port: 14250
    protocol: TCP
    targetPort: 14250
  selector:
    component: otel-collector
---
apiVersion: v1
kind: Service
metadata:
  name: jaeger-all-in-one-ui
  namespace: opentelemetry
  labels:
    app: opentelemetry
    component: otel-collector
spec:
  ports:
  - name: jaeger
    port: 16686
    protocol: TCP
    targetPort: 16686
    nodePort: 30086
  selector:
    component: otel-collector
  type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jaeger-all-in-one
  namespace: opentelemetry
  labels:
    app: opentelemetry
    component: otel-collector
spec:
  replicas: 1
  selector:
    matchLabels:
      app: opentelemetry
      component: otel-collector
  template:
    metadata:
      labels:
        app: opentelemetry
        component: otel-collector
    spec:
      containers:
      - image: jaegertracing/all-in-one:1.35
        name: jaeger
        ports:
        - containerPort: 16686
        - containerPort: 14268
        - containerPort: 14250 

部署后查看状态

root@apisec-node-master:/home/nsfocus/puming/otel_operator/app# kubectl get pod,svc -n opentelemetry
NAME                                     READY   STATUS    RESTARTS   AGE
pod/jaeger-all-in-one-6475d5bf77-xm78p   1/1     Running   0          2d
 
NAME                           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)           AGE
service/jaeger-all-in-one      ClusterIP   10.102.246.238   <none>        14250/TCP         2d
service/jaeger-all-in-one-ui   NodePort    10.108.73.154    <none>        16686:30086/TCP   2d

2.2 部署Opentelemetry Operator

2.2.1 部署cert-manager

Opentelemetry Operator是针对Kubernetes Operator的二次封装实现,主要管理Opentelemetry Collector以及使用Opentelemetry检测库完成微服务应用Pod的自动注入,cert-manager需要预先安装

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

查看Cert-manager的安装状态

root@apisec-node-master:/home/nsfocus/puming/otel_operator/app# kubectl get all -n cert-manager
NAME                                           READY   STATUS    RESTARTS        AGE
pod/cert-manager-5b65cb968c-pl9ng              1/1     Running   2 (7d18h ago)   8d
pod/cert-manager-cainjector-56b88bcdf7-hp2f6   1/1     Running   2 (7d18h ago)   8d
pod/cert-manager-webhook-c784c79c7-6brgr       1/1     Running   1 (7d18h ago)   8d
 
NAME                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/cert-manager           ClusterIP   10.97.77.33     <none>        9402/TCP   8d
service/cert-manager-webhook   ClusterIP   10.103.21.107   <none>        443/TCP    8d
 
NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cert-manager              1/1     1            1           8d
deployment.apps/cert-manager-cainjector   1/1     1            1           8d
deployment.apps/cert-manager-webhook      1/1     1            1           8d
 
NAME                                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/cert-manager-5b65cb968c              1         1         1       8d
replicaset.apps/cert-manager-cainjector-56b88bcdf7   1         1         1       8d
replicaset.apps/cert-manager-webhook-c784c79c7       1         1         1       8d

2.2.2 部署适配k8s版本的opentelemetry operator

由于Opentelemetry Operator基于Kubernetes Operator实现,因此安装版本需与当前环境中的K8S版本适配才可,Opentelemetry Operator Github项目中给出了版本匹配列表,可参考 https://github.com/open-telemetry/opentelemetry-operator#opentelemetry-operator-vs-kubernetes-vs-cert-manager

目前安装的适配版本为0.69.0 PS. 上一步Cert-manager版本也需与Kubernetes及Opentelemetry Operator版本进行适配

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

部署后查看运行状态

root@apisec-node-master:/home/nsfocus/puming/otel_operator/app# kubectl get all -n opentelemetry-operator-system
NAME                                                             READY   STATUS    RESTARTS        AGE
pod/opentelemetry-operator-controller-manager-676796b79f-9nr9m   2/2     Running   1 (7d18h ago)   8d
 
NAME                                                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/opentelemetry-operator-controller-manager-metrics-service   ClusterIP   10.96.5.88       <none>        8443/TCP   8d
service/opentelemetry-operator-webhook-service                      ClusterIP   10.102.197.132   <none>        443/TCP    8d
 
NAME                                                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/opentelemetry-operator-controller-manager   1/1     1            1           8d
 
NAME                                                                   DESIRED   CURRENT   READY   AGE
replicaset.apps/opentelemetry-operator-controller-manager-676796b79f   1         1         1       8d

2.3 部署微服务环境

Opentelemetry Operator目前支持的instrumentation开发语言有4种,分别是Node.js 、.Net Python、 Java,本实验搭建使用Python,需要注意的是Opentelemetry Operator适配的Python版本需要大于3.6以上(https://opentelemetry.io/docs/instrumentation/python/#version-support)

简单搭建了Flask服务

Dockerfile:

FROM python:3.7
WORKDIR /usr/src/app
COPY app.py /usr/src/app/
COPY requirements.txt /usr/src/app/
 
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y inetutils-ping telnet
RUN pip install -r /usr/src/app/requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
 
EXPOSE 3000
CMD ["flask", "run","-h","0.0.0.0","-p","3000" ]

app.py:

from random import randint
from flask import Flask, request
import requests
 
app = Flask(__name__)
 
@app.route("/rolldice")
def roll_dice():
    r = requests.get("http://express-server2.application.svc.cluster.local:3001/api/v1/get_commodity/") 此处rolldice api还会进一步访问k8s环境中的另一个微服务,通过域名进行访问
    return str(do_roll())
 
def do_roll():
    return randint(1, 6)

requirments.txt:

flask 
requests

app.yaml

---
apiVersion: v1
kind: Service
metadata:
  name: express-server
  namespace: application
  labels:
    app: application
    component: express-server
spec:
  ports:
    - name: express
      port: 3000
      protocol: TCP
      targetPort: 3000
      nodePort: 30099
  selector:
    component: express-server
  type: NodePort
 
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: express-server
  namespace: application
  labels:
    app: application
    component: express-server
spec:
  selector:
    matchLabels:
      app: application
      component: express-server
  replicas: 1
  template:
      labels:
        app: application
        component: express-server
    spec:
      containers:
        - name: express-server
          ports:
            - containerPort: 3000
          image: registry.ic.intra.nsfocus.com/test/example-app:latest
          imagePullPolicy: IfNotPresent

构建镜像

docker build -t registry.ic.intra.nsfocus.com/test/example-app:latest . --no-cache

部署微服务

kubectl apply -f app.yaml

查看运行状态

root@apisec-node-master:/home/nsfocus/puming/otel_operator/app# kubectl get all -n application
NAME                                   READY   STATUS    RESTARTS   AGE
pod/express-server-8c5bdd7c7-x6grb     1/1     Running   0          175m
 
NAME                      TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
service/express-server    NodePort   10.111.123.124   <none>        3000:30099/TCP   175m
 
NAME                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/express-server    1/1     1            1           175m
 
NAME                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/express-server-8c5bdd7c7     1         1         1       175m

2.4 部署CRD-Opentelemetry Collector

部署Oepntelemetry Collector用于收集微服务的观测数据,并将观测数据进行处理,过滤,再export到后端的服务,此处后端服务为开始部署的Jaeger,通过Opentelemtry Operator,我们可以定义资源类型为OpenTelemetryCollector,部署yaml如下所示:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel-collector
  labels:
    app: opentelemetry
    component: otel-collector
spec:
  mode: deployment #提供三种部署方式,sidecar、daemonset、deployment,默认为deployment
  config: |
    receivers: #配置receivers, OpenTelemetryCollector会暴露4317(grpc),4318(http)两个端口用于接收微服务观测数据,此处使用OpenTelemetry的otlp协议(http/proto)传输,配置如下
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    exporters: #配置exporters, receivers接收完数据后通过proessors步骤可对数据进行过滤,重组,如下一步processors配置,数据经过滤后可发送至后端服务,此处配置后端服务为Jaeger,endpoint为后端服务地址,此处也可以使用K8S FQDN域名,通信过程未进行加密
      jaeger:
        endpoint: 10.102.246.238:14250
        tls:
          insecure: true
      logging:
        verbosity: detailed
    processors: #配置processors,此处在http协议传输的header头种加入了一对k/v,用于测试
      batch:
      resource:
        attributes:
          - key: test.key
            value: "test-value"
            action: insert
    extensions: #配置插件,Extention主要用来扩充Collector, 提供核心流程(receivers、processors、exporters)之外的插件, 內建插件包括healthcheck, zpages, pprof, memory_balllast等, 或与auth相关插件,如basic, oauth2, oidc等,此处配置了healthcheck和zpages
      health_check:
      zpages:
        endpoint: :55679 #zpages由OpenCensus标准引入的一种格式。主要用于创建HTTP服务以提供用于调试Collctor不同组件的实时数据。默认配置未localhost:55679
    service: #配置服务使用的opentelemetry配置,包括使用上述引入的插件以及pipeline等
      telemetry:
        logs:
          level: "debug" #此处设置观测数据日志为debug级别,用于调试
      extensions: [zpages, health_check] #此处配置使用的插件
      pipelines: #此处设置traces的pipeline,如果后端有监控服务如Prometheus,也可以配置对应的metrics的Pipeline
        traces:
          receivers: [otlp] #采用上述提到的otlp协议进行观测数据传输
          processors: [batch, resource] #采用batch(将traces、metric、logging数据均传送,也可指定传输数据类别)类型传输,采用resource将传输数据进行二次修改
          exporters: [logging, jaeger] # 最终数据导出至日志以及后端服务Jaeger种

部署完成后查看状态

kubectl apply -f collector.yaml

由于采用Deployment方式进行部署,因此k8s集群只含有一个Collector实例

root@apisec-node-master:/home/nsfocus/puming/otel_operator/app# kubectl get pod,svc
NAME                                               READY   STATUS    RESTARTS        AGE
...
pod/otel-collector-collector-c974987cf-gpbx5       1/1     Running   0               4h19m
...
 
NAME                                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
...
service/otel-collector-collector              ClusterIP   10.105.49.46     <none>        4317/TCP,4318/TCP   4h19m
service/otel-collector-collector-headless     ClusterIP   None             <none>        4317/TCP,4318/TCP   4h19m
service/otel-collector-collector-monitoring   ClusterIP   10.99.57.224     <none>        8888/TCP            4h19m
...

2.5 部署CRD-Instrumentation

部署Instrumentation CRD资源,该资源主要为适配各类语言的检测机制(如python、node.js 、.net、java),配置yaml如下所示

# instrumentation.yml
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: python-instrumentation #给该instrumentation命名,后续在自动注入时会用到
  namespace: application #该instrumentation仅在application命名空间生效
spec:
  exporter:
    endpoint: http://otel-collector-collector.default.svc.cluster.local:4318 #将检测后的观测数据导出到上一步部署的Collector中
  propagators:
  - tracecontext
  - baggage
  - b3
  sampler: #采样频率,可以做调整
    type: always_on
  python: #此处若不指定镜像,则使用官方镜像
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest

官方采用的各类语言检测镜像,我们可对其进行裁剪和定制,如python,我们可以更改Dockerfile甚至requirements.txt ,具体可参考 https://github.com/open-telemetry/opentelemetry-operator/tree/main/autoinstrumentation/python

部署后查看状态

root@apisec-node-master:/home/nsfocus/puming/otel_operator/app# kubectl get Instrumentation -n application
NAME                     AGE     ENDPOINT                                                         SAMPLER     SAMPLER ARG
python-instrumentation   4h28m   http://otel-collector-collector.default.svc.cluster.local:4318   always_on
root@apisec-node-master:/home/nsfocus/puming/otel_operator/app# kubectl get Instrumentation python-instrumentation -o yaml -n application
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  annotations:
    instrumentation.opentelemetry.io/default-auto-instrumentation-dotnet-image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:0.5.0
    instrumentation.opentelemetry.io/default-auto-instrumentation-java-image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.22.1
    instrumentation.opentelemetry.io/default-auto-instrumentation-nodejs-image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.34.0
    instrumentation.opentelemetry.io/default-auto-instrumentation-python-image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.36b0
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"opentelemetry.io/v1alpha1","kind":"Instrumentation","metadata":{"annotations":{},"name":"python-instrumentation","namespace":"application"},"spec":{"exporter":{"endpoint":"http://otel-collector-collector.default.svc.cluster.local:4318"},"propagators":["tracecontext","baggage","b3"],"python":{"image":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest"},"sampler":{"type":"always_on"}}}
  creationTimestamp: "2023-02-23T02:25:12Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: opentelemetry-operator
  name: python-instrumentation
  namespace: application
  resourceVersion: "1092354"
  uid: 62d2e006-f1ce-4fe0-9a6f-19b0f3bacf5c
spec:
  apacheHttpd:
    configPath: /usr/local/apache2/conf
    version: "2.4"
  dotnet:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:0.5.0
  exporter:
    endpoint: http://otel-collector-collector.default.svc.cluster.local:4318
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.22.1
  nodejs:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.34.0
  propagators:
  - tracecontext
  - baggage
  - b3
  python:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
  resource: {}
  sampler:
    type: always_on

2.6 配置自动检测注入机制

实现微服务数据采样需首先注入检测机制,配置方式可通过在微服务yaml配置中增加annotation实现,如下所示:

apiVersion: apps/v1
kind: Deployment
metadata:
 ...
spec:
  ...
  template:
    metadata:
      annotations:
        instrumentation.opentelemetry.io/inject-python: "python-instrumentation"
...

之后需要重启应用(kubectl apply)自动检测机制才能被注入至微服务中,重启后可通过kubectl describe pod查看注入信息,如下所示

root@apisec-node-master:/home/nsfocus/puming/otel_operator/app# kubectl describe pod express-server-8c5bdd7c7-x6grb -n application
Name:         express-server-8c5bdd7c7-x6grb
Namespace:    application
Priority:     0
Node:         apisec-node-worker/192.168.19.211
Start Time:   Thu, 23 Feb 2023 03:21:44 +0000
Labels:       app=application
              component=express-server
              pod-template-hash=8c5bdd7c7
Annotations:  instrumentation.opentelemetry.io/inject-python: python-instrumentation
Status:       Running
IP:           10.244.1.122
IPs:
  IP:           10.244.1.122
Controlled By:  ReplicaSet/express-server-8c5bdd7c7
Init Containers: #可以看到通过init_container实现检测机制植入
  opentelemetry-auto-instrumentation:
    Container ID:  docker://1f53a3eb8b08c3b504b1cb4187d048c9302ff125be44e33294cc516851ab1b8e
    Image:         ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
    Image ID:      docker-pullable://ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python@sha256:b7cf74d710b0c33b9b65aaa64e82f0cef962398561700c34b9a47b5ec7068aa4
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      -a
      /autoinstrumentation/.
      /otel-auto-instrumentation/
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 23 Feb 2023 03:21:51 +0000
      Finished:     Thu, 23 Feb 2023 03:21:51 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /otel-auto-instrumentation from opentelemetry-auto-instrumentation (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n8cgd (ro)
Containers:
  express-server:
    Container ID:   docker://9981853e867d432647dd26f0618ad2b4f57d86fcdfc3c98d6f85322a1040907e
    Image:          registry.ic.intra.nsfocus.com/test/example-app:latest
    Image ID:       docker-pullable://registry.ic.intra.nsfocus.com/test/example-app@sha256:18e1f61a1ecce9a22abe6f2ff6da074da4b3ef24808375c7dd51eea2b6e83109
    Port:           3000/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Thu, 23 Feb 2023 03:21:57 +0000
    Ready:          True
    Restart Count:  0
    Environment: #添加环境变量供sdk使用
      PYTHONPATH:                           /otel-auto-instrumentation/opentelemetry/instrumentation/auto_instrumentation:/otel-auto-instrumentation
      OTEL_TRACES_EXPORTER:                 otlp
      OTEL_EXPORTER_OTLP_TRACES_PROTOCOL:   http/protobuf
      OTEL_METRICS_EXPORTER:                otlp
      OTEL_EXPORTER_OTLP_METRICS_PROTOCOL:  http/protobuf
      OTEL_SERVICE_NAME:                    express-server
      OTEL_EXPORTER_OTLP_ENDPOINT:          http://otel-collector-collector.default.svc.cluster.local:4318
      OTEL_RESOURCE_ATTRIBUTES_POD_NAME:    express-server-8c5bdd7c7-x6grb (v1:metadata.name)
      OTEL_RESOURCE_ATTRIBUTES_NODE_NAME:    (v1:spec.nodeName)
      OTEL_PROPAGATORS:                     tracecontext,baggage,b3
      OTEL_TRACES_SAMPLER:                  always_on
      OTEL_RESOURCE_ATTRIBUTES:             k8s.container.name=express-server,k8s.deployment.name=express-server,k8s.namespace.name=application,k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME),k8s.replicaset.name=express-server-8c5bdd7c7
    Mounts:
      /otel-auto-instrumentation from opentelemetry-auto-instrumentation (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n8cgd (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  kube-api-access-n8cgd:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
  opentelemetry-auto-instrumentation:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:   <unset>
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

2.7 追踪过程及微服务拓扑关系可视化

可通过两种方式进行观测

  1. 通过Collector日志查看

​ 针对微服务发送一个请求,观察Collector日志

devops@devops-node1:~$ curl -v 192.168.19.210:30099/rolldice
*   Trying 192.168.19.210:30099...
* TCP_NODELAY set
* Connected to 192.168.19.210 (192.168.19.210) port 30099 (#0)
> GET /rolldice HTTP/1.1
> Host: 192.168.19.210:30099
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: Werkzeug/2.2.3 Python/3.7.16
< Date: Thu, 23 Feb 2023 07:04:53 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 1
< Connection: close
<
* Closing connection 0
5 #返回值
 
root@apisec-node-master:/home/nsfocus/puming/otel_operator/app# kubectl logs otel-collector-collector-c974987cf-gpbx5
2023-02-23T02:20:02.376Z        info    service/telemetry.go:92 Setting up own telemetry...
2023-02-23T02:20:02.376Z        info    service/telemetry.go:118        Serving Prometheus metrics      {"address": ":8888", "level": "Basic"}
2023-02-23T02:20:02.376Z        debug   extension/extension.go:150      Beta component. May change in the future.       {"kind": "extension", "name": "zpages"}
2023-02-23T02:20:02.376Z        debug   extension/extension.go:150      Beta component. May change in the future.       {"kind": "extension", "name": "health_check"}
2023-02-23T02:20:02.376Z        info    exporter/exporter.go:290        Development component. May change in the future.        {"kind": "exporter", "data_type": "traces", "name": "logging"}
2023-02-23T02:20:02.376Z        debug   exporter/exporter.go:288        Beta component. May change in the future.       {"kind": "exporter", "data_type": "traces", "name": "jaeger"}
2023-02-23T02:20:02.377Z        debug   processor/processor.go:302      Beta component. May change in the future.       {"kind": "processor", "name": "resource", "pipeline": "traces"}
2023-02-23T02:20:02.377Z        debug   processor/processor.go:302      Stable component.       {"kind": "processor", "name": "batch", "pipeline": "traces"}
2023-02-23T02:20:02.377Z        debug   receiver/receiver.go:309        Stable component.       {"kind": "receiver", "name": "otlp", "pipeline": "traces"}
2023-02-23T02:20:02.379Z        info    service/service.go:123  Starting otelcol...     {"Version": "0.69.0", "NumCPU": 16}
2023-02-23T02:20:02.379Z        info    extensions/extensions.go:42     Starting extensions...
2023-02-23T02:20:02.379Z        info    extensions/extensions.go:45     Extension is starting...        {"kind": "extension", "name": "health_check"}
...
2023-02-23T02:20:02.380Z        info    service/pipelines.go:89 Starting processors...
...
2023-02-23T02:20:02.381Z        info    otlpreceiver@v0.69.0/otlp.go:94 Starting GRPC server    {"kind": "receiver", "name": "otlp", "pipeline": "traces", "endpoint": "0.0.0.0:4317"}
2023-02-23T02:20:02.381Z        info    otlpreceiver@v0.69.0/otlp.go:112        Starting HTTP server    {"kind": "receiver", "name": "otlp", "pipeline": "traces", "endpoint": "0.0.0.0:4318"}
2023-02-23T02:20:02.381Z        info    service/service.go:140  Everything is ready. Begin running and processing data.
2023-02-23T02:20:03.381Z        info    jaegerexporter@v0.69.0/exporter.go:181  State of the connection with the Jaeger Collector backend       {"kind": "exporter", "data_type": "traces", "name": "jaeger", "state": "READY"}
2023-02-23T02:26:56.231Z        info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "logging", "#spans": 4} #可以看到有新的数据
...
  1. 通过Jaeger查看

服务调用链详情

image
图1. 服务调用链路详情1

服务调用链详情

image
图2. 服务调用链路详情2

服务调用拓扑

image
图3. 调用链路拓扑

三. 自动检测过程注入原理及注意事项

3.1 注入原理

Opentelemetry Operator实现了一个Mutation Adminssion Webhook, 在Pod更新或创建时触发执行Hook,该Hook会更改Pod资源注入一个init_container,该container目的是将Opentelemetry SDK CP到Pod业务容器中,根据开发语言不同,注入不同开发语言的SDK。

3.2 注意事项

需要注意的是,Opentelemetry并非对每个开发语言使用的框架都支持,但主流的基本都支持,比如python sdk中,可以看到如django、flask这些都是支持的,如下所示:

root@express-server-8c5bdd7c7-x6grb:/usr/src/app# pip list
Package                                       Version
--------------------------------------------- ---------
asgiref                                       3.6.0
backoff                                       2.2.1
certifi                                       2022.12.7
charset-normalizer                            3.0.1
click                                         8.1.3
Deprecated                                    1.2.13
Flask                                         2.2.3
googleapis-common-protos                      1.58.0
idna                                          3.4
importlib-metadata                            6.0.0
itsdangerous                                  2.1.2
Jinja2                                        3.1.2
MarkupSafe                                    2.1.2
opentelemetry-api                             1.15.0
opentelemetry-distro                          0.36b0
opentelemetry-exporter-otlp-proto-http        1.15.0
opentelemetry-instrumentation                 0.36b0
opentelemetry-instrumentation-aio-pika        0.36b0
opentelemetry-instrumentation-aiohttp-client  0.36b0
opentelemetry-instrumentation-aiopg           0.36b0
opentelemetry-instrumentation-asgi            0.36b0
opentelemetry-instrumentation-asyncpg         0.36b0
opentelemetry-instrumentation-boto            0.36b0
opentelemetry-instrumentation-boto3sqs        0.36b0
opentelemetry-instrumentation-botocore        0.36b0
opentelemetry-instrumentation-celery          0.36b0
opentelemetry-instrumentation-confluent-kafka 0.36b0
opentelemetry-instrumentation-dbapi           0.36b0
opentelemetry-instrumentation-django          0.36b0
opentelemetry-instrumentation-elasticsearch   0.36b0
opentelemetry-instrumentation-falcon          0.36b0
opentelemetry-instrumentation-fastapi         0.36b0
opentelemetry-instrumentation-flask           0.36b0
opentelemetry-instrumentation-grpc            0.36b0
opentelemetry-instrumentation-httpx           0.36b0
opentelemetry-instrumentation-jinja2          0.36b0
opentelemetry-instrumentation-kafka-python    0.36b0
opentelemetry-instrumentation-logging         0.36b0
opentelemetry-instrumentation-mysql           0.36b0
opentelemetry-instrumentation-pika            0.36b0
opentelemetry-instrumentation-psycopg2        0.36b0
opentelemetry-instrumentation-pymemcache      0.36b0
opentelemetry-instrumentation-pymongo         0.36b0
opentelemetry-instrumentation-pymysql         0.36b0
opentelemetry-instrumentation-pyramid         0.36b0
opentelemetry-instrumentation-redis           0.36b0
opentelemetry-instrumentation-requests        0.36b0
opentelemetry-instrumentation-sklearn         0.36b0
opentelemetry-instrumentation-sqlalchemy      0.36b0
opentelemetry-instrumentation-sqlite3         0.36b0
opentelemetry-instrumentation-starlette       0.36b0
opentelemetry-instrumentation-tornado         0.36b0
opentelemetry-instrumentation-tortoiseorm     0.36b0
opentelemetry-instrumentation-urllib          0.36b0
opentelemetry-instrumentation-urllib3         0.36b0
opentelemetry-instrumentation-wsgi            0.36b0
opentelemetry-propagator-aws-xray             1.0.1
opentelemetry-propagator-b3                   1.15.0
opentelemetry-propagator-jaeger               1.15.0
opentelemetry-propagator-ot-trace             0.36b0
opentelemetry-proto                           1.15.0
opentelemetry-sdk                             1.15.0
opentelemetry-semantic-conventions            0.36b0
opentelemetry-util-http                       0.36b0
packaging                                     23.0
pip                                           22.0.4
protobuf                                      4.21.12
requests                                      2.28.2
setuptools                                    67.0.0
typing_extensions                             4.4.0
urllib3                                       1.26.14
Werkzeug                                      2.2.3
wheel                                         0.38.4
wrapt                                         1.14.1
zipp                                          3.14.0

此外使用自动注入机制需要重启应用,目前无法动态加载检测机制

四. 参考文献

https://github.com/open-telemetry/opentelemetry-operator/blob/main/README.md

https://cert-manager.io/docs/installation/supported-releases/

https://cert-manager.io/docs/installation/

https://cert-manager.io/docs/installation/kubectl/

https://cert-manager.io/docs/installation/verify/

https://medium.com/opentelemetry/using-opentelemetry-auto-instrumentation-agents-in-kubernetes-869ec0f42377

https://www.aspecto.io/blog/opentelemetry-operator/#Auto-Instrumentation-with-OpenTelemetry-Operator

https://github.com/open-telemetry/opentelemetry-operator/tree/main/autoinstrumentation/python

https://ithelp.ithome.com.tw/articles/10291371

https://isitobservable.io/open-telemetry/how-to-observe-your-kubernetes-cluster-with-opentelemetry

insturmentation crd api config: https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#instrumentationspec

k8s属性配置 https://opentelemetry.io/docs/reference/specification/resource/semantic_conventions/k8s/

「真诚赞赏,手留余香」

Ming Blog

真诚赞赏,手留余香