自定义 Insight 组件调度策略¶
当部署可观测平台 Insight 到 Kubernetes 环境时,正确的资源管理和优化至关重要。 Insight 包含多个核心组件,如 Prometheus、OpenTelemetry、FluentBit、Vector、Elasticsearch 等, 这些组件在运行过程中可能因为资源占用问题对集群内其他 Pod 的性能产生负面影响。 为了有效地管理资源并优化集群的运行,节点亲和性成为一项重要的配置选项。
本文将重点探讨如何通过污点和节点亲和性的配置策略,使得每个组件能够在适当的节点上运行, 并避免资源竞争或争用,从而确保整个 Kubernetes 集群的稳定性和高效性。
通过污点为 Insight 配置专有节点¶
由于 Insight Agent 包含了 DaemonSet 组件,所以本节所述的配置方式是让除了 Insight DameonSet 之外的其余组件均运行在专有节点上。
该方式是通过为专有节点添加污点(taint),并配合污点容忍度(tolerations)来实现的。 更多细节可以参考 Kubernetes 官方文档。
可以参考如下命令为节点添加及移除污点:
# 添加污点
kubectl taint nodes worker1 node.daocloud.io=insight-only:NoSchedule
# 移除污点
kubectl taint nodes worker1 node.daocloud.io:NoSchedule-
有以下两种途径让 Insight 组件调度至专有节点:
1. 为每个组件添加污点容忍度¶
针对 insight-server 和 insight-agent 两个 Chart 分别进行配置:
server:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
ui:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
runbook:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
# mysql:
victoria-metrics-k8s-stack:
  victoria-metrics-operator:
    tolerations:
      - key: "node.daocloud.io"
        operator: "Equal"
        value: "insight-only"
        effect: "NoSchedule"
  vmcluster:
    spec:
      vmstorage:
        tolerations:
          - key: "node.daocloud.io"
            operator: "Equal"
            value: "insight-only"
            effect: "NoSchedule"
      vmselect:
        tolerations:
          - key: "node.daocloud.io"
            operator: "Equal"
            value: "insight-only"
            effect: "NoSchedule"
      vminsert:
        tolerations:
          - key: "node.daocloud.io"
            operator: "Equal"
            value: "insight-only"
            effect: "NoSchedule"
  vmalert:
    spec:
      tolerations:
        - key: "node.daocloud.io"
          operator: "Equal"
          value: "insight-only"
          effect: "NoSchedule"
  alertmanager:
    spec:
      tolerations:
        - key: "node.daocloud.io"
          operator: "Equal"
          value: "insight-only"
          effect: "NoSchedule"
jaeger:
  collector:
    tolerations:
      - key: "node.daocloud.io"
        operator: "Equal"
        value: "insight-only"
        effect: "NoSchedule"
  query:
    tolerations:
      - key: "node.daocloud.io"
        operator: "Equal"
        value: "insight-only"
        effect: "NoSchedule"
opentelemetry-collector-aggregator:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
opentelemetry-collector:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
grafana-operator:
  operator:
    tolerations:
      - key: "node.daocloud.io"
        operator: "Equal"
        value: "insight-only"
        effect: "NoSchedule"
  grafana:
    tolerations:
      - key: "node.daocloud.io"
        operator: "Equal"
        value: "insight-only"
        effect: "NoSchedule"
kibana:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
elastic-alert:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
vector:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
kube-prometheus-stack:
  prometheus:
    prometheusSpec:
      tolerations:
        - key: "node.daocloud.io"
          operator: "Equal"
          value: "insight-only"
          effect: "NoSchedule"
  prometheus-node-exporter:
    tolerations:
      - effect: NoSchedule
        operator: Exists
  prometheusOperator:
    tolerations:
      - key: "node.daocloud.io"
        operator: "Equal"
        value: "insight-only"
        effect: "NoSchedule"
kube-state-metrics:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
opentelemetry-operator:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
opentelemetry-collector:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
tailing-sidecar-operator:
  operator:
    tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
opentelemetry-kubernetes-collector:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
prometheus-blackbox-exporter:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule"
etcd-exporter:
  tolerations:
    - key: "node.daocloud.io"
      operator: "Equal"
      value: "insight-only"
      effect: "NoSchedule" 
2. 通过命名空间级别配置¶
让 insight-system 命名空间的 Pod 都容忍 node.daocloud.io=insight-only 污点。
- 
调整 apiserver的配置文件/etc/kubernetes/manifests/kube-apiserver.yaml,放开PodTolerationRestriction,PodNodeSelector, 参考下图: 
- 
给 insight-system命名空间增加注解:
重启 insight-system 命名空间下面的组件即可正常容忍 insight-system 下的 Pod 调度。
为节点添加 Label 和节点亲和性来管理组件调度¶
Info
节点亲和性概念上类似于 nodeSelector,它使你可以根据节点上的 标签(label) 来约束 Pod 可以调度到哪些节点上。 
节点亲和性有两种:
- requiredDuringSchedulingIgnoredDuringExecution:调度器只有在规则被满足的时候才能执行调度。此功能类似于 nodeSelector, 但其语法表达能力更强。
- preferredDuringSchedulingIgnoredDuringExecution:调度器会尝试寻找满足对应规则的节点。如果找不到匹配的节点,调度器仍然会调度该 Pod。
更过细节请参考 kubernetes 官方文档。
为了实现不同用户对 Insight 组件调度的灵活需求,Insight 分别提供了较为细粒度的 Label 来实现不同组件的调度策略,以下是标签与组件的关系说明:
| 标签 Key | 标签 Value | 说明 | 
|---|---|---|
| node.daocloud.io/insight-any | 任意值,推荐用 true | 代表 Insight 所有组件优先考虑带了该标签的节点 | 
| node.daocloud.io/insight-prometheus | 任意值,推荐用 true | 特指 Prometheus 组件 | 
| node.daocloud.io/insight-vmstorage | 任意值,推荐用 true | 特指 VictoriaMetrics vmstorage 组件 | 
| node.daocloud.io/insight-vector | 任意值,推荐用 true | 特指 Vector 组件 | 
| node.daocloud.io/insight-otel-col | 任意值,推荐用 true | 特指 OpenTelemetry 组件 | 
可以参考如下命令为节点添加及移除标签:
# 为 node8 添加标签,先将 insight-prometheus 调度到 node8 
kubectl label nodes node8 node.daocloud.io/insight-prometheus=true
# 移除 node8 的 node.daocloud.io/insight-prometheus 标签
kubectl label nodes node8 node.daocloud.io/insight-prometheus-
以下是 insight-prometheus 组件在部署时默认的亲和性偏好:
affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - preference:
        matchExpressions:
        - key: node-role.kubernetes.io/control-plane
          operator: DoesNotExist
      weight: 1
    - preference:
        matchExpressions:
        - key: node.daocloud.io/insight-prometheus # (1)!
          operator: Exists
      weight: 2
    - preference:
        matchExpressions:
        - key: node.daocloud.io/insight-any
          operator: Exists
      weight: 3
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          podAffinityTerm:
            topologyKey: kubernetes.io/hostname
            labelSelector:
              matchExpressions:
                - key: app.kubernetes.io/instance
                  operator: In
                  values:
                    - insight-agent-kube-prometh-prometheus
- 先将 insight-prometheus 调度到带有 node.daocloud.io/insight-prometheus 标签的节点