Prometheus-operator获取k8s指标失败 Prometheus-operator获取kubelet指标失败的解决方法 问题描述 使用Prometheus-operator chart搭建完成后发现部分集群的kubelet采集job拉取指标失败 显示500错误
问题可能发生的原因 由于k8s集群各个节点未开启kubelet组件采集的权限导致
问题解决 参考链接 修改各个节点位于/etc/systemd/system/kubelet.service.d/10-kubeadm.conf位置的kubelet配置文件 修改命令如下,记得修改前进行配置文件的备份
1 2 3 4 5 6 7 KUBEADM_SYSTEMD_CONF=/etc/systemd/system/kubelet.service.d/10-kubeadm.conf sed -e "/cadvisor-port=0/d" -i "$KUBEADM_SYSTEMD_CONF " if ! grep -q "authentication-token-webhook=true" "$KUBEADM_SYSTEMD_CONF " ; then sed -e "s/--authorization-mode=Webhook/--authentication-token-webhook=true --authorization-mode=Webhook/" -i "$KUBEADM_SYSTEMD_CONF " fi systemctl daemon-reload systemctl restart kubelet
类似问题 prometheus采集kube-controller-manager 与 kube-scheduler 组件指标失败 由于kube-controller-manager和kube-scheduler 配置绑定的地址为127.0.0.1导致
1 2 3 sed -e "s/- --address=127.0.0.1/- --address=0.0.0.0/" -i /etc/kubernetes/manifests/kube-controller-manager.yaml sed -e "s/- --address=127.0.0.1/- --address=0.0.0.0/" -i /etc/kubernetes/manifests/kube-scheduler.yaml
另一种的解决方法 实际使用上发现prometheus-opretor自带的kubelet采集的cadvisor job有部分节点上的标签不会进行采集 可以替换为kubernetes-cadvisor job
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 - job_name: kubernetes-cadvisor kubernetes_sd_configs: - role: node scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - target_label: __address__ replacement: kubernetes.default.svc:443 - source_labels: [__meta_kubernetes_node_name ] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor metric_relabel_configs: - action: replace source_labels: [id ] regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$' target_label: rkt_container_name replacement: '${2}-${1}' - action: replace source_labels: [id ] regex: '^/system\.slice/(.+)\.service$' target_label: systemd_service_name replacement: '${1}'