prometheus迁移存储为ceph

在Rook-Ceph集群中创建基于块存储的StorageClass

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 4
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
provisioner: ceph.rook.io/block
parameters:
blockPool: replicapool
clusterNamespace: rook-ceph
fstype: xfs
reclaimPolicy: Retain

调整Prometheus-opretor中的prometheus的存储

修改prometheus的存储StorageClass 为 rook-ceph-block

kubectl edit prometheuses.monitoring.coreos.com c2-monitor-prometheus-oper-prometheus -n admin

1
2
3
4
5
6
7
8
9
10
...
storage:
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 30Gi
storageClassName: rook-ceph-block

如果修改后没有效果,则需要去删除掉以前绑定的pvc后再重启Prometheus Pod

  • 注意在删除老pvc之前注意以前历史数据存储的位置,如果保留策略为delete则需要备份历史数据

检查prometheus是否重启成功

若prometheus重启正常且底层存储成功切换成ceph的StorageClass存储,就可以开始迁移历史数据到prometheus中

1
2
3
4
5
# kubectl get pod -n admin | grep prome
prometheus-c2-monitor-prometheus-oper-prometheus-0 3/3 Running 0 2h

# kubectl get pv -n admin | grep 30Gi
pvc-77d9ba2c-09ac-11ea-8b13-005056925f82 30Gi RWX Retain Bound admin/prometheus-c2-monitor-prometheus-oper-prometheus-db-prometheus-c2-monitor-prometheus-oper-prometheus-0 rook-ceph-block 3h

历史数据迁移

首先定位到以前的老数据目录

1
2
3
[root@ prometheus-db]# ls
01DRPGYK0X5XKF4DDRQ09T5WPY 01DS1WW6WNN6EG3QCNRK87X3YJ 01DSDFNF35TRFPRGVR5Y9C05HN 01DSS2ES0G0D4WJQ2ZATW73S49 01DSWY1JG4099RSG4GFQF07MJ0 01DSXJMP7FDV0W3MPGWYHBR909 01DSY0C2WCV86QX8HZC10C20RA
01DRW3FJB9DEGMEKWEWP6TFMAK 01DS7P8SK48ZRAM6W2QE578QAP 01DSK921QVSVTZ03HA8JWZ9DR7 01DSV08CZJY5SNMHDSSCZP1N0X 01DSXJMMBPT9A4FH6QSFZ7ZFEP 01DSXSGBMCA55462EQ46FKGPGM wal

删除掉历史数据中的wal目录,rm wal

借助kubectl cp命令 或者 docker cp 完成历史数据到容器中的迁移

1
docker cp prometheus-db/. ca58c0c65cef:/prometheus

完成复制命令后重启prometheus的pod,待pod正常启动后完成数据迁移

1
2
3
4
5
6
7
8
# 进入prometheus容器中检查迁移是否完成
/prometheus $ ls
01DRPGYK0X5XKF4DDRQ09T5WPY 01DS1WW6WNN6EG3QCNRK87X3YJ 01DSDFNF35TRFPRGVR5Y9C05HN 01DSS2ES0G0D4WJQ2ZATW73S49 01DSWY1JG4099RSG4GFQF07MJ0 01DSYE3KTWWCGX0MSY4ZNKRMDJ wal
01DRW3FJB9DEGMEKWEWP6TFMAK 01DS7P8SK48ZRAM6W2QE578QAP 01DSK921QVSVTZ03HA8JWZ9DR7 01DSV08CZJY5SNMHDSSCZP1N0X 01DSXJMP7FDV0W3MPGWYHBR909 01DSYE3N4ZMK606TWRENR57GG8

/prometheus $ df -h
Filesystem Size Used Available Use% Mounted on
/dev/rbd0 30.0G 1.9G 28.1G 6% /prometheus

当数据出现问题导致prometheus无法启动的解决方法

当挂载存储容器prometheus异常时,我们没法直接通过prometheus的挂载去修改问题数据,此时我们可以通过手动新建一个容器来挂载问题pvc,删除错误数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: v1
kind: Pod
metadata:
name: busy-box-test1
namespace: admin
spec:
restartPolicy: OnFailure
containers:
- name: busy-box-test1
image: busybox
volumeMounts:
- name: busy-box-test-pv1
mountPath: /mnt/busy-box
command: ["sleep", "60000"]
volumes:
- name: busy-box-test-pv1
persistentVolumeClaim:
# 有异常数据的pvc
claimName: prometheus-c2-monitor-prometheus-oper-prometheus-db-prometheus-c2-monitor-prometheus-oper-prometheus-0

如果pv的绑定模式为ReadWriteOnce时,可能会导致busy-box无法正常启动,此时我们关闭掉其他挂载此pvc的pod即可完成启动。