kubernetes中operator-prometheus、prometheus远程读写实现
- remote read、write优点
- 环境准备
- 启动监控数据database
- prometheus配置
- prometheus权限模型定义
- 定义prometheus配置文件configmap
- prometheus 部署
- 部署
- prometheus operator配置
- adapter.yaml
- prometheus-operator中prometheus的部署文件添加remoteRead、remoteWrite
- 部署
- 验证
remote read、write优点
- 长期持久化监控数据
- 集中管理多个prometheus中的监控数据
- 对外提供统一的监控数据出口
环境准备
准备两套k8s集群, 分别部署prometheus prometheus-operator
docker server version |
kubernetes version |
1.13.1 |
v1.8.6 |
启动监控数据database
以下以opentsdb作为存储prometheus监控数据的database。注:opentsdb较高版本对于带有”:”的key或value并不支持。
eg:
1 2 3 4 5 6 7 8 9
| { "aggregator": "sum", "metric": "cpu", "tags": { "demo-name": "opentsdb-test", "host": "localhost:9090", "try-name": "bluebreezecf-sample" } }
|
为了方便测试,下面以容器的方式启动opentsdb数据库
1
| docker run -d -p 4242:4242 -p 8070:8070 stackexchange/bosun:0.6.0-pre
|
prometheus配置
prometheus权限模型定义
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: prometheus-colletor rules: - apiGroups: - '*' resources: - '*' verbs: - '*' - nonResourceURLs: - '*' verbs: - '*' --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: prometheus-colletor roleRef: kind: ClusterRole name: kube-system apiGroup: rbac.authorization.k8s.io subjects: - kind: ServiceAccount name: prometheus-colletor namespace: prometheus-colletor
--- apiVersion: v1 kind: ServiceAccount metadata: name: prometheus-colletor
|
定义prometheus配置文件configmap
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
| apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: kube-system data: prometheus.yml: | global: scrape_interval: 30s scrape_timeout: 30s evaluation_interval: 30s remote_write: - url: http://127.0.0.1:9201/write remote_read: - url: http://127.0.0.1:9201/read rule_files: - "/etc/prometheus-rules/*.rules" scrape_configs: - job_name: 'kubernetes-apiservers' kubernetes_sd_configs: - role: endpoints scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https
- job_name: 'kubernetes-services' metrics_path: /probe params: module: [http_2xx] kubernetes_sd_configs: - role: service relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe] action: keep regex: true - source_labels: [__address__] target_label: __param_target - target_label: __address__ replacement: blackbox - source_labels: [__param_target] target_label: instance - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] target_label: kubernetes_name - job_name: 'kubernetes-service-endpoints' kubernetes_sd_configs: - role: endpoints relabel_configs: - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace target_label: __scheme__ regex: (https?) - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path] action: replace target_label: __metrics_path__ regex: (.+) - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] action: replace target_label: __address__ regex: ([^:]+)(?::\d+)?;(\d+) replacement: $1:$2 - action: labelmap regex: __meta_kubernetes_service_label_(.+) - source_labels: [__meta_kubernetes_namespace] action: replace target_label: kubernetes_namespace - source_labels: [__meta_kubernetes_service_name] action: replace target_label: kubernetes_name
|
prometheus 部署
以下prometheus.yaml文件中包含了prometheus-server、prometheus-adapter。adapter在此功能中担任的角色相当于一个装换器,write的时候将prometheus采集的数据格式装换为database的记录,read的时候再将database记录转换为prometheus类型的数据格式。对于opentsdb官网并没有提供带有remote_read的adapter。为此我们扩展了官网的opentsdb库增加了remote_read功能。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
| apiVersion: extensions/v1beta1 kind: Deployment metadata: labels: name: prometheus app: prometheus-server name: prometheus namespace: wisecloud-agent spec: replicas: 1 template: metadata: labels: name: prometheus app: prometheus-server spec: serviceAccountName: wisecloud-agent containers: - name: prometheus image: prom/prometheus:v2.3.1 imagePullPolicy: IfNotPresent command: - "/bin/prometheus" args: - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.path=/prometheus/" - "--storage.tsdb.retention=168h" # - "--alertmanager.url=http://127.0.0.1:9093" ports: - containerPort: 9090 protocol: TCP volumeMounts: - name: data mountPath: "/prometheus" - name: config-volume mountPath: "/etc/prometheus" - name: alert-roles mountPath: "/etc/prometheus-rules" - name: local-timezone mountPath: "/etc/localtime" resources: requests: cpu: 100m memory: 100Mi limits: cpu: 500m memory: 2500Mi - name: prometheus-adapter image: registry.cn-hangzhou.aliyuncs.com/tder/prometheus-adapter:v2.3 ports: - containerPort: 9021 protocol: TCP args: - -opentsdb-url=http://xx.xx.xx.xx:4242 // xx.xx.xx.xx为database运行主机ip #- name: altermanager # image: prom/alertmanager:v0.15.0 # imagePullPolicy: IfNotPresent # args: # - '--config.file=/etc/alertmanager/config.yml' # - '--storage.path=/alertmanager' # ports: # - containerPort: 9093 # protocol: TCP # volumeMounts: # - name: alert-conf # mountPath: /etc/alertmanager # - name: local-timezone # mountPath: "/etc/localtime" tolerations: - key: "node-role.kubernetes.io/master" operator: "Exists" effect: "NoSchedule" volumes: - name: data hostPath: path: /prometheus - name: alert-roles hostPath: path: /etc/prometheus-rules - name: alert-conf hostPath: path: /etc/alertmanager - name: config-volume configMap: name: prometheus-config - name: local-timezone hostPath: path: /etc/localtime --- apiVersion: v1 kind: Service metadata: name: prometheus-server namespace: kube-system spec: selector: app: prometheus-server ports: - protocol: TCP port: 9090 targetPort: 9090 nodePort: 32005 type: NodePort
|
注: 高版本的prometheus在配置storage选项的时候会报permission错误,测试可以给对应的目录设置所有人员可读写模式。如上: chmod 777 /prometheus
部署
1
| kubectl apply -f prometheus.yaml
|
prometheus operator配置
adapter.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
| apiVersion: apps/v1beta2 kind: Deployment metadata: name: prometheus-adapter namespace: monitoring labels: app: prometheus-adapter spec: replicas: 1 selector: matchLabels: app: prometheus-adapter template: metadata: labels: app: prometheus-adapter spec: containers: - name: adapter image: registry.cn-hangzhou.aliyuncs.com/tder/prometheus-adapter:v2.3 args: - -opentsdb-url=http://yy.yy.yy.yy:4242 //yy.yy.yy.yy为database运行主机ip --- apiVersion: v1 kind: Service metadata: name: prometheus-adapter namespace: monitoring spec: selector: app: prometheus-adapter ports: - protocol: TCP port: 9201 targetPort: 9201 nodePort: 30902 type: NodePort
|
prometheus-operator中prometheus的部署文件添加remoteRead、remoteWrite
被修改的文件为prometheus-prometheus.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
| apiVersion: monitoring.coreos.com/v1 kind: Prometheus metadata: labels: prometheus: k8s name: k8s namespace: monitoring spec: alerting: alertmanagers: - name: alertmanager-main namespace: monitoring port: web baseImage: quay.io/prometheus/prometheus nodeSelector: beta.kubernetes.io/os: linux replicas: 2 remoteRead: - url: http://xx.xx.xx.xx:30902/read remoteWrite: - url: http://xx.xx.xx.xx:30902/write resources: requests: memory: 400Mi ruleSelector: matchLabels: prometheus: k8s role: alert-rules serviceAccountName: prometheus-k8s serviceMonitorNamespaceSelector: {} serviceMonitorSelector: {} version: v2.3.1
|
注: 目前官方提供的opentsdb的remote_write在prometheus-operator中支持有问题,时间长了导致写入数据失败。
部署
1 2
| kubectl apply -f adapter.yaml kubectl apply -f prometheus-prometheus.yaml
|
验证
write
通过查看adapter日志可以看到是否写入数据到database
read
访问prometheus提供的ui界面, 默认情况prometheus并没有暴露端口。需要验证的话用户可以自己修改prometheus的service暴露9090对外可访问