高级配置
前面我们一起学习了如何在 Prometheus Operator 下面自定义一个监控项,以及自定义报警规则的使用。那么我们还能够直接使用前面课程中的自动发现功能吗?如果在我们的 Kubernetes 集群中有了很多的 Service/Pod,那么我们都需要一个一个的去建立一个对应的 ServiceMonitor
或 PodMonitor
对象来进行监控吗?这样岂不是又变得麻烦起来了?
自动发现配置
为解决上面的问题,Prometheus Operator 为我们提供了一个额外的抓取配置来解决这个问题,我们可以通过添加额外的配置来进行服务发现进行自动监控。和前面自定义的方式一样,我们可以在 Prometheus Operator 当中去自动发现并监控具有 prometheus.io/scrape=true
这个 annotations
的 Service,之前我们定义的 Prometheus 的配置如下:
- job_name: 'endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs: # 指标采集之前或采集过程中去重新配置
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep # 保留具有 prometheus.io/scrape=true 这个注解的Service
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels:
[__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+) # RE2 正则规则,+是一次多多次,?是0次或1次,其中?:表示非匹配组(意思就是不获取匹配结果)
replacement: $1:$2
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
replacement: $1
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod
- source_labels: [__meta_kubernetes_node_name]
action: replace
target_label: kubernetes_node
如果你对上面这个配置还不是很熟悉的话,建议去查看下前面关于 Kubernetes 常用资源对象监控章节的介绍,要想自动发现集群中的 Service,就需要我们在 Service 的 annotation 区域添加 prometheus.io/scrape=true
的声明,将上面文件直接保存为 prometheus-additional.yaml
,然后通过这个文件创建一个对应的 Secret 对象:
$ kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
secret "additional-configs" created
然后我们需要在声明 prometheus 的资源对象文件中通过 additionalScrapeConfigs
属性添加上这个额外的配置:
# prometheus-prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.35.0
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- apiVersion: v2
name: alertmanager-main
namespace: monitoring
port: web
enableFeatures: []
externalLabels: {}
image: quay.io/prometheus/prometheus:v2.35.0
nodeSelector:
kubernetes.io/os: linux
podMetadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.35.0
podMonitorNamespaceSelector: {}
podMonitorSelector: {}
probeNamespaceSelector: {}
probeSelector: {}
replicas: 2
resources:
requests:
memory: 400Mi
ruleNamespaceSelector: {}
ruleSelector: {}
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: 2.35.0
additionalScrapeConfigs:
name: additional-configs
key: prometheus-additional.yaml
关于 additionalScrapeConfigs
属性的具体介绍,我们可以使用 kubectl explain
命令来了解详细信息:
$ kubectl explain prometheus.spec.additionalScrapeConfigs
KIND: Prometheus
VERSION: monitoring.coreos.com/v1
RESOURCE: additionalScrapeConfigs <Object>
DESCRIPTION:
AdditionalScrapeConfigs allows specifying a key of a Secret containing
additional Prometheus scrape configurations. Scrape configurations
specified are appended to the configurations generated by the Prometheus
Operator. Job configurations specified must have the form as specified in
the official Prometheus documentation:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config.
As scrape configs are appended, the user is responsible to make sure it is
valid. Note that using this feature may expose the possibility to break
upgrades of Prometheus. It is advised to review Prometheus release notes to
ensure that no incompatible scrape configs are going to break Prometheus
after the upgrade.
FIELDS:
key <string> -required-
The key of the secret to select from. Must be a valid secret key.
name <string>
Name of the referent. More info:
https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind, uid?
optional <boolean>
Specify whether the Secret or its key must be defined
添加完成后,直接更新 prometheus 这个 CRD 资源对象即可:
$ kubectl apply -f prometheus-prometheus.yaml
prometheus.monitoring.coreos.com "k8s" configured
隔一小会儿,可以前往 Prometheus 的 Dashboard 中查看配置已经生效了:
但是我们切换到 targets 页面下面却并没有发现对应的监控任务,查看 Prometheus 的 Pod 日志:
$ kubectl logs -f prometheus-k8s-0 prometheus -n monitoring
......
ts=2022-05-26T09:34:30.845Z caller=klog.go:108 level=warn component=k8s_client_runtime func=Warningf msg="pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" at the cluster scope"
ts=2022-05-26T09:34:30.845Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" at the cluster scope"
ts=2022-05-26T09:34:40.552Z caller=klog.go:108 level=warn component=k8s_client_runtime func=Warningf msg="pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" at the cluster scope"
ts=2022-05-26T09:34:40.552Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" at the cluster scope"
可以看到有很多错误日志出现,都是 xxx is forbidden
,这说明是 RBAC 权限的问题,通过 prometheus 资源对象的配置可以知道 Prometheus 绑定了一个名为 prometheus-k8s
的 ServiceAccount 对象,而这个对象绑定的是一个名为 prometheus-k8s
的 ClusterRole:
# prometheus-clusterRole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.35.0
name: prometheus-k8s
rules:
- apiGroups:
- ''
resources:
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
上面的权限规则中我们可以看到明显没有对 Service 或者 Pod 的 list
权限,所以报错了,要解决这个问题,我们只需要添加上需要的权限即可:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.35.0
name: prometheus-k8s
rules:
- apiGroups:
- ''
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- ''
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
更新上面的 ClusterRole
这个资源对象,然后重建下 Prometheus 的所有 Pod,正常就可以看到 targets 页面下面有 endpoints
这个监控任务了:
这里发现的几个抓取目标是因为 Service 中都有 prometheus.io/scrape=true
这个 annotation。
数据持久化
上面我们在修改完权限的时候,重启了 Prometheus 的 Pod,如果我们仔细观察的话会发现我们之前采集的数据已经没有了,这是因为我们通过 prometheus 这个 CRD 创建的 Prometheus 并没有做数据的持久化,我们可以直接查看生成的 Prometheus Pod 的挂载情况就清楚了:
$ kubectl get pod prometheus-k8s-0 -n monitoring -o yaml
......
volumeMounts:
- mountPath: /prometheus
name: prometheus-k8s-db
......
volumes:
......
- emptyDir: {}
name: prometheus-k8s-db
......
我们可以看到 Prometheus 的数据目录 /prometheus
实际上是通过 emptyDir
进行挂载的,我们知道 emptyDir 挂载的数据的生命周期和 Pod 生命周期一致的,所以如果 Pod 挂掉了,数据也就丢失了,这也就是为什么我们重建 Pod 后之前的数据就没有了的原因,对应线上的监控数据肯定需要做数据的持久化的,同样的 prometheus 这个 CRD 资源也为我们提供了数据持久化的配置方法,由于我们的 Prometheus 最终是通过 Statefulset 控制器进行部署的,所以我们这里通过 StorageClass
来做数据持久化,此外由于 Prometheus 本身对 NFS 存储没有做相关的支持,所以线上一定不要用 NFS 来做数据持久化,对于如何去为 prometheus 这个 CRD 对象配置存储数据,我们可以去查看官方文档 API,也可以用 kubectl explain
命令去了解:
$ kubectl explain prometheus.spec.storage
KIND: Prometheus
VERSION: monitoring.coreos.com/v1
RESOURCE: storage <Object>
DESCRIPTION:
Storage spec to specify how storage shall be used.
FIELDS:
disableMountSubPath <boolean>
Deprecated: subPath usage will be disabled by default in a future release,
this option will become unnecessary. DisableMountSubPath allows to remove
any subPath usage in volume mounts.
emptyDir <Object>
EmptyDirVolumeSource to be used by the Prometheus StatefulSets. If
specified, used in place of any volumeClaimTemplate. More info:
https://kubernetes.io/docs/concepts/storage/volumes/#emptydir
ephemeral <Object>
EphemeralVolumeSource to be used by the Prometheus StatefulSets. This is a
beta field in k8s 1.21, for lower versions, starting with k8s 1.19, it
requires enabling the GenericEphemeralVolume feature gate. More info:
https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes
volumeClaimTemplate <Object>
A PVC spec to be used by the Prometheus StatefulSets.
所以我们在 prometheus 的 CRD 对象中通过 storage
属性配置 volumeClaimTemplate
对象即可:
# prometheus-prometheus.yaml
......
storage:
volumeClaimTemplate:
spec:
storageClassName: local-path
resources:
requests:
storage: 20Gi
然后更新 prometheus 这个 CRD 资源,更新完成后会自动生成两个 PVC 和 PV 资源对象:
$ kubectl apply -f prometheus-prometheus.yaml
prometheus.monitoring.coreos.com/k8s configured
$ kubectl get pvc -n monitoring
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-k8s-db-prometheus-k8s-0 Bound pvc-4c67d0a9-e97c-4820-9c66-340a7da3c53f 20Gi RWO local-path 4s
prometheus-k8s-db-prometheus-k8s-1 Bound pvc-6f26e85c-01c6-483c-9d42-55166f77e5d0 20Gi RWO local-path 4s
$ kubectl get pv |grep monitoring
pvc-4c67d0a9-e97c-4820-9c66-340a7da3c53f 20Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-0 local-path 17s
pvc-6f26e85c-01c6-483c-9d42-55166f77e5d0 20Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-1 local-path 17s
现在我们再去看 Prometheus Pod 的数据目录就可以看到是关联到一个 PVC 对象上了:
$ kubectl get pod prometheus-k8s-0 -n monitoring -o yaml
......
volumeMounts:
- mountPath: /prometheus
name: prometheus-k8s-db
......
volumes:
- name: prometheus-k8s-db
persistentVolumeClaim:
claimName: prometheus-k8s-db-prometheus-k8s-0
......
现在即使我们的 Pod 挂掉了,数据也不会丢失了。到这里 Prometheus Operator 的一些基本配置就算完成了,对于大型的监控集群还需要做一些其他配置,比如前面我们学习的使用 Thanos
和 VictorialMetrics
来做 Prometheus 集群的高可用以及数据远程存储,对于 Prometheus Operator 来说,要配置 Thanos
也比较简单,因为 prometheus 这个 CRD 对象本身也支持的。
关于 prometheus operator 中如何配置 thanos,可以查看官方文档的介绍:https://github.com/coreos/prometheus-operator/blob/master/Documentation/thanos.md。
但是 Prometheus Operator 没有提供对 VictorialMetrics
的支持,不过 VM Operator 可以识别 Prometheus Operator 的 ServiceMonitor
、PodMonitor
、PrometheusRule
和 Probe
对象,如果我们使用的是 Prometheus Operator,然后想使用 VM 来做监控数据的远程存储的话,那我们只有通过去配置 Prometheus 的 remote-write 了,同样 prometheus 这个 crd 对象中也支持配置远程存储。
$ kubectl explain prometheus.spec.remoteWrite
KIND: Prometheus
VERSION: monitoring.coreos.com/v1
RESOURCE: remoteWrite <[]Object>
DESCRIPTION:
remoteWrite is the list of remote write configurations.
RemoteWriteSpec defines the configuration to write samples from Prometheus
to a remote endpoint.
FIELDS:
authorization <Object>
Authorization section for remote write
basicAuth <Object>
BasicAuth for the URL.
bearerToken <string>
Bearer token for remote write.
bearerTokenFile <string>
File to read bearer token for remote write.
headers <map[string]string>
Custom HTTP headers to be sent along with each remote write request. Be
aware that headers that are set by Prometheus itself can't be overwritten.
Only valid in Prometheus versions 2.25.0 and newer.
metadataConfig <Object>
MetadataConfig configures the sending of series metadata to the remote
storage.
name <string>
The name of the remote write queue, it must be unique if specified. The
name is used in metrics and logging in order to differentiate queues. Only
valid in Prometheus versions 2.15.0 and newer.
oauth2 <Object>
OAuth2 for the URL. Only valid in Prometheus versions 2.27.0 and newer.
proxyUrl <string>
Optional ProxyURL.
queueConfig <Object>
QueueConfig allows tuning of the remote write queue parameters.
remoteTimeout <string>
Timeout for requests to the remote write endpoint.
sendExemplars <boolean>
Enables sending of exemplars over remote write. Note that exemplar-storage
itself must be enabled using the enableFeature option for exemplars to be
scraped in the first place. Only valid in Prometheus versions 2.27.0 and
newer.
sigv4 <Object>
Sigv4 allows to configures AWS's Signature Verification 4
tlsConfig <Object>
TLS Config to use for remote write.
url <string> -required-
The URL of the endpoint to send samples to.
writeRelabelConfigs <[]Object>
The list of remote write relabel configurations.
这里我们就以前面 VM Operator 章节的 VM 集群为例来作为 Prometheus 的远程存储,整个集群的状态如下所示:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
grafana-fc7c7d476-5wgdh 1/1 Running 4 (30m ago) 8d
vmagent-vmagent-demo-6cb84cfc55-6p4sv 2/2 Running 4 (30m ago) 6d6h
vminsert-vmcluster-demo-6bbd664f6f-dn82f 1/1 Running 2 (30m ago) 6d5h
vminsert-vmcluster-demo-6bbd664f6f-xnx2c 1/1 Running 2 (30m ago) 6d5h
vmselect-vmcluster-demo-0 1/1 Running 2 (30m ago) 6d5h
vmselect-vmcluster-demo-1 1/1 Running 2 (30m ago) 6d5h
vmstorage-vmcluster-demo-0 1/1 Running 2 (30m ago) 6d5h
vmstorage-vmcluster-demo-1 1/1 Running 2 (30m ago) 6d5h
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
grafana ClusterIP 10.106.109.228 <none> 80/TCP 8d
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 61d
vmagent-vmagent-demo ClusterIP 10.106.129.4 <none> 8429/TCP 8d
vminsert-vmcluster-demo ClusterIP 10.97.108.211 <none> 8480/TCP 8d
vmselect-vmcluster-demo ClusterIP None <none> 8481/TCP 8d
vmstorage-vmcluster-demo ClusterIP None <none> 8482/TCP,8400/TCP,8401/TCP 8d
我们只需要将 vminsert
组件地址作为 Prometheus 的远程存储地址写入即可,在 prometheus-prometheus.yaml
文件中添加如下所示配置:
# prometheus-prometheus.yaml
......
remoteWrite:
- url: http://vminsert-vmcluster-demo.default:8480/insert/0/prometheus/
然后更新 prometheus 对象即可:
$ kubectl apply -f prometheus-prometheus.yaml
更新后 Prometheus 实例就会将数据远程写入到 VM 集群中去了,我们可以通过 vmselect
组件提供的 vmui
来验证数据是否接收到了:
图上我们查询的 node_load1{prometheus="monitoring/k8s", instance="node2"}
可以看到有两条一样的序列,这是因为有两个 Prometheus 实例都在将数据远程写入到 VM 中去,要想去重可以在 vmselect
与 vmstorage
组件中配置 -dedup.minScrapeInterval
参数,多复制因子模式下默认配置了该参数的。
这是因为 VM 去重机制的问题,我们需要将 Prometheus 两个实例的共同的额外标签清理掉才可以,只需要设置 replicaExternalLabelName
属性为空即可:
remoteWrite:
- url: http://vminsert-vmcluster-demo.default:8480/insert/0/prometheus/
replicaExternalLabelName: ''
更新后 Prometheus 全局配置中就会去掉默认的 prometheus_replica
标签了:
global:
scrape_interval: 30s
scrape_timeout: 10s
evaluation_interval: 30s
external_labels:
prometheus: monitoring/k8s
这个时候再去 vmui
中查看数据就已经去重了,只保留了一份数据:
关于 Prometheus Operator 的其他高级用法可以参考官方文档 https://prometheus-operator.dev 了解更多信息。