跳到主要内容

微服务模式

前面我们提到了 Loki 部署的单体模式和读写分离两种模式,当你的每天日志规模超过了 TB 的量级,那么可能我们就需要使用到微服务模式来部署 Loki 了。

微服务部署模式将 Loki 的组件实例化为不同的进程,每个进程都被调用并指定其目标,每个组件都会产生一个用于内部请求的 gRPC 服务器和一个用于外部 API 请求的 HTTP 服务。

  • ingester
  • distributor
  • query-frontend
  • query-scheduler
  • querier
  • index-gateway
  • ruler
  • compactor

微服务模式

将组件作为单独的微服务运行允许通过增加微服务的数量来进行扩展,定制的集群对各个组件具有更好的可观察性。微服务模式部署是最高效的 Loki 安装,但是,它们的设置和维护也是最复杂的。

对于超大的 Loki 集群或需要对扩展和集群操作进行更多控制的集群,建议使用微服务模式。

微服务模式最适合在 Kubernetes 集群中部署,提供了 Jsonnet 和 Helm Chart 两种安装方式。

安装

同样这里我们还是使用 Helm Chart 的方式来安装微服务模式的 Loki,在安装之前记得将前面章节安装的 Loki 相关服务删除。

首先获取微服务模式的 Chart 包:

$ helm repo add grafana https://grafana.github.io/helm-charts
$ helm pull grafana/loki-distributed --untar --version 0.48.4
$ cd loki-simple-scalable

该 Chart 包支持下表中显示的组件,Ingester、distributor、querier 和 query-frontend 组件是始终安装的,其他组件是可选的。

组件可选默认开启?
gateway
ingestern/a
distributorn/a
queriern/a
query-frontendn/a
table-manager
compactor
ruler
index-gateway
memcached-chunks
memcached-frontend
memcached-index-queries
memcached-index-writes

该 Chart 包在微服务模式下配置 Loki,已经过测试,可以与 boltdb-shippermemberlist 一起使用,而其他存储和发现选项也可以使用,但是,该图表不支持设置 Consul 或 Etcd 以进行发现,它们需要进行单独配置,相反,可以使用不需要单独的键/值存储的 memberlist。默认情况下该 Chart 包会为成员列表创建了一个 Headless Service,ingester、distributor、querier 和 ruler 是其中的一部分。

安装 minio

比如我们这里使用 memberlist、boltdb-shipper 和 minio 来作存储,由于这个 Chart 包没有包含 minio,所以需要我们先单独安装 minio:

$ helm repo add minio https://helm.min.io/
$ helm pull minio/minio --untar --version 8.0.10
$ cd minio

创建一个如下所示的 values 文件:

# ci/loki-values.yaml
accessKey: 'myaccessKey'
secretKey: 'mysecretKey'

persistence:
enabled: true
storageClass: 'local-path'
accessMode: ReadWriteOnce
size: 5Gi

service:
type: NodePort
port: 9000
nodePort: 32000

resources:
requests:
memory: 1Gi

直接使用上面配置的 values 文件安装 minio:

$ helm upgrade --install minio -n logging -f ci/loki-values.yaml .
Release "minio" does not exist. Installing it now.
NAME: minio
LAST DEPLOYED: Sun Jun 19 16:56:28 2022
NAMESPACE: logging
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Minio can be accessed via port 9000 on the following DNS name from within your cluster:
minio.logging.svc.cluster.local

To access Minio from localhost, run the below commands:

1. export POD_NAME=$(kubectl get pods --namespace logging -l "release=minio" -o jsonpath="{.items[0].metadata.name}")

2. kubectl port-forward $POD_NAME 9000 --namespace logging

Read more about port forwarding here: http://kubernetes.io/docs/user-guide/kubectl/kubectl_port-forward/

You can now access Minio server on http://localhost:9000. Follow the below steps to connect to Minio server with mc client:

1. Download the Minio mc client - https://docs.minio.io/docs/minio-client-quickstart-guide

2. Get the ACCESS_KEY=$(kubectl get secret minio -o jsonpath="{.data.accesskey}" | base64 --decode) and the SECRET_KEY=$(kubectl get secret minio -o jsonpath="{.data.secretkey}" | base64 --decode)

3. mc alias set minio-local http://localhost:9000 "$ACCESS_KEY" "$SECRET_KEY" --api s3v4

4. mc ls minio-local

Alternately, you can use your browser or the Minio SDK to access the server - https://docs.minio.io/categories/17

安装完成后查看对应的 Pod 状态:

$ kubectl get pods -n logging
NAME READY STATUS RESTARTS AGE
minio-548656f786-gctk9 1/1 Running 0 2m45s
$ kubectl get svc -n logging
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
minio NodePort 10.111.58.196 <none> 9000:32000/TCP 3h16m

可以通过指定的 32000 端口来访问 minio:

minio

然后记得创建一个名为 loki-data 的 bucket。

安装 Loki

现在将我们的对象存储准备好后,接下来我们来安装微服务模式的 Loki,首先创建一个如下所示的 values 文件:

# ci/minio-values.yaml
loki:
structuredConfig:
ingester:
max_transfer_retries: 0
chunk_idle_period: 1h
chunk_target_size: 1536000
max_chunk_age: 1h
storage_config:
aws:
endpoint: minio.logging.svc.cluster.local:9000
insecure: true
bucketnames: loki-data
access_key_id: myaccessKey
secret_access_key: mysecretKey
s3forcepathstyle: true
boltdb_shipper:
shared_store: s3
schema_config:
configs:
- from: 2022-06-21
store: boltdb-shipper
object_store: s3
schema: v12
index:
prefix: loki_index_
period: 24h

distributor:
replicas: 2

ingester:
replicas: 2
persistence:
enabled: true
size: 1Gi
storageClass: local-path

querier:
replicas: 2
persistence:
enabled: true
size: 1Gi
storageClass: local-path

queryFrontend:
replicas: 2

gateway:
nginxConfig:
httpSnippet: |-
client_max_body_size 100M;
serverSnippet: |-
client_max_body_size 100M;

上述配置会选择性地覆盖 loki.config 模板文件中的默认值,使用 loki.structuredConfig 可以在外部设置大多数配置参数。loki.configloki.schemaConfigloki.storageConfig 也可以与 loki.structuredConfig 结合使用。 loki.structuredConfig 中的值优先级更高。

这里我们通过 loki.structuredConfig.storage_config.aws 指定了用于保存数据的 minio 配置,为了高可用,核心的几个组件我们配置了 2 个副本,ingesterquerier 配置了持久化存储。

现在使用上面的 values 文件进行一键安装:

$ helm upgrade --install loki -n logging -f ci/minio-values.yaml .
Release "loki" does not exist. Installing it now.
NAME: loki
LAST DEPLOYED: Tue Jun 21 16:20:10 2022
NAMESPACE: logging
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
***********************************************************************
Welcome to Grafana Loki
Chart version: 0.48.4
Loki version: 2.5.0
***********************************************************************

Installed components:
* gateway
* ingester
* distributor
* querier
* query-frontend

上面会分别安装几个组件:gateway、ingester、distributor、querier、query-frontend,对应的 Pod 状态如下所示:

$ kubectl get pods -n logging
NAME READY STATUS RESTARTS AGE
loki-loki-distributed-distributor-5dfdd5bd78-nxdq8 1/1 Running 0 2m40s
loki-loki-distributed-distributor-5dfdd5bd78-rh4gz 1/1 Running 0 116s
loki-loki-distributed-gateway-6f4cfd898c-hpszv 1/1 Running 0 21m
loki-loki-distributed-ingester-0 1/1 Running 0 96s
loki-loki-distributed-ingester-1 1/1 Running 0 2m38s
loki-loki-distributed-querier-0 1/1 Running 0 2m2s
loki-loki-distributed-querier-1 1/1 Running 0 2m33s
loki-loki-distributed-query-frontend-6d9845cb5b-p4vns 1/1 Running 0 4s
loki-loki-distributed-query-frontend-6d9845cb5b-sq5hr 1/1 Running 0 2m40s
minio-548656f786-gctk9 1/1 Running 1 (123m ago) 47h
$ kubectl get svc -n logging
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
loki-loki-distributed-distributor ClusterIP 10.102.156.127 <none> 3100/TCP,9095/TCP 22m
loki-loki-distributed-gateway ClusterIP 10.111.73.138 <none> 80/TCP 22m
loki-loki-distributed-ingester ClusterIP 10.98.238.236 <none> 3100/TCP,9095/TCP 22m
loki-loki-distributed-ingester-headless ClusterIP None <none> 3100/TCP,9095/TCP 22m
loki-loki-distributed-memberlist ClusterIP None <none> 7946/TCP 22m
loki-loki-distributed-querier ClusterIP 10.101.117.137 <none> 3100/TCP,9095/TCP 22m
loki-loki-distributed-querier-headless ClusterIP None <none> 3100/TCP,9095/TCP 22m
loki-loki-distributed-query-frontend ClusterIP None <none> 3100/TCP,9095/TCP,9096/TCP 22m
minio NodePort 10.111.58.196 <none> 9000:32000/TCP 47h

Loki 对应的配置文件如下所示:

$ kubectl get cm -n logging loki-loki-distributed -o yaml
apiVersion: v1
data:
config.yaml: |
auth_enabled: false
chunk_store_config:
max_look_back_period: 0s
compactor:
shared_store: filesystem
distributor:
ring:
kvstore:
store: memberlist
frontend:
compress_responses: true
log_queries_longer_than: 5s
tail_proxy_url: http://loki-loki-distributed-querier:3100
frontend_worker:
frontend_address: loki-loki-distributed-query-frontend:9095
ingester:
chunk_block_size: 262144
chunk_encoding: snappy
chunk_idle_period: 1h
chunk_retain_period: 1m
chunk_target_size: 1536000
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 1
max_chunk_age: 1h
max_transfer_retries: 0
wal:
dir: /var/loki/wal
limits_config:
enforce_metric_name: false
max_cache_freshness_per_query: 10m
reject_old_samples: true
reject_old_samples_max_age: 168h
split_queries_by_interval: 15m
memberlist:
join_members:
- loki-loki-distributed-memberlist
query_range:
align_queries_with_step: true
cache_results: true
max_retries: 5
results_cache:
cache:
enable_fifocache: true
fifocache:
max_size_items: 1024
validity: 24h
ruler:
alertmanager_url: https://alertmanager.xx
external_url: https://alertmanager.xx
ring:
kvstore:
store: memberlist
rule_path: /tmp/loki/scratch
storage:
local:
directory: /etc/loki/rules
type: local
schema_config:
configs:
- from: "2022-06-21"
index:
period: 24h
prefix: loki_index_
object_store: s3
schema: v12
store: boltdb-shipper
server:
http_listen_port: 3100
storage_config:
aws:
access_key_id: myaccessKey
bucketnames: loki-data
endpoint: minio.logging.svc.cluster.local:9000
insecure: true
s3forcepathstyle: true
secret_access_key: mysecretKey
boltdb_shipper:
active_index_directory: /var/loki/index
cache_location: /var/loki/cache
cache_ttl: 168h
shared_store: s3
filesystem:
directory: /var/loki/chunks
table_manager:
retention_deletes_enabled: false
retention_period: 0s
kind: ConfigMap
# ......

同样其中有一个 gateway 组件会来帮助我们将请求路由到正确的组件中去,该组件同样就是一个 nginx 服务,对应的配置如下所示:

$ kubectl -n logging exec -it loki-loki-distributed-gateway-6f4cfd898c-hpszv -- cat /etc/nginx/nginx.conf
worker_processes 5; ## Default: 1
error_log /dev/stderr;
pid /tmp/nginx.pid;
worker_rlimit_nofile 8192;

events {
worker_connections 4096; ## Default: 1024
}

http {
client_body_temp_path /tmp/client_temp;
proxy_temp_path /tmp/proxy_temp_path;
fastcgi_temp_path /tmp/fastcgi_temp;
uwsgi_temp_path /tmp/uwsgi_temp;
scgi_temp_path /tmp/scgi_temp;

default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] $status '
'"$request" $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /dev/stderr main;

sendfile on;
tcp_nopush on;
resolver kube-dns.kube-system.svc.cluster.local;

client_max_body_size 100M;

server {
listen 8080;

location = / {
return 200 'OK';
auth_basic off;
}

location = /api/prom/push {
proxy_pass http://loki-loki-distributed-distributor.logging.svc.cluster.local:3100$request_uri;
}

location = /api/prom/tail {
proxy_pass http://loki-loki-distributed-querier.logging.svc.cluster.local:3100$request_uri;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}

# Ruler
location ~ /prometheus/api/v1/alerts.* {
proxy_pass http://loki-loki-distributed-ruler.logging.svc.cluster.local:3100$request_uri;
}
location ~ /prometheus/api/v1/rules.* {
proxy_pass http://loki-loki-distributed-ruler.logging.svc.cluster.local:3100$request_uri;
}
location ~ /api/prom/rules.* {
proxy_pass http://loki-loki-distributed-ruler.logging.svc.cluster.local:3100$request_uri;
}
location ~ /api/prom/alerts.* {
proxy_pass http://loki-loki-distributed-ruler.logging.svc.cluster.local:3100$request_uri;
}

location ~ /api/prom/.* {
proxy_pass http://loki-loki-distributed-query-frontend.logging.svc.cluster.local:3100$request_uri;
}

location = /loki/api/v1/push {
proxy_pass http://loki-loki-distributed-distributor.logging.svc.cluster.local:3100$request_uri;
}

location = /loki/api/v1/tail {
proxy_pass http://loki-loki-distributed-querier.logging.svc.cluster.local:3100$request_uri;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}

location ~ /loki/api/.* {
proxy_pass http://loki-loki-distributed-query-frontend.logging.svc.cluster.local:3100$request_uri;
}

client_max_body_size 100M;
}
}

从上面配置可以看出对应的 Push 端点 /api/prom/push/loki/api/v1/push 会转发给 http://loki-loki-distributed-distributor.logging.svc.cluster.local:3100$request_uri;,也就是对应的 distributor 服务:

$ kubectl get pods -n logging -l app.kubernetes.io/component=distributor,app.kubernetes.io/instance=loki,app.kubernetes.io/name=loki-distributed
NAME READY STATUS RESTARTS AGE
loki-loki-distributed-distributor-5dfdd5bd78-nxdq8 1/1 Running 0 8m20s
loki-loki-distributed-distributor-5dfdd5bd78-rh4gz 1/1 Running 0 7m36s

所以如果我们要写入日志数据,自然现在是写入到 gateway 的 Push 端点上去。为了验证应用是否正常,接下来我们再安装 Promtail 和 Grafana 来进行数据的读写。

安装 Promtail

获取 promtail 的 Chart 包并解压:

$ helm pull grafana/promtail --untar
$ cd promtail

创建一个如下所示的 values 文件:

# ci/minio-values.yaml
rbac:
pspEnabled: false
config:
clients:
- url: http://loki-loki-distributed-gateway/loki/api/v1/push

注意我们需要将 Promtail 中配置的 Loki 地址为 http://loki-loki-distributed-gateway/loki/api/v1/push,这样就是 Promtail 将日志数据首先发送到 gateway 上面去,然后 gateway 根据我们的 Endpoints 去转发给 write 节点,使用上面的 values 文件来安装 Promtail:

$ helm upgrade --install promtail -n logging -f ci/minio-values.yaml .
Release "promtail" does not exist. Installing it now.
NAME: promtail
LAST DEPLOYED: Tue Jun 21 16:31:34 2022
NAMESPACE: logging
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
***********************************************************************
Welcome to Grafana Promtail
Chart version: 5.1.0
Promtail version: 2.5.0
***********************************************************************

Verify the application is working by running these commands:

* kubectl --namespace logging port-forward daemonset/promtail 3101
* curl http://127.0.0.1:3101/metrics

正常安装完成后会在每个节点上运行一个 promtail:

$ kubectl get pods -n logging -l app.kubernetes.io/name=promtail
NAME READY STATUS RESTARTS AGE
promtail-gbjzs 1/1 Running 0 38s
promtail-gjn5p 1/1 Running 0 38s
promtail-z6vhd 1/1 Running 0 38s

正常 promtail 就已经在开始采集所在节点上的所有容器日志了,然后将日志数据 Push 给 gateway,gateway 转发给 write 节点,我们可以查看 gateway 的日志:

$ kubectl logs -f loki-loki-distributed-gateway-6f4cfd898c-hpszv -n logging
10.244.2.26 - - [21/Jun/2022:08:41:24 +0000] 204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.244.2.1 - - [21/Jun/2022:08:41:24 +0000] 200 "GET / HTTP/1.1" 2 "-" "kube-probe/1.22" "-"
10.244.2.26 - - [21/Jun/2022:08:41:25 +0000] 204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
10.244.1.28 - - [21/Jun/2022:08:41:26 +0000] 204 "POST /loki/api/v1/push HTTP/1.1" 0 "-" "promtail/2.5.0" "-"
......

可以看到 gateway 现在在一直接接收着 /loki/api/v1/push 的请求,也就是 promtail 发送过来的,正常来说现在日志数据已经分发给 write 节点了,write 节点将数据存储在了 minio 中,可以去查看下 minio 中已经有日志数据了,前面安装的时候为 minio 服务指定了一个 32000 的 NodePort 端口:

chunks bucket

到这里可以看到数据已经可以正常写入了。

安装 Grafana

下面我们来验证下读取路径,安装 Grafana 对接 Loki:

$ helm pull grafana/grafana --untar
$ cd grafana

创建如下所示的 values 配置文件:

# ci/minio-values.yaml
service:
type: NodePort
nodePort: 32001
rbac:
pspEnabled: false
persistence:
enabled: true
storageClassName: local-path
accessModes:
- ReadWriteOnce
size: 1Gi

直接使用上面的 values 文件安装 Grafana:

$ helm upgrade --install grafana -n logging -f ci/minio-values.yaml .
Release "grafana" does not exist. Installing it now.
NAME: grafana
LAST DEPLOYED: Tue Jun 21 16:47:54 2022
NAMESPACE: logging
STATUS: deployed
REVISION: 1
NOTES:
1. Get your 'admin' user password by running:

kubectl get secret --namespace logging grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:

grafana.logging.svc.cluster.local

Get the Grafana URL to visit by running these commands in the same shell:
export NODE_PORT=$(kubectl get --namespace logging -o jsonpath="{.spec.ports[0].nodePort}" services grafana)
export NODE_IP=$(kubectl get nodes --namespace logging -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT


3. Login with the password from step 1 and the username: admin

可以通过上面提示中的命令获取登录密码:

$ kubectl get secret --namespace logging grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

然后使用上面的密码和 admin 用户名登录 Grafana:

grafana login

登录后进入 Grafana 添加一个数据源,这里需要注意要填写 gateway 的地址 http://loki-loki-distributed-gateway

loki datasource

保存数据源后,可以进入 Explore 页面过滤日志,比如我们这里来实时查看 gateway 这个应用的日志,如下图所示:

logs

如果你能看到最新的日志数据那说明我们部署成功了微服务模式的 Loki,这种模式灵活性非常高,可以根据需要对不同的组件做扩缩容,但是运维成本也会增加很多。

缓存

上面虽然我们完成了 Loki 的微服务模式部署,当在海量日志数据的情况下我们可能需要启用缓存来提升查询性能。这里我们使用的 grafana/loki-distributed 这个 Chart 包可以为 Loki 使用的各种缓存配置 Memcached 实例,所有缓存的配置都相同,但是目前该包中的 Memcached 缓存方式还有一些问题,我们这里就以 redis 为例来配置缓存。

首先当然需要一个可用的 redis 服务,直接使用下面的资源清单部署 redis 应用。

# loki-redis.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
namespace: logging
spec:
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:4
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 6379
- name: redis-exporter # 它只会提供监控指标数据,主应用的metrics
image: oliver006/redis_exporter:latest
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 9121
---
kind: Service
apiVersion: v1
metadata:
name: redis
namespace: logging
labels:
app: redis
spec:
selector:
app: redis
ports:
- name: redis
port: 6379
targetPort: 6379
- name: prom
port: 9121
targetPort: 9121
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: redis
namespace: logging
spec:
endpoints:
- port: prom
selector:
matchLabels:
app: redis

部署完成后接下来我们为 Loki 集群添加 redis 缓存,定制如下所示的 values 文件:

# cache-values.yaml
loki:
config: |
auth_enabled: false

server:
http_listen_port: 3100

distributor:
ring:
kvstore:
store: memberlist

memberlist:
join_members:
- {{ include "loki.fullname" . }}-memberlist

ingester:
lifecycler:
ring:
kvstore:
store: memberlist
replication_factor: 1
chunk_idle_period: 30m
chunk_block_size: 262144
chunk_retain_period: 1m
max_transfer_retries: 0
wal:
dir: /var/loki/wal

limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_burst_size_mb: 20
ingestion_rate_mb: 10
split_queries_by_interval: 24h

schema_config:
configs:
- from: 2020-09-07
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: loki_index_
period: 24h

storage_config:
aws:
access_key_id: myaccessKey
bucketnames: loki-data
endpoint: minio.logging.svc.cluster.local:9000
insecure: true
s3forcepathstyle: true
secret_access_key: mysecretKey
boltdb_shipper:
shared_store: filesystem
active_index_directory: /var/loki/index
cache_location: /var/loki/cache
cache_ttl: 168h
filesystem:
directory: /var/loki/chunks
index_queries_cache_config:
redis:
endpoint: redis:6379
expiration: 1h

chunk_store_config:
max_look_back_period: 0
chunk_cache_config:
redis:
endpoint: redis:6379
expiration: 1h
write_dedupe_cache_config:
redis:
endpoint: redis:6379
expiration: 1h

query_range:
cache_results: true
results_cache:
cache:
redis:
endpoint: redis:6379
expiration: 1h

frontend_worker:
frontend_address: {{ include "loki.queryFrontendFullname" . }}:9095

frontend:
log_queries_longer_than: 5s
compress_responses: true

ruler:
storage:
type: local
local:
directory: /etc/loki/rules
ring:
kvstore:
store: memberlist
rule_path: /tmp/loki/scratch
alertmanager_url: http://alertmanager-main.monitoring.svc.cluster.local:9093 # alertmanager的地址
external_url: http://192.168.0.106:31918

distributor:
replicas: 2

ingester: # WAL(replay)
replicas: 2
persistence:
enabled: true
size: 1Gi
storageClass: local-path

querier:
replicas: 2
persistence:
enabled: true
size: 1Gi
storageClass: local-path

queryFrontend:
replicas: 2

gateway: # nginx容器 -> 路由日志写/读的请求
nginxConfig:
httpSnippet: |-
client_max_body_size 100M;
serverSnippet: |-
client_max_body_size 100M;

ruler:
enabled: true
kind: Deployment
replicas: 1
persistence:
enabled: true
size: 1Gi
storageClass: local-path
# -- Directories containing rules files
directories:
tenant_no:
rules1.txt: |
groups:
- name: nginx-rate
rules:
- alert: LokiNginxRate
expr: sum(rate({app="nginx"} |= "error" [1m])) by (job)
/
sum(rate({app="nginx"}[1m])) by (job)
> 0.01
for: 1m
labels:
severity: critical
annotations:
summary: loki nginx rate
description: high request latency

上面的 values 中我们为 Loki 添加了一些缓存配置,核心配置如下所示:

......
query_range:
results_cache:
cache:
redis:
endpoint: redis:6379
expiration: 1h
cache_results: true
storage_config:
index_queries_cache_config:
redis:
endpoint: redis:6379
expiration: 1h
chunk_store_config:
chunk_cache_config:
redis:
endpoint: redis:6379
expiration: 1h
write_dedupe_cache_config:
redis:
endpoint: redis:6379
expiration: 1h
......

直接使用上面的 values 重新覆盖 Loki 即可:

$ helm upgrade --install loki -n logging -f ci/cache-values.yaml .

安装完成后总体的 Pod 如下所示:

$ kubectl get pods -n logging
NAME READY STATUS RESTARTS AGE
grafana-55d8779dc6-mm5f2 1/1 Running 0 70m
loki-loki-distributed-distributor-77f8f99dbb-7snns 1/1 Running 0 38m
loki-loki-distributed-distributor-77f8f99dbb-rsn7l 1/1 Running 0 38m
loki-loki-distributed-gateway-6f4cfd898c-p9xxf 1/1 Running 2 (4h9m ago) 3d1h
loki-loki-distributed-ingester-0 1/1 Running 0 37m
loki-loki-distributed-ingester-1 1/1 Running 0 38m
loki-loki-distributed-querier-0 1/1 Running 0 38m
loki-loki-distributed-querier-1 1/1 Running 0 38m
loki-loki-distributed-query-frontend-78c4ccc99b-4t85w 1/1 Running 0 38m
loki-loki-distributed-query-frontend-78c4ccc99b-cgh8d 1/1 Running 0 38m
loki-loki-distributed-ruler-5654f8cf59-pjb8p 1/1 Running 0 38m
minio-548656f786-mjd4c 1/1 Running 3 (4h10m ago) 6d22h
promtail-ddz27 1/1 Running 1 (4h10m ago) 3d1h
promtail-lzr6v 1/1 Running 1 (4h10m ago) 3d1h
promtail-nldqx 1/1 Running 2 (4h4m ago) 3d1h
redis-7fb8ff6779-m26m6 2/2 Running 0 30m

正常现在 Redis 服务就已经再使用了,下图展示了 redis 的运行状态,可以看出压力也并不大。

redis