Prometheus监控的4个黄金指标及示例-asiasports365-asiasports365-365bet线上网址-365体育中国

在监控系统设计中，Google 的四个黄金指标（Four Golden Signals）是衡量系统健康状态的关键指标。Prometheus 作为一款强大的监控工具，可以很好地支持这些指标的采集和展示。以下是这四个黄金指标的详细介绍，以及如何在 Prometheus 中实现这些指标的监控。

1. 四个黄金指标概述

1.1 延迟（Latency）

定义：请求处理所需的时间。

重要性：高延迟可能意味着系统性能下降或资源不足。

示例：HTTP 请求的响应时间。

1.2 流量（Traffic）

定义：系统的请求量或负载。

重要性：流量变化可以反映系统的使用情况，帮助识别异常或峰值。

示例：HTTP 请求的 QPS（每秒查询数）。

1.3 错误率（Errors）

定义：请求失败的比例。

重要性：高错误率可能意味着系统存在故障或配置问题。

示例：HTTP 5xx 错误的比例。

1.4 饱和度（Saturation）

定义：系统资源的使用程度。

重要性：高饱和度可能导致性能下降或系统崩溃。

示例：CPU 使用率、内存使用率。

2. 在 Prometheus 中实现四个黄金指标

以下是如何在 Prometheus 中采集和展示这四个黄金指标的示例。

2.1 延迟（Latency）

指标：HTTP 请求的响应时间。

采集方式：使用 Prometheus 的 histogram 或 summary 类型。

示例：

# Prometheus 配置文件中定义 histogram

- job_name: 'my_app'

static_configs:

- targets: ['localhost:8080']

# 在应用中暴露指标

from prometheus_client import start_http_server, Histogram

import time

REQUEST_TIME = Histogram('http_request_duration_seconds', 'HTTP request latency', ['method', 'endpoint'])

@REQUEST_TIME.time()

def handle_request():

time.sleep(0.1) # 模拟请求处理时间

if __name__ == '__main__':

start_http_server(8000)

while True:

handle_request()

PromQL 查询：

# 计算 99% 的请求延迟

histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[1m])) by (le))

2.2 流量（Traffic）

指标：HTTP 请求的 QPS。

采集方式：使用 Prometheus 的 counter 类型。

示例：

# 在应用中暴露指标

from prometheus_client import start_http_server, Counter

REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint'])

def handle_request():

REQUEST_COUNT.labels(method='GET', endpoint='/').inc()

if __name__ == '__main__':

start_http_server(8000)

while True:

handle_request()

PromQL 查询：

# 计算每秒请求量

rate(http_requests_total[1m])

2.3 错误率（Errors）

指标：HTTP 5xx 错误的比例。

采集方式：使用 Prometheus 的 counter 类型。

示例：

# 在应用中暴露指标

from prometheus_client import start_http_server, Counter

ERROR_COUNT = Counter('http_errors_total', 'Total HTTP errors', ['status_code'])

def handle_request():

ERROR_COUNT.labels(status_code='500').inc()

if __name__ == '__main__':

start_http_server(8000)

while True:

handle_request()

PromQL 查询：

# 计算错误率

sum(rate(http_errors_total{status_code=~"5.."}[1m])) / sum(rate(http_requests_total[1m]))

2.4 饱和度（Saturation）

指标：CPU 使用率、内存使用率。

采集方式：使用 Node Exporter 暴露系统资源指标。

示例：

# Prometheus 配置文件中定义 Node Exporter

- job_name: 'node_exporter'

static_configs:

- targets: ['localhost:9100']

PromQL 查询：

# CPU 使用率

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)

# 内存使用率

(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100

3. 综合示例

以下是一个完整的 Prometheus 配置文件示例，用于监控四个黄金指标：

global:

scrape_interval: 15s

scrape_configs:

- job_name: 'my_app'

static_configs:

- targets: ['localhost:8080']

- job_name: 'node_exporter'

static_configs:

- targets: ['localhost:9100']

4. 总结

通过监控四个黄金指标（延迟、流量、错误率、饱和度），可以全面了解系统的健康状态和性能表现。Prometheus 提供了强大的工具和灵活的查询语言（PromQL），能够轻松实现这些指标的采集和展示。结合 Grafana 等可视化工具，可以进一步优化监控体验，帮助快速发现和解决问题。

Prometheus监控的4个黄金指标及示例

相关推荐

山寨电脑为什么无法连上wifi显示已停用

刺鲀的“铠甲”背后藏着什么秘密？刺鲀浑身的刺由什么演变而成？

虎贲题库怎么样？虎奔等考app好用吗？

合作伙伴