安装组件
- 监控linux主机要用到
node-exporter
组件,在被监控节点安装,执行如下;
# download: https://prometheus.io/download/
mkdir -p /usr/local/prometheus/node-exporter
useradd -s /sbin/nologin -M prometheus
pkg=node_exporter-1.7.0.linux-amd64.tar.gz
tar xf ${pkg} -C /usr/local/prometheus/node-exporter --strip-components=1
chown prometheus.prometheus -R /usr/local/prometheus
- 创建启动脚本;
cat > /usr/lib/systemd/system/node_exporter.service <<eof
[Unit]
Description=node_export
Documentation=https://prometheus.io/
After=network.target
[Service]
ExecStart=/usr/local/prometheus/node-exporter/node_exporter --web.listen-address=:9100
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
[Install]
WantedBy=multi-user.target
eof
- 启动进程;
systemctl daemon-reload && systemctl enable node_exporter --now
systemctl status node_exporter
添加监控
- 在server端添加监控配置;
scrape_configs:
...
- job_name: "project-01"
static_configs:
- targets:
- 10.0.0.68:9100
labels:
uat: webserver
- 重新加载配置;
./promtool check config /usr/local/prometheus/prometheus.yml
curl -X POST http://10.0.0.67:9090/-/reload
配置告警
- • 在server节点添加linux主机告警规则配置;
mkdir rules.d/linux-alert.yml
--
groups:
- name: node-exporter
rules:
- alert: 内存不足
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
for: 2m
labels:
severity: warning
annotations:
summary: "当前实例:{{ $labels.instance }}主机内存不足"
description: "内存可用率<10,当前值: {{ $value }}"
- alert: 系统盘空间不足
expr: (100 - (node_filesystem_free_bytes{device!="tmpfs"}/node_filesystem_size_bytes{device!="tmpfs"}) * 100) > 85
for: 2m
labels:
severity: warning
annotations:
summary: "当前实例:{{ $labels.instance }} 系统盘空间不足"
description: "剩余空间<15%, 当前值: {{ $value }}"
- alert: 机器宕机
expr: up == 0
for: 30s
labels:
severity: warning
annotations:
summary: "当前实例:{{ $labels.instance }}离线,无法通信"
description: "主机疑似关机状态, 当前值: {{ $value }}"
- alert: cpu高使用率
expr: (1 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[2m])))) * 100 > 80
for: 2m
labels:
severity: warning
annotations:
summary: "当前实例:{{ $labels.instance }}cpu负载高"
description: "cpu使用率>80%, 当前值: {{ $value }}"
- • 引用告警配置;
vim prometheus.yml
--
rule_files:
- rules.d/*.yml
- • 重新加载;
./promtool check config /usr/local/prometheus/prometheus.yml
curl -X POST http://10.0.0.67:9090/-/reload
声明:
本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。