当前位置:首页 > 经验总结 > 正文内容

Prometheus套装

撒谎信2年前 (2024-06-04)经验总结1885

node_exporter

下载:(目标机)

https://github.com/prometheus/node_exporter/releases

解压后启动

tar -xvf node*.tar.gz

启动:

nohup /opt/node_exporter/node_exporter --no-collector.softnet > /opt/node_exporter/node_exporter.log 2>&1 &

image.png 

登录链接:http://IP:9100

账号:prometheus

密码:见如下配置

yum install httpd-tools –y
htpasswd -nBC 12 '' | tr -d ':\n'
根据提示输入密码,生成密码后,复制
ps -ef|grep node_exporter


根据配置文件修改

$2y$12$y4PaNc0UM0Jzi07jJf6zcuRFyp2GlH6F5rUKcE.xk3Aug2khcqa7m

vi 配置文件

更改密码为:123456

image.png 

重新加载配置文件

ps -ef grep |grep prometheus
kill -HUP PID

同时更改Prometheus配置文件为密码访问:

  - job_name: 'webserver'

    basic_auth:

      username: prometheus

      password: 123456

    static_configs:

    - targets: ['192.168.179.99:9100','192.168.179.102:9100']

opengauss_exporter

unzip opengauss_exporter_0.0.9_linux_amd64.zip


# ./opengauss_exporter --help

usage: opengauss_exporter [<flags>]

Flags:

  --help                        Show context-sensitive help (also try --help-long and --help-man).

  --version                     Show application version.

  --url=""                      openGauss database target url

  --config=""                   path to config dir or file.

  --constantLabels=""           A list of label=value separated by comma(,).

  --disable-cache               force not using cache

  --auto-discover-databases     Whether to discover the databases on a server dynamically.

  --exclude-databases="template0,template1"  

                                A list of databases to remove when autoDiscoverDatabases is enabled

  --namespace="pg"              prefix of built-in metrics, (og) by default

  --web.listen-address=":9187"  Address to listen on for web interface and telemetry.

  --web.telemetry-path="/metrics"  

                                Path under which to expose metrics.

  --time-to-string              convert database timestamp to date string.

  --dry-run                     dry run and print default configs and user config

  --disable-settings-metrics    Do not include pg_settings metrics.

  --explain                     explain server planned queries

  --parallel=5                  Specify the parallelism. the degree of parallelism is now useful query database thread

  --log.level="info"            Only log messages with the given severity or above. Valid levels: [debug, info, warn, error, fatal]

  --log.format="logger:stderr"  Set the log target and format. Example: "logger:syslog?appname=bob&local=7" or "logger:stdout?json=true"


参数配置:

方式一:直接修改postgresql.conf参数文件

password_encryption_type=1

重新加载数据库

gsql -p $port postgres -r -c “select pg_reload_conf();”

方式二:使用集群管理工具

gs_guc reload -I all -N all -c "password_encryption_type=1"

集群采用方法二:

image.png 

 echo "password_encryption_type=1" >> postgresql.conf

访问控制:

将opengauss_exporter部署服务器的ip地址以md5的加密方式加入白名单;

如果是部署在本地服务器,需要以md5的方式添加在host all all 127.0.0.1/32 trust前面,

否则会有FATAL:Forbid remote connection with trust method! 报错

方式一:直接修改pg_hba.conf文件,不需要加载

host dbname opengauss_exporter x.x.x.x/32 md5

 方式二:使用管理工具

gs_guc reload -I all -N all -h "host all opengauss_exporter 0.0.0.0/32 md5"

image.png

配置数据库用户:

1.0.1版本的数据库,创建用户需要带有sysadmin权限;

1.1.0版本的数据库,创建用户需要带有monadmin权限的,

密码复杂度需要符合数据库密码策略

CREATE USER opengauss_exporter WITH PASSWORD 'opengauss_exporter123' MONADMIN;
grant usage on schema dbe_perf to opengauss_exporter;
grant select on pg_stat_replication to opengauss_exporter;

image.png 

创建数据库:

pw_initdb -D /home/panweidb/data/panweidb --nodename panweidb - w panwei@123 --dbcompatibility=PG
create database ogexporter DBCOMPATIBILITY='PG';--此命令磐维不适用

配置监控主机环境变量

可将以下配置添加到~/.bashrc 文件,也可以在每次执行命令前执行

export DATA_SOURCE_NAME="host=x.x.x.x user=xxx password=xxx port=xxx dbname=xxx sslmode=disable"

 or

export DATA_SOURCE_NAME="postgresql://username:password@hostname:port/dbname?sslmode=disable"

 or

监控多实例

export DATA_SOURCE_NAME="postgresql://username:password@hostname:port/dbname?sslmode=disable,postgresql://username2:password2@hostname2:port2/dbname2?sslmode=disable"
export DATA_SOURCE_NAME="host=10.176.52.195 user=opengauss_exporter password=opengauss_exporter123 port=17700 dbname=test sslmode=disable"

启动

nohup /opt/opengauss_exporter/opengauss_exporter --config="/opt/opengauss_exporter/default_all_20240303.yaml" --log.level=debug --auto-discover-databases --exclude-databases="template0,template1" --web.listen-address=":9187" --parallel=5 >> /opt/opengauss_exporter/opengauss_exporter.log 2>&1 &

image.png 

 

Prometheus

解压

tar -zxvf prometheus-2.51.2.linux-amd64.tar.gz
cd prometheus-2.51.2.linux-amd64

编辑配置文件(缩进要求严格,请按照原有格式进行缩进)

vim prometheus.yml


    # metrics_path defaults to '/metrics'

    # scheme defaults to 'http’.

     static_configs:

    - targets: ['localhost:9090']

   - job_name: 'Panwei'

    static_configs:

     - targets: ["IP:9187"]

       labels:

                 instance: 'panwei_primy'

  - job_name: 'node'

    basic_auth:

      username: prometheus

      password: 123456

    static_configs:

    - targets: ['localhost:9100']

image.png

启动:

nohup /opt/prometheus/prometheus --web.listen-address=:9090 --config.file=/opt/prometheus/prometheus.yml --web.enable-lifecycle > /opt/prometheus/prometheus.log 2>&1 &

--判断是否启动成功

curl http://IP:9090/metrics | wc -l

image.png 

 重新加载配置文件

ps -ef |grep prometheus
kill -HUP PID

Grafana

安装

yum install -y grafana-enterprise-10.4.2-1.x86_64.rpm

创建数据库存储

create user grafanaer login encrypted password 'grafanaer@123';
create database grafana owner grafanaer;


修改postgresql的pg_hba.cnf文件
vim /data/pg_data/pg_hba.conf

host    all   all    0.0.0.0/32         trust

---修改配置文件

vim /etc/grafana/grafana.ini

type = postgres

host = IP:17700

name = grafana

user = grafanaer

password = grafanaer@123

autoMigrateOldPanels = true

 启动:

systemctl daemon-reload
systemctl enable grafana-server.service
systemctl start grafana-server
systemctl status grafana-server

打开web界面

http://xxx.xxx.xxx:3000

添加数据源

image.png

 测试添加完成后,点击Dashboards

 选择Prometheus 2.0 Stats,点击import

image.png 

导入成功

再导入opengauss的Dashboards

  image.png

  

Grafana报错处理:

this panel requires angluar (deprecated)

image.png 

查找配置文件目录:

image.png 

cd /etc/grafana/
cp grafana.ini grafana.ini.bak
vi grafana.ini

末尾添加:

autoMigrateOldPanels = true

重启服务

systemctl disable grafana-server.service
systemctl stop grafana-server


systemctl daemon-reload
systemctl enable grafana-server.service
systemctl start grafana-server

报错网页重定向过多:df -h 查看磁盘是否占满

清理磁盘,重启服务

image.png 

alertmanager安装部署

下载解压

https://github.com/prometheus/alertmanager/releases

 

这里以0.27.0为例

https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz

tar -zxvf alertmanager-0.27.0.linux-amd64.tar.gz -C /opt/
mv * /opt/alertmanager/alertmanager-0.27.0.linux-amd64/ /opt/alertmanager
cd /opt/alertmanager

启动

nohup /opt/alertmanager/alertmanager --config.file=/opt/alertmanager/alertmanager.yml > /opt/alertmanager/alertmanager.log 2>&1 &
nohup /app/alertmanager/alertmanager --web.listen-address=":9494" --config.file=/app/alertmanager/alertmanager.yml > /app/alertmanager/alertmanager.log 2>&1 &

指定端口

./alertmanager --config.file=alertmanager.yml --web.listen-address=":9093"

web查看

直接在浏览器输入:http://IP:9093,如果能打开界面,说明alertmanager配置成功

添加到prometheus

alertmanager 的默认端口是9093,需要将alertmanager添加到prometheus里统一管理

--编辑prometheus配置文件

vi /opt/prometheus.yaml

image.png 

# Alertmanager configuration

alerting:

  alertmanagers:

    - static_configs:

        - targets:

          - IP:9093

 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

   - "rules/*.yml"

  # - "first_rules.yml"

  # - "second_rules.yml"

 

添加规则示例

mkdir /opt/prometheus/rules  --(相对prometheus路径)
vim node_rules.yml

groups:

    - name: node_rule

      rules:

 

      - alert: server status

        expr: up == 0

        for: 10s

        labels:

          severity: critical

          service: node

        annotations:

          summary: "{{$labels.instance}}: instance down"

          description: "{{$labels.instance}}: instance down"

 

      - alert: Disk Usage

        expr: 100-(node_filesystem_free_bytes{fstype=~"ext4|xfs"}/node_filesystem_size_bytes {fstype=~"ext4|xfs"}*100) > 50

        for: 1m

        labels:

          severity: warning

          type: "service"

          service: node

          oid: "1.3.6.1.4.1.98789.0.1"

        annotations:

          summary: "Disk used too high"

          description: "Service {{ $labels.instance}} : {{$value}}%)"

重新加载配置文件

ps -ef |grep prometheus
kill -HUP PID

测试:

修改告警规则:测试告警是否正常

修改磁盘超过5%就告警

出现这个说明告警正常

告警配置

下载snmp_notifier

https://github.com/maxwo/snmp_notifier/releases/download/v1.5.0/snmp_notifier-1.5.0.linux-amd64.tar.gz

解压启动snmp_notifier

tar -zxvf snmp_notifier-1.5.0.linux-amd64.tar.gz -C /opt
mv /opt/snmp_notifier-1.5.0.linux-amd64 /opt/snmp_notifier
cd /opt/snmp_notifier
nohup /opt/snmp_notifier/snmp_notifier > /opt/snmp_notifier/snmp_notifier.log 2>&1 &
nohup /app/snmp_notifier/snmp_notifier > /app/snmp_notifier/snmp_notifier.log 2>&1 &
netstat -nap |grep -i 9464

tcp6       0      0 :::9464                 :::*                    LISTEN      14502/snmp_notifier

访问:

http://IP:9464

image.png 

修改alertmanager.yml配置文件

image.png 

route:

  group_by: ['alertname']

  group_wait: 30s

  group_interval: 5m

  repeat_interval: 1h

  receiver: 'snap_notifier'

receivers:

  - name: 'snap_notifier'

    webhook_configs:

    - send_resolved: true

      - url: 'http://IP:9464/alerts'

inhibit_rules:

  - source_match:

      severity: 'critical'

    target_match:

      severity: 'warning'

    equal: ['alertname', 'dev', 'instance']

本地启动snaptrapd接受报警信息

snmptrapd -m ALL -f -Of -Lo -c /opt/software/snmp_notifier-1.2.1/scripts/snmptrapd.conf

邮件告警

仅需要修改alertmanager.yml文件即可

 global:

  resolve_timeout: 5m

  smtp_from: 'nair@xxx.cn'

  smtp_smarthost: 'smtp.xxx.cn:587'

  smtp_auth_username: 'nair@xxx.cn'

  smtp_auth_password: 'xxxxxxx'

  smtp_hello: 'xxx.cn'

 

route:

  group_by: ['alertname']

  group_wait: 5s

  group_interval: 5s

  repeat_interval: 1m

  receiver: 'email'

  routes:

  - receiver: email

    group_wait: 10s

    group_interval: 20s

    repeat_interval: 30s

    match_re:

      service: mysql|redis|postgres|node

      severity: critical

 

  - receiver: email

    group_wait: 1m

    group_interval: 1m

    repeat_interval: 1m

    match_re:

      service: mysql|redis|postgres|node

      severity: error

 

  - receiver: email

    group_wait: 1h

    group_interval: 1h

    repeat_interval: 1h

    match_re:

      service: mysql|redis|postgres|node

      severity: warning

 

receivers:

- name: 'email'

  email_configs:

  - to: 'xxx@xxxemail.cn'

    send_resolved: true

inhibit_rules:

  - source_match:

      severity: 'critical'

    target_match:

      severity: 'warning'

    equal: ['alertname', 'dev', 'instance']

 

 

扫描二维码推送至手机访问。

版权声明:本文由撒谎信发布,如需转载请注明出处。

本文链接:https://www.yangwuyu.com/?id=4

分享给朋友:

“Prometheus套装” 的相关文章

Oracle-rman异机恢复

Oracle-rman异机恢复

rman异机恢复环境准备主机ip操作系统版本数据库版本数据库状态源端19c192.168.174.130RHEL7.6Oracle19c实例正常运行目标端test192.168.174.140RHEL7.6Oracle19c仅安装数据库软件 源端创建文件夹用来存储备份文件su -...

opengauss常用命令

opengauss常用命令

======库外操作======--检查端口netstat -anp | grep LISTEN | grep 端口 netstat -tuln | grep 端口 nmap ...

GoldenDB告警脚本部署及基本使用

GoldenDB告警脚本部署及基本使用

GoldenDB-Sloth-告警部署和基础应用目 录1、 上传工具包(1) 将alarm告警工具压缩包上传至insight服务器中2、 复制文件3、 编辑配置文件(1) 编辑告警发送格式(2) 编辑告警发送渠道send.xml(3) 编辑告警过滤器filters.xml4、 启动告警检查...

MySQL系列-join的用法

MySQL系列-join的用法

导言        在数据库操作中,JOIN是实现多表关联查询的核心机制。相较于嵌套子查询和IN操作符,合理使用JOIN能显著提升查询性能,优化执行计划,并增强SQL代码的可读性与可维护性。本文将系统解析MySQL中的J...

发表评论

访客

◎欢迎参与讨论,请在这里发表您的看法和观点。