0% found this document useful (0 votes)

51 views6 pages

Ultimate Monitoring Project

The document outlines the steps to set up a DevOps monitoring project using Node Exporter, Prometheus, Alertmanager, and Blackbox Exporter across two virtual machines. It includes prerequisites, download and extraction instructions, service start commands, and configuration details for Prometheus and Alertmanager. Additionally, it provides alert rules for monitoring system performance and email notification settings for alert management.

Uploaded by

faizan soudagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views6 pages

Ultimate Monitoring Project

Uploaded by

faizan soudagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

DevOps Shack

Ultimate DevOps Monitoring Project

Click Here To Enrol To Batch-5 | DevOps & Cloud DevOps

Prerequisites

• Ensure you have wget and tar installed on both VMs.

• Ensure you have appropriate permissions to download, extract, and run these
binaries.
• Replace <version> with the appropriate version number you wish to download.

VM-1 (Node Exporter)

1. Download Node Exporter

wget
https://github.com/prometheus/node_exporter/releases/download/v1.8.1/
node_exporter-1.8.1.linux-amd64.tar.gz

2. Extract Node Exporter

tar xvfz node_exporter-1.8.1.linux-amd64.tar.gz

3. Start Node Exporter

4. cd node_exporter-1.8.1.linux-amd64
./node_exporter &

VM-2 (Prometheus, Alertmanager, Blackbox Exporter)

Prometheus

1. Download Prometheus
wget
https://github.com/prometheus/prometheus/releases/download/v2.52.0/pr
ometheus-2.52.0.linux-amd64.tar.gz

2. Extract Prometheus
tar xvfz prometheus-2.52.0.linux-amd64.tar.gz

3. Start Prometheus
4. cd prometheus-2.52.0.linux-amd64
./prometheus --config.file=prometheus.yml &

Alertmanager

1. Download Alertmanager
wget
https://github.com/prometheus/alertmanager/releases/download/v0.27.0/
alertmanager-0.27.0.linux-amd64.tar.gz

2. Extract Alertmanager
tar xvfz alertmanager-0.27.0.linux-amd64.tar.gz

3. Start Alertmanager
4. cd alertmanager-0.27.0.linux-amd64
./alertmanager --config.file=alertmanager.yml &

Blackbox Exporter

1. Download Blackbox Exporter

wget
https://github.com/prometheus/blackbox_exporter/releases/download/v0.
25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz

2. Extract Blackbox Exporter

tar xvfz blackbox_exporter-0.25.0.linux-amd64.tar.gz

3. Start Blackbox Exporter

4. cd blackbox_exporter-0.25.0.linux-amd64
./blackbox_exporter &

Notes:
• The & at the end of each command ensures the process runs in the background.
• Ensure that you have configured
the prometheus.yml and alertmanager.yml configuration files correctly before
starting the services.
• Adjust the firewall and security settings to allow the necessary ports (typically 9090
for Prometheus, 9093 for Alertmanager, 9115 for Blackbox Exporter, and 9100 for
Node Exporter) to be accessible.

Prometheus and Alertmanager

Configuration
Prometheus Configuration (prometheus.yml)

Global Configuration
global:
scrape_interval: 15s # Set the scrape interval to every 15
seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds.
The default is every 1 minute.
# scrape_timeout is set to the global default (10s).

Alertmanager Configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 'localhost:9093' # Alertmanager endpoint

Rule Files
rule_files:
- "alert_rules.yml" # Path to alert rules file
# - "second_rules.yml" # Additional rule files can be added
here

Scrape Configuration

Prometheus Itself
scrape_configs:
- job_name: "prometheus" # Job name for Prometheus

# metrics_path defaults to '/metrics'

# scheme defaults to 'http'.

static_configs:
- targets: ["localhost:9090"] # Target to scrape (Prometheus
itself)

Node Exporter
- job_name: "node_exporter" # Job name for node exporter

# metrics_path defaults to '/metrics'

# scheme defaults to 'http'.

static_configs:
- targets: ["3.110.195.114:9100"] # Target node exporter endpoint

Blackbox Exporter
- job_name: 'blackbox' # Job name for blackbox exporter
metrics_path: /probe # Path for blackbox probe
params:
module: [http_2xx] # Module to look for HTTP 200
response
static_configs:
- targets:
- http://prometheus.io # HTTP target
- https://prometheus.io # HTTPS target
- http://3.110.195.114:8080/ # HTTP target with port 8080
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 13.235.248.225:9115 # Blackbox exporter address

Alert Rules Configuration (alert_rules.yml)

Alert Rules Group

groups:
- name: alert_rules # Name of the alert rules group
rules:
- alert: InstanceDown
expr: up == 0 # Expression to detect instance down
for: 1m
labels:
severity: "critical"
annotations:
summary: "Endpoint {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has
been down for more than 1 minute."

- alert: WebsiteDown
expr: probe_success == 0 # Expression to detect website down
for: 1m
labels:
severity: critical
annotations:
description: The website at {{ $labels.instance }} is down.
summary: Website down

- alert: HostOutOfMemory
expr: node_memory_MemAvailable / node_memory_MemTotal * 100 < 25 #
Expression to detect low memory
for: 5m
labels:
severity: warning
annotations:
summary: "Host out of memory (instance {{ $labels.instance }})"
description: "Node memory is filling up (< 25% left)\n VALUE = {{
$value }}\n LABELS: {{ $labels }}"

- alert: HostOutOfDiskSpace
expr: (node_filesystem_avail{mountpoint="/"} * 100) /
node_filesystem_size{mountpoint="/"} < 50 # Expression to detect low disk
space
for: 1s
labels:
severity: warning
annotations:
summary: "Host out of disk space (instance {{ $labels.instance }})"
description: "Disk is almost full (< 50% left)\n VALUE = {{ $value
}}\n LABELS: {{ $labels }}"

- alert: HostHighCpuLoad
expr: (sum by (instance)
(irate(node_cpu{job="node_exporter_metrics",mode="idle"}[5m]))) > 80 #
Expression to detect high CPU load
for: 5m
labels:
severity: warning
annotations:
summary: "Host high CPU load (instance {{ $labels.instance }})"
description: "CPU load is > 80%\n VALUE = {{ $value }}\n LABELS:
{{ $labels }}"

- alert: ServiceUnavailable
expr: up{job="node_exporter"} == 0 # Expression to detect service
unavailability
for: 2m
labels:
severity: critical
annotations:
summary: "Service Unavailable (instance {{ $labels.instance }})"
description: "The service {{ $labels.job }} is not available\n
VALUE = {{ $value }}\n LABELS: {{ $labels }}"

- alert: HighMemoryUsage
expr: (node_memory_Active / node_memory_MemTotal) * 100 > 90 #
Expression to detect high memory usage
for: 10m
labels:
severity: critical
annotations:
summary: "High Memory Usage (instance {{ $labels.instance }})"
description: "Memory usage is > 90%\n VALUE = {{ $value }}\n
LABELS: {{ $labels }}"
- alert: FileSystemFull
expr: (node_filesystem_avail / node_filesystem_size) * 100 < 10 #
Expression to detect file system almost full
for: 5m
labels:
severity: critical
annotations:
summary: "File System Almost Full (instance {{ $labels.instance
}})"
description: "File system has < 10% free space\n VALUE = {{ $value
}}\n LABELS: {{ $labels }}"

Alertmanager Configuration (alertmanager.yml)

Routing Configuration
route:
group_by: ['alertname'] # Group by alert name
group_wait: 30s # Wait time before sending the first
notification
group_interval: 5m # Interval between notifications
repeat_interval: 1h # Interval to resend notifications
receiver: 'email-notifications' # Default receiver

receivers:
- name: 'email-notifications' # Receiver name
email_configs:
- to: jaiswaladi246@gmail.com # Email recipient
from: test@gmail.com # Email sender
smarthost: smtp.gmail.com:587 # SMTP server
auth_username: your_email # SMTP auth username
auth_identity: your_email # SMTP auth identity
auth_password: "bdmq omqh vvkk zoqx" # SMTP auth password
send_resolved: true # Send notifications for resolved
alerts

Inhibition Rules
inhibit_rules:
- source_match:
severity: 'critical' # Source alert severity
target_match:
severity: 'warning' # Target alert severity
equal: ['alertname', 'dev', 'instance'] # Fields to match

Mastering Core DevOps Scenarios
No ratings yet
Mastering Core DevOps Scenarios
15 pages
Disaster Recovery Into The CICD Pipeline
No ratings yet
Disaster Recovery Into The CICD Pipeline
11 pages
Continuous Testing in DevOps
No ratings yet
Continuous Testing in DevOps
8 pages
DevOps CheatSheets Commands
No ratings yet
DevOps CheatSheets Commands
17 pages
?????? ?????????
No ratings yet
?????? ?????????
37 pages
Kuber Net Es
No ratings yet
Kuber Net Es
40 pages
AWS DevOps Troubleshooting Guide
No ratings yet
AWS DevOps Troubleshooting Guide
47 pages
AI-Ops Introduction
No ratings yet
AI-Ops Introduction
9 pages
Null Resource & Dynamic Block
No ratings yet
Null Resource & Dynamic Block
4 pages
250 DevOps Interview Questions With Detailed Answers 1738168764
No ratings yet
250 DevOps Interview Questions With Detailed Answers 1738168764
67 pages
Scaling K8s
No ratings yet
Scaling K8s
9 pages
Jenkins
No ratings yet
Jenkins
35 pages
DevOps Shack - Jenkins Pipeline Issues and Solutions
No ratings yet
DevOps Shack - Jenkins Pipeline Issues and Solutions
32 pages
Cloud DevOps Vs On Premises DevOps Setup
No ratings yet
Cloud DevOps Vs On Premises DevOps Setup
9 pages
DevOps Notes
No ratings yet
DevOps Notes
4 pages
DevOps Shack Pipeline Stages
No ratings yet
DevOps Shack Pipeline Stages
9 pages
Real-Time DevOps WorkFlow by DevOps Shack
No ratings yet
Real-Time DevOps WorkFlow by DevOps Shack
6 pages
Devops Shack Azure DevOps Pipeline
No ratings yet
Devops Shack Azure DevOps Pipeline
11 pages
DevOps Shack Azure DevOps Errors Solutions and RCA 1737571627
No ratings yet
DevOps Shack Azure DevOps Errors Solutions and RCA 1737571627
22 pages
50 Kubernetes Tips & Useful Tricks With Usecases Part-1,2,3
No ratings yet
50 Kubernetes Tips & Useful Tricks With Usecases Part-1,2,3
10 pages
Shshs
No ratings yet
Shshs
33 pages
Managing Secrets in DevOps Workflows
No ratings yet
Managing Secrets in DevOps Workflows
8 pages
Jenkins DevOps Q&A: Setup, Pipelines, Security
No ratings yet
Jenkins DevOps Q&A: Setup, Pipelines, Security
36 pages
Docker Interview1
No ratings yet
Docker Interview1
7 pages
Devops Shack: Creating & Managing Aws Resources With Python
No ratings yet
Devops Shack: Creating & Managing Aws Resources With Python
13 pages
100 Azure DevOps Interview Questions by DevOps Shack
No ratings yet
100 Azure DevOps Interview Questions by DevOps Shack
35 pages
DevOps Corporate Workflow
No ratings yet
DevOps Corporate Workflow
3 pages
100 Kubernetes Errors With Solution in Detail
No ratings yet
100 Kubernetes Errors With Solution in Detail
30 pages
Azure Devops
No ratings yet
Azure Devops
55 pages
DevOps Guide For Python
No ratings yet
DevOps Guide For Python
32 pages
DevOps in Multi-Cloud Environments
No ratings yet
DevOps in Multi-Cloud Environments
15 pages
SR Cloud Engineer
No ratings yet
SR Cloud Engineer
6 pages
Real World Ansible Scenarios 1744640671
No ratings yet
Real World Ansible Scenarios 1744640671
32 pages
Docker Interviw Questions
No ratings yet
Docker Interviw Questions
49 pages
DevOps Shack - 500 Essential DevOps Commands
No ratings yet
DevOps Shack - 500 Essential DevOps Commands
47 pages
166 Datasources in Grafana
No ratings yet
166 Datasources in Grafana
59 pages
200 Maven, NPM Interview Questions and Answers
No ratings yet
200 Maven, NPM Interview Questions and Answers
76 pages
DevOps Onboarding Blueprint 6 Months Success Plan
No ratings yet
DevOps Onboarding Blueprint 6 Months Success Plan
46 pages
DevOps Shack Fundamental Kubernetes A Practical Helpbook 1747552824
No ratings yet
DevOps Shack Fundamental Kubernetes A Practical Helpbook 1747552824
15 pages
AWS Production
No ratings yet
AWS Production
9 pages
AWS Solutation Architech Job Roles and Responsiblity
No ratings yet
AWS Solutation Architech Job Roles and Responsiblity
9 pages
Building Reusable Terraform Infrastructure
No ratings yet
Building Reusable Terraform Infrastructure
37 pages
DevOps Technical Controls Guide
No ratings yet
DevOps Technical Controls Guide
25 pages
DevOps Shack - Mastering Multi-Stage Docker Builds
No ratings yet
DevOps Shack - Mastering Multi-Stage Docker Builds
36 pages
Maxcell Ayim Resume 032023
No ratings yet
Maxcell Ayim Resume 032023
5 pages
Step-By-Step Guide - Build Terraform Modules - DevOps Shack
No ratings yet
Step-By-Step Guide - Build Terraform Modules - DevOps Shack
46 pages
Clone Steps Cold
No ratings yet
Clone Steps Cold
3 pages
Corporate DevOps Workbook 1740200469
No ratings yet
Corporate DevOps Workbook 1740200469
16 pages
5 Steps To Monitor and Optimize DevOps CICD Pipeline
No ratings yet
5 Steps To Monitor and Optimize DevOps CICD Pipeline
30 pages
Dockar Interview
No ratings yet
Dockar Interview
28 pages
Sudo Apt-Get Install Docker-Ce 17.12.0 cd-0 Ubuntu
No ratings yet
Sudo Apt-Get Install Docker-Ce 17.12.0 cd-0 Ubuntu
5 pages
Zero Downtime Deployment With Deployment Strategies 1744584483
No ratings yet
Zero Downtime Deployment With Deployment Strategies 1744584483
38 pages
Azure DevOps
No ratings yet
Azure DevOps
11 pages
Devops Shack: Advanced Jenkins Configuration: Tips and Tricks
No ratings yet
Devops Shack: Advanced Jenkins Configuration: Tips and Tricks
7 pages
Anupama AWS Cloud Engneer Architect
No ratings yet
Anupama AWS Cloud Engneer Architect
5 pages
100 Linux Errors Solutions by DevOps Shack 1742299688
No ratings yet
100 Linux Errors Solutions by DevOps Shack 1742299688
24 pages
Kubernetes Common Errors & Troubleshooting
No ratings yet
Kubernetes Common Errors & Troubleshooting
10 pages
Devops Shack: Top 200 Most Asked Kubernetes Commands For Maang/Faang Devops & Sre Interviews
No ratings yet
Devops Shack: Top 200 Most Asked Kubernetes Commands For Maang/Faang Devops & Sre Interviews
23 pages
Content Terraform
No ratings yet
Content Terraform
5 pages
DevOps Shack Ultimate Monitoring Project
No ratings yet
DevOps Shack Ultimate Monitoring Project
7 pages
Ajay Devops Resume
No ratings yet
Ajay Devops Resume
4 pages
5 Best Cost Optimization Techniques in DevOps
No ratings yet
5 Best Cost Optimization Techniques in DevOps
5 pages
Guide To Orchestrating and Deploying Containers
100% (1)
Guide To Orchestrating and Deploying Containers
49 pages
Obserbeablity
No ratings yet
Obserbeablity
30 pages
Devops Shack Kubernetes Scenario-Based Interview Questions
No ratings yet
Devops Shack Kubernetes Scenario-Based Interview Questions
11 pages
DevOps Tools: Git & Chef Guide
No ratings yet
DevOps Tools: Git & Chef Guide
55 pages
Trend Report Kubernetes in The Enterprise
No ratings yet
Trend Report Kubernetes in The Enterprise
33 pages
Automatic Flight Control System (Afcs)
No ratings yet
Automatic Flight Control System (Afcs)
8 pages
MATLAB ODE and PDE Solutions
No ratings yet
MATLAB ODE and PDE Solutions
9 pages
Rambabu Vasupilli It Devops Specialist
No ratings yet
Rambabu Vasupilli It Devops Specialist
6 pages
HW2 Solution
No ratings yet
HW2 Solution
4 pages
Python Crash Course
No ratings yet
Python Crash Course
18 pages
Kodak Pandora 9.0 Software Guide
No ratings yet
Kodak Pandora 9.0 Software Guide
191 pages
Interface BBGate Changelist
No ratings yet
Interface BBGate Changelist
7 pages
Java Practical
No ratings yet
Java Practical
7 pages
B.Voc IoT Cryptography Assignments
No ratings yet
B.Voc IoT Cryptography Assignments
15 pages
A Practitioner's Guide To Software Test Design
100% (1)
A Practitioner's Guide To Software Test Design
238 pages
Questions:: Operating Systems MCQ Questions 02
100% (1)
Questions:: Operating Systems MCQ Questions 02
18 pages
AD Module 0 ADDI Introduction
No ratings yet
AD Module 0 ADDI Introduction
29 pages
Jaltest Copy Ecu Cummins x15
100% (1)
Jaltest Copy Ecu Cummins x15
10 pages
Intro Got Design
No ratings yet
Intro Got Design
10 pages
Security Patches Iec
No ratings yet
Security Patches Iec
543 pages
Ren'Py Initialization Log
No ratings yet
Ren'Py Initialization Log
2 pages
Restful Api
No ratings yet
Restful Api
27 pages
Cs2304 - System Software (SS) Question Bank Two Mark Question & Answers
No ratings yet
Cs2304 - System Software (SS) Question Bank Two Mark Question & Answers
18 pages
Set Chassis Cluster Disable Reboot
No ratings yet
Set Chassis Cluster Disable Reboot
3 pages
Experiment No 2
No ratings yet
Experiment No 2
9 pages
Big Data Analytics (Unit-II)
No ratings yet
Big Data Analytics (Unit-II)
17 pages
Intelligent Service Provider With Location Access: Alochana Chakra Journal ISSN NO:2231-3990
No ratings yet
Intelligent Service Provider With Location Access: Alochana Chakra Journal ISSN NO:2231-3990
7 pages
Java MongoDB Interface & Analysis
No ratings yet
Java MongoDB Interface & Analysis
39 pages
Unit Ii
No ratings yet
Unit Ii
7 pages
BDC
No ratings yet
BDC
5 pages
Machine Learning - AL3451 - Notes - Unit 5 - Design and Analysis of Machine Learning Experiments
No ratings yet
Machine Learning - AL3451 - Notes - Unit 5 - Design and Analysis of Machine Learning Experiments
33 pages
Asterisk Service Launch Improved Future Ready Session Border Controller (SBC) For VoIP Network
No ratings yet
Asterisk Service Launch Improved Future Ready Session Border Controller (SBC) For VoIP Network
2 pages
ABIA
0% (1)
ABIA
2 pages
Scalable LLM Deployment Architecture and Design
No ratings yet
Scalable LLM Deployment Architecture and Design
10 pages
2/A.P1 Explain How The Features of A Relational Database Are Used For Database Management
No ratings yet
2/A.P1 Explain How The Features of A Relational Database Are Used For Database Management
8 pages
Kcs074 Cryptography and Network Security
No ratings yet
Kcs074 Cryptography and Network Security
2 pages
Solving The Permutation Flow Shop Problem With Makespan Criterion Using Grids
No ratings yet
Solving The Permutation Flow Shop Problem With Makespan Criterion Using Grids
12 pages
UNIT - I Notes
No ratings yet
UNIT - I Notes
48 pages

Ultimate Monitoring Project

Uploaded by

Ultimate Monitoring Project

Uploaded by

DevOps Shack

Ultimate DevOps Monitoring Project

• Ensure you have wget and tar installed on both VMs.

VM-1 (Node Exporter)

1. Download Node Exporter

2. Extract Node Exporter

3. Start Node Exporter

VM-2 (Prometheus, Alertmanager, Blackbox Exporter)

1. Download Blackbox Exporter

2. Extract Blackbox Exporter

3. Start Blackbox Exporter

Prometheus and Alertmanager

# metrics_path defaults to '/metrics'

# metrics_path defaults to '/metrics'

Alert Rules Configuration (alert_rules.yml)

Alert Rules Group

Alertmanager Configuration (alertmanager.yml)

You might also like