Optimize Alerting System to Reduce Alert Fatigue

Sep 22, 2025 - MidLevel

$60.00 Hourly

Overview:

We are a growing online media company with a complex infrastructure spanning multiple servers and microservices. We use a combination of Prometheus, Grafana, and an alerting tool.

The Challenge:

Our team is suffering from "alert fatigue." We receive an overwhelming number of alerts, many of which are non-critical or false positives. This makes it difficult to distinguish real emergencies from noise, causing us to miss important issues.

Problems Caused:

The constant stream of alerts is causing burnout and a lack of trust in our monitoring system. This leads to slow response times for real incidents, as critical alerts are often ignored amidst the noise.

Proposed Method:

We need a freelancer to audit our existing alerting rules. This involves reviewing our current configurations, identifying noisy alerts, and implementing smarter, more actionable rules. The solution should prioritize critical alerts and suppress irrelevant ones.

Required Skills:

Expertise in Prometheus and Grafana.

Strong understanding of alerting best practices.

Experience with PromQL for advanced queries.

Proficiency in a scripting language like Python or Bash.

Experience Required:

At least 3 years of experience in a DevOps or SRE role with a focus on monitoring and observability.

Delivery:

A refined set of alerting rules and a brief report on the changes made.

Support:

We require 1 week of post-delivery support to address any immediate issues with the new rules.

  • United Arab Emirates
  • Proposal: 0
  • Not Verified
  • Less than a month
  • Estimated Hours: 25
Ahmed Khan
Ahmed Khan Inactive
Dubai , United Arab Emirates
Member since
Oct 26, 2024
Total Job
10
Last seen
1 week ago