Observability Platform Implementation (ELK Stack / Prometheus & Grafana)

Jul 29, 2025 - Senior

$5,500.00 Fixed

Basic Monitoring Setup:

Overview:

This package provides a foundational monitoring and logging solution for a single application or a small set of servers, enabling basic visibility into system health and performance. Ideal for startups or projects requiring quick setup.

Deliverables:

  • Installation and basic configuration of either ELK Stack (Elasticsearch, Logstash, Kibana) or Prometheus & Grafana.
  • r collecting logs and metrics from up to 5 servers/VMs.
  • Creation of 3-5 essential dashboards for key metrics (CPU, Memory, Disk, Network) and log overview.
  • Configuration of 3-5 basic alerts for critical issues (e.g., server down, high resource utilization).
  • Basic documentation for accessing dashboards and understanding alerts.
  • Full installation, configuration, and optimization of a scalable ELK Stack or Prometheus & Grafana cluster.
  • Agent deployment for collecting logs and metrics from up to 20 servers/VMs or a small Kubernetes cluster.
  • Implementation of advanced log parsing, filtering, and metric aggregation.
  • Development of 10-15 custom, actionable dashboards tailored to application and infrastructure needs.
  • Configuration of 10-15 advanced alerting rules with integration to notification channels (e.g., Slack, PagerDuty).
  • Basic distributed tracing setup (e.g., Jaeger/OpenTelemetry integration).
  • Detailed documentation including architecture, configuration, and operational runbooks.
  • One-day training session for your operations team.

Required Qualifications:

  • Minimum 3+ years of hands-on experience in implementing and managing large-scale monitoring and logging solutions, with at least 3 years specifically with either ELK Stack or Prometheus/Grafana.
  • Deep expertise in Elasticsearch cluster management, Logstash pipeline configuration, and Kibana dashboarding, OR profound knowledge of Prometheus metric collection, PromQL, Alertmanager, and Grafana dashboarding.
  • Strong understanding of logging best practices, metric collection strategies, and distributed tracing concepts.
  • Proficiency in Linux server administration and scripting (Bash, Python) for automation and data collection.
  • Experience with containerized environments (Docker, Kubernetes) and collecting metrics/logs from them.
  • Familiarity with cloud platforms (AWS, Azure, GCP) and their native monitoring services.
  • Excellent analytical skills to interpret complex data and identify root causes of issues.
  • Strong communication and collaboration skills to work with diverse technical teams.

Key Skills:

  • Observability
  • Monitoring
  • Logging
  • Alerting
  • ELK Stack
  • Elasticsearch
  • Logstash
  • Kibana
  • Prometheus
  • Grafana
  • Alertmanager
  • PromQL
  • Distributed Tracing (Jaeger, OpenTelemetry)
  • Metrics
  • Logs
  • Dashboards
  • Cloud Monitoring (CloudWatch, Azure Monitor, Google Cloud Monitoring)

 Expectations for Support from Freelancer:

  • Responsiveness: Prompt communication and response to inquiries (within 24 hours on weekdays).
  • Availability: Willingness to be available for urgent issues or critical updates, potentially outside standard business hours, with prior arrangement.
  • Troubleshooting: Ability to quickly diagnose and resolve any post-implementation issues that may arise.
  • Documentation Updates: Keep documentation current with any changes or optimizations made during the support phase.
  • Advisory: Provide expert advice on future scaling, security enhancements, or new feature implementations.

Project Goals:

  • Enhanced Visibility: Provide real-time, actionable insights into system health, performance, and operational issues.
  • Proactive Problem Detection: Enable early identification of issues before they impact users.
  • Faster Incident Response: Streamline troubleshooting and reduce Mean Time To Resolution (MTTR).
  • Improved System Reliability: Support data-driven decisions for system optimization and stability.
  • Operational Efficiency: Reduce manual effort in monitoring and alerting.
  • India
  • Proposal: 0
  • Verified
  • Less than a month
Priya Nair
Priya Nair Inactive
Maharashtra , India
Member since
Oct 26, 2024
Total Job
6
Last seen
1 week ago