Establish a Formal Post-Incident Root Cause Analysis (RCA) Process

Sep 22, 2025 - MidLevel

$55.00 Hourly

Overview:

We are a fintech startup, and our platform must have high uptime. While our incident response is decent, we lack a structured process for learning from failures.

The Challenge:

We frequently experience service disruptions, but our post-incident analysis is informal and inconsistent. We do not have a standard procedure to conduct a thorough Root Cause Analysis (RCA), which means we often fix the symptom rather than the underlying problem.

Problems Caused:

This lack of a formal process leads to recurring incidents and a failure to improve our system's reliability over time. It prevents us from implementing long-term fixes and building a more resilient infrastructure.

Proposed Method:

The freelancer will be responsible for creating and documenting a formal RCA process. This includes developing a template for incident reports, defining a timeline for analysis, and establishing a clear chain of communication and accountability for follow-up actions.

Required Skills:

Experience in Incident Management and Service Reliability Engineering (SRE).

Knowledge of ITIL or other incident management frameworks.

Strong technical writing and communication skills.

Experience Required:

At least 3-5 years of experience in a role where you have led or participated in post-incident reviews.

Delivery:

A comprehensive, documented RCA process including templates and guidelines.

Support:

We require 2 weeks of post-delivery support to help our team adopt the new process and answer questions.

  • Spain
  • Proposal: 0
  • Not Verified
  • Less than a month
  • Estimated Hours: 40
Maria Gomez
Maria Gomez Inactive
Madrid , Spain
Member since
Oct 26, 2024
Total Job
8
Last seen
2 weeks ago