Role:

Sr. Site Reliability Engineer (SRE) - Unified Observability \& AIOps

Location:

Austin, TX / Fort Mill, SC (Hybrid)Job Type: Full Time

Role Summary

We are seeking a

Senior SRE

with strong expertise in

Unified Observability, proactive detection, AIOps, and GenAI-driven operations

to support complex, distributed financial services platforms. The role requires hands-on experience designing

SLI/SLO-driven monitoring

dynamic thresholds

intelligent alerting

, and

AI/ML-based anomaly detection

across multi-stream architectures.

Key Responsibilities

Observability \& Reliability Engineering

Design and implement unified observability dashboards across metrics, logs, traces, events, and topology
Define and manage SLIs, SLOs, and error budgets aligned to business outcomes
Build actionable dashboards for operations, engineering, and leadership
Implement alerting strategies using static and dynamic thresholds

Proactive Detection \& AIOps

Distributed Systems \& Dependency Analysis

Tooling \& Platforms

GenAI \& LLM Enablement

Required Skills \& Experience

✅ 15\+ years in SRE / Production Engineering ✅ Strong

Unified Observability

background (not infra-only) ✅ Hands-on

Dynatrace

experience (metrics, traces, logs, Davis AI) ✅ SLI/SLO engineering experience in production systems ✅ Experience implementing

dynamic thresholds

and anomaly detection ✅ Knowledge of

AI/ML concepts applied to Ops (AIOps)

✅ Distributed systems troubleshooting expertise ✅ Experience with Kafka or streaming data platforms

Differentiators (Highly Valued)

Sr. Site Reliability Engineer

Job Description