Senior / Staff DevOps Engineer (Platform \& Reliability)

Location:

Remote (U.S. or Canada)

Company:

Peerlogic

The Role

Peerlogic is hiring a

Senior / Staff DevOps Engineer

to own the platform, infrastructure, and reliability of a production system that spans

application services, AI/ML workloads, and real-time voice infrastructure

You are replacing a strong DevOps leader and not building from scratch. The system works. Your job is to

make it exceptional

This is not a support role.

This is not a ticket-driven role.

You will:

Own reliability end-to-end
Make architectural decisions with real consequences
Operate in ambiguity without waiting for direction

If you prefer clearly defined scopes, narrow ownership, or “assigned work,” this is not the role.

What You’ll Own

Platform \& Infrastructure

End-to-end ownership of

cloud \+ hybrid infrastructure

(AWS, GCP, and physical environments)

Multi-region architecture targeting

99\.999% uptime

Kubernetes clusters and container orchestration across all services
CI/CD pipelines (GitHub Actions); reliability, speed, and developer experience
Infrastructure as Code (Terraform, Ansible)

Reliability \& Observability

Design and enforce

SLOs, SLIs, and error budgets

Build a

best-in-class observability stack

(metrics, logs, traces)

Drive incident response, postmortems, and systemic fixes (not band-aids)
Reduce MTTR and eliminate repeat incidents

Data \& Event Systems

Ownership of

event-driven architecture

(RabbitMQ or equivalent)

Ensure

durability, replayability, and correctness

of pipelines

Design and maintain

backfill and recovery strategies

Improve debuggability of asynchronous systems

AI / ML Infrastructure

Operate and scale

LLM-powered systems

(Bedrock, SageMaker, or equivalent)

Manage inference workloads with a focus on:
Latency
Cost
Reliability
Build and maintain:
Evaluation pipelines
Dataset versioning
Reproducible ML workflows

Performance \& Cost

infrastructure cost efficiency

across:

Compute
Storage
LLM usage
Continuously optimize tradeoffs between:
Performance
Reliability
Cost

Security \& Compliance

Own infrastructure posture for

SOC 2 and HIPAA

Ensure secure handling of PHI (encryption, access controls, auditability)
Implement and enforce:
Secrets management
IAM best practices
Network isolation
Partner with compliance tooling (e.g., Sprinto)

What You Will NOT Own

SIP routing, dial plans, or telecom call flows
Carrier integrations or VoIP-specific logic

(You will collaborate closely with a dedicated VoIP Infrastructure Engineer where systems intersect.)

What We’re Looking For

Experience

5–10\+ years in DevOps, SRE, or Infrastructure Engineering
Proven ownership of

production systems at scale

Experience operating

multi-region, high-availability systems

Technical Depth

Strong hands-on experience with:

Kubernetes, ECS, and containerized systems
Terraform and infrastructure as code
CI/CD systems (GitHub Actions preferred)
Networking fundamentals (TCP/IP, DNS, ip tables, load balancing)

You should also:

Be comfortable writing code (Python, Go, or similar)
Have experience with

real-time or low-latency systems

Understand

event-driven architectures

deeply

Mindset (this matters more than tools)

You take ownership beyond your “area”
You fix root causes, not symptoms
You make decisions with incomplete information
You care about

systems, not just infrastructure

Our Stack (Partial)

AWS, GCP, Kubernetes
Python, Postgres
RabbitMQ / async pipelines
LLM systems (multi-agent, inference pipelines)
VoIP \+ EHR integrations (adjacent systems)

What Success Looks Like

Within 3–6 months:

Reliability improves measurably (fewer incidents, faster recovery)
Observability provides

clear, actionable insights

across systems

CI/CD becomes faster, safer, and more predictable
Event-driven systems are easier to debug and recover

Within 6–12 months:

Platform operates at or near

5-nines reliability

Infrastructure scales cleanly across app, AI, and voice workloads
AI systems are

cost-efficient and production-grade

Engineering velocity increases due to strong platform foundations

Team \& Environment

\~10 person engineering team
Reports directly to CTO
High-ownership, fast-moving startup
Expectation of after-hours ownership when needed

Compensation

$140K – $180K CAD base (flexible for Senior vs Staff)
Equity included
Will stretch for the right candidate

Why This Role Matters

Peerlogic sits at the intersection of

healthcare, AI, and real-time communication

This role ensures the platform is:

Fast enough for real-time interaction
Reliable enough for healthcare workflows
Scalable enough to support rapid growth

DevOps Engineer

Job Description

Looking for more opportunities?