Skip to main content
P

DevOps Engineer

Peerlogic

Location

Remote

Salary

$140,000 - $180,000 /yearly

Type

fulltime

Posted

Today

via linkedin

Job Description

Senior / Staff DevOps Engineer (Platform \& Reliability)

Location:

Remote (U.S. or Canada)

Company:

Peerlogic

The Role

Peerlogic is hiring a

Senior / Staff DevOps Engineer

to own the platform, infrastructure, and reliability of a production system that spans

application services, AI/ML workloads, and real-time voice infrastructure

.

You are replacing a strong DevOps leader and not building from scratch. The system works. Your job is to

make it exceptional

.

This is not a support role.

This is not a ticket-driven role.

You will:

  • Own reliability end-to-end
  • Make architectural decisions with real consequences
  • Operate in ambiguity without waiting for direction

If you prefer clearly defined scopes, narrow ownership, or “assigned work,” this is not the role.

What You’ll Own

Platform \& Infrastructure

  • End-to-end ownership of

cloud \+ hybrid infrastructure

(AWS, GCP, and physical environments)

  • Multi-region architecture targeting

99\.999% uptime

  • Kubernetes clusters and container orchestration across all services
  • CI/CD pipelines (GitHub Actions); reliability, speed, and developer experience
  • Infrastructure as Code (Terraform, Ansible)

Reliability \& Observability

  • Design and enforce

SLOs, SLIs, and error budgets

  • Build a

best-in-class observability stack

(metrics, logs, traces)

  • Drive incident response, postmortems, and systemic fixes (not band-aids)
  • Reduce MTTR and eliminate repeat incidents

Data \& Event Systems

  • Ownership of

event-driven architecture

(RabbitMQ or equivalent)

  • Ensure

durability, replayability, and correctness

of pipelines

  • Design and maintain

backfill and recovery strategies

  • Improve debuggability of asynchronous systems

AI / ML Infrastructure

  • Operate and scale

LLM-powered systems

(Bedrock, SageMaker, or equivalent)

  • Manage inference workloads with a focus on:
  • Latency
  • Cost
  • Reliability
  • Build and maintain:
  • Evaluation pipelines
  • Dataset versioning
  • Reproducible ML workflows

Performance \& Cost

  • Own

infrastructure cost efficiency

across:

  • Compute
  • Storage
  • LLM usage
  • Continuously optimize tradeoffs between:
  • Performance
  • Reliability
  • Cost

Security \& Compliance

  • Own infrastructure posture for

SOC 2 and HIPAA

  • Ensure secure handling of PHI (encryption, access controls, auditability)
  • Implement and enforce:
  • Secrets management
  • IAM best practices
  • Network isolation
  • Partner with compliance tooling (e.g., Sprinto)

What You Will NOT Own

  • SIP routing, dial plans, or telecom call flows
  • Carrier integrations or VoIP-specific logic

(You will collaborate closely with a dedicated VoIP Infrastructure Engineer where systems intersect.)

What We’re Looking For

Experience

  • 5–10\+ years in DevOps, SRE, or Infrastructure Engineering
  • Proven ownership of

production systems at scale

  • Experience operating

multi-region, high-availability systems

Technical Depth

Strong hands-on experience with:

  • Kubernetes, ECS, and containerized systems
  • Terraform and infrastructure as code
  • CI/CD systems (GitHub Actions preferred)
  • Networking fundamentals (TCP/IP, DNS, ip tables, load balancing)

You should also:

  • Be comfortable writing code (Python, Go, or similar)
  • Have experience with

real-time or low-latency systems

  • Understand

event-driven architectures

deeply

Mindset (this matters more than tools)

  • You take ownership beyond your “area”
  • You fix root causes, not symptoms
  • You make decisions with incomplete information
  • You care about

systems, not just infrastructure

Our Stack (Partial)

  • AWS, GCP, Kubernetes
  • Python, Postgres
  • RabbitMQ / async pipelines
  • LLM systems (multi-agent, inference pipelines)
  • VoIP \+ EHR integrations (adjacent systems)

What Success Looks Like

Within 3–6 months:

  • Reliability improves measurably (fewer incidents, faster recovery)
  • Observability provides

clear, actionable insights

across systems

  • CI/CD becomes faster, safer, and more predictable
  • Event-driven systems are easier to debug and recover

Within 6–12 months:

  • Platform operates at or near

5-nines reliability

  • Infrastructure scales cleanly across app, AI, and voice workloads
  • AI systems are

cost-efficient and production-grade

  • Engineering velocity increases due to strong platform foundations

Team \& Environment

  • \~10 person engineering team
  • Reports directly to CTO
  • High-ownership, fast-moving startup
  • Expectation of after-hours ownership when needed

Compensation

  • $140K – $180K CAD base (flexible for Senior vs Staff)
  • Equity included
  • Will stretch for the right candidate

Why This Role Matters

Peerlogic sits at the intersection of

healthcare, AI, and real-time communication

.

This role ensures the platform is:

  • Fast enough for real-time interaction
  • Reliable enough for healthcare workflows
  • Scalable enough to support rapid growth

Looking for more opportunities?

Browse thousands of graduate jobs and entry-level positions.

Browse All Jobs