Skip to main content
S

Senior DevOps Engineer

SCALIS

Location

Remote

Salary

Not specified

Type

fulltime

Posted

Today

via linkedin

Job Description

About SCALIS

SCALIS is building a modern Applicant Tracking System (ATS) and CRM platform that helps employers manage their entire hiring pipeline — from sourcing and job posting to candidate evaluation, interview scheduling, and offers. Our platform serves employers, job seekers, and recruiters with AI-powered features including candidate matching, sourcing agents, and intelligent automations.

About the Role

We're looking for a DevOps Engineer to own and evolve the infrastructure, CI/CD pipelines, and developer experience across the SCALIS platform. You'll work with a microservices architecture spanning 10\+ services, real-time data pipelines, and a GitOps-driven deployment model. This role is hands-on — you'll be writing Kubernetes manifests, tuning CI workflows, debugging production incidents, and making the lives of our product engineers dramatically better.

What You'll Do

  • Own the CI/CD platform: Maintain and improve GitHub Actions workflows across 10\+ repositories (build, test, lint, deploy, migration, e2e). Optimize build times, reduce flakiness, and keep pipelines reliable.
  • Manage Kubernetes infrastructure: Operate EKS clusters across development, staging, and production environments. Write and maintain Kustomize overlays, Helm charts, deployments, HPAs, PDBs, and ingress configurations.
  • Drive GitOps deployments: Manage the ArgoCD-based deployment pipeline. Ensure image tag updates flow cleanly from CI through gitops-workloads to cluster sync, with automated rollback on failure.
  • Operate the data pipeline: Maintain the Change Data Capture pipeline that keeps our search layer in sync. Monitor connector health, troubleshoot replication lag, and ensure data consistency.
  • Manage databases: Oversee PostgreSQL on AWS RDS. Coordinate schema migration rollouts across environments, manage connection pooling, monitor query performance, and handle backup/restore procedures.
  • Maintain message infrastructure: Operate Kafka-compatible event streaming clusters. Manage topics, consumer groups, and schema evolution across services.
  • Improve observability: Expand Datadog integration for metrics, traces, and logs. Build dashboards, set up alerts, and establish SLOs for critical user paths.
  • Secure the platform: Manage AWS IAM roles, secrets, TLS certificates, and network policies. Ensure least-privilege access across services.
  • Improve developer experience: Maintain the Docker Compose-based local development environment (15\+ services), optimize build caches, and reduce friction for engineers running the full stack locally.
  • Capacity plan and cost optimize: Right-size pod resources, manage reserved instances, and optimize ECR image storage and S3 costs.

What We're Looking For

Required:

  • 4\+ years of experience in a DevOps, SRE, or Platform Engineering role
  • Strong Kubernetes experience (EKS preferred) — you've written deployments, HPAs, PDBs, and ingress configs from scratch, not just applied Helm defaults
  • GitHub Actions expertise — building reusable workflows, matrix strategies, self-hosted runners, caching, and artifact management
  • AWS proficiency — EC2/EKS, RDS (PostgreSQL), S3, ECR, IAM, Secrets Manager, CloudWatch, and VPC networking
  • Experience with GitOps tools, specifically ArgoCD and Kustomize
  • Hands-on Docker experience — writing multi-stage Dockerfiles, optimizing image sizes, and debugging container networking
  • Solid understanding of PostgreSQL operations — migrations, connection pooling, performance tuning, backup/recovery
  • Experience with event streaming platforms — Kafka or similar (topic management, consumer group monitoring, connector operations)
  • Comfortable with TypeScript/Node.js — enough to read application code, debug startup failures, and understand build tooling
  • Infrastructure-as-Code experience — Terraform, CDKTF, Pulumi, or equivalent

Nice to Have:

  • Experience with Debezium or other CDC tools
  • OpenSearch/Elasticsearch operations (index management, cluster tuning, mapping configuration)
  • Cloudflare or similar edge/CDN management
  • Datadog or comparable observability platform experience
  • Familiarity with third-party SaaS integration webhook infrastructure
  • Redis operations for caching and job queue systems
  • Experience scaling platforms from startup to growth stage (we're moving fast)
  • KEDA or other event-driven autoscaling experience

Looking for more opportunities?

Browse thousands of graduate jobs and entry-level positions.

Browse All Jobs