Job Description
About SCALIS
SCALIS is building a modern Applicant Tracking System (ATS) and CRM platform that helps employers manage their entire hiring pipeline — from sourcing and job posting to candidate evaluation, interview scheduling, and offers. Our platform serves employers, job seekers, and recruiters with AI-powered features including candidate matching, sourcing agents, and intelligent automations.
About the Role
We're looking for a DevOps Engineer to own and evolve the infrastructure, CI/CD pipelines, and developer experience across the SCALIS platform. You'll work with a microservices architecture spanning 10\+ services, real-time data pipelines, and a GitOps-driven deployment model. This role is hands-on — you'll be writing Kubernetes manifests, tuning CI workflows, debugging production incidents, and making the lives of our product engineers dramatically better.
What You'll Do
- Own the CI/CD platform: Maintain and improve GitHub Actions workflows across 10\+ repositories (build, test, lint, deploy, migration, e2e). Optimize build times, reduce flakiness, and keep pipelines reliable.
- Manage Kubernetes infrastructure: Operate EKS clusters across development, staging, and production environments. Write and maintain Kustomize overlays, Helm charts, deployments, HPAs, PDBs, and ingress configurations.
- Drive GitOps deployments: Manage the ArgoCD-based deployment pipeline. Ensure image tag updates flow cleanly from CI through gitops-workloads to cluster sync, with automated rollback on failure.
- Operate the data pipeline: Maintain the Change Data Capture pipeline that keeps our search layer in sync. Monitor connector health, troubleshoot replication lag, and ensure data consistency.
- Manage databases: Oversee PostgreSQL on AWS RDS. Coordinate schema migration rollouts across environments, manage connection pooling, monitor query performance, and handle backup/restore procedures.
- Maintain message infrastructure: Operate Kafka-compatible event streaming clusters. Manage topics, consumer groups, and schema evolution across services.
- Improve observability: Expand Datadog integration for metrics, traces, and logs. Build dashboards, set up alerts, and establish SLOs for critical user paths.
- Secure the platform: Manage AWS IAM roles, secrets, TLS certificates, and network policies. Ensure least-privilege access across services.
- Improve developer experience: Maintain the Docker Compose-based local development environment (15\+ services), optimize build caches, and reduce friction for engineers running the full stack locally.
- Capacity plan and cost optimize: Right-size pod resources, manage reserved instances, and optimize ECR image storage and S3 costs.
What We're Looking For
Required:
- 4\+ years of experience in a DevOps, SRE, or Platform Engineering role
- Strong Kubernetes experience (EKS preferred) — you've written deployments, HPAs, PDBs, and ingress configs from scratch, not just applied Helm defaults
- GitHub Actions expertise — building reusable workflows, matrix strategies, self-hosted runners, caching, and artifact management
- AWS proficiency — EC2/EKS, RDS (PostgreSQL), S3, ECR, IAM, Secrets Manager, CloudWatch, and VPC networking
- Experience with GitOps tools, specifically ArgoCD and Kustomize
- Hands-on Docker experience — writing multi-stage Dockerfiles, optimizing image sizes, and debugging container networking
- Solid understanding of PostgreSQL operations — migrations, connection pooling, performance tuning, backup/recovery
- Experience with event streaming platforms — Kafka or similar (topic management, consumer group monitoring, connector operations)
- Comfortable with TypeScript/Node.js — enough to read application code, debug startup failures, and understand build tooling
- Infrastructure-as-Code experience — Terraform, CDKTF, Pulumi, or equivalent
Nice to Have:
- Experience with Debezium or other CDC tools
- OpenSearch/Elasticsearch operations (index management, cluster tuning, mapping configuration)
- Cloudflare or similar edge/CDN management
- Datadog or comparable observability platform experience
- Familiarity with third-party SaaS integration webhook infrastructure
- Redis operations for caching and job queue systems
- Experience scaling platforms from startup to growth stage (we're moving fast)
- KEDA or other event-driven autoscaling experience
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.