About Our Company

Universal D

, Inc. is an international Company with a highly experienced team focused on cracking cancer’s code. Through our multi-omics and bioinformatics models, we have figured out how to read the disease’s signals in blood with high accuracy to detect cancer in its earliest stages. Starting with a colorectal cancer screening liquid biopsy test, we are building a multi-cancer platform that can identify the unique DNA regions associated with different types of cancers.

The Opportunity

Universal DX is seeking an experienced

Site Reliability Engineer

to join our growing team. You will be a key technical leader responsible for the reliability, scalability, and operational excellence of our production platforms, with a strong focus on EKS/Kubernetes and cloud infrastructure.

You will be part of a team that is passionate about developing novel diagnostic tests for the early detection of cancers. As part of the team, you will be in a Company that aims more than to become one of the leaders in the industry. We want to have a huge positive impact on society by achieving the ambitious purpose of “making cancer a curable disease by detecting it earlier”.

How You’ll Contribute

Own the reliability, performance, and uptime of Kubernetes (EKS) clusters and shared platform services.
Define and monitor service-level indicators and objectives (SLIs/SLOs) that drive reliability decisions.
Lead incident response, root cause analysis, and implement long-term fixes.
Build and enhance observability (monitoring, logging, tracing) to surface issues before they impact customers.
Automate operational tasks and reduce toil using software engineering principles.
Plan and execute safe infrastructure changes, including cluster upgrades, scaling strategies, and safer rollout processes.
Collaborate with Dev, Cloud, and Security teams to improve operational practices and platform maturity.
Mentor and coach less senior engineers on reliability engineering best practices and tooling.

What You’ll Bring

Minimum bachelors degree in Computer Science, Computer Engineering, or other job-related field.
5\+ years of experience in SRE, Reliability Engineering, DevOps, or Infrastructure roles with cloud systems.
Proven experience with Kubernetes (preferably EKS) and container orchestration at scale.
Strong skills administrating AWS and cloud-native infrastructure.
Expertise with observability tools such as Prometheus, Grafana, ELK, Datadog, etc.
Proficiency with Infrastructure as Code (Terraform, Helm, GitOps workflows).
Strong programming or scripting ability (e.g., Python, Go, Bash).
Excellent troubleshooting, communication, and collaboration skills.
Detail-oriented, self-motivated, and able to work independently in a fast-paced, dynamic environment.
Preferred: Experience leading SRE practices in a growing engineering organization.
Preferred: Familiarity with CI/CD systems (GitLab CI, Jenkins, ArgoCD, etc.).
Preferred: A passion for automation, reliability, and continuous improvement.
Preferred: AWS Certified Solutions Architect (Associate) or higher.

What We´ll Offer

22 days of PTO with the possibility to carry over 10 days to the following year.
Company Holidays, plus your Birthday off!
Company-sponsored benefit plans, including medical, dental, and vision insurance plus life, STD, and LTD coverage, and 401(k).
Flexible work schedule
And more to come!

Why Now?

This is an exciting time to be at Universal DX. We are growing rapidly and starting our US operations by building up our team, starting our lab and business operations, and establishing strategic partnerships.

We are looking for passionate changemakers to be a part of our journey in this expansive time for us.

Site Reliability Engineer

Job Description

Looking for more opportunities?