Senior Software Engineer, HPC Scheduling

Location: Dallas, TX \| Hybrid

Type: Direct Hire

Relocation: Available for non-local candidates

Compensation

Base salary: $170,000 – $250,000 \+ performance bonus

Benefits: 100% company-paid benefits

OVERVIEW

GTN is seeking a Senior Software Engineer, HPC Scheduling to help design, build, and maintain large-scale scheduling software that supports demanding HPC, AI, research, and production workloads.

This role sits on a highly technical engineering team responsible for developing distributed systems, backend services, APIs, tooling, and automation that keep a high-scale compute platform reliable, performant, and maintainable.

Much of the work centers around Armada, an open-source project built and maintained by the team, along with internal scheduling, orchestration, and platform services. The current codebase is primarily written in Go, but the team is open to strong backend engineers from any language background as long as they have experience building production software at scale and can ramp into Go.

This is a hands-on engineering role focused on writing clean, well-tested code, reviewing designs, solving complex distributed systems problems, and owning production-quality software.

The ideal candidate is a strong backend engineer with excellent coding fundamentals, experience building scalable services or distributed systems, and a practical understanding of how software runs in cloud, Linux, Kubernetes, and production infrastructure environments.

KEY RESPONSIBILITIES

Software Engineering \& Platform Development

Design, write, test, and review high-quality production code, primarily in Go

Build and maintain scalable backend services, APIs, and distributed systems supporting high-demand workloads

Contribute to Armada and related internal scheduling, orchestration, and platform services

Develop tooling and automation that improves platform reliability, developer productivity, and operational efficiency

Apply strong software architecture principles to ensure systems are maintainable, correct, and scalable

Collaborate with senior engineers on technical design, code reviews, system improvements, and long-term platform direction

Backend Systems \& Distributed Infrastructure

Build services that operate reliably across large-scale HPC and AI infrastructure environments

Design backend systems that support high-volume workloads, complex scheduling logic, and distributed execution patterns

Work with Kubernetes-based orchestration, containerized services, and modern deployment workflows

Develop and debug software in Linux environments using command-line and system-level tooling

Apply networking and systems fundamentals to troubleshoot, optimize, and improve platform performance

Independently diagnose and resolve complex issues across software and infrastructure layers

Data, Reliability \& Operations

Manage and optimize data interactions across relational and non-relational data stores, with emphasis on PostgreSQL

Contribute to CI/CD pipelines, automated testing, observability, and engineering best practices

Use monitoring, logging, and runtime tools such as Prometheus, Grafana, or similar platforms

Think critically about correctness, edge cases, performance, scalability, and failure modes

Support production-quality engineering practices across testing, reliability, documentation, and maintainability

Stay current with emerging technologies and apply new approaches where they improve platform outcomes

REQUIRED EXPERIENCE

Strong backend software engineering fundamentals, including data structures, algorithms, system design, and maintainable code practices

Professional experience building backend services, APIs, distributed systems, platform services, or infrastructure software in production environments

Proficiency in Go, Java, C\+\+, C#, Rust, Scala, Kotlin, Python, or another production backend language

Ability and willingness to ramp into Go-based codebases

Experience building software at scale, ideally in environments involving high throughput, distributed workloads, reliability requirements, or complex production systems

Familiarity with cloud environments such as AWS, GCP, or Azure

Experience with Linux-based development and debugging

Familiarity with Kubernetes, containers, or modern deployment pipelines

Experience with PostgreSQL or similar relational databases

Understanding of observability practices, including monitoring, logging, metrics, and alerting

Strong testing mindset with focus on correctness, reliability, edge cases, and failure scenarios

Ability to work independently, review code thoughtfully, and contribute in a collaborative engineering team

PREFERRED EXPERIENCE

Experience with HPC, AI infrastructure, batch scheduling, workload orchestration, or large-scale compute platforms

Hands-on experience with Kubernetes scheduling, multi-cluster systems, or distributed job orchestration

Experience building backend systems at significant scale in cloud, infrastructure, platform, fintech, adtech, data, developer tools, or similar high-demand environments

Contributions to open-source projects or experience working in open-source engineering environments

Experience with non-relational databases, message queues, event-driven systems, or high-throughput platforms

Familiarity with performance optimization, reliability engineering, or production platform operations

Prior experience with Go is helpful, but not required

IDEAL PROFILE

The ideal candidate is a hands-on backend software engineer who enjoys building systems that operate at scale. They write clean, tested code, understand distributed systems tradeoffs, and are comfortable working close to production infrastructure.

They do not need to come directly from an HPC background and do not need prior Go experience. The key requirement is strong backend engineering capability, experience building reliable software in scaled production environments, and an interest in solving complex scheduling, orchestration, and platform reliability challenges.

This person should be comfortable learning new technical domains, working with senior engineers, contributing to open-source and internal platforms, and owning production-quality systems that support demanding infrastructure workloads.

WHY THIS ROLE

Work on high-scale HPC and AI infrastructure supporting demanding production workloads

Contribute to Armada, an open-source scheduling platform

Join a senior, collaborative engineering team with real ownership over technical direction

Build software that directly impacts platform reliability, performance, and scalability

Opportunity for strong backend engineers from any language background to work on complex infrastructure software

Competitive compensation, performance bonus, relocation support, and 100% company-paid benefits

Senior Backend Developer

Job Description

Looking for more opportunities?