Location
Remote
Salary
$170,000 - $250,000 /yearly
Type
fulltime
Posted
Today
Job Description
Senior Software Engineer, HPC Scheduling
Location: Dallas, TX \| Hybrid
Type: Direct Hire
Relocation: Available for non-local candidates
Compensation
Base salary: $170,000 – $250,000 \+ performance bonus
Benefits: 100% company-paid benefits
OVERVIEW
GTN is seeking a Senior Software Engineer, HPC Scheduling to help design, build, and maintain large-scale scheduling software that supports demanding HPC, AI, research, and production workloads.
This role sits on a highly technical engineering team responsible for developing distributed systems, backend services, APIs, tooling, and automation that keep a high-scale compute platform reliable, performant, and maintainable.
Much of the work centers around Armada, an open-source project built and maintained by the team, along with internal scheduling, orchestration, and platform services. The current codebase is primarily written in Go, but the team is open to strong backend engineers from any language background as long as they have experience building production software at scale and can ramp into Go.
This is a hands-on engineering role focused on writing clean, well-tested code, reviewing designs, solving complex distributed systems problems, and owning production-quality software.
The ideal candidate is a strong backend engineer with excellent coding fundamentals, experience building scalable services or distributed systems, and a practical understanding of how software runs in cloud, Linux, Kubernetes, and production infrastructure environments.
KEY RESPONSIBILITIES
Software Engineering \& Platform Development
- Design, write, test, and review high-quality production code, primarily in Go
- Build and maintain scalable backend services, APIs, and distributed systems supporting high-demand workloads
- Contribute to Armada and related internal scheduling, orchestration, and platform services
- Develop tooling and automation that improves platform reliability, developer productivity, and operational efficiency
- Apply strong software architecture principles to ensure systems are maintainable, correct, and scalable
- Collaborate with senior engineers on technical design, code reviews, system improvements, and long-term platform direction
Backend Systems \& Distributed Infrastructure
- Build services that operate reliably across large-scale HPC and AI infrastructure environments
- Design backend systems that support high-volume workloads, complex scheduling logic, and distributed execution patterns
- Work with Kubernetes-based orchestration, containerized services, and modern deployment workflows
- Develop and debug software in Linux environments using command-line and system-level tooling
- Apply networking and systems fundamentals to troubleshoot, optimize, and improve platform performance
- Independently diagnose and resolve complex issues across software and infrastructure layers
Data, Reliability \& Operations
- Manage and optimize data interactions across relational and non-relational data stores, with emphasis on PostgreSQL
- Contribute to CI/CD pipelines, automated testing, observability, and engineering best practices
- Use monitoring, logging, and runtime tools such as Prometheus, Grafana, or similar platforms
- Think critically about correctness, edge cases, performance, scalability, and failure modes
- Support production-quality engineering practices across testing, reliability, documentation, and maintainability
- Stay current with emerging technologies and apply new approaches where they improve platform outcomes
REQUIRED EXPERIENCE
- Strong backend software engineering fundamentals, including data structures, algorithms, system design, and maintainable code practices
- Professional experience building backend services, APIs, distributed systems, platform services, or infrastructure software in production environments
- Proficiency in Go, Java, C\+\+, C#, Rust, Scala, Kotlin, Python, or another production backend language
- Ability and willingness to ramp into Go-based codebases
- Experience building software at scale, ideally in environments involving high throughput, distributed workloads, reliability requirements, or complex production systems
- Familiarity with cloud environments such as AWS, GCP, or Azure
- Experience with Linux-based development and debugging
- Familiarity with Kubernetes, containers, or modern deployment pipelines
- Experience with PostgreSQL or similar relational databases
- Understanding of observability practices, including monitoring, logging, metrics, and alerting
- Strong testing mindset with focus on correctness, reliability, edge cases, and failure scenarios
- Ability to work independently, review code thoughtfully, and contribute in a collaborative engineering team
PREFERRED EXPERIENCE
- Experience with HPC, AI infrastructure, batch scheduling, workload orchestration, or large-scale compute platforms
- Hands-on experience with Kubernetes scheduling, multi-cluster systems, or distributed job orchestration
- Experience building backend systems at significant scale in cloud, infrastructure, platform, fintech, adtech, data, developer tools, or similar high-demand environments
- Contributions to open-source projects or experience working in open-source engineering environments
- Experience with non-relational databases, message queues, event-driven systems, or high-throughput platforms
- Familiarity with performance optimization, reliability engineering, or production platform operations
- Prior experience with Go is helpful, but not required
IDEAL PROFILE
The ideal candidate is a hands-on backend software engineer who enjoys building systems that operate at scale. They write clean, tested code, understand distributed systems tradeoffs, and are comfortable working close to production infrastructure.
They do not need to come directly from an HPC background and do not need prior Go experience. The key requirement is strong backend engineering capability, experience building reliable software in scaled production environments, and an interest in solving complex scheduling, orchestration, and platform reliability challenges.
This person should be comfortable learning new technical domains, working with senior engineers, contributing to open-source and internal platforms, and owning production-quality systems that support demanding infrastructure workloads.
WHY THIS ROLE
- Work on high-scale HPC and AI infrastructure supporting demanding production workloads
- Contribute to Armada, an open-source scheduling platform
- Join a senior, collaborative engineering team with real ownership over technical direction
- Build software that directly impacts platform reliability, performance, and scalability
- Opportunity for strong backend engineers from any language background to work on complex infrastructure software
- Competitive compensation, performance bonus, relocation support, and 100% company-paid benefits
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.