Senior Data Center Operations Engineer / Lead Hardware Engineer (GPU Infrastructure)

We’re working with a high-growth AI infrastructure provider building the compute, data centers, and power systems underpinning next-generation artificial intelligence.

This team is deploying and operating hyperscale, GPU-dense environments for some of the most advanced AI workloads globally. The environment is fast-paced, highly technical, and focused on delivering reliable, scalable infrastructure at speed.

Locations open: Abernathy, Barber Lake, Buffalo

The Role

As a Data Center Operations Engineer, you’ll take full ownership of onsite operations within a high-performance compute environment. You’ll be responsible for the deployment, maintenance, and reliability of GPU-based infrastructure, supporting critical AI workloads.

This is a hands-on role working close to the hardware, where you’ll act as a first responder for incidents, support ongoing scaling efforts, and ensure operational excellence across the data center. You must have GPU experience and have worked in a Senior Capacity.

Key Responsibilities

Install, deploy, and configure server and network hardware, with a focus on GPU-based systems
Troubleshoot and maintain GPU servers (e.g. H100, B200, GB200 or similar) in production environments
Perform hardware replacements (servers, components, networking gear) while maintaining accurate asset tracking
Support network troubleshooting, including cabling diagnostics (copper/fibre) and device-level issues
Act as an onsite incident responder, coordinating with remote engineering teams and SMEs
Own and resolve operational tickets, escalating where needed while maintaining high SLAs
Support 24/7 operations via shift patterns or on-call rotations
Collaborate with internal teams, vendors, and customers to support ongoing deployments and improvements

Requirements

Hands-on experience working with GPU servers in production environments (essential)
Exposure to NVIDIA-based systems such as H100, B200, A100, GB200 or similar
Strong experience in server hardware troubleshooting
POST, BIOS, PXE boot, IPMI, BMC, etc.
Solid understanding of networking fundamentals
TCP/IP, Ethernet, switching, routing, cabling (copper \& fibre)
Working knowledge of Linux systems administration
Experience operating in data center or hardware-intensive environments
Ability to work in fast-paced, high-availability environments with shifting priorities

Nice to Have

Experience in hyperscale or HPC environments
Background in electrical, mechanical, or related engineering disciplines
Experience working with vendors and managing hardware lifecycle projects
Strong communication skills and ability to collaborate across technical teams

Package

Competitive salary \+ equity
Pension / retirement plan
Private healthcare (including dental and vision where applicable)
Generous PTO

Data Center Engineer

Job Description

Looking for more opportunities?