Location
Texas, United States
Salary
Not specified
Type
fulltime
Posted
Today
Job Description
Senior Data Center Operations Engineer / Lead Hardware Engineer (GPU Infrastructure)
We’re working with a high-growth AI infrastructure provider building the compute, data centers, and power systems underpinning next-generation artificial intelligence.
This team is deploying and operating hyperscale, GPU-dense environments for some of the most advanced AI workloads globally. The environment is fast-paced, highly technical, and focused on delivering reliable, scalable infrastructure at speed.
Locations open: Abernathy, Barber Lake, Buffalo
The Role
As a Data Center Operations Engineer, you’ll take full ownership of onsite operations within a high-performance compute environment. You’ll be responsible for the deployment, maintenance, and reliability of GPU-based infrastructure, supporting critical AI workloads.
This is a hands-on role working close to the hardware, where you’ll act as a first responder for incidents, support ongoing scaling efforts, and ensure operational excellence across the data center. You must have GPU experience and have worked in a Senior Capacity.
Key Responsibilities
- Install, deploy, and configure server and network hardware, with a focus on GPU-based systems
- Troubleshoot and maintain GPU servers (e.g. H100, B200, GB200 or similar) in production environments
- Perform hardware replacements (servers, components, networking gear) while maintaining accurate asset tracking
- Support network troubleshooting, including cabling diagnostics (copper/fibre) and device-level issues
- Act as an onsite incident responder, coordinating with remote engineering teams and SMEs
- Own and resolve operational tickets, escalating where needed while maintaining high SLAs
- Support 24/7 operations via shift patterns or on-call rotations
- Collaborate with internal teams, vendors, and customers to support ongoing deployments and improvements
Requirements
- Hands-on experience working with GPU servers in production environments (essential)
- Exposure to NVIDIA-based systems such as H100, B200, A100, GB200 or similar
- Strong experience in server hardware troubleshooting
- POST, BIOS, PXE boot, IPMI, BMC, etc.
- Solid understanding of networking fundamentals
- TCP/IP, Ethernet, switching, routing, cabling (copper \& fibre)
- Working knowledge of Linux systems administration
- Experience operating in data center or hardware-intensive environments
- Ability to work in fast-paced, high-availability environments with shifting priorities
Nice to Have
- Experience in hyperscale or HPC environments
- Background in electrical, mechanical, or related engineering disciplines
- Experience working with vendors and managing hardware lifecycle projects
- Strong communication skills and ability to collaborate across technical teams
Package
- Competitive salary \+ equity
- Pension / retirement plan
- Private healthcare (including dental and vision where applicable)
- Generous PTO
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.