Skip to main content
T

SRE/Platform Engineer (OpenShift/Kubernetes) 4660

Tier4 Group

Location

Remote

Salary

Not specified

Type

Full-time

Posted

Today

via linkedin

Job Description

Site Reliability Engineer (SRE) / Platform Engineer

Location:

Reston, VA (Hybrid — 2 days onsite / 3 days remote)

Employment Type:

Full-time

About the Organization

Join a

mission-driven, national financial services organization

at the heart of the U.S. housing finance ecosystem. This is a

mid-sized, highly regulated enterprise

operating at market scale—supporting platforms and analytics that enable

trillions of dollars in annual economic activity

. You’ll work in a modern tech environment with strong engineering partners, clear business impact, and a mandate for reliability, security, and continuous improvement.

The Role

Our client is hiring a

hands-on SRE / Platform Engineer

to operate, tune, and scale our

OpenShift/Kubernetes

platforms while

bridging on-prem to Azure

to power our analytics ecosystem. You’ll own reliability, automation, and observability across a hybrid estate—partnering closely with developers, data engineers, infrastructure operations, and security to deliver secure, performant platform services using modern DevSecOps practices.

Why This Role Stands Out

  • Hybrid impact:

Operate critical OpenShift clusters

and

manage Azure services used by data and analytics teams.

  • Hybrid architecture:

Help design and support the

bridge from on-prem to cloud

—migration, integration, and steady-state operations.

  • Real-world scale:

Reliability work that directly supports

high-volume financial market operations

and enterprise analytics.

  • Automation-first:

Lean into

Terraform, Ansible, and GitOps

to make reliability repeatable.

What You’ll Do the First 180 Days...

  • Operate, tune, and optimize

OpenShift/Kubernetes clusters (scheduling, ingress, upgrades, quotas, policies).

  • Stand up and/or refine

observability

(Datadog, Prometheus, Grafana)—dashboards, alerts, SLOs, runbooks.

  • Map current

hybrid topology

and critical delivery pipelines; identify toil and prioritize automation (Terraform/Ansible).

  • Begin supporting

Azure

environments (compute, networking, storage, data services) used by analytics teams.

  • Drive

GitOps-first

workflows; harden CI/CD with

ArgoCD/Jenkins/GitHub Actions

and policy-as-code guardrails.

  • Implement or enhance

platform services

(Vault, Kafka/AMQ, ingress, service mesh) for dev and data teams.

  • Lead incident response and postmortems

; institutionalize RCA, blameless learning, and continuous improvement.

  • Advance the

hybrid service model

—migrations, integrations, reliability/latency tuning, cost and performance optimization.

Day-to-Day Responsibilities

  • Operate and optimize

OpenShift/Kubernetes

clusters, ingress (e.g., Nginx), and container networking/service mesh.

  • Manage

Azure

services (compute, VNet, storage, data services) supporting analytics workloads.

  • Build and maintain

automated infrastructure

with

Terraform, Ansible, and GitOps

workflows.

  • Implement and evolve

observability

(Datadog, Prometheus, Grafana): metrics, traces, logs, alerting, SLOs, runbooks.

  • Design, harden, and support

delivery pipelines

with

ArgoCD/Jenkins/GitHub Actions

.

  • Provide

platform tooling and enablement

for application developers, data engineers, and operations teams.

  • Ensure

security and access management

(HashiCorp Vault, secrets management, least privilege).

  • Lead incident response

, coordinate cross-functional resolution, and drive corrective actions and platform improvements.

  • Script or develop tools in

Bash, Python, or Go

to eliminate toil and improve developer experience.

Tech You’ll Work With

  • Kubernetes / OpenShift
  • Azure

(compute, networking, storage, and data services)

  • Automation \& IaC:

Terraform, Ansible, GitOps

  • Observability:

Datadog, Prometheus, Grafana

  • Networking \& Ingress:

Nginx, service meshes, container networking

  • Messaging:

Kafka, AMQ

  • Secrets \& Access:

HashiCorp Vault

  • CI/CD:

ArgoCD, Jenkins, GitHub Actions

  • Scripting/Coding:

Bash, Python, Go

Must-Have Qualifications

  • 2\+ years

hands-on operating and managing

Kubernetes and OpenShift

clusters.

  • Strong experience with

Microsoft Azure

(compute, networking, storage,

and

data services).

  • Proven skills in

automation and Infrastructure-as-Code

(Terraform, Ansible, GitOps).

  • Proficiency with

observability tooling

(Datadog, Prometheus, Grafana).

  • Scripting/coding

ability in

Bash, Python, or Go

.

Preferred / Stand-Out Skills

  • Experience

bridging on-prem and cloud

in a hybrid service model (migration, integration, optimization).

  • Expertise with

Kafka/AMQ

,

HashiCorp Vault

, and

ArgoCD/Jenkins/GitHub Actions

.

  • Background

leading incident response and postmortems

with strong RCA and continuous improvement practices.

Work Model \& Team

  • Hybrid:

2 days onsite in

Reston, VA

; 3 days remote.

  • You’ll be part of the

IT organization

, collaborating daily with

developers, data engineers, infrastructure operations, and security.

How to Succeed Here

  • You’re a

hands-on engineer

who thrives in regulated, high-impact environments.

  • You favor

automation over repetition

, and

observability over guesswork

.

  • You collaborate openly, communicate clearly, and

leave systems better

than you found them.

Looking for more opportunities?

Browse thousands of graduate jobs and entry-level positions.

Browse All Jobs