Location
Glasgow, Scotland, UK
Salary
Not specified
Type
contract
Posted
Today
Job Description
We’re seeking an
AWS Site Reliability Engineer (SRE)
with strong
incident operations
experience to support and improve the reliability of cloud and data platform services across
AWS and Snowflake
.
This role is hands-on and operationally focused:
proactive monitoring, rapid incident response, service restoration, root cause analysis, and automation
to improve resilience and reduce MTTR.
What you’ll do
- Lead
incident triage, coordination, and resolution
for AWS and Snowflake services in production
- Monitor and respond to
alerts, dashboards, and service health indicators
- Perform
root cause analysis (RCA)
and drive post-incident remediation and continuous improvement
- Create, maintain, and improve
runbooks
, operational procedures, and on-call readiness
- Participate in and strengthen
on-call rotations
(including operational handovers)
- Automate repetitive operational tasks to reduce toil, improve reliability, and
reduce MTTR
What you’ll bring (required)
- Strong knowledge of AWS, including
EC2, S3, IAM, VPC, Lambda, CloudWatch
- Experience with
Snowflake administration and troubleshooting
- Familiarity with observability tooling such as
CloudWatch, Datadog, Grafana, and/or Splunk
- Solid understanding of SRE principles:
SLIs, SLOs, error budgets, incident management
- Scripting/automation skills in
Python, Bash, and/or Terraform
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.