Skip to main content
V

DevOps Engineer

Vumedi

Location

Oakland, CA

Salary

Not specified

Type

fulltime

Posted

Today

via linkedin

Job Description

About Vumedi:

Vumedi is the largest video education platform for doctors worldwide, dedicated to advancing medical education through innovative video-based learning. Our mission is to empower healthcare professionals by providing them with access to the latest clinical knowledge and surgical techniques from experts around the globe. We curate a vast library of high-quality educational content, enabling users to enhance their skills, stay informed about industry trends, and improve patient outcomes. We are headquartered in

Oakland, CA

, and have additional offices in Minneapolis, MN, and Zagreb, Croatia.

We're hiring a

Senior/Staff/Principal

DevOps Engineer

to lead the development of our digital platform and products at this critical stage of Vumedi's growth.

Why join Vumedi right now?

  • Build technology that matters in a fast-scaling Silicon Valley digital healthcare company

: Your work directly impacts how doctors across the world learn and make decisions that save lives.

  • Grow as we grow:

Be part of a company in an accelerated growth phase, where expanding teams, products, and markets create real opportunities for ownership, leadership, and career progression.

  • Build with AI

: Work on applied LLM systems - from intelligent search to AI-driven content agents - and shape how AI transforms medical knowledge delivery.

  • Own your craft end-to-end

: Take full responsibility for building systems that scale globally and power mission-critical workflows.

  • Collaborate globally:

Join a world-class team of passionate engineers on modern tech stack which will further drive your career development.

  • Have real product impact

: Influence the direction of product development by collaborating closely with product and leadership teams.

About the role:

We are looking for a DevOps Engineer to join our engineering team and take ownership of our infrastructure, deployment processes, and overall platform reliability. You will work closely with backend and data teams to support a growing video and data platform used by millions of healthcare professionals worldwide.

In this role, you will focus on improving our CI/CD pipelines, system reliability, and developer experience, while helping scale our cloud infrastructure in a secure and cost-efficient way. You will work extensively with AWS services (compute, storage, networking, IAM, monitoring) and help ensure our systems are reliable, observable, and well-architected.

You'll also support and enable emerging AI/ML and LLM-powered systems used for large-scale medical content processing, helping build and operate the infrastructure required for these workloads. This includes improving data pipelines, optimizing resource usage, and ensuring production-grade reliability of AI-driven services.

This is a high-impact role with a broad scope—from supporting production systems and data pipelines to driving long-term improvements in how we build, deploy, and operate our platform, with strong ownership and autonomy in shaping DevOps practices.

What you will do:

  • Own and improve our infrastructure, CI/CD pipelines, and deployment processes across multiple environments
  • Work with AWS services (compute, storage, networking, IAM, monitoring) to ensure scalable, secure, and reliable systems
  • Collaborate closely with backend and data teams to support production systems, data pipelines, and overall platform reliability
  • Continuously improve developer experience by streamlining workflows, reducing friction, and enabling faster, safer deployments
  • Contribute to improving security practices, access control, and compliance of our infrastructure
  • Automate infrastructure and workflows using Python
  • Improve observability by implementing and maintaining monitoring, logging, and alerting systems
  • Troubleshoot production issues, participate in incident response, and implement long-term fixes to improve system stability
  • Identify and drive improvements in performance, scalability, and cost efficiency across the platform
  • Support and scale AI/ML and LLM-based systems, ensuring reliable infrastructure for data processing and content classification workloads

Who you are:

  • You have 5\+ years of experience in DevOps, SRE, or infrastructure engineering, with a strong focus on cloud-native environments (preferably AWS)
  • You have managed cloud infrastructure (networking, IAM, compute, storage) with a strong understanding of security best practices and cost optimization
  • You have experience building and maintaining CI/CD pipelines to support rapid, reliable software delivery across multiple environments
  • You are comfortable writing Python for automation, scripting, and building internal tooling to improve infrastructure and developer workflows
  • You have a strong understanding of monitoring, logging, and observability (e.g., Datadog, Prometheus, CloudWatch), and proactively identifying and resolve issues
  • You are comfortable debugging production issues across systems and collaborating with engineering teams to resolve them
  • You are proactive, take ownership, and enjoy working in environments with high autonomy and evolving processes
  • You communicate clearly and collaborate effectively with engineers, product managers, and other stakeholders
  • You are curious and motivated to learn, especially in areas like AI/ML infrastructure and large-scale systems

Required Qualifications:

  • 5\+ years of experience in DevOps, Site Reliability Engineering, or infrastructure-focused roles
  • Proven experience designing and operating scalable, reliable, and secure cloud infrastructure (preferably AWS) in production environments
  • Strong understanding of cloud security best practices (IAM, network security, secrets management), preferably within AWS
  • Proficiency in Python for automation, scripting, and tooling
  • Hands-on experience building and maintaining CI/CD pipelines
  • Experience with monitoring, logging, and alerting tools (e.g., Datadog, CloudWatch, Prometheus)
  • Experience working in a Linux-based environment
  • Ability to drive infrastructure and DevOps strategy, balancing scalability, reliability, and cost
  • Experience working cross-functionally and influencing engineering teams on best practices and architectural decisions
  • Strong ownership mindset with the ability to operate autonomously in ambiguous environments

Preferred Qualifications:

  • Experience supporting or scaling AI/ML or LLM-based systems in production
  • You have worked with containerized applications (Docker) and are familiar with orchestration concepts (Kubernetes or ECS is a plus)
  • You are familiar with Infrastructure as Code principles (e.g., Terraform) and have experience implementing Infrastructure as Code from scratch in existing environments
  • You have experience working with or supporting backend systems and data platforms (e.g., Postgres, Airflow is a plus)
  • Background in backend engineering or software development
  • Experience working in a fast-paced startup or scale-up environment
  • Experience leading and mentoring engineers, while contributing to team-wide best practices

This is a hybrid role, working 3 days a week (Monday, Wednesday, and Friday) in our Oakland office.

Looking for more opportunities?

Browse thousands of graduate jobs and entry-level positions.

Browse All Jobs