Skip to main content
T

LLM Ops Engineer

Take2 Consulting, LLC

Location

Paramus, NJ

Salary

Not specified

Type

fulltime

Posted

Today

via linkedin

Job Description

Key Responsibilities

  • Design and implement comprehensive monitoring and observability systems for all live AI agents — tracking response quality, latency, error rates, and conversation outcomes
  • Build and maintain evaluation frameworks to measure agent performance against defined benchmarks, including automated quality scoring and regression detection
  • Manage token usage, API costs, and resource allocation across all agents and LLM providers; provide regular cost reports and optimization recommendations
  • Develop and maintain conversation logging infrastructure for analysis, debugging, and compliance purposes
  • Implement hallucination detection, content safety filters, and guardrail systems to protect end users and maintain brand integrity
  • Create and manage alerting systems for agent failures, performance degradation, and anomalous behavior patterns
  • Build A/B testing and prompt versioning infrastructure to support the Prompt Architect in iterative agent improvement
  • Establish and maintain CI/CD pipelines for prompt deployments, ensuring changes are tested, staged, and rolled out safely
  • Develop dashboards and reporting tools that give leadership visibility into agent performance, ROI, and operational health
  • Collaborate with the AI/ML Engineer on infrastructure optimization and with the Solutions Engineer on production reliability

Required Qualifications

  • 3\+ years of experience in DevOps, SRE, MLOps, or a similar operations-focused engineering role
  • Strong proficiency in Python and experience building monitoring/observability systems
  • Experience with logging and monitoring tools (Datadog, Grafana, Prometheus, CloudWatch, or similar)
  • Understanding of LLM APIs, token-based pricing models, and AI system architectures
  • Experience building evaluation frameworks, testing pipelines, or quality assurance systems for software products
  • Familiarity with CI/CD tools and deployment automation (GitHub Actions, Jenkins, or similar)
  • Strong analytical skills with the ability to identify patterns in data and translate them into actionable insights

Preferred Qualifications

  • Direct experience with LLMOps tooling (LangSmith, Weights \& Biases, Humanloop, or similar)
  • Experience managing costs and optimizing resource usage for API-heavy systems
  • Background in building dashboards and data visualization (Metabase, Looker, custom solutions)
  • Familiarity with prompt engineering and understanding of how prompt changes affect model behavior
  • Experience with multi-agent systems or orchestration platform monitoring
  • Knowledge of AI safety, content moderation, and responsible AI deployment practices

What Success Looks Like

  • Within 30 days: Full monitoring and logging coverage for all active agents; baseline performance metrics established
  • Within 60 days: Cost optimization implemented saving 15%\+ on token spend; automated alerting catching issues before users report them
  • Within 90 days: Evaluation framework live with automated quality scoring; prompt versioning and A/B testing infrastructure operational; leadership dashboard delivering weekly insights

Looking for more opportunities?

Browse thousands of graduate jobs and entry-level positions.

Browse All Jobs