Location
Paramus, NJ
Salary
Not specified
Type
fulltime
Posted
Today
via linkedin
Job Description
Key Responsibilities
- Design and implement comprehensive monitoring and observability systems for all live AI agents — tracking response quality, latency, error rates, and conversation outcomes
- Build and maintain evaluation frameworks to measure agent performance against defined benchmarks, including automated quality scoring and regression detection
- Manage token usage, API costs, and resource allocation across all agents and LLM providers; provide regular cost reports and optimization recommendations
- Develop and maintain conversation logging infrastructure for analysis, debugging, and compliance purposes
- Implement hallucination detection, content safety filters, and guardrail systems to protect end users and maintain brand integrity
- Create and manage alerting systems for agent failures, performance degradation, and anomalous behavior patterns
- Build A/B testing and prompt versioning infrastructure to support the Prompt Architect in iterative agent improvement
- Establish and maintain CI/CD pipelines for prompt deployments, ensuring changes are tested, staged, and rolled out safely
- Develop dashboards and reporting tools that give leadership visibility into agent performance, ROI, and operational health
- Collaborate with the AI/ML Engineer on infrastructure optimization and with the Solutions Engineer on production reliability
Required Qualifications
- 3\+ years of experience in DevOps, SRE, MLOps, or a similar operations-focused engineering role
- Strong proficiency in Python and experience building monitoring/observability systems
- Experience with logging and monitoring tools (Datadog, Grafana, Prometheus, CloudWatch, or similar)
- Understanding of LLM APIs, token-based pricing models, and AI system architectures
- Experience building evaluation frameworks, testing pipelines, or quality assurance systems for software products
- Familiarity with CI/CD tools and deployment automation (GitHub Actions, Jenkins, or similar)
- Strong analytical skills with the ability to identify patterns in data and translate them into actionable insights
Preferred Qualifications
- Direct experience with LLMOps tooling (LangSmith, Weights \& Biases, Humanloop, or similar)
- Experience managing costs and optimizing resource usage for API-heavy systems
- Background in building dashboards and data visualization (Metabase, Looker, custom solutions)
- Familiarity with prompt engineering and understanding of how prompt changes affect model behavior
- Experience with multi-agent systems or orchestration platform monitoring
- Knowledge of AI safety, content moderation, and responsible AI deployment practices
What Success Looks Like
- Within 30 days: Full monitoring and logging coverage for all active agents; baseline performance metrics established
- Within 60 days: Cost optimization implemented saving 15%\+ on token spend; automated alerting catching issues before users report them
- Within 90 days: Evaluation framework live with automated quality scoring; prompt versioning and A/B testing infrastructure operational; leadership dashboard delivering weekly insights
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.