Company Description

Jigsaw is an applied research lab building the environments where frontier models learn to act in the real world. We train foundation models and build systems for world generation, authoring, and evaluation, turning human expertise into high‑fidelity simulations that actually change model behavior.

We partner directly with top model labs to uncover agent failure modes in complex, real‑world workflows.

Our task environments are already in use across leading frontier labs. Jigsaw is backed by a16z Speedrun, with advisors and angels from Anthropic, SpaceX, and Stanford AI Lab.

Role Description

As a member of technical staff, you will shape the tasks, datasets, and rubrics that determine how frontier models practice and improve. You’ll collaborate closely with research teams at leading AI labs, exploring data collection strategies, probing model weaknesses, and defining the signals that tell us whether models are actually getting better.

In your day-to-day, you will design task environments that surface meaningful failure modes across domains like finance, code, and enterprise workflows. You’ll craft task specs and evaluation rubrics that can be converted into reward signals for RLHF and rubric-based RL, then iterate as new behaviors and failure modes emerge.

What You'll Do

Design and refine tasks, environments, and evaluation rubrics that can be used as reward signals in RLHF and rubric-based RL setups
Create and manage real-world and synthetic data pipelines for generating interaction data and trajectories from our environments
Work with lab research teams to turn their training objectives into concrete task, data, and evaluation specifications
Run experiments across environment variants, rubrics, and data slices, and analyze results to uncover model failure modes and improvement opportunities

What we look for

1–4 years in software engineering, ML, or applied research
Major plus to have experience with RL environments, evaluation, or post‑training pipelines, ideally at an RL infra company, eval org, or applied research lab
You’ve built something real end‑to‑end: an environment, benchmark, open‑source project, tool, or study that shows how you think and execute
You design and run experiments, move quickly from idea to result, and can iterate quickly
Former founder, or early engineer at a startup
Genuinely obsessed with improving model behavior, creative problem solving, and personal growth

Compensation Structure

$120k-200k\+
Competitive equity
Performance Bonus

Member of Technical Staff

Job Description

Looking for more opportunities?