Location
California, United States
Salary
Not specified
Type
fulltime
Posted
Today
Job Description
Company Description
Jigsaw is an applied research lab building the environments where frontier models learn to act in the real world. We train foundation models and build systems for world generation, authoring, and evaluation, turning human expertise into high‑fidelity simulations that actually change model behavior.
We partner directly with top model labs to uncover agent failure modes in complex, real‑world workflows.
Our task environments are already in use across leading frontier labs. Jigsaw is backed by a16z Speedrun, with advisors and angels from Anthropic, SpaceX, and Stanford AI Lab.
Role Description
As a member of technical staff, you will shape the tasks, datasets, and rubrics that determine how frontier models practice and improve. You’ll collaborate closely with research teams at leading AI labs, exploring data collection strategies, probing model weaknesses, and defining the signals that tell us whether models are actually getting better.
In your day-to-day, you will design task environments that surface meaningful failure modes across domains like finance, code, and enterprise workflows. You’ll craft task specs and evaluation rubrics that can be converted into reward signals for RLHF and rubric-based RL, then iterate as new behaviors and failure modes emerge.
What You'll Do
- Design and refine tasks, environments, and evaluation rubrics that can be used as reward signals in RLHF and rubric-based RL setups
- Create and manage real-world and synthetic data pipelines for generating interaction data and trajectories from our environments
- Work with lab research teams to turn their training objectives into concrete task, data, and evaluation specifications
- Run experiments across environment variants, rubrics, and data slices, and analyze results to uncover model failure modes and improvement opportunities
What we look for
- 1–4 years in software engineering, ML, or applied research
- Major plus to have experience with RL environments, evaluation, or post‑training pipelines, ideally at an RL infra company, eval org, or applied research lab
- You’ve built something real end‑to‑end: an environment, benchmark, open‑source project, tool, or study that shows how you think and execute
- You design and run experiments, move quickly from idea to result, and can iterate quickly
- Former founder, or early engineer at a startup
- Genuinely obsessed with improving model behavior, creative problem solving, and personal growth
Compensation Structure
- $120k-200k\+
- Competitive equity
- Performance Bonus
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.