Inference Performance Engineer
Location
New York, NY
Salary
Not specified
Type
Full-time
Posted
Today
Job Description
About us
We're an inference cloud built for AI ASICs, generating tokens 5–7× faster than existing GPU infrastructure at a fraction of the price. We closed our oversubscribed seed round with $97M in compute allocation and $200M in hardware financing underway.
We believe that when something is important enough, no obstacle is insurmountable. If you thrive on extreme ownership, outsized impact, and relentless optimism, this is the place for you.
About the role
Build the inference runtime that powers our ASIC cloud including batching, KV cache optimization, scheduling, and APIs. You'll be working with experts from top manufacturers operating at the frontier of AI hardware design.
What you'll do
Build and improve the inference runtime that serves our ASIC hardware
Own scheduling, continuous batching, KV cache optimization, prefill, and decode separation
Optimize tokens/sec, TTFT, p99 latency, and cost per token
Collaborate with hardware and compiler teams to update kernels and operators
Maintain the OpenAI-compatible API surface
Benchmarking and regression testing
What you'll need
BS in Computer Science or related field
3+ years of software engineering experience: Rust, Go, or Python
Solid fundamentals in concurrency, memory, and tail latency
Familiarity with modern LLM inference: transformers, attention, KV cache, batching, speculative decoding, quantization
Experience with model serving: vLLM, TGI, SGLang, TensorRT-LLM, llama.cpp, or custom runtimes
What we'd like
CUDA, ROCm, Triton, kernel-level work, or experience with non-NVIDIA accelerators
Built and scaled an OpenAI-compatible API in production
What we offer
Competitive cash compensation
Generous stock options
100% paid medical, dental, and vision insurance for employees
Flexible PTO
Paid Holidays
Equal Employment Opportunity
We're an Equal Opportunity Employer and do not discriminate on the basis of any protected status under applicable law.
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.