Location
San Francisco, CA
Salary
Not specified
Type
fulltime
Posted
Today
via linkedin
Job Description
About Us
Velvet is a data research company building the datasets that power the next generation of multimodal AI. Founded by Lucas Mantovani (ex Meta FAIR) and Lucas Tucker (ex Adobe Infra), our mission is to make AI more human by producing high-quality audiovisual training data for frontier labs.
We're hiring a Research Scientist to develop and fine-tune models for video and audio data processing and enhancement, as well as to conduct data-oriented research that pushes the boundaries of multimodal quality.
What You'll Do
- Research, develop, and fine-tune models for audio and video enhancement — including denoising, super-resolution, speech restoration, and perceptual quality improvement — ensuring outputs meet the standards required for frontier model training.
- Experiment with novel architectures, training objectives, and data augmentation strategies to improve model performance across diverse and noisy real-world audiovisual data.
- Build evaluation frameworks and benchmarks to rigorously measure enhancement quality, guiding iterative model improvement.
- Collaborate with infrastructure and data pipeline engineers to integrate trained models into large-scale processing workflows that handle wide variation in speech, visual quality, and format.
What We're Looking For
- Strong research background in deep learning, with hands-on experience training and fine-tuning models for audio processing, video processing, or related domains.
- Proficiency in PyTorch. Experience designing and running experiments at scale.
- Solid understanding of signal processing fundamentals and how they inform model design for enhancement tasks.
- A publication track record or demonstrated research output in relevant areas (audio/speech enhancement, video restoration, generative models, multimodal learning).
- Ability to work effectively in an early-stage environment where scope is broad and priorities shift fast.
Even Better
- Prior work at a frontier AI lab or data company focused on multimodal data.
- Experience fine-tuning large pretrained models (diffusion models, autoencoders, or transformer-based architectures) for perceptual quality tasks.
- Familiarity with perceptual quality metrics and human evaluation methodologies for audio and video.
- Track record working with datasets spanning tens of thousands of hours of audio or video.
You'll Thrive Here If
- You're excited by applied research with immediate, visible impact on data quality and downstream model performance.
- You move fluidly between reading papers, writing training loops, and analyzing failure cases.
- You hold yourself to a high bar for rigor — because you understand that model quality directly determines the value of the data we produce.
Looking for more opportunities?
Browse thousands of graduate jobs and entry-level positions.