Research Scientist - Model Evaluation

Sanas

Sanas is pioneering the future of human communication by developing a real‑time speech AI platform capable of accent translation, noise cancellation, speech enhancement, and cross‑language communication. About the Role Progress in speech AI is only as meaningful as our ability to measure it. At Sanas, model quality spans dimensions that automated metrics struggle to capture: accent naturalness, perceptual clarity, speaker identity preservation, noise suppression without speech distortion, and translation fluency under real‑world disfluency. We are looking for a Research Scientist who can define what "better" actually means across all of Sanas's model families, build the evaluation infrastructure to measure it rigorously, and close the loop between research progress and real‑world impact. This role sits at the intersection of research, product, and infrastructure and directly shapes how every model team at Sanas measures progress. Job Description Design and own evaluation frameworks across Sanas’s full model portfolio – Accent Translation, Noise Cancellation, Speech Enhancement, Language Translation, and more – ensuring each captures meaningful progress, not just benchmark performance. Develop novel quantitative metrics for subjective and perceptual qualities: accent similarity, naturalness, speaker identity preservation, intelligibility under noise, and translation fluency in spoken‑language domains. Build evaluation systems that bridge automated metrics and human judgment – designing listening studies, MOS/MUSHRA protocols, and preference tests that are statistically rigorous and operationally scalable. Define evaluation splits, test sets, and benchmark suites that accurately reflect production conditions – diverse accents, languages, noise environments, recording devices, and telephony codecs. Evaluation Infrastructure & Tooling Build and maintain automated evaluation pipelines that run continuously against model checkpoints – surfacing regressions early and tracking quality trends across training runs. Develop reference‑based and reference‑free metrics calibrated to Sanas’s specific model tasks: SI‑SDR, PESQ, STOI, DNSMOS, speaker similarity, WER delta, COMET, and task‑specific custom metrics where off‑the‑shelf measures fall short. Instrument model quality monitoring in production – detecting degradation across language pairs, accent profiles, and acoustic conditions in live customer traffic. Build tooling that allows research scientists and ML engineers to run rigorous ablations, compare model versions, and understand quality trade‑offs without needing to design the evaluation from scratch each time. Design and operate human evaluation programs – listener panels, crowdsourced annotation, and expert evaluator workflows – that produce reliable signals on dimensions automated metrics cannot capture. Conduct research into evaluation methodology itself: when do automated metrics correlate with human perception, when do they diverge, and what does that tell us about model behavior? Partner directly with research scientists across model teams to translate open‑ended quality questions into concrete, measurable evaluation protocols. Cross‑functional Impact Work closely with ML research, product, and customer success teams to ensure evaluation reflects what customers actually experience – not just what lab conditions optimize for. Feed evaluation insights back into data acquisition and model training priorities – identifying which failure modes require more data, architectural changes, or training procedure improvements. Communicate evaluation results clearly to both technical and non‑technical stakeholders, translating metric movements into product quality narratives that inform roadmap decisions. Qualifications 4+ years of research or applied research experience in speech, audio, or NLP, with a demonstrated focus on evaluation methodology and quality measurement. Deep familiarity with speech and audio quality metrics – perceptual (MOS, MUSHRA, PESQ, STOI), signal‑level (SI‑SDR, SNR), and task‑specific (WER, speaker similarity, DNSMOS) – and an understanding of when each is and isn’t the right tool. Experience designing and running human evaluation studies – listener panels, crowdsourced annotation, inter‑annotator agreement analysis – with statistical rigor. Strong engineering skills: you can build production‑quality evaluation pipelines, not just run scripts. Proficiency in Python and PyTorch or equivalent. Creativity in defining novel quantitative metrics for subjective or behavioral qualities – you’ve identified gaps in existing evaluation approaches and built something better. Ability to take open‑ended research questions and translate them into concrete, measurable evaluation systems that run reliably at scale. Curiosity and rigor in equal measure – you’re as motivated by discovering the right way to measure progress as by the progress itself. Bonus Experience evaluating models across multiple speech tasks – ASR, TTS, speech enhancement, speaker verification, or machine translation. Familiarity with real‑time or streaming model evaluation – latency‑quality tradeoffs, codec‑degraded audio, telephony channel conditions. Background in psychoacoustics or perceptual audio quality – understanding of how humans perceive speech naturalness, noise, and distortion. Experience with multilingual evaluation – cross‑lingual quality metrics, language‑specific annotation challenges, low‑resource language evaluation. Published research at INTERSPEECH, ICASSP, ACL, EMNLP, or equivalent venues on evaluation methodology, speech quality, or related topics. #J-18808-Ljbffr

Apply

Vacancy posted 16 hours ago

Similar jobs that could be interesting for youBased on the Research Scientist - Model Evaluation in Palo Alto, CA vacancy

Research Scientist, Speech Model Evaluation & Metrics
...A leading AI speech technology company in Palo Alto seeks a Research Scientist to enhance evaluation methodologies for speech AI models. The role involves designing and implementing evaluation frameworks while using advanced metrics to measure model quality. Candidates...
Suggested
Sanas
Palo Alto, CA
4 days ago
Lead Research Scientist, Speech Model Evaluation
...Sanas is looking for a Research Scientist to develop evaluation frameworks for its speech AI models. The role involves ensuring evaluation captures meaningful progress and building evaluation systems that bridge automated metrics and human judgment. The ideal candidate...
Suggested
Sanas
Palo Alto, CA
16 hours ago
World Model Research Scientist- Physical AI
$190k - $250k
...developing large-scale generative world models that learn to predict realistic,... ...autonomous trucks. We are looking for a research scientist to lead the design and development of... ...camera, LiDAR, and radar outputs Design evaluation frameworks that measure world model...
Suggested
Temporary work
Work at office
Visa sponsorship
Flexible hours
Kodiak
Mountain View, CA
26 days ago
Foundation Model Evaluation Scientist (Multimodal & LLMs)
Apple Inc. is seeking an expert to evaluate machine learning and deep learning models, playing a crucial role in creating robust evaluation frameworks. The ideal candidate will collaborate with multidisciplinary teams, utilizing statistical methods and Python expertise...
Suggested
Apple Inc.
Sunnyvale, CA
23 hours ago
Senior Research Scientist, Foundation Model for Simulation
$204k - $259k
...states. The mission of the Waymo Applied Research team is to develop machine learning solutions... .... You will: Conduct applied foundation model research and development Design compelling experiments by training and evaluating large deep learning models Present results...
Suggested
Full time
Remote work
Waymo
Mountain View, CA
4 days ago
Research Scientist - Vision Language Model
...About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using... ...world-class researchers, data scientists, and engineers, tackling the most fundamental... ...-training and post-training, and evaluation benchmarks. The role combines...
Institute of Foundation Models
Sunnyvale, CA
19 days ago
Model Evaluation Systems Engineer - Benchmark AI Pipelines
$154.45k - $208.96k
Groq is seeking a Software Engineer, Model Evaluation Systems, to build and optimize systems ensuring AI models achieve exceptional quality on our platform. This role involves developing benchmarking frameworks and integrating models into Groq’s infrastructure. Ideal candidates...
I did my part and supported the Regular Toilet
Palo Alto, CA
2 days ago
Research Scientist: Efficient AI & Model Optimization
Google Inc. is hiring a Research Scientist in Mountain View, CA, with a strong background in Machine Learning and a PhD in Computer Science or related field. The role involves proposing independent research directions, managing a strong research agenda, and translating...
Google Inc.
Mountain View, CA
4 days ago
Research Scientist, Evaluations, Security and Privacy, DeepMind
$207k - $300k
Research Scientist, Evaluations, Security and Privacy, DeepMind DeepMind Mountain View, CA, USA ; San Francisco, CA, USA Apply X Applicants in San... ...benchmarking frameworks for machine learning models. 2 years of experience in security and privacy. One or...
Full time
Google Inc.
Mountain View, CA
4 days ago
Senior AIML Engineer — AI Model Evaluation & Benchmarking
Apple Inc. is seeking a Senior Machine Learning Engineer in Cupertino, California, to evaluate and refine Apple's AI systems. You will design and develop key infrastructures for model and agent evaluations, contribute to quality improvements, and work closely with product...
Apple Inc.
Cupertino, CA
2 days ago
AI Data Quality & Model Evaluation Specialist
$34 per hour
...Welo Global is seeking a Data Quality Associate based in Sunnyvale, CA. The role involves evaluating AI model outputs, providing structured feedback, and performing audits on data quality. Candidates should possess a university degree and have critical thinking, attention...
Full time
Welo Global
Sunnyvale, CA
3 days ago
Senior Deep Learning Engineer - Model Evaluation & AI Systems
$224k - $356.5k
...computing. As a Senior / Principal Deep Learning Engineer — Model Evaluation & AI Systems, you will play a meaningful role in crafting the... ...unclear technical challenges and communicate effectively across research, engineering, and product teams. Ways to stand out from the...
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Head of World-Model Evaluation & Benchmarks
NVIDIA is seeking a Senior Research Manager to lead world-model evaluation and benchmarking in Santa Clara, California. The ideal candidate will have a strong research background in machine learning, deep understanding of modern foundation models, and extensive leadership...
NVIDIA
Santa Clara, CA
3 days ago
Senior GenAI & World Model Research Scientist
$165k - $185k
...Robert Bosch Group is seeking a Senior AI Research Scientist to join our team in Sunnyvale, California. The role involves conducting research on GenAI and Foundation Models, working alongside an international team to develop scalable AI solutions in automotive and other...
Robert Bosch Group
Sunnyvale, CA
3 days ago
Senior AI Research Scientist- World Model
$165k - $185k
...Senior AI Research Scientist- World Model Full-time The Bosch Research and Technology Center North America with offices in Sunnyvale, California, Pittsburgh, Pennsylvania, and Cambridge, Massachusetts is a part of the global Bosch Group, a company with over 70 billion...
Full time
Work experience placement
Worldwide
Robert Bosch Group
Sunnyvale, CA
4 days ago
Senior Research Manager, World Model Evaluation
$272k - $431.25k
...ID: JR2019461 Job Category: Research. Time Type: Full time. At NVIDIA... ...’re generating it! Our world model team is pushing the... ...Manager to lead world-model evaluation and benchmarking across NVIDIA... ...Doing Lead a team of Research Scientists focused on world‑model evaluation...
Full time
NVIDIA AI
Santa Clara, CA
3 days ago
Senior Research Manager, World-Model Evaluation & Benchmarking
$272k - $431.25k
NVIDIA AI is seeking a Senior Research Manager based in Santa Clara, California, to lead world-model evaluation in Physical AI. The role involves spearheading a team of research scientists to define scientific roadmaps and benchmarks, focusing on precise diagnostics and...
NVIDIA AI
Santa Clara, CA
3 days ago
Turkish Data Quality Associate | AI Model Evaluator
...Welo Global is seeking a Data Quality Associate in Sunnyvale, California. In this full-time role, you will evaluate AI model outputs, performing audit-based reviews to enhance data quality, and contribute to improving evaluation frameworks. The ideal candidate has a university...
Full time
Work at office
Welo Global
Sunnyvale, CA
4 days ago
CW Research on Large Vehicle Data Model - Summer Intern
...Job Title: CW Research on Large Vehicle Data Model - Summer Intern (99W210) About Kyyba: Founded in 1998 and headquartered in Farmington... ...multimodal reasoning capabilities Train and evaluate models on multimodal data across vehicle sensors, edge,...
Summer internship
Visa sponsorship
Work visa
Kyyba
Mountain View, CA
23 hours ago
Model Training
$175k - $350k
...Model Training Engineer At Inflection AI, our public benefit mission is to harness... ...can iterate on the fun parts. Balance research curiosity with product pragmatism—you know... ...curation, hyper-parameter search, evaluation, and rollout—using PyTorch, Torchtune, FSDP...
Full time
Humanx
Palo Alto, CA
1 day ago
Senior Research Scientist, World Action Modeling
$213k - $263k
...foster collaborations with other research teams in Alphabet. AI... ...from demonstration, generative modeling, Bayesian inference, hierarchical learning, and robust evaluation. This role follows a hybrid... ...reports to a Staff Research Scientist / Tech Lead Manager . You...
Temporary work
Remote work
Waymo
Mountain View, CA
10 days ago
Member of Technical Staff - Imagine Model
$180k
...teammates. ABOUT THE ROLE: As a multimodal engineer on the Imagine Model Team, you will develop cutting-edge AI experiences beyond text,... ...studies, particularly for visual and audio data. Design evaluation frameworks, metrics, benchmarks, evals, and reward models...
Temporary work
xAI
Palo Alto, CA
26 days ago
Member of Technical Staff - Voice Model
$150k
...their teammates. ABOUT THE ROLE: You will join the Grok Voice Model team to help build the world's best voice AI. We deliver smooth... ...annotation workflows to enable high-quality model training and evaluation. Work on pre-training and post-training of speech-language...
Temporary work
xAI
Palo Alto, CA
a month ago
Member of Technical Staff - Model Training
$175k - $350k
...agent powered by Inflection AI's foundation model, proving that AI can be personal,... ...can iterate on the fun parts. Balance research curiosity with product pragmatism-you know... ...dataset curation, hyper-parameter search, evaluation, and rollout-using PyTorch, Torchtune,...
Inflection AI
Palo Alto, CA
1 day ago
Member of Technical Staff — Diffusion Model
Member of Technical Staff — Diffusion Model About the Role RadixArk is seeking a Member... ...and scalability. This role combines deep research thinking with strong engineering execution... ...ideas into practical production systems Evaluate models using rigorous metrics and benchmarks...
Flexible hours
RadixArk
Palo Alto, CA
3 days ago
Product Manager, AI Model Hub & LLM Strategy
$160k - $240k
Glean is seeking a Product Manager for Glean Model Hub in Mountain View, California. In this hybrid role, you will evaluate LLM models, define the product roadmap, and manage key customer relationships. The ideal candidate has over 4 years of product management experience...
Glean
Mountain View, CA
2 days ago
Research Scientist
...Sycamore’s Research Scientists work on the open problems behind reliable agentic systems: planning, memory, coordination, trust, and evaluation. You’ll publish, prototype, and collaborate directly with engineers to bring research into production. Hard problems, real impact...
Full time
Sycamore Labs, Inc.
Palo Alto, CA
3 days ago
Research Scientist - Salesforce AI Research
$117.2k - $313.7k
...About the Role Salesforce AI Research is seeking outstanding AI Research Scientists / Research Engineers to build and... ...Computer Vision – Vision‑language models, video understanding, visual grounding... ...scalable APIs and agentic evaluators. Collaborate across a strike‑team...
Full time
100 Salesforce, Inc.
Palo Alto, CA
3 days ago
Research Scientist, RL for Autonomous Planning & World Modeling
$204k - $259k
...foster collaborations with other research teams in Alphabet. AI... ...from demonstration, generative modeling, Bayesian inference, hierarchical learning, and robust evaluation. In this hybrid role, you will report to a Principal Scientist. You will: Participate in Waymo...
Temporary work
Remote work
Neura Market
Mountain View, CA
3 days ago
Senior ML Research Scientist, End-to-End Autonomous Driving
$183.83k - $275.98k
...leveraging the cutting edge of machine learning research to solve challenging real-world robotics... ...working with and developing large models for perception and behavior, keeping up-... .... Work with infrastructure, data and evaluation teams to build effective and efficient data...
Icehouseventures
Mountain View, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Scientist - Model Evaluation. Be the first to apply!