Research Scientist (Model Evaluation)

Sanas

About the Role Progress in speech AI is only as meaningful as our ability to measure it. At Sanas, model quality spans dimensions that automated metrics struggle to capture — accent naturalness, perceptual clarity, speaker identity preservation, noise suppression without speech distortion, translation fluency under real-world disfluency. We’re looking for a Research Scientist who can define what "better" actually means across all of Sanas’s model families, build the evaluation infrastructure to measure it rigorously, and close the loop between research progress and real‑world impact. This role sits at the intersection of research, product, and infrastructure — and directly shapes how every model team at Sanas measures progress. Job Description Design and own evaluation frameworks across Sanas’s full model portfolio — Accent Translation, Noise Cancellation, Speech Enhancement, Language Translation, and more — ensuring each captures meaningful progress, not just benchmark performance. Develop novel quantitative metrics for subjective and perceptual qualities: accent similarity, naturalness, speaker identity preservation, intelligibility under noise, and translation fluency in spoken‑language domains. Build evaluation systems that bridge automated metrics and human judgment — designing listening studies, MOS/MUSHRA protocols, and preference tests that are statistically rigorous and operationally scalable. Define evaluation splits, test sets, and benchmark suites that accurately reflect production conditions — diverse accents, languages, noise environments, recording devices, and telephony codecs. Evaluation infrastructure & tooling Build and maintain automated evaluation pipelines that run continuously against model checkpoints — surfacing regressions early and tracking quality trends across training runs. Develop reference‑based and reference‑free metrics calibrated to Sanas’s specific model tasks: SI‑SDR, PESQ, STOI, DNSMOS, speaker similarity, WER delta, COMET, and task‑specific custom metrics where off‑the‑shelf measures fall short. Instrument model quality monitoring in production — detecting degradation across language pairs, accent profiles, and acoustic conditions in live customer traffic. Build tooling that allows research scientists and ML engineers to run rigorous ablations, compare model versions, and understand quality tradeoffs without needing to design the evaluation from scratch each time. Design and operate human evaluation programs — listener panels, crowdsourced annotation, and expert evaluator workflows — that produce reliable signal on dimensions automated metrics cannot capture. Conduct research into evaluation methodology itself: when do automated metrics correlate with human perception, when do they diverge, and what does that tell us about model behavior? Partner directly with research scientists across model teams to translate open‑ended quality questions into concrete, measurable evaluation protocols. Cross‑functional impact Work closely with ML research, product, and customer success teams to ensure evaluation reflects what customers actually experience — not just what lab conditions optimize for. Feed evaluation insights back into data acquisition and model training priorities — identifying which failure modes require more data, architectural changes, or training procedure improvements. Communicate evaluation results clearly to both technical and non‑technical stakeholders, translating metric movements into product quality narratives that inform roadmap decisions. Qualifications 4+ years of research or applied research experience in speech, audio, or NLP, with a demonstrated focus on evaluation methodology and quality measurement. Deep familiarity with speech and audio quality metrics — perceptual (MOS, MUSHRA, PESQ, STOI), signal‑level (SI‑SDR, SNR), and task‑specific (WER, speaker similarity, DNSMOS) — and an understanding of when each is and isn’t the right tool. Experience designing and running human evaluation studies — listener panels, crowdsourced annotation, inter‑annotator agreement analysis — with statistical rigor. Strong engineering skills: you can build production‑quality evaluation pipelines, not just run scripts. Proficiency in Python and PyTorch or equivalent. Creativity in defining novel quantitative metrics for subjective or behavioral qualities — you’ve identified gaps in existing evaluation approaches and built something better. Ability to take open‑ended research questions and translate them into concrete, measurable evaluation systems that run reliably at scale. Curiosity and rigor in equal measure — you’re as motivated by discovering the right way to measure progress as by the progress itself. Bonus Experience evaluating models across multiple speech tasks — ASR, TTS, speech enhancement, speaker verification, or machine translation. Familiarity with real‑time or streaming model evaluation — latency‑quality tradeoffs, codec‑degraded audio, telephony channel conditions. Background in psychoacoustics or perceptual audio quality — understanding of how humans perceive speech naturalness, noise, and distortion. Experience with multilingual evaluation — cross‑lingual quality metrics, language‑specific annotation challenges, low‑resource language evaluation. Published research at INTERSPEECH, ICASSP, ACL, EMNLP, or equivalent venues on evaluation methodology, speech quality, or related topics. #J-18808-Ljbffr Sanas

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the Research Scientist (Model Evaluation) in Palo Alto, CA vacancy

Foundation Model Evaluation Scientist (Multimodal & LLMs)
Apple Inc. is seeking an expert to evaluate machine learning and deep learning models, playing a crucial role in creating robust evaluation frameworks. The ideal candidate will collaborate with multidisciplinary teams, utilizing statistical methods and Python expertise...
Suggested
Apple Inc.
Sunnyvale, CA
3 days ago
World Model Research Scientist- Physical AI
$190k - $250k
...developing large-scale generative world models that learn to predict realistic,... ...autonomous trucks. We are looking for a research scientist to lead the design and development of... ...camera, LiDAR, and radar outputs * Design evaluation frameworks that measure world model...
Suggested
Full time
Temporary work
Work at office
Visa sponsorship
Flexible hours
Kodiak
Mountain View, CA
19 hours ago
Senior ML Product Manager: Model Evaluation & Cross-Team Lead
$256k - $279k
...Inc. is seeking an experienced Product Manager in Mountain View, CA, to define and translate model evaluation goals into measurable metrics. You will collaborate with researchers to guide training runs, curate evaluation suites, and troubleshoot high-impact issues. The...
Suggested
Google Inc.
Mountain View, CA
4 days ago
Research Scientist - Vision Language Model
$150k
About the Institute of Foundation Models We are a dedicated research lab for building, understanding, using... ...world-class researchers, data scientists, and engineers, tackling the most fundamental... ...-training and post-training, and evaluation benchmarks. The role combines...
Suggested
Institute of Foundation Models
Sunnyvale, CA
2 days ago
Research Scientist: Efficient AI & Model Optimization
Google Inc. is hiring a Research Scientist in Mountain View, CA, with a strong background in Machine Learning and a PhD in Computer Science or related field. The role involves proposing independent research directions, managing a strong research agenda, and translating...
Suggested
Google Inc.
Mountain View, CA
2 days ago
Senior AIML Engineer — AI Model Evaluation & Benchmarking
Apple Inc. is seeking a Senior Machine Learning Engineer in Cupertino, California, to evaluate and refine Apple's AI systems. You will design and develop key infrastructures for model and agent evaluations, contribute to quality improvements, and work closely with product...
Apple Inc.
Cupertino, CA
5 days ago
Head of World-Model Evaluation & Benchmarking
NVIDIA Gruppe is seeking a Senior Research Manager to lead world-model evaluation and benchmarking efforts in Santa Clara, California. The successful candidate... ...models for Physical AI and build a team of research scientists focused on innovative evaluation techniques. The...
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior Lead, World-Model Evaluation & Benchmarking
NVIDIA is seeking a Senior Research Manager to lead world-model evaluation and benchmarking for Physical AI. This position involves developing evaluation methods and driving model improvement through rigorous scientific standards. The ideal candidate will have a PhD in...
NVIDIA
Santa Clara, CA
2 days ago
AI Research Scientist- World Model (Hiring Immediately)
$165k - $185k
...Company Description The Bosch Research and Technology Center North America with offices in Sunnyvale, California, Pittsburgh, Pennsylvania... ..., our AI research in Silicon Valley focuses on Foundation Models, Big Data Visual Analytics, Explainable AI (XAI), Natural...
Part time
Work experience placement
Immediate start
Worldwide
Bosch Group
Sunnyvale, CA
4 days ago
Senior Deep Learning Engineer - Model Evaluation & AI Systems
$224k - $356.5k
...computing. As a Senior / Principal Deep Learning Engineer — Model Evaluation & AI Systems, you will play a meaningful role in crafting the... ...unclear technical challenges and communicate effectively across research, engineering, and product teams.**Ways to stand out from the...
NVIDIA Corporation
Santa Clara, CA
6 days ago
Senior Research Manager, World Model Evaluation
$272k - $431.25k
We are seeking a Senior Research Manager to lead world‑model evaluation and benchmarking across NVIDIA’s Physical AI model portfolio. The role will build... ...systems. Responsibilities Lead a team of Research Scientists focused on world‑model evaluation, benchmarking, and...
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior Research Manager, World-Model Evaluation & Benchmarking
$272k - $431.25k
NVIDIA AI is seeking a Senior Research Manager based in Santa Clara, California, to lead world-model evaluation in Physical AI. The role involves spearheading a team of research scientists to define scientific roadmaps and benchmarks, focusing on precise diagnostics and...
NVIDIA AI
Santa Clara, CA
6 days ago
Applied Scientist II, Foundation Model
...learning and large language models. We leverage advanced robotics... ...will contribute to research that bridges the gap between... .... Build and run capability evaluations/benchmarks to clearly profile... .... Work closely with senior scientists, engineers, and leaders across...
Worldwide
Califesciences
Sunnyvale, CA
5 days ago
Lead GenAI & World Model Research Scientist
$165k - $185k
Ultimate.ai is seeking an AI Researcher to conduct cutting-edge research on GenAI and Foundation models in the ADAS AI domain. The successful candidate will collaborate with global experts and contribute to advancements in AI products and services. This role requires a...
Ultimate.ai
Sunnyvale, CA
3 days ago
Hunyuan AIGC Algorithm Researcher (World Model Foundation Direction)
$173.2k - $326k
...Role Entails What the Role Entails 1. Engage in the research and development of large-scale video world models, including the design and construction of... ...related to pre-training, SFT, and RL, model capability evaluation, and exploration of downstream application...
Full time
Relocation package
Tencent
Palo Alto, CA
1 day ago
Member of Technical Staff - Model Training
$175k - $350k
...pioneering this future with human-centered AI models that unite emotional intelligence (EQ)... ...can iterate on the fun parts. Balance research curiosity with product pragmatism—you... ...dataset curation, hyper-parameter search, evaluation, and rollout—using PyTorch, Torchtune,...
Inflection AI
Palo Alto, CA
1 day ago
Senior Applied Scientist, Delivery Foundation Model at Amazon.com Services LLC Santa Clara, CA
Senior Applied Scientist, Delivery Foundation Model job at Amazon.com Services LLC. Santa Clara, CA. DESCRIPTION... ...technical direction for specific research initiatives, ensuring robust... ...data and our extensive training and evaluation infrastructure. Guide and support...
Worldwide
Itlearn360
Santa Clara, CA
2 days ago
Internship - 2024 Summer Intern, PhD Research Scientist, Generative AI
...exciting journey. The mission of the Waymo Research team is to develop machine learning solutions... ...learning, learning from demonstration, generative modeling, Bayesian inference, hierarchical learning, and robust evaluation. Waymo interns work alongside leaders in...
Internship
Summer internship
Local area
DiversityJobs Inc
Mountain View, CA
2 days ago
Member of Technical Staff — Diffusion Model
Member of Technical Staff — Diffusion Model About the Role RadixArk is seeking a Member... ...and scalability. This role combines deep research thinking with strong engineering execution... ...ideas into practical production systems Evaluate models using rigorous metrics and benchmarks...
Flexible hours
RadixArk
Palo Alto, CA
6 days ago
Product Manager, AI Model Hub & LLM Strategy
$160k - $240k
Glean is seeking a Product Manager for Glean Model Hub in Mountain View, California. In this hybrid role, you will evaluate LLM models, define the product roadmap, and manage key customer relationships. The ideal candidate has over 4 years of product management experience...
Glean
Mountain View, CA
5 days ago
Member of Technical Staff - Voice Model
$150k
...their teammates. ABOUT THE ROLE: You will join the Grok Voice Model team to help build the world’s best voice AI. We deliver smooth... ...annotation workflows to enable high-quality model training and evaluation. Work on pre-training and post-training of speech-language models...
Temporary work
Pantera Capital
Palo Alto, CA
6 days ago
Member of Technical Staff, Model Training
...will own the training pipeline behind the models that power both Parallel’s search stack... ...‑quality training data, fine‑tune and evaluate these models rigorously, and ship them safely... ...serve all three. You care about your research being applied to product and systems...
Work at office
Visa sponsorship
Parallel Web Systems
Palo Alto, CA
3 days ago
AI Experience Researcher, Product Evaluation, Vision Products Group
$141.8k - $258.6k
AI Experience Researcher, Product Evaluation, Vision Products Group Sunnyvale, California, United States... ...team, collaborating with ML and data scientists, software engineers, designers, project... ...— to recognize patterns in model behaviors and outputs, and to develop...
Relocation
Apple Inc.
Sunnyvale, CA
6 days ago
AI Experience Researcher: Human-Centered AI Evaluations
A leading technology company located in Sunnyvale is seeking an AI Experience Researcher to blend cognitive and human sciences with product evaluation. This role is pivotal in developing evaluation frameworks for AI-powered products, ensuring they meet exceptional user...
Apple Inc.
Sunnyvale, CA
1 day ago
GenAI & World Model Scientist for ADAS Innovation
$165k - $185k
Ultimate.ai in Sunnyvale, California is seeking a Research Scientist specializing in GenAI and Foundation Models. This role involves tackling challenges in the ADAS AI domain, including world modeling and data curation, while working with international teams. The ideal...
Ultimate.ai
Sunnyvale, CA
3 days ago
Senior Foundation Model Scientist - Logistics AI
Itlearn360 is looking for a Senior Applied Scientist to join the Delivery Foundation Model team in Santa Clara, CA. This role involves developing innovative foundation models to enhance logistics efficiency across Amazon's delivery network, leveraging cutting-edge AI and...
Itlearn360
Santa Clara, CA
6 days ago
Vice President, Business Model & Monetization
$349k
...pricing. Set pricing for new products Innovate new pricing models for our products Work closely with both PMM organization and... ...established businesses with dozens of employees. Leverage research and analytics to explore new pricing constructs and levels, as...
Work experience placement
Intuit Inc.
Mountain View, CA
2 hours ago
Senior Research Scientist, Google Research
$174k - $252k
Senior Research Scientist, Google Research Mountain View, CA, USA; New York, NY, USA; +2 more Apply X Applicants in San Francisco: Qualified... ...work by defining the data structure, framework, design, and evaluation metrics for research solution development and implementation...
Full time
Google Inc.
Mountain View, CA
4 days ago
Machine Learning Research Scientist, Mechanical Intuition in Multimodal Models
$176k - $253k
At Toyota Research Institute (TRI), we’re on a mission to improve the... ...needed to train and evaluate these systems at scale. The... ...We are looking for a Research Scientist to join us in building intelligent... ...to explore how large language models and agentic infrastructure can...
Work experience placement
Internship
Local area
Shift work
Toyota Research Institute
Los Altos, CA
6 days ago
Research Scientist
...according to the order of listing. What you’ll do As a Research Scientist at Simular, you will: Shape the future of agentic AI... ...human-agent interaction, and alignment (e.g. reward modeling, automated task evaluation, AI safety). Design and execute experiments end-to-...
Simular Inc.
Palo Alto, CA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Scientist (Model Evaluation). Be the first to apply!