Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Research Scientist (Model Evaluation)

Sanas

About the Role Progress in speech AI is only as meaningful as our ability to measure it. At Sanas, model quality spans dimensions that automated metrics struggle to capture — accent naturalness, perceptual clarity, speaker identity preservation, noise suppression without speech distortion, translation fluency under real-world disfluency. We’re looking for a Research Scientist who can define what "better" actually means across all of Sanas’s model families, build the evaluation infrastructure to measure it rigorously, and close the loop between research progress and real‑world impact. This role sits at the intersection of research, product, and infrastructure — and directly shapes how every model team at Sanas measures progress. Job Description Design and own evaluation frameworks across Sanas’s full model portfolio — Accent Translation, Noise Cancellation, Speech Enhancement, Language Translation, and more — ensuring each captures meaningful progress, not just benchmark performance. Develop novel quantitative metrics for subjective and perceptual qualities: accent similarity, naturalness, speaker identity preservation, intelligibility under noise, and translation fluency in spoken‑language domains. Build evaluation systems that bridge automated metrics and human judgment — designing listening studies, MOS/MUSHRA protocols, and preference tests that are statistically rigorous and operationally scalable. Define evaluation splits, test sets, and benchmark suites that accurately reflect production conditions — diverse accents, languages, noise environments, recording devices, and telephony codecs. Evaluation infrastructure & tooling Build and maintain automated evaluation pipelines that run continuously against model checkpoints — surfacing regressions early and tracking quality trends across training runs. Develop reference‑based and reference‑free metrics calibrated to Sanas’s specific model tasks: SI‑SDR, PESQ, STOI, DNSMOS, speaker similarity, WER delta, COMET, and task‑specific custom metrics where off‑the‑shelf measures fall short. Instrument model quality monitoring in production — detecting degradation across language pairs, accent profiles, and acoustic conditions in live customer traffic. Build tooling that allows research scientists and ML engineers to run rigorous ablations, compare model versions, and understand quality tradeoffs without needing to design the evaluation from scratch each time. Design and operate human evaluation programs — listener panels, crowdsourced annotation, and expert evaluator workflows — that produce reliable signal on dimensions automated metrics cannot capture. Conduct research into evaluation methodology itself: when do automated metrics correlate with human perception, when do they diverge, and what does that tell us about model behavior? Partner directly with research scientists across model teams to translate open‑ended quality questions into concrete, measurable evaluation protocols. Cross‑functional impact Work closely with ML research, product, and customer success teams to ensure evaluation reflects what customers actually experience — not just what lab conditions optimize for. Feed evaluation insights back into data acquisition and model training priorities — identifying which failure modes require more data, architectural changes, or training procedure improvements. Communicate evaluation results clearly to both technical and non‑technical stakeholders, translating metric movements into product quality narratives that inform roadmap decisions. Qualifications 4+ years of research or applied research experience in speech, audio, or NLP, with a demonstrated focus on evaluation methodology and quality measurement. Deep familiarity with speech and audio quality metrics — perceptual (MOS, MUSHRA, PESQ, STOI), signal‑level (SI‑SDR, SNR), and task‑specific (WER, speaker similarity, DNSMOS) — and an understanding of when each is and isn’t the right tool. Experience designing and running human evaluation studies — listener panels, crowdsourced annotation, inter‑annotator agreement analysis — with statistical rigor. Strong engineering skills: you can build production‑quality evaluation pipelines, not just run scripts. Proficiency in Python and PyTorch or equivalent. Creativity in defining novel quantitative metrics for subjective or behavioral qualities — you’ve identified gaps in existing evaluation approaches and built something better. Ability to take open‑ended research questions and translate them into concrete, measurable evaluation systems that run reliably at scale. Curiosity and rigor in equal measure — you’re as motivated by discovering the right way to measure progress as by the progress itself. Bonus Experience evaluating models across multiple speech tasks — ASR, TTS, speech enhancement, speaker verification, or machine translation. Familiarity with real‑time or streaming model evaluation — latency‑quality tradeoffs, codec‑degraded audio, telephony channel conditions. Background in psychoacoustics or perceptual audio quality — understanding of how humans perceive speech naturalness, noise, and distortion. Experience with multilingual evaluation — cross‑lingual quality metrics, language‑specific annotation challenges, low‑resource language evaluation. Published research at INTERSPEECH, ICASSP, ACL, EMNLP, or equivalent venues on evaluation methodology, speech quality, or related topics. #J-18808-Ljbffr Sanas

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Research Scientist (Model Evaluation) in Palo Alto, CA vacancy
  • A leading AI speech technology company in Palo Alto seeks a Research Scientist to enhance evaluation methodologies for speech AI models. The role involves designing and implementing evaluation frameworks while using advanced metrics to measure model quality. Candidates... 
    Suggested

    Sanas

    Palo Alto, CA
    4 days ago
  •  ...Overview We build frontier foundation models that power intelligent experiences at Apple...  ...you're drawn to hard problems where the research and the product are inseparable, this is...  ...over a billion people. You will design evaluation systems where the outcome is not just a... 
    Suggested

    Broughton Group

    Cupertino, CA
    2 days ago
  • Broughton Group seeks an experienced AI Model Evaluator in Cupertino, California to design evaluation systems for Apple products. Your work will enhance the user experience for over a billion users and drive model improvements by providing actionable insights. The ideal... 
    Suggested

    Broughton Group

    Cupertino, CA
    2 days ago
  • Apple Inc. is seeking an expert to evaluate machine learning and deep learning models, playing a crucial role in creating robust evaluation frameworks. The ideal candidate will collaborate with multidisciplinary teams, utilizing statistical methods and Python expertise... 
    Suggested

    Apple Inc.

    Sunnyvale, CA
    5 days ago
  • $190k - $250k

     ...developing large-scale generative world models that learn to predict realistic,...  ...autonomous trucks. We are looking for a research scientist to lead the design and development of...  ...camera, LiDAR, and radar outputs Design evaluation frameworks that measure world model... 
    Suggested
    Temporary work
    Work at office
    Visa sponsorship
    Flexible hours

    Kodiak

    Mountain View, CA
    5 days ago
  • $207k - $300k

    Research Scientist, Evaluations, Security and Privacy, DeepMind DeepMind Mountain View, CA, USA ; San Francisco, CA, USA Apply X Applicants in San...  ...benchmarking frameworks for machine learning models. 2 years of experience in security and privacy. One or... 
    Full time

    Google Inc.

    Mountain View, CA
    4 days ago
  • $224k - $356.5k

     ...computing. As a Senior / Principal Deep Learning Engineer — Model Evaluation & AI Systems, you will play a meaningful role in crafting the...  ...unclear technical challenges and communicate effectively across research, engineering, and product teams. Ways to stand out from... 

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $34 per hour

    Welo Global is seeking a Data Quality Associate based in Sunnyvale, CA. The role involves evaluating AI model outputs, providing structured feedback, and performing audits on data quality. Candidates should possess a university degree and have critical thinking, attention... 
    Full time

    Welo Global

    Sunnyvale, CA
    2 days ago
  •  ...state-of-the-art foundation world models that control our robots. Our...  ...made possibly by our cutting edge research and end-to-end system design....  .... We're looking for Research Scientists and Research Engineers to build the data and evaluation foundations for our video action... 

    Rhoda ai

    Palo Alto, CA
    1 day ago
  • $175k - $350k

     ...Model Training Engineer At Inflection AI, our public benefit mission is to harness...  ...can iterate on the fun parts. Balance research curiosity with product pragmatism—you know...  ...curation, hyper-parameter search, evaluation, and rollout—using PyTorch, Torchtune, FSDP... 
    Full time

    Humanx

    Palo Alto, CA
    1 day ago
  •  ...Job Title: CW Research on Large Vehicle Data Model - Summer Intern (99W210) About Kyyba: Founded in 1998 and headquartered in Farmington...  ...multimodal reasoning capabilities Train and evaluate models on multimodal data across vehicle sensors, edge,... 
    Summer internship
    Visa sponsorship
    Work visa

    Kyyba

    Mountain View, CA
    6 hours ago
  • $180k

     ...teammates. ABOUT THE ROLE: As a multimodal engineer on the Imagine Model Team, you will develop cutting-edge AI experiences beyond text,...  ...studies, particularly for visual and audio data. Design evaluation frameworks, metrics, benchmarks, evals, and reward models... 
    Temporary work

    xAI

    Palo Alto, CA
    6 days ago
  • $150k

     ...their teammates. ABOUT THE ROLE: You will join the Grok Voice Model team to help build the world's best voice AI. We deliver smooth...  ...annotation workflows to enable high-quality model training and evaluation. Work on pre-training and post-training of speech-language... 
    Temporary work

    xAI

    Palo Alto, CA
    a month ago
  •  ...at Amazon's Delivery Foundation Model team, where you'll work alongside world-class scientists and engineers to pioneer the...  ...technical direction for specific research initiatives, ensuring robust performance...  ...and our extensive training and evaluation infrastructure. Guide and... 
    Worldwide

    Itlearn360

    Santa Clara, CA
    4 days ago
  • $165k - $185k

    Company Description The Bosch Research and Technology Center North America with offices in Sunnyvale, California, Pittsburgh, Pennsylvania...  ..., our AI research in Silicon Valley focuses on Foundation Models, Big Data Visual Analytics, Explainable AI (XAI), Natural Language... 
    Full time
    Work experience placement
    Worldwide

    Bosch Group

    Sunnyvale, CA
    23 hours ago
  • $160k - $240k

    Glean is seeking a Product Manager for Glean Model Hub in Mountain View, California. In this hybrid role, you will evaluate LLM models, define the product roadmap, and manage key customer relationships. The ideal candidate has over 4 years of product management experience... 

    Glean

    Mountain View, CA
    2 days ago
  • Member of Technical Staff — Diffusion Model About the Role RadixArk is seeking a Member...  ...and scalability. This role combines deep research thinking with strong engineering execution...  ...ideas into practical production systems Evaluate models using rigorous metrics and benchmarks... 
    Flexible hours

    RadixArk

    Palo Alto, CA
    3 days ago
  • $175k - $350k

     ...pioneering this future with human-centered AI models that unite emotional intelligence (EQ)...  ...can iterate on the fun parts. Balance research curiosity with product pragmatism—you...  ...dataset curation, hyper-parameter search, evaluation, and rollout—using PyTorch, Torchtune,... 

    Inflection AI

    Palo Alto, CA
    3 days ago
  •  ...exciting journey. The mission of the Waymo Research team is to develop machine learning solutions...  ..., learning from demonstration, generative modeling, Bayesian inference, hierarchical learning, and robust evaluation. Waymo interns work alongside leaders in... 
    Internship
    Summer internship
    Local area

    Waymo

    Mountain View, CA
    4 days ago
  • $176k - $253k

     ...At Toyota Research Institute (TRI), we're on a mission to improve...  ...infrastructure needed to train and evaluate these systems at scale....  ...We are looking for a Research Scientist to join us in building intelligent...  ...explore how large language models and agentic infrastructure... 
    Work experience placement
    Internship
    Local area
    Remote work
    Shift work

    Toyota Research Institute

    Los Altos, CA
    1 day ago
  • $176k - $253.5k

     ...At Toyota Research Institute (TRI), we're on a mission to improve...  ...this, we are developing novel models of human behavior that integrate...  ...looking for an AI Research Scientist, or Senior Machine Learning Research...  ...model training, fine-tuning, evaluation and benchmarking. This role... 
    Temporary work
    Local area
    Shift work

    Toyota Research Institute

    Los Altos, CA
    1 day ago
  • $160.36k - $240.54k

     ...Machine Learning Research Scientist: Generative Modeling for Planning Mountain View, California (HQ) Nuro is a self-driving technology company...  ..., prioritize work and develop solutions to solve them, evaluate your solution by deploying the models on to the NuroDriver... 

    Nuro

    Mountain View, CA
    6 hours ago
  • $281k - $356k

     ...Senior Staff Software Engineer, Model Post Training Waymo is an autonomous driving...  ...working alongside a world-class team of researchers and engineers to develop and advance the...  ...the technical bar for how Waymo trains, evaluates, and deploys LLM models in the autonomous... 
    Full time
    Remote work

    Waymo

    Mountain View, CA
    3 days ago
  • $109k - $147k

    2026 PhD Residency, Research Scientist, Functional Glass & Photonics (Early Stage Project) Internship...  ...: Utilize analytical tools to evaluate performance: spectroscopic, thermal, and...  ...aims to push the limits of science and modeling as we know them and to prove how... 
    Internship
    Flexible hours

    X Development, LLC

    Mountain View, CA
    2 days ago
  •  ...according to the order of listing. What you’ll do As a Research Scientist at Simular, you will: Shape the future of agentic AI...  ...human-agent interaction, and alignment (e.g. reward modeling, automated task evaluation, AI safety). Design and execute experiments end-to-... 

    Simular Inc.

    Palo Alto, CA
    2 days ago
  • $204k - $259k

     ...foster collaborations with other research teams in Alphabet. AI...  ...from demonstration, generative modeling, Bayesian inference, hierarchical learning, and robust evaluation. In this hybrid role, you will report to a Principal Scientist. You will: Participate in Waymo... 
    Temporary work
    Remote work

    Neura Market

    Mountain View, CA
    5 days ago
  • $207k - $300k

    Research Scientist, Gemini Retrieval and Agera, DeepMind Mountain View, CA, USA Required qualifications...  ...in the full lifecycle of research modeling, with a specific emphasis on ensuring...  ...Learning (RL) or automated evaluation systems. Ability to solve exceptionally... 
    Full time

    Google Inc.

    Mountain View, CA
    5 days ago
  • $176k - $253k

    At Toyota Research Institute (TRI), we’re on a mission to improve the...  ...needed to train and evaluate these systems at scale. The...  ...We are looking for a Research Scientist to join us in building intelligent...  ...to explore how large language models and agentic infrastructure can... 
    Work experience placement
    Internship
    Local area
    Shift work

    Toyota Research Institute

    Los Altos, CA
    3 days ago
  • $174k - $252k

    Senior Research Scientist, Google Research Mountain View, CA, USA; New York, NY, USA; +2 more Apply X Applicants in San Francisco: Qualified...  ...work by defining the data structure, framework, design, and evaluation metrics for research solution development and implementation... 
    Full time

    Google Inc.

    Mountain View, CA
    1 day ago
  • $183.83k - $275.98k

     ...leveraging the cutting edge of machine learning research to solve challenging real-world robotics...  ...working with and developing large models for perception and behavior, keeping up-...  .... Work with infrastructure, data and evaluation teams to build effective and efficient data... 

    Icehouseventures

    Mountain View, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Scientist (Model Evaluation). Be the first to apply!