Machine Learning Engineer - LLM Evaluation & Automation

Grid Dynamics Holdings

We are seeking a highly skilled Machine Learning Engineer who specializes in leveraging Large Language Models (LLMs) for automated evaluation and quality assessment. In this role, you will design and build systems that automatically measure and improve the accuracy, relevance, and consistency of model outputs. You will lead initiatives to create evaluation pipelines, develop metrics, and deliver actionable insights for continuous improvements. This position requires strong technical expertise, analytical problem-solving abilities, and the capacity to manage projects across multiple cross-functional teams.

Essential functions

Responsibilities:

Design and implement automated systems and pipelines for evaluating LLM outputs.
Develop metrics and KPIs to measure output quality, accuracy, and consistency using LLM-based evaluations
Collaborate with Engineering teams to create automated logic checks and validation tools.
Partner with Data Scientists to analyze evaluation results and optimize prompt and task structures.
Provide feedback loops to ensure evaluation guidelines align with LLM-based assessments.
Investigate how LLM-derived evaluations can enhance product reliability and user experience.
Recommend refinements to prompt engineering, evaluation strategies, and automation tools.
Stay informed on emerging trends in LLM evaluation, automated quality assessment, and AI toolchains.
Continuously improve and expand automated evaluation processes based on industry best practices.

Qualifications

5+ years of experience in ML engineering, NLP, or AI/ML automation.
Advanced degree (MS/PhD) in Statistics, Data Science, Computational Social Science, Quantitative Psychology, or a related field.
Hands-on experience in prompt engineering and designing LLM-based evaluation systems is preferred
Strong understanding of machine learning principles with focus on NLP and advanced LLM capabilities (e.g., Chain-of-Thought, agentic workflows)
Expertise in building automated evaluation or QA pipelines.
Excellent analytical and problem-solving skills with experience in root cause and error pattern analysis.
Proven project management and cross-functional collaboration experience.
Excellent communication skills to convey complex insights to technical and non-technical audiences.
Detail-oriented mindset with a focus on evaluation metrics, prompt design, and automation.
Ability to quickly adapt to new business rules and evaluation guidelines across diverse product domains.
Strong programming skills in Python and SQL.
Experience with big data technologies like PySpark for data aggregation and sampling is a strong plus

We offer

Opportunity to work on cutting-edge projects
Work with a highly motivated and dedicated team
Competitive salary
Flexible schedule
Benefits package - medical insurance, vision, dental, etc.
Corporate social events
Professional development opportunities
Well-equipped office

About us

Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI , supported by profound expertise and ongoing investment in data , analytics , cloud & DevOps , application modernization and customer experience . Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Machine Learning Engineer - LLM Evaluation & Automation in United States vacancy

Machine Learning Engineer, Model Evaluations (Speech LLM) - San Francisco
$180k - $270k
...privacy protection. To learn more about Plaud,... ...clear, defensible, and automated metrics that researchers... ...Possess strong software engineering skills (especially in... ..., data pipelines, or evaluation harnesses that can run... ...looks like for a Speech LLM, translating...
Suggested
Full time
Work at office
Worldwide
Plaud
San Francisco, CA
5 days ago
ML Engineer — LLM Evaluation
...frontier research for their next generation of LLM products. Join us if you: Wish to work... ...advancement. Responsibilities Own LLM evaluation processes and methods with a focus on... ...abrupt shift in focus. You must be able to learn, implement, and extend state-of-the-art...
Suggested
Local area
Shift work
Capitolis
San Francisco, CA
6 days ago
Senior Machine Learning Engineer - VLM/LLM Evaluation
$204k - $259k
...Waymo AI Foundations team is to develop machine learning solutions addressing open problems in... ..., hierarchical learning, and robust evaluation. This role follows a hybrid work schedule... ...report to a Senior Staff Software Engineer. You will: Work with a creative team...
Suggested
Full time
Temporary work
Remote work
Neura Market
Mountain View, CA
4 days ago
Senior Machine Learning Engineer II - LLM
$200k - $275k
...Do We are looking for a Machine Learning Engineer to help build cutting edge... ...infrastructure for building and serving LLM’s at Moveworks. This role... ...models(LLM), model evaluation and monitoring framework,... .... Build abstractions to automate various steps in different...
Suggested
Full time
Moveworks
Mountain View, CA
more than 2 months ago
Sr. AI/ML Engineer (LLM)
...Description Dealer Automation Technologies is a... ..., for a Senior AI/ML Engineer specializing in Large... ...designing and implementing machine learning models, particularly... ...neural networks, and evaluating model performance.... ...to integrate LLM-based automation into...
Suggested
Full time
Part time
Local area
Flexible hours
Bomnin Chevrolet Dadeland
Miami, FL
2 days ago
Senior Machine Learning Engineer
...About Kinetic Kinetic Automation is building a network of automated repair centers... .... You’ll collaborate with other engineers and researchers to develop, evaluate, and help deploy vision models for... ...must understand how Transformer/LLM building blocks map to vision (ViT...
Menlo Ventures
Costa Mesa, CA
2 days ago
Senior Machine Learning Engineer
$131.4k - $235.95k
...Experience Technology Machine Learning Team (GET-ML) @... ...personalization, and automation, using data, machine... ...Assistant (CSA), an LLM-driven conversational... ...workflows, query routing, evaluation and measurement, and... ...partner closely with ML engineers, MLOps, product managers...
For contractors
Work experience placement
Work at office
Remote work
Autodesk
Atlanta, GA
4 days ago
Senior Machine Learning Engineer
...Senior ML Engineer Supply chain is the circulatory... .... We're an industrial automation and research company building... ...AI, our autonomous LLM-based dispatch agent,... ...reinforcement learning workflows, prompt engineering... ...team. Technical evaluation, details disclosed after...
Immediate start
Ritual Capital
San Francisco, CA
5 days ago
Senior Machine Learning Engineer
$240k - $290k
...AI. Instead of basic automation that needs constant human... ...role for ML engineers who want to build production... ...As a Founding Senior Machine Learning Engineer at Retell,... ...models and audio models, evaluate them with rigorous... ...Interview (45 min) : LLM theory specific...
H1b
Work at office
Retell AI
Redwood City, CA
5 days ago
Principal Machine Learning Engineer
$264k - $330k
...simple assistance to power real automation and decision‑making. Who We... ...We’re seeking a Principal Machine Learning Engineer to help define and lead the... ..., model training, evaluation, deployment, and inference... ...and deploying open source LLM and SLM to production for optimizing...
AppFolio
San Diego, CA
2 days ago
Machine Learning Engineer, ML/GenAI Evaluation
$171.6k - $302.2k
Machine Learning Engineer, ML/GenAI Evaluation San Diego, California, United States Software and Services Would you like to contribute to Machine Learning... ...hallucination rates, faithfulness, and groundedness using LLM-as-a-judge frameworks, human evaluation protocols, and...
Relocation
Shift work
Apple
San Diego, CA
3 days ago
Machine Learning Engineer, LLM Post-Training
$150k - $230k
...the Role We are looking for a hands‑on Machine Learning Engineer to drive the post‑training of our... ...throughput and stability. Build and maintain evaluation and reward/verifier pipelines to... ...production‑ready code. Requirements Hands‑on LLM post‑training experience. You have...
Full time
GoTo Meeting
Mountain View, CA
5 days ago
Senior ML Engineer, Foundation Models & VLM/LLM Evaluation
$204k - $259k
Neura Market is seeking an experienced Software Engineer in Mountain View, California. You will develop innovative machine learning solutions for autonomous driving and contribute to advanced evaluation systems. The ideal candidate holds a Bachelor's or Master's degree...
Neura Market
Mountain View, CA
6 days ago
Machine Learning Engineer - AI & ML Evaluation Frameworks
$147.4k - $272.1k
...The Health Sensing Machine Learning Interpretability & Analytics... ...an exceptional ML Engineer to help us build the... ...of scalable evaluation infrastructure and lead... ...data pipelines, and automated frameworks that ensure... ...edge cases. Expand LLM/diffusion‑based data...
Relocation
Apple
Cupertino, CA
4 days ago
AIML - Sr Machine Learning Engineer, Evaluation
$212k - $386.3k
AIML - Sr Machine Learning Engineer, Evaluation Cupertino, California, United States Machine Learning and AI... ...observation in production. We develop LLM-as-judge evaluators, train reward models... ...loop, and on-device settings; build automated prompt and context optimization...
Relocation
Apple
Cupertino, CA
2 days ago
Machine Learning Engineer (LLM / Personalization)
...your unique tastes, our Taste AI engine sifts through the noise to find the... ...applications. Role Overview As a Machine Learning Engineer reporting to the LLM Research Lead, you will operate at... ...explainability Experiment with and evaluate modern ML approaches (transformers...
Remote work
Flexible hours
Medium
New York, NY
4 days ago
Senior LLM Evaluation Data Scientist - Remote
...Applied Data Scientist with expertise in LLM evaluation to join its innovative team in Austin,... ...a strong background in statistics and machine learning. The successful candidate will define... ...evaluation datasets, and establish automated quality signals for content generation...
Remote job
Driverai
Austin, TX
6 days ago
Applied Data Scientist, LLM Evaluation United States (Remote) View Role
$175k - $275k
...location) - Senior - Product & Engineering - $175k - $275k Applied Data Scientist, LLM Evaluation Introduction At Driver, we’re... ...balance human judgment with automated signals. This role builds... ...Master’s, or PhD in Statistics, Machine Learning, Data Science, Computational...
Remote job
Full time
Flexible hours
Driverai
Austin, TX
6 days ago
Data Scientist - LLM Evaluation & Survey Design
$141.8k - $258.6k
...leveraging multimodal capabilities. You will design and manage data annotation processes, work with ML Engineers, and develop LLM auto-judges for AI model evaluation. The ideal candidate has a BA/Master’s in a relevant field and at least 2 years of experience in survey...
Apple
Cupertino, CA
6 days ago
Senior Software Machine Learning Engineer (Teradyne, North Reading, MA)
$133.9k - $223.9k
Senior Software Machine Learning Engineer (Teradyne, North Reading, MA) Location... ...Teradyne, a global leader in automated test equipment (ATE) and... ...optimization, and applied LLM systems. You will be the go... ...design, training pipelines, evaluation frameworks, and deployment....
Flexible hours
Teradyne
North Reading, MA
3 days ago
Machine Learning Engineer
$170k - $225k
...Machine Learning Engineer – Healthcare Salary Range: $170,000 to $225,000 Location: Charlotte, NC Are... ...leader in AI and large language model (LLM) technology, is transforming one of the... ...to minutes, driving impactful automation in healthcare. The Role Design and optimise...
Remote work
Flexible hours
Consortia Group
Whitehall, NY
1 day ago
ML Engineer
...ML Engineer | Nox Metals | Detroit, MI American factories... .... We use software and automation to supply metal to... ...not built yet. Every machine, every order, every shipment... ...price Build NLP and LLM features for sales... ..., labeling, training, evaluation, deployment, monitoring...
Full time
Immediate start
Shift work
Nox Metals
Detroit, MI
2 days ago
Senior Machine Learning Engineer
$161.9k - $194.2k
...creating the future of financial automation so businesses can spend... ...Join BILL\'s AI Product Engineering team and help shape the future... ...automation. As a Senior Machine Learning Engineer , you\'ll play a... ...drive product innovation Evaluate, optimize, and monitor model...
Temporary work
Remote work
Visa sponsorship
Flexible hours
Bill.com
San Jose, CA
2 days ago
Machine Learning Engineer (GCP)
$125k - $135k
...Platform (inference, deployment automation, experimentation, sampling)... ...(not deep expertise) Machine Learning frameworks: TensorFlow, PyTorch... ...or similar Requirements Evaluate and benchmark new ML inference... ..., Computer or Electrical Engineering, Mathematics, or a related...
Temporary work
Work experience placement
Remote work
Flexible hours
Hitachi Data Systems
San Jose, CA
2 days ago
Senior ML Engineer
$500 per month
...Forward Deployed Senior Machine Learning Engineer Adelphi builds AI/ML-enabled... ...data silos, build trust in automation without compromising... ...mix of software development, LLM Ops, and secdevops practices... ...AutoGen, or similar) and agent evaluation / observability tooling....
Adelphi
Washington DC
1 day ago
Machine Learning Engineer
...Machine Learning Engineer Location: Cupertino, CA BOUT THIS FEATURED OPPORTUNITY... ..., anomaly detection, and operational automation. This role will support two major initiatives... ...platform that uses image-based analysis to evaluate store readiness, supply conditions,...
Local area
Remote work
Flexible hours
INSPYR Solutions
United States
3 days ago
Machine Learning Engineer
$10k
...corporation, created to provide Automation Solutions and Support... ...and have a desire to learn and grow. Yaskawa's... ...have a passion for machine learning and advanced... ...Business Unit. This engineer will work with the guidance... ...clustering, and model evaluation is a must....
Internship
Yaskawa
Franklin, WI
1 day ago
Machine Learning Engineer
...Machine Learning Engineer (MLOps / Data Engineering) Darwill is a nationally recognized print and... ...performance through advanced analytics, automation, and AI-powered insights. We are... ...A/B testing and model performance evaluation in partnership with Data Science Troubleshoot...
2 days per week
1 day per week
Darwill/Ross Media Inc.
Oak Brook, IL
4 days ago
Machine Learning Engineer
...Overview: Machine Learning Engineer Philadelphia, PA OR Washington, DC | Hybrid: 3-4 days/week 9 + Months Role... ...tooling teams. Enhance existing AIML automation tools (e.g., Speech data), implement LLM prompt interactions, and use LLMs to test LLMs -...
3 days per week
Guru Schools
Philadelphia, PA
2 days ago
Machine Learning Engineer
...Join Our Data Products and Machine Learning Development Remote Startup... ...looking for a Machine Learning Engineer to help take our expertise to... ...visual recognition and automation for various industries, improving... ...). Background in modern LLM technologies. Understanding...
Remote work
Mutt Data
United States
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Engineer - LLM Evaluation & Automation. Be the first to apply!