Machine Learning Engineer - LLM Evaluation & Automation
Grid Dynamics Holdings
We are seeking a highly skilled Machine Learning Engineer who specializes in leveraging Large Language Models (LLMs) for automated evaluation and quality assessment. In this role, you will design and build systems that automatically measure and improve the accuracy, relevance, and consistency of model outputs. You will lead initiatives to create evaluation pipelines, develop metrics, and deliver actionable insights for continuous improvements. This position requires strong technical expertise, analytical problem-solving abilities, and the capacity to manage projects across multiple cross-functional teams.
Essential functionsResponsibilities:
- Design and implement automated systems and pipelines for evaluating LLM outputs.
- Develop metrics and KPIs to measure output quality, accuracy, and consistency using LLM-based evaluations
- Collaborate with Engineering teams to create automated logic checks and validation tools.
- Partner with Data Scientists to analyze evaluation results and optimize prompt and task structures.
- Provide feedback loops to ensure evaluation guidelines align with LLM-based assessments.
- Investigate how LLM-derived evaluations can enhance product reliability and user experience.
- Recommend refinements to prompt engineering, evaluation strategies, and automation tools.
- Stay informed on emerging trends in LLM evaluation, automated quality assessment, and AI toolchains.
- Continuously improve and expand automated evaluation processes based on industry best practices.
- 5+ years of experience in ML engineering, NLP, or AI/ML automation.
- Advanced degree (MS/PhD) in Statistics, Data Science, Computational Social Science, Quantitative Psychology, or a related field.
- Hands-on experience in prompt engineering and designing LLM-based evaluation systems is preferred
- Strong understanding of machine learning principles with focus on NLP and advanced LLM capabilities (e.g., Chain-of-Thought, agentic workflows)
- Expertise in building automated evaluation or QA pipelines.
- Excellent analytical and problem-solving skills with experience in root cause and error pattern analysis.
- Proven project management and cross-functional collaboration experience.
- Excellent communication skills to convey complex insights to technical and non-technical audiences.
- Detail-oriented mindset with a focus on evaluation metrics, prompt design, and automation.
- Ability to quickly adapt to new business rules and evaluation guidelines across diverse product domains.
- Strong programming skills in Python and SQL.
- Experience with big data technologies like PySpark for data aggregation and sampling is a strong plus
- Opportunity to work on cutting-edge projects
- Work with a highly motivated and dedicated team
- Competitive salary
- Flexible schedule
- Benefits package - medical insurance, vision, dental, etc.
- Corporate social events
- Professional development opportunities
- Well-equipped office
About us
Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI , supported by profound expertise and ongoing investment in data , analytics , cloud & DevOps , application modernization and customer experience . Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.
$180k - $270k
...privacy protection. To learn more about Plaud,... ...clear, defensible, and automated metrics that researchers... ...Possess strong software engineering skills (especially in... ..., data pipelines, or evaluation harnesses that can run... ...looks like for a Speech LLM, translating...SuggestedFull timeWork at officeWorldwide$204k - $259k
...Waymo AI Foundations team is to develop machine learning solutions addressing open problems in... ..., hierarchical learning, and robust evaluation. This role follows a hybrid work schedule... ...report to a Senior Staff Software Engineer. You will: Work with a creative team...SuggestedFull timeTemporary workRemote work- ...frontier research for their next generation of LLM products. Join us if you: Wish to work... ...advancement. Responsibilities Own LLM evaluation processes and methods with a focus on... ...abrupt shift in focus. You must be able to learn, implement, and extend state-of-the-art...SuggestedLocal areaShift work
$200k - $275k
...Do We are looking for a Machine Learning Engineer to help build cutting edge... ...infrastructure for building and serving LLM’s at Moveworks. This role... ...models(LLM), model evaluation and monitoring framework,... .... Build abstractions to automate various steps in different...SuggestedFull time- ...Description Dealer Automation Technologies is a... ..., for a Senior AI/ML Engineer specializing in Large... ...designing and implementing machine learning models, particularly... ...neural networks, and evaluating model performance.... ...to integrate LLM-based automation into...SuggestedFull timePart timeLocal areaFlexible hours
- ...About Kinetic Kinetic Automation is building a network of automated repair centers... .... You’ll collaborate with other engineers and researchers to develop, evaluate, and help deploy vision models for... ...must understand how Transformer/LLM building blocks map to vision (ViT...
$131.4k - $235.95k
...Experience Technology Machine Learning Team (GET-ML) @... ...personalization, and automation, using data, machine... ...Assistant (CSA), an LLM-driven conversational... ...workflows, query routing, evaluation and measurement, and... ...partner closely with ML engineers, MLOps, product managers...For contractorsWork experience placementWork at officeRemote work- ...Senior ML Engineer Supply chain is the circulatory... .... We're an industrial automation and research company building... ...AI, our autonomous LLM-based dispatch agent,... ...reinforcement learning workflows, prompt engineering... ...team. Technical evaluation, details disclosed after...Immediate start
$240k - $290k
...AI. Instead of basic automation that needs constant human... ...role for ML engineers who want to build production... ...As a Founding Senior Machine Learning Engineer at Retell,... ...models and audio models, evaluate them with rigorous... ...Interview (45 min) : LLM theory specific...H1bWork at office$264k - $330k
...simple assistance to power real automation and decision‑making. Who We... ...We’re seeking a Principal Machine Learning Engineer to help define and lead the... ..., model training, evaluation, deployment, and inference... ...and deploying open source LLM and SLM to production for optimizing...$150k - $230k
...the Role We are looking for a hands‑on Machine Learning Engineer to drive the post‑training of our... ...throughput and stability. Build and maintain evaluation and reward/verifier pipelines to... ...production‑ready code. Requirements Hands‑on LLM post‑training experience. You have...Full time$171.6k - $302.2k
Machine Learning Engineer, ML/GenAI Evaluation San Diego, California, United States Software and Services Would you like to contribute to Machine Learning... ...hallucination rates, faithfulness, and groundedness using LLM-as-a-judge frameworks, human evaluation protocols, and...RelocationShift work$212k - $386.3k
AIML - Sr Machine Learning Engineer, Evaluation Cupertino, California, United States Machine Learning and AI... ...observation in production. We develop LLM-as-judge evaluators, train reward models... ...loop, and on-device settings; build automated prompt and context optimization...Relocation$147.4k - $272.1k
...The Health Sensing Machine Learning Interpretability & Analytics... ...an exceptional ML Engineer to help us build the... ...of scalable evaluation infrastructure and lead... ...data pipelines, and automated frameworks that ensure... ...edge cases. Expand LLM/diffusion‑based data...Relocation$204k - $259k
Neura Market is seeking an experienced Software Engineer in Mountain View, California. You will develop innovative machine learning solutions for autonomous driving and contribute to advanced evaluation systems. The ideal candidate holds a Bachelor's or Master's degree...$175k - $275k
...location) - Senior - Product & Engineering - $175k - $275k Applied Data Scientist, LLM Evaluation Introduction At Driver, we’re... ...balance human judgment with automated signals. This role builds... ...Master’s, or PhD in Statistics, Machine Learning, Data Science, Computational...Remote jobFull timeFlexible hours$141.8k - $258.6k
...leveraging multimodal capabilities. You will design and manage data annotation processes, work with ML Engineers, and develop LLM auto-judges for AI model evaluation. The ideal candidate has a BA/Master’s in a relevant field and at least 2 years of experience in survey...- ...Applied Data Scientist with expertise in LLM evaluation to join its innovative team in Austin,... ...a strong background in statistics and machine learning. The successful candidate will define... ...evaluation datasets, and establish automated quality signals for content generation...Remote job
$133.9k - $223.9k
Senior Software Machine Learning Engineer (Teradyne, North Reading, MA) Location... ...Teradyne, a global leader in automated test equipment (ATE) and... ...optimization, and applied LLM systems. You will be the go... ...design, training pipelines, evaluation frameworks, and deployment....Flexible hours$170k - $225k
...Machine Learning Engineer – Healthcare Salary Range: $170,000 to $225,000 Location: Charlotte, NC Are... ...leader in AI and large language model (LLM) technology, is transforming one of the... ...to minutes, driving impactful automation in healthcare. The Role Design and optimise...Remote workFlexible hours- ...ML Engineer | Nox Metals | Detroit, MI American factories... .... We use software and automation to supply metal to... ...not built yet. Every machine, every order, every shipment... ...price Build NLP and LLM features for sales... ..., labeling, training, evaluation, deployment, monitoring...Full timeImmediate startShift work
$161.9k - $194.2k
...creating the future of financial automation so businesses can spend... ...Join BILL\'s AI Product Engineering team and help shape the future... ...automation. As a Senior Machine Learning Engineer , you\'ll play a... ...drive product innovation Evaluate, optimize, and monitor model...Temporary workRemote workVisa sponsorshipFlexible hours$125k - $135k
...Platform (inference, deployment automation, experimentation, sampling)... ...(not deep expertise) Machine Learning frameworks: TensorFlow, PyTorch... ...or similar Requirements Evaluate and benchmark new ML inference... ..., Computer or Electrical Engineering, Mathematics, or a related...Temporary workWork experience placementRemote workFlexible hours$10k
...corporation, created to provide Automation Solutions and Support... ...and have a desire to learn and grow. Yaskawa's... ...have a passion for machine learning and advanced... ...Business Unit. This engineer will work with the guidance... ...clustering, and model evaluation is a must....Internship$500 per month
...Forward Deployed Senior Machine Learning Engineer Adelphi builds AI/ML-enabled... ...data silos, build trust in automation without compromising... ...mix of software development, LLM Ops, and secdevops practices... ...AutoGen, or similar) and agent evaluation / observability tooling....- ...Machine Learning Engineer Location: Cupertino, CA BOUT THIS FEATURED OPPORTUNITY... ..., anomaly detection, and operational automation. This role will support two major initiatives... ...platform that uses image-based analysis to evaluate store readiness, supply conditions,...Local areaRemote workFlexible hours
- ...Overview: Machine Learning Engineer Philadelphia, PA OR Washington, DC | Hybrid: 3-4 days/week 9 + Months Role... ...tooling teams. Enhance existing AIML automation tools (e.g., Speech data), implement LLM prompt interactions, and use LLMs to test LLMs -...3 days per week
- ...Join Our Data Products and Machine Learning Development Remote Startup... ...looking for a Machine Learning Engineer to help take our expertise to... ...visual recognition and automation for various industries, improving... ...). Background in modern LLM technologies. Understanding...Remote work
- ...Tyto Athene is seeking a driven and adaptable Machine Learning Engineer to help shape the future of cybersecurity through automation and machine learning. This role is an... ...outside of the current trends, e.g. knows pre-LLM NLP theory and how approaches such as genetic...Remote workWorldwide
$170k - $230k
...Machine Learning Engineer Help us solve fraud asap with Casap — where we're building the world's first AI-native disputes automation and fraud prevention platform. Our mission is to create a future... ...machine learning models to evaluate disputes and chargebacks and likelihood...Full timeWork at officeImmediate startHome officeMonday to FridayFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Machine Learning Engineer - LLM Evaluation & Automation. Be the first to apply!
- lead machine learning engineer United States
- graduate machine learning engineer United States
- machine learning engineer United States
- data scientist machine learning engineer United States
- junior machine learning research engineer United States
- senior ml engineer United States
- computer vision machine learning engineer United States
- staff machine learning engineer United States
- ai ml engineer United States
- junior machine learning engineer United States

