Machine Learning Engineer - LLM Evaluation & Automation
Grid Dynamics Holdings
We are seeking a highly skilled Machine Learning Engineer who specializes in leveraging Large Language Models (LLMs) for automated evaluation and quality assessment. In this role, you will design and build systems that automatically measure and improve the accuracy, relevance, and consistency of model outputs. You will lead initiatives to create evaluation pipelines, develop metrics, and deliver actionable insights for continuous improvements. This position requires strong technical expertise, analytical problem-solving abilities, and the capacity to manage projects across multiple cross-functional teams.
Essential functionsResponsibilities:
- Design and implement automated systems and pipelines for evaluating LLM outputs.
- Develop metrics and KPIs to measure output quality, accuracy, and consistency using LLM-based evaluations
- Collaborate with Engineering teams to create automated logic checks and validation tools.
- Partner with Data Scientists to analyze evaluation results and optimize prompt and task structures.
- Provide feedback loops to ensure evaluation guidelines align with LLM-based assessments.
- Investigate how LLM-derived evaluations can enhance product reliability and user experience.
- Recommend refinements to prompt engineering, evaluation strategies, and automation tools.
- Stay informed on emerging trends in LLM evaluation, automated quality assessment, and AI toolchains.
- Continuously improve and expand automated evaluation processes based on industry best practices.
- 5+ years of experience in ML engineering, NLP, or AI/ML automation.
- Advanced degree (MS/PhD) in Statistics, Data Science, Computational Social Science, Quantitative Psychology, or a related field.
- Hands-on experience in prompt engineering and designing LLM-based evaluation systems is preferred
- Strong understanding of machine learning principles with focus on NLP and advanced LLM capabilities (e.g., Chain-of-Thought, agentic workflows)
- Expertise in building automated evaluation or QA pipelines.
- Excellent analytical and problem-solving skills with experience in root cause and error pattern analysis.
- Proven project management and cross-functional collaboration experience.
- Excellent communication skills to convey complex insights to technical and non-technical audiences.
- Detail-oriented mindset with a focus on evaluation metrics, prompt design, and automation.
- Ability to quickly adapt to new business rules and evaluation guidelines across diverse product domains.
- Strong programming skills in Python and SQL.
- Experience with big data technologies like PySpark for data aggregation and sampling is a strong plus
- Opportunity to work on cutting-edge projects
- Work with a highly motivated and dedicated team
- Competitive salary
- Flexible schedule
- Benefits package - medical insurance, vision, dental, etc.
- Corporate social events
- Professional development opportunities
- Well-equipped office
About us
Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI , supported by profound expertise and ongoing investment in data , analytics , cloud & DevOps , application modernization and customer experience . Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.
$60 - $70 per hour
...Overview: We are seeking a Machine Learning Engineer to join a high-impact team focused on advancing LLM evaluation, NLP, and AI-driven automation. This role centers on designing scalable evaluation frameworks, optimizing prompt strategies, and building systems that...SuggestedContract workTemporary workRemote work3 days per week$204k - $259k
...Senior Machine Learning Engineer – VLM/LLM Evaluation Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver...SuggestedFull timeTemporary workRemote work$200k - $365k
...champions continuous learning and fast career development... ..., defensible, and automated metrics that researchers... ...strong software engineering skills (especially in... ...systems, data pipelines, or evaluation harnesses that can run... ...like for a Speech LLM, translating capabilities...SuggestedFull timeWork at officeRemote workWorldwide- ...Clinical Intelligence and Workflow Automation capabilities to support clinicians,... ...outcomes. We are seeking a Senior Machine Learning / LLM Engineer to design and implement advanced AI... ...in clinical environments Create evaluation frameworks to ensure AI outputs are...SuggestedContract work
- ...frontier research for their next generation of LLM products. Join us if you: Wish to work... ...advancement. Responsibilities Own LLM evaluation processes and methods with a focus on... ...abrupt shift in focus. You must be able to learn, implement, and extend state-of-the-art...SuggestedLocal areaShift work
$200k - $275k
...Do We are looking for a Machine Learning Engineer to help build cutting edge... ...infrastructure for building and serving LLM’s at Moveworks. This role... ...models(LLM), model evaluation and monitoring framework,... .... Build abstractions to automate various steps in different...Full time- ...Description Dealer Automation Technologies is a... ..., for a Senior AI/ML Engineer specializing in Large... ...designing and implementing machine learning models, particularly... ...neural networks, and evaluating model performance.... ...to integrate LLM-based automation into...Full timePart timeLocal areaFlexible hours
$100k - $120k
...Machine Learning Engineer At Qloo, our cutting-edge Taste AI technology leverages extraordinary amounts... ...Learning Engineer reporting to the LLM Research Lead, you will operate at the... ...explainability Experiment with and evaluate modern ML approaches (transformers,...Remote workFlexible hours- ...About Kinetic Kinetic Automation is building a network of automated repair... ...Oakland. You’ll collaborate with other engineers and researchers to develop, evaluate, and help deploy vision models for... ...must understand how Transformer/LLM building blocks map to vision (ViT...
$204k - $259k
...builds the system which learns the spatial-temporal... ...set of sensors, enabling engineers like you to (1) develop... ...Develop and rigorously evaluate metrics and methodologies... ...years of experience in Machine Learning, with a focus... ...model development (LLM, VLM, or similar foundation...Full timeRemote work$225k - $325k
...AI. Instead of basic automation that needs constant human... ...role for ML engineers who want to build production... ...As a Founding Senior Machine Learning Engineer at Retell,... ...models and audio models, evaluate them with rigorous... ...Interview (45 min) : LLM theory specific...H1bWork at office$264k - $330k
...simple assistance to power real automation and decision-making. Who... ...We're seeking a Principal Machine Learning Engineer to help define and lead the... ..., model training, evaluation, deployment, and inference... ...and deploying open source LLM and SLM to production for optimizing...Remote workFlexible hours$240.45k - $300.3k
...Senior Machine Learning Engineer - Model Evaluations, Public Sector The Public Sector ML team at Scale deploys advanced... ...will design, implement, and scale automated evaluation pipelines that help... ...robustness, and safety metrics, including LLM-judge-based evaluations. Design...Full time- ...Applied Data Scientist, LLM Evaluation Introduction At Driver... ...a core compiler-like engine, a heavily asynchronous/distributed... ...human judgment with automated signals. This role builds... ...Master's, or PhD in Statistics, Machine Learning, Data Science,...Remote workFlexible hours
$139.5k - $258.1k
...LLM Machine Learning Research Engineer Apple is seeking a Research Engineer to join our Foundation Model Preparation and Algorithm Team. We are looking... ...server optimization, ML tools/platforms, datasets, and evaluation. You will develop reliable and scalable pipelines and...Relocation- ...Fortune 500. By bridging the gap between LLM capabilities and domain-specific... ...improve its fundamentals?" CTGT's Senior Machine Learning Engineer will operate deep within the model... ...improvements in model output. Build the evaluation and deployment loops needed to ship...
$238k - $302k
...Driver Understanding and Evaluation (DUE) team at Waymo... ...Driver. The DUE Machine Learning team will build and operate... ...and software engineers who are passionate about... ...and introduce automation for high-volume workflows... ...Design and build Gen AI LLM/VLM solutions for...Full time- ...visible initiatives focused on automation, workflow optimization, and... ...We are seeking a Machine Learning Engineer to help design, implement,... ...Support experimentation, evaluation, testing, and continuous improvement... ...current with emerging AI, LLM, and machine learning technologies...Flexible hours
- ...expertise, data, and automation work together to deliver... ...our About page to learn more. The... ...be Shepherd’s first Machine Learning Engineer, embedded in the Fully... ...confidence scoring and evaluation frameworks that define... ...frameworks or multi-step LLM orchestration (...Full timeWork at office
$200k - $300k
...connectors, flexible LLM choice, and robust... ...- AI agents that automate real work across... ...better over time: evaluation pipelines, quality... ..., and the tooling engineers use to understand... ...evaluation, reinforcement learning from human... ...systems involving machine learning. ~ Analytically...Full timeHome officeFlexible hours3 days per week- ...find answers and automate tasks. Powered by... ...and continuously learn and adapt. Moveworks... ...’ Reasoning Engine and natural language... ...are looking for a Machine Learning Engineer... ...building and serving LLM’s at Moveworks. This... ...(LLM), model evaluation and monitoring framework...Full timeWork at officeRemote workFlexible hours
$139.5k - $258.1k
...LLM Machine Learning Research Engineer, Model Optimization & Algorithms Development, SIML The Apple Intelligence Model Optimization and Algorithms... ...debugging experience Experimental rigor when training/evaluating DNNs for the purpose of benchmarking neural network...Relocation$176.17k - $251.67k
...Machine Learning Engineering Manager - LLM Serving (Remote - US) We are currently looking for a Machine Learning Engineering Manager - LLM Serving & Infrastructure... ...for multiple LLM models, supporting batch, offline evaluation, and real‑time inference. Oversee the development and...Remote workFlexible hours$141.8k - $258.6k
...leveraging multimodal capabilities. You will design and manage data annotation processes, work with ML Engineers, and develop LLM auto-judges for AI model evaluation. The ideal candidate has a BA/Master’s in a relevant field and at least 2 years of experience in survey...- ...Community You Will Join: Machine Learning and Artificial... ...services and tools including LLM fine-tuning, alignment and... ...optimization, RAG/Search, LLM evaluation and testing automation, feedback-based learning... ...machine learning engineer, you will be responsible...Full timeCasual workLive inWork at officeRemote work
$31 per hour
...analytics, intelligent automation, and a unified data... ...to continuous learning. AI is reshaping every... ...outcomes. As an AI/Machine Learning Engineer Intern , you will... ...Develop and evolve LLM-backed features and... ...Help build or extend evaluation harnesses, benchmarks...Internship- ...Senior ML Engineer Medical Imaging Evaluation & AI Reliability About the Role: My client is building evaluation and evidence infrastructure... ...Required Qualifications: Strong experience in machine learning for medical imaging (radiology, pathology, cardiology...Shift work
$183k - $246k
...construction through intelligent automation. Despite being a $13+... ...65+ employees (25+ engineers), we’re scaling fast... ...Establish and improve evaluation and experimentation... ...agentic systems, LLM-based applications, or... ...and Vision Insurance! Learning & Growth stipend Flexible...Work at officeRemote workFlexible hours- ...Join Our Data Products and Machine Learning Development Remote Startup... ...looking for a Machine Learning Engineer to help take our expertise to... ...visual recognition and automation for various industries, improving... ...). Background in modern LLM technologies. Understanding...Remote work
- ...Tyto Athene is seeking a driven and adaptable Machine Learning Engineer to help shape the future of cybersecurity through automation and machine learning. This role is an... ...outside of the current trends, e.g. knows pre-LLM NLP theory and how approaches such as genetic...Remote workWorldwide
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Machine Learning Engineer - LLM Evaluation & Automation. Be the first to apply!
- entry level machine learning engineer United States
- senior ml engineer United States
- data scientist machine learning engineer United States
- machine learning ai engineer United States
- lead machine learning engineer United States
- junior machine learning engineer United States
- staff machine learning engineer United States
- junior machine learning research engineer United States
- computer vision machine learning engineer United States
- graduate machine learning engineer United States




