Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Machine Learning Engineer - LLM Evaluation & Automation

Grid Dynamics Holdings

We are seeking a highly skilled Machine Learning Engineer who specializes in leveraging Large Language Models (LLMs) for automated evaluation and quality assessment. In this role, you will design and build systems that automatically measure and improve the accuracy, relevance, and consistency of model outputs. You will lead initiatives to create evaluation pipelines, develop metrics, and deliver actionable insights for continuous improvements. This position requires strong technical expertise, analytical problem-solving abilities, and the capacity to manage projects across multiple cross-functional teams.

Essential functions


Responsibilities:
  • Design and implement automated systems and pipelines for evaluating LLM outputs.
  • Develop metrics and KPIs to measure output quality, accuracy, and consistency using LLM-based evaluations
  • Collaborate with Engineering teams to create automated logic checks and validation tools.
  • Partner with Data Scientists to analyze evaluation results and optimize prompt and task structures.
  • Provide feedback loops to ensure evaluation guidelines align with LLM-based assessments.
  • Investigate how LLM-derived evaluations can enhance product reliability and user experience.
  • Recommend refinements to prompt engineering, evaluation strategies, and automation tools.
  • Stay informed on emerging trends in LLM evaluation, automated quality assessment, and AI toolchains.
  • Continuously improve and expand automated evaluation processes based on industry best practices.
Qualifications
  • 5+ years of experience in ML engineering, NLP, or AI/ML automation.
  • Advanced degree (MS/PhD) in Statistics, Data Science, Computational Social Science, Quantitative Psychology, or a related field.
  • Hands-on experience in prompt engineering and designing LLM-based evaluation systems is preferred
  • Strong understanding of machine learning principles with focus on NLP and advanced LLM capabilities (e.g., Chain-of-Thought, agentic workflows)
  • Expertise in building automated evaluation or QA pipelines.
  • Excellent analytical and problem-solving skills with experience in root cause and error pattern analysis.
  • Proven project management and cross-functional collaboration experience.
  • Excellent communication skills to convey complex insights to technical and non-technical audiences.
  • Detail-oriented mindset with a focus on evaluation metrics, prompt design, and automation.
  • Ability to quickly adapt to new business rules and evaluation guidelines across diverse product domains.
  • Strong programming skills in Python and SQL.
  • Experience with big data technologies like PySpark for data aggregation and sampling is a strong plus
We offer
  • Opportunity to work on cutting-edge projects
  • Work with a highly motivated and dedicated team
  • Competitive salary
  • Flexible schedule
  • Benefits package - medical insurance, vision, dental, etc.
  • Corporate social events
  • Professional development opportunities
  • Well-equipped office

About us


Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI , supported by profound expertise and ongoing investment in data , analytics , cloud & DevOps , application modernization and customer experience . Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.
Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Machine Learning Engineer - LLM Evaluation & Automation in United States vacancy
  • $60 - $70 per hour

     ...Overview: We are seeking a Machine Learning Engineer to join a high-impact team focused on advancing LLM evaluation, NLP, and AI-driven automation. This role centers on designing scalable evaluation frameworks, optimizing prompt strategies, and building systems that... 
    Suggested
    Contract work
    Temporary work
    Remote work
    3 days per week

    TEKsystems

    Seattle, WA
    2 days ago
  • $204k - $259k

     ...Senior Machine Learning Engineer – VLM/LLM Evaluation Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver... 
    Suggested
    Full time
    Temporary work
    Remote work

    Waymo

    San Francisco, CA
    3 days ago
  • $200k - $365k

     ...champions continuous learning and fast career development...  ..., defensible, and automated metrics that researchers...  ...strong software engineering skills (especially in...  ...systems, data pipelines, or evaluation harnesses that can run...  ...like for a Speech LLM, translating capabilities... 
    Suggested
    Full time
    Work at office
    Remote work
    Worldwide

    Plaud

    United States
    3 days ago
  •  ...Clinical Intelligence and Workflow Automation capabilities to support clinicians,...  ...outcomes. We are seeking a Senior Machine Learning / LLM Engineer to design and implement advanced AI...  ...in clinical environments Create evaluation frameworks to ensure AI outputs are... 
    Suggested
    Contract work

    StaffSource

    Atlanta, GA
    2 hours ago
  •  ...frontier research for their next generation of LLM products. Join us if you: Wish to work...  ...advancement. Responsibilities Own LLM evaluation processes and methods with a focus on...  ...abrupt shift in focus. You must be able to learn, implement, and extend state-of-the-art... 
    Suggested
    Local area
    Shift work

    Capitolis

    San Francisco, CA
    3 days ago
  • $200k - $275k

     ...Do We are looking for a Machine Learning Engineer to help build cutting edge...  ...infrastructure for building and serving LLM’s  at Moveworks. This role...  ...models(LLM), model evaluation and monitoring framework,...  .... Build abstractions to automate various steps in different... 
    Full time

    Moveworks

    Mountain View, CA
    more than 2 months ago
  •  ...Description Dealer Automation Technologies is a...  ..., for a Senior AI/ML Engineer specializing in Large...  ...designing and implementing machine learning models, particularly...  ...neural networks, and evaluating model performance....  ...to integrate LLM-based automation into... 
    Full time
    Part time
    Local area
    Flexible hours

    Bomnin Chevrolet Dadeland

    Miami, FL
    4 days ago
  • $100k - $120k

     ...Machine Learning Engineer At Qloo, our cutting-edge Taste AI technology leverages extraordinary amounts...  ...Learning Engineer reporting to the LLM Research Lead, you will operate at the...  ...explainability Experiment with and evaluate modern ML approaches (transformers,... 
    Remote work
    Flexible hours

    Qloo Inc.

    United States
    4 days ago
  •  ...About Kinetic Kinetic Automation is building a network of automated repair...  ...Oakland. You’ll collaborate with other engineers and researchers to develop, evaluate, and help deploy vision models for...  ...must understand how Transformer/LLM building blocks map to vision (ViT... 

    Kinetic Corporation

    Costa Mesa, CA
    5 days ago
  • $204k - $259k

     ...builds the system which learns the spatial-temporal...  ...set of sensors, enabling engineers like you to (1) develop...  ...Develop and rigorously evaluate metrics and methodologies...  ...years of experience in Machine Learning, with a focus...  ...model development (LLM, VLM, or similar foundation... 
    Full time
    Remote work

    Waymo

    Mountain View, CA
    1 day ago
  • $225k - $325k

     ...AI. Instead of basic automation that needs constant human...  ...role for ML engineers who want to build production...  ...As a Founding Senior Machine Learning Engineer at Retell,...  ...models and audio models, evaluate them with rigorous...  ...Interview (45 min) : LLM theory specific... 
    H1b
    Work at office

    Retell AI

    San Francisco, CA
    4 days ago
  • $264k - $330k

     ...simple assistance to power real automation and decision-making. Who...  ...We're seeking a Principal Machine Learning Engineer to help define and lead the...  ..., model training, evaluation, deployment, and inference...  ...and deploying open source LLM and SLM to production for optimizing... 
    Remote work
    Flexible hours

    AppFolio

    United States
    9 hours ago
  • $240.45k - $300.3k

     ...Senior Machine Learning Engineer - Model Evaluations, Public Sector The Public Sector ML team at Scale deploys advanced...  ...will design, implement, and scale automated evaluation pipelines that help...  ...robustness, and safety metrics, including LLM-judge-based evaluations. Design... 
    Full time

    Scale AI

    Washington DC
    1 day ago
  •  ...Applied Data Scientist, LLM Evaluation Introduction At Driver...  ...a core compiler-like engine, a heavily asynchronous/distributed...  ...human judgment with automated signals. This role builds...  ...Master's, or PhD in Statistics, Machine Learning, Data Science,... 
    Remote work
    Flexible hours

    Driver AI Inc.

    United States
    3 days ago
  • $139.5k - $258.1k

     ...LLM Machine Learning Research Engineer Apple is seeking a Research Engineer to join our Foundation Model Preparation and Algorithm Team. We are looking...  ...server optimization, ML tools/platforms, datasets, and evaluation. You will develop reliable and scalable pipelines and... 
    Relocation

    Apple

    Seattle, WA
    2 days ago
  •  ...Fortune 500. By bridging the gap between LLM capabilities and domain-specific...  ...improve its fundamentals?" CTGT's Senior Machine Learning Engineer will operate deep within the model...  ...improvements in model output. Build the evaluation and deployment loops needed to ship... 

    CTGT

    San Francisco, CA
    3 days ago
  • $238k - $302k

     ...Driver Understanding and Evaluation (DUE) team at Waymo...  ...Driver. The DUE Machine Learning team will build and operate...  ...and software engineers who are passionate about...  ...and introduce automation for high-volume workflows...  ...Design and build Gen AI LLM/VLM solutions for... 
    Full time

    Waymo

    Mountain View, CA
    1 day ago
  •  ...visible initiatives focused on automation, workflow optimization, and...  ...We are seeking a Machine Learning Engineer to help design, implement,...  ...Support experimentation, evaluation, testing, and continuous improvement...  ...current with emerging AI, LLM, and machine learning technologies... 
    Flexible hours

    ExtendMyTeam

    Austin, TX
    15 days ago
  •  ...expertise, data, and automation work together to deliver...  ...our About page to learn more. The...  ...be Shepherd’s first Machine Learning Engineer, embedded in the Fully...  ...confidence scoring and evaluation frameworks that define...  ...frameworks or multi-step LLM orchestration (... 
    Full time
    Work at office

    Shepherd

    San Francisco, CA
    1 day ago
  • $200k - $300k

     ...connectors, flexible LLM choice, and robust...  ...- AI agents that automate real work across...  ...better over time: evaluation pipelines, quality...  ..., and the tooling engineers use to understand...  ...evaluation, reinforcement learning from human...  ...systems involving machine learning. ~ Analytically... 
    Full time
    Home office
    Flexible hours
    3 days per week

    Glean

    Remote
    1 day ago
  •  ...find answers and automate tasks. Powered by...  ...and continuously learn and adapt. Moveworks...  ...’ Reasoning Engine and natural language...  ...are looking for a Machine Learning Engineer...  ...building and serving LLM’s at Moveworks. This...  ...(LLM), model evaluation and monitoring framework... 
    Full time
    Work at office
    Remote work
    Flexible hours

    Servicenow

    Mountain View, CA
    1 day ago
  • $139.5k - $258.1k

     ...LLM Machine Learning Research Engineer, Model Optimization & Algorithms Development, SIML The Apple Intelligence Model Optimization and Algorithms...  ...debugging experience Experimental rigor when training/evaluating DNNs for the purpose of benchmarking neural network... 
    Relocation

    Apple

    Seattle, WA
    4 days ago
  • $176.17k - $251.67k

     ...Machine Learning Engineering Manager - LLM Serving (Remote - US) We are currently looking for a Machine Learning Engineering Manager - LLM Serving & Infrastructure...  ...for multiple LLM models, supporting batch, offline evaluation, and real‑time inference. Oversee the development and... 
    Remote work
    Flexible hours

    Jobgether

    New York, NY
    3 days ago
  • $141.8k - $258.6k

     ...leveraging multimodal capabilities. You will design and manage data annotation processes, work with ML Engineers, and develop LLM auto-judges for AI model evaluation. The ideal candidate has a BA/Master’s in a relevant field and at least 2 years of experience in survey... 

    Apple Inc.

    Cupertino, CA
    3 days ago
  •  ...Community You Will Join:  Machine Learning and Artificial...  ...services and tools including LLM fine-tuning, alignment and...  ...optimization, RAG/Search, LLM evaluation and testing automation, feedback-based learning...  ...machine learning engineer, you will be responsible... 
    Full time
    Casual work
    Live in
    Work at office
    Remote work

    Airbnb, Inc.

    United States
    1 day ago
  • $31 per hour

     ...analytics, intelligent automation, and a unified data...  ...to continuous learning. AI is reshaping every...  ...outcomes. As an AI/Machine Learning Engineer Intern , you will...  ...Develop and evolve LLM-backed features and...  ...Help build or extend evaluation harnesses, benchmarks... 
    Internship

    SentinelOne

    United States
    1 day ago
  •  ...Senior ML Engineer Medical Imaging Evaluation & AI Reliability About the Role: My client is building evaluation and evidence infrastructure...  ...Required Qualifications: Strong experience in machine learning for medical imaging (radiology, pathology, cardiology... 
    Shift work

    Established Search

    Sunnyvale, CA
    1 day ago
  • $183k - $246k

     ...construction through intelligent automation. Despite being a $13+...  ...65+ employees (25+ engineers), we’re scaling fast...  ...Establish and improve evaluation and experimentation...  ...agentic systems, LLM-based applications, or...  ...and Vision Insurance! Learning & Growth stipend Flexible... 
    Work at office
    Remote work
    Flexible hours

    Trunk Tools, Inc.

    New York, NY
    1 day ago
  •  ...Join Our Data Products and Machine Learning Development Remote Startup...  ...looking for a Machine Learning Engineer to help take our expertise to...  ...visual recognition and automation for various industries, improving...  ...). Background in modern LLM technologies. Understanding... 
    Remote work

    Mutt Data

    United States
    1 day ago
  •  ...Tyto Athene is seeking a driven and adaptable Machine Learning Engineer to help shape the future of cybersecurity through automation and machine learning. This role is an...  ...outside of the current trends, e.g. knows pre-LLM NLP theory and how approaches such as genetic... 
    Remote work
    Worldwide

    Tyto Athene, LLC

    United States
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Machine Learning Engineer - LLM Evaluation & Automation. Be the first to apply!