Senior Software Engineer - AI Evaluation

Alignerr

Senior Software Engineer – AI Evaluation

What if your engineering skills could directly determine whether the world's most advanced AI systems are actually working? We're looking for Senior Software Engineers to design, build, and scale the evaluation infrastructure that measures AI performance — the critical layer between raw model output and real-world trust.

This is high-impact, technically challenging work at the intersection of software engineering and AI research. You'll build the tools, pipelines, and frameworks that help leading research teams understand what their models can do, where they fail, and how to make them better. If you love building robust systems and care deeply about quality and measurement, this role puts you at the center of the AI revolution.

Design and build scalable evaluation pipelines and frameworks for assessing AI model performance across diverse tasks and domains
Develop automated testing harnesses, scoring systems, and benchmarking tools for large language models and other AI systems
Write clean, production-quality code to process, analyze, and visualize evaluation datasets at scale
Create and maintain APIs, dashboards, and internal tools that enable research teams to run, track, and compare evaluations efficiently
Collaborate with AI researchers and data scientists to translate evaluation methodologies into reliable, repeatable software
Identify edge cases, failure modes, and reliability issues in AI outputs through systematic engineering approaches
Optimize system performance, data processing speed, and infrastructure costs
Contribute to the architecture and technical direction of the evaluation platform
Write clear documentation and participate in code reviews to maintain high engineering standards

5+ years of professional software engineering experience, with a track record of building and shipping production systems
Strong proficiency in Python — including experience with data processing libraries (pandas, NumPy) and web frameworks (FastAPI, Flask, or Django)
Solid understanding of software architecture, design patterns, and engineering best practices
Experience working with large datasets and building data pipelines
Comfortable with cloud infrastructure (AWS, GCP, or Azure) and containerized deployments
Familiarity with version control (Git), CI/CD workflows, and testing frameworks
Strong problem-solving skills and the ability to work through ambiguity independently
Excellent written communication skills — you can document your work clearly and collaborate asynchronously
Self-motivated and reliable when working independently in a remote environment

Experience with ML/AI evaluation, benchmarking, or model testing
Familiarity with LLMs, prompt engineering, or AI safety and alignment concepts
Background in building developer tools, internal platforms, or data infrastructure
Experience with distributed systems, message queues, or workflow orchestration (Airflow, Prefect, etc.)
Knowledge of statistical methods for measuring and comparing model performance
Prior experience in a remote-first or async-first engineering culture
Contributions to open-source projects related to AI, ML, or evaluation tooling

Work on cutting-edge AI evaluation projects alongside world-class research labs
Directly influence how AI quality and safety are measured at scale — your code shapes the standard
Fully remote and flexible — work when and where you're most productive
Freelance autonomy with access to deeply meaningful, technically stimulating work
Collaborate with a global team of engineers and researchers pushing the boundaries of AI
Exposure to the latest developments in AI research, model capabilities, and evaluation science
Potential for ongoing work and contract extension as the platform and project scope grow

Apply

Vacancy posted 6 hours ago

Similar jobs that could be interesting for youBased on the Senior Software Engineer - AI Evaluation in United States vacancy

Contract Senior Software Engineer - AI Code Review & Evaluation
$50 - $150 per hour
A leading AI company is seeking a software engineer to review and evaluate model-generated code. This contract role requires several years of software engineering experience, particularly as a full-stack engineer at notable tech firms. You will assess code quality and...
Senior
Hourly pay
Contract work
Flexible hours
Turing
San Francisco, CA
1 day ago
Senior Software Engineer, EdTech Evaluations (Hybrid)
...Learning Commons in Redwood City, CA is seeking a Senior Software Engineer to design and build evaluation systems for educational technology products. As part... ...Evaluators team, you will work at the intersection of AI, learning science, and product development. The ideal...
Senior
Learning Commons
Redwood City, CA
3 days ago
Senior Software Engineer, Simulator Evaluation Scale & Validate Reality
...leading autonomous driving technology firm is seeking a Senior Software Engineer to architect evaluation methodologies for their simulator. The ideal candidate... ...design principles. You will work closely with AI research to ensure the simulator accurately represents...
Senior
Waymo
San Francisco, CA
3 days ago
Senior Software Engineer, Simulator Evaluation & Metrics
$204k - $259k
...leading autonomous driving technology company is seeking a Senior Software Engineer to architect evaluation methodologies for their hybrid simulator.... ...throughput data processing systems and partnering with AI research teams. The ideal candidate has over 5 years of...
Senior
Full time
Waymo
Mountain View, CA
3 days ago
Senior Software Engineer (Simulator Evaluation)
...Senior Software Engineer, Simulator Evaluation Mar 02, 2026 Waymo is an autonomous driving technology company with the mission to be the world's most trusted... ..., physical dynamics, and state-of-the-art Generative AI to create a training ground for the Waymo Driver. The...
Senior
Full time
Remote work
Waymo
San Francisco, CA
3 days ago
Senior Software Engineer, AI Data & LLM Evaluation (Remote)
...Kake is seeking a Senior Software Engineer to contribute to developing AI training data for a leading human data platform. This role involves working at the... ...experience in software engineering, with strong skills in evaluating AI-generated code and terminal-based workflows....
Senior
Remote work
KAKE
Poland, NY
3 days ago
Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible)
$175k - $245k
...Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible) -REMOTE, USA- For over 20 years, Smartsheet has helped people and teams achieve–well, anything. From seamless work management to smart, scalable solutions, we’ve always worked with flow. We’re...
Senior
Full time
Temporary work
Local area
Immediate start
Remote work
Smartsheet
Bellevue, WA
11 days ago
Senior Software Engineer, Autonomy Evaluation
$136k - $199.2k
...Autonomous Driving Software Architect General Motors is a global... ...About the Organization The Evaluation team builds and evolves the... ...results into clear feedback for engineering and leadership, and help... ...systems. Experience leveraging AI-assisted development and...
Senior
Remote work
Relocation
Relocation package
Flexible hours
General Motors
United States
23 hours ago
Senior Software Engineer - Agent Evaluation
$60 per hour
...Mindrift AI Coding Agent Evaluation Specialist Mindrift connects specialists with project-based AI... ...Not data labeling Not prompt engineering Not writing code from scratch - the... ...What we look for ~5+ years in software development ~ Core stack: Python (FastAPI...
Senior
Permanent employment
Temporary work
Remote work
Mind Rift
United States
4 days ago
Senior Software Engineer - LLM Trainer & Evaluator
...Kake is seeking a Senior Software Engineer to help develop and evaluate AI training data for an expert platform serving AI agents. This unique role requires strong software engineering expertise to create coding tasks, evaluate AI outputs, and contribute to AI model generation...
Senior
Remote work
KAKE
Peru, IL
3 days ago
(Senior) Software Engineer - Evaluation
$120k - $250k
...2016 in Silicon Valley, Pony.ai has quickly become a global leader... ..., and multi-dimensional evaluation. Design and implement high... ...Build and optimize downstream engineering workflows for Large Language... ...skills in C/C++, Python, and software design Strong foundation in...
Senior
Temporary work
pony.ai
Fremont, CA
15 days ago
Senior Software Engineer, Metrics and Evaluation - Autonomous Vehicles
$148k - $356.5k
Senior Software Engineer, Metrics and Evaluation - Autonomous Vehicles page is loaded Senior Software Engineer, Metrics and Evaluation - Autonomous Vehicles... ...tackle challenges no one else can solve. Our work in AI and digital twins is transforming the world's largest...
Senior
Full time
Remote work
NVIDIA Corporation
Raleigh, NC
2 days ago
Senior Software Engineer, Evaluators, Learning Commons
$190k - $238k
...Senior Software Engineer, Evaluators, Learning Commons Redwood City, CA (Hybrid) Learning Commons aims to scale proven teaching and learning practices to benefit every learner by building AI infrastructure that better connects the way students learn to the tools they...
Senior
Work at office
Relocation package
3 days per week
Learning Commons
Redwood City, CA
3 days ago
Senior Software Engineer (Large Model Evaluation)
$204k - $259k
...+ U.S. states. The Large Model Evaluation team is at the nexus of Waymo’s AI ambition. With advancements in Large... ...looking for quantitatively‑minded engineers to research and propose new ways... ...experience in a heavily quantitative software engineering area ~ Experience...
Senior
Full time
Remote work
Waymo
San Francisco, CA
3 days ago
Remote Senior Software Engineer AI Evaluation & Benchmarks (Python)
$80 - $100 per hour
...Design and build the coding benchmarks and evaluation pipelines used to test frontier AI models on real software engineering work: Design coding benchmarks that evaluate... ...country of residence. Nice to have Senior or Lead-level profile with a history of technical...
Senior
Full time
Contract work
For contractors
Remote work
GrabJobs
United States
6 hours ago
Senior AI Software Engineer - Model Evaluation (f/m/d)
...Senior AI Engineer In Pre-training Evaluation Aleph Alpha Research's mission is to deliver category-defining AI innovation that enables open, accessible, and trustworthy deployment of GenAI in industrial applications. Our organization develops foundational models and...
Senior
Remote work
Relocation
Flexible hours
Aleph Alpha
United States
1 day ago
Senior Software Engineer, AI Evaluation Platform
$356.5k
NVIDIA Gruppe is seeking a Senior Software Engineer to develop the NeMo Platform, a product that enhances AI systems. You will design Python APIs and systems to monitor agent behaviors and improve performance efficiently. The ideal candidate will have strong Python skills...
Senior
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Senior AI Research Engineer - RAG & GenAI Evaluation
Drata is seeking a Senior Applied Research Engineer to enhance the quality of AI systems through rigorous evaluation and experimentation. This role emphasizes applied research, focusing on information retrieval and reasoning strategies. The ideal candidate will bring 5+...
Senior
jobr.pro
San Francisco, CA
2 days ago
Senior AI Evaluation & Reliability Engineer
$70 - $80 per hour
...A leading AI solutions firm in Redwood City seeks a Senior Engineer specializing in AI Evaluation & Reliability. The role focuses on designing evaluation metrics, ensuring operational excellence for AI features, and requires substantial experience in machine learning systems...
Senior
Contract work
3 days per week
The Mice Groups Inc
Redwood City, CA
3 days ago
Senior AI Research Engineer: RAG, Evaluation & GenAI
Cacheflow is seeking a Senior Applied Research Engineer to enhance AI systems through rigorous experimentation and applied research. This research-focused... ...will design information access strategies, evaluate innovative methodologies, and collaborate closely with...
Senior
Flexible hours
Cacheflow
San Francisco, CA
1 day ago
Senior Software Engineer, Data Platform - Experimentation & Evaluation
$156k - $387.6k
...Senior Software Engineer, Data Platform - Experimentation & Evaluation Location: San Jose Employment Type: Regular Job Code: X9644 Responsibilities Team Introduction Our mission in experimentation and evaluation team is to build the next‑gen A/B testing platform, that...
Senior
Temporary work
Local area
Ellis Technologies, Inc.
San Jose, CA
4 days ago
Senior ML Tech Lead — Autograder Systems & Evaluation
...California, is seeking a Sr Machine Learning Engineer, Tech Lead for Autograder Systems. In this high... ...role, you will define the technical vision for evaluating model outputs and lead a team of MLEs to enhance generative AI features. Candidates should have a Master's or...
Senior
Apple Inc.
Cupertino, CA
23 hours ago
Senior Scala/Kotlin/OCaml Engineer for AI Evaluation
$120 per hour
Mercor is seeking expert software engineers skilled in Scala, Kotlin, or OCaml to evaluate advanced AI systems in specialized engineering domains. You'll apply your expertise to assess complex technical scenarios and influence the development of AI in key ecosystems. The...
Senior
Hourly pay
Mercor
Henderson, NV
2 days ago
Senior Conversational AI Evaluation Engineer
Blueface Ltd in Washington seeks an experienced AI Evaluator to design and develop evaluation pipelines for conversational AI. The role involves defining metrics, conducting experiments, and ensuring high-quality AI solutions. The ideal candidate will have 5-7 years of...
Senior
Blueface Ltd
Washington DC
2 days ago
Senior Scala/Kotlin/OCaml Engineer for AI Evaluation
$120 per hour
Mercor, a leading AI research organization, is seeking expert software engineers specialized in Scala, Kotlin, and OCaml. You'll evaluate complex technical tasks in real-world scenarios and provide structured assessments, influencing the performance of advanced AI systems...
Senior
Hourly pay
Flexible hours
Mercor
Lancaster, CA
2 days ago
Senior AI Evaluation Engineer — Metrics & Data Pipelines
$240k - $280k
A leading software monitoring company is seeking a Senior Software Engineer on its AI/ML team to build evaluation infrastructure for measuring the performance of AI systems. This role involves designing datasets, creating benchmarks, and ensuring AI features behave reliably...
Senior
Sentry
San Francisco, CA
1 day ago
Senior AI Engineer, Agentic Evaluation & V&V
$150k - $250k
...Senior AI Engineer, Agentic Evaluation & V&V At Slingshot Aerospace, we're on a mission to make space safer and more secure for everyone. Our work... ...operations will be powered by better data and smarter software. This role focuses on building and scaling evaluation...
Senior
Full time
Remote work
Slingshot Aerospace
United States
1 day ago
Senior AI Engineer: Agentic Evaluation & V&V for Autonomy
Slingshot Aerospace is looking for a Senior AI Engineer to focus on Agentic Evaluation and V&V. The role involves building evaluation frameworks and simulation... ...AI. Candidates must have 6+ years of experience in software or ML engineering, strong Python skills, and a...
Senior
Full time
Remote work
Slingshot Aerospace
Phoenix, AZ
23 hours ago
Senior Software Engineer, Data Platform - Experimentation & Evaluation
$156k - $387.6k
...Team Introduction Our mission in experimentation and evaluation team is to build the next-gen A/B testing platform, that empowers... ...to make bold hypotheses and cautious verification. As a software engineer in experimentation and evaluation team, you will have the opportunity...
Senior
Temporary work
Local area
Tik Tok
San Jose, CA
3 days ago
Senior Software Engineer for LLM Evaluation
$50 per hour
...This role focuses on creating advanced datasets for training and evaluating large language models, collaborating closely with researchers to enhance AI-driven coding solutions. As a Software Engineering evaluator, you will curate code examples, provide precise solutions...
Senior
Remote job
For contractors
Flexible hours
SaidGig
Remote
7 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Engineer - AI Evaluation. Be the first to apply!