Senior Software Engineer - AI Evaluation
Alignerr
Senior Software Engineer – AI Evaluation
What if your engineering skills could directly determine whether the world's most advanced AI systems are actually working? We're looking for Senior Software Engineers to design, build, and scale the evaluation infrastructure that measures AI performance — the critical layer between raw model output and real-world trust.
This is high-impact, technically challenging work at the intersection of software engineering and AI research. You'll build the tools, pipelines, and frameworks that help leading research teams understand what their models can do, where they fail, and how to make them better. If you love building robust systems and care deeply about quality and measurement, this role puts you at the center of the AI revolution.
- Design and build scalable evaluation pipelines and frameworks for assessing AI model performance across diverse tasks and domains
- Develop automated testing harnesses, scoring systems, and benchmarking tools for large language models and other AI systems
- Write clean, production-quality code to process, analyze, and visualize evaluation datasets at scale
- Create and maintain APIs, dashboards, and internal tools that enable research teams to run, track, and compare evaluations efficiently
- Collaborate with AI researchers and data scientists to translate evaluation methodologies into reliable, repeatable software
- Identify edge cases, failure modes, and reliability issues in AI outputs through systematic engineering approaches
- Optimize system performance, data processing speed, and infrastructure costs
- Contribute to the architecture and technical direction of the evaluation platform
- Write clear documentation and participate in code reviews to maintain high engineering standards
- 5+ years of professional software engineering experience, with a track record of building and shipping production systems
- Strong proficiency in Python — including experience with data processing libraries (pandas, NumPy) and web frameworks (FastAPI, Flask, or Django)
- Solid understanding of software architecture, design patterns, and engineering best practices
- Experience working with large datasets and building data pipelines
- Comfortable with cloud infrastructure (AWS, GCP, or Azure) and containerized deployments
- Familiarity with version control (Git), CI/CD workflows, and testing frameworks
- Strong problem-solving skills and the ability to work through ambiguity independently
- Excellent written communication skills — you can document your work clearly and collaborate asynchronously
- Self-motivated and reliable when working independently in a remote environment
- Experience with ML/AI evaluation, benchmarking, or model testing
- Familiarity with LLMs, prompt engineering, or AI safety and alignment concepts
- Background in building developer tools, internal platforms, or data infrastructure
- Experience with distributed systems, message queues, or workflow orchestration (Airflow, Prefect, etc.)
- Knowledge of statistical methods for measuring and comparing model performance
- Prior experience in a remote-first or async-first engineering culture
- Contributions to open-source projects related to AI, ML, or evaluation tooling
- Work on cutting-edge AI evaluation projects alongside world-class research labs
- Directly influence how AI quality and safety are measured at scale — your code shapes the standard
- Fully remote and flexible — work when and where you're most productive
- Freelance autonomy with access to deeply meaningful, technically stimulating work
- Collaborate with a global team of engineers and researchers pushing the boundaries of AI
- Exposure to the latest developments in AI research, model capabilities, and evaluation science
- Potential for ongoing work and contract extension as the platform and project scope grow
$50 - $150 per hour
A leading AI company is seeking a software engineer to review and evaluate model-generated code. This contract role requires several years of software engineering experience, particularly as a full-stack engineer at notable tech firms. You will assess code quality and...SeniorHourly payContract workFlexible hours- ...Learning Commons in Redwood City, CA is seeking a Senior Software Engineer to design and build evaluation systems for educational technology products. As part... ...Evaluators team, you will work at the intersection of AI, learning science, and product development. The ideal...Senior
- ...leading autonomous driving technology firm is seeking a Senior Software Engineer to architect evaluation methodologies for their simulator. The ideal candidate... ...design principles. You will work closely with AI research to ensure the simulator accurately represents...Senior
$204k - $259k
...leading autonomous driving technology company is seeking a Senior Software Engineer to architect evaluation methodologies for their hybrid simulator.... ...throughput data processing systems and partnering with AI research teams. The ideal candidate has over 5 years of...SeniorFull time- ...Senior Software Engineer, Simulator Evaluation Mar 02, 2026 Waymo is an autonomous driving technology company with the mission to be the world's most trusted... ..., physical dynamics, and state-of-the-art Generative AI to create a training ground for the Waymo Driver. The...SeniorFull timeRemote work
- ...Kake is seeking a Senior Software Engineer to contribute to developing AI training data for a leading human data platform. This role involves working at the... ...experience in software engineering, with strong skills in evaluating AI-generated code and terminal-based workflows....SeniorRemote work
$175k - $245k
...Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible) -REMOTE, USA- For over 20 years, Smartsheet has helped people and teams achieve–well, anything. From seamless work management to smart, scalable solutions, we’ve always worked with flow. We’re...SeniorFull timeTemporary workLocal areaImmediate startRemote work$136k - $199.2k
...Autonomous Driving Software Architect General Motors is a global... ...About the Organization The Evaluation team builds and evolves the... ...results into clear feedback for engineering and leadership, and help... ...systems. Experience leveraging AI-assisted development and...SeniorRemote workRelocationRelocation packageFlexible hours$60 per hour
...Mindrift AI Coding Agent Evaluation Specialist Mindrift connects specialists with project-based AI... ...Not data labeling Not prompt engineering Not writing code from scratch - the... ...What we look for ~5+ years in software development ~ Core stack: Python (FastAPI...SeniorPermanent employmentTemporary workRemote work- ...Kake is seeking a Senior Software Engineer to help develop and evaluate AI training data for an expert platform serving AI agents. This unique role requires strong software engineering expertise to create coding tasks, evaluate AI outputs, and contribute to AI model generation...SeniorRemote work
$120k - $250k
...2016 in Silicon Valley, Pony.ai has quickly become a global leader... ..., and multi-dimensional evaluation. Design and implement high... ...Build and optimize downstream engineering workflows for Large Language... ...skills in C/C++, Python, and software design Strong foundation in...SeniorTemporary work$148k - $356.5k
Senior Software Engineer, Metrics and Evaluation - Autonomous Vehicles page is loaded Senior Software Engineer, Metrics and Evaluation - Autonomous Vehicles... ...tackle challenges no one else can solve. Our work in AI and digital twins is transforming the world's largest...SeniorFull timeRemote work$190k - $238k
...Senior Software Engineer, Evaluators, Learning Commons Redwood City, CA (Hybrid) Learning Commons aims to scale proven teaching and learning practices to benefit every learner by building AI infrastructure that better connects the way students learn to the tools they...SeniorWork at officeRelocation package3 days per week$204k - $259k
...+ U.S. states. The Large Model Evaluation team is at the nexus of Waymo’s AI ambition. With advancements in Large... ...looking for quantitatively‑minded engineers to research and propose new ways... ...experience in a heavily quantitative software engineering area ~ Experience...SeniorFull timeRemote work$80 - $100 per hour
...Design and build the coding benchmarks and evaluation pipelines used to test frontier AI models on real software engineering work: Design coding benchmarks that evaluate... ...country of residence. Nice to have Senior or Lead-level profile with a history of technical...SeniorFull timeContract workFor contractorsRemote work- ...Senior AI Engineer In Pre-training Evaluation Aleph Alpha Research's mission is to deliver category-defining AI innovation that enables open, accessible, and trustworthy deployment of GenAI in industrial applications. Our organization develops foundational models and...SeniorRemote workRelocationFlexible hours
$356.5k
NVIDIA Gruppe is seeking a Senior Software Engineer to develop the NeMo Platform, a product that enhances AI systems. You will design Python APIs and systems to monitor agent behaviors and improve performance efficiently. The ideal candidate will have strong Python skills...Senior- Drata is seeking a Senior Applied Research Engineer to enhance the quality of AI systems through rigorous evaluation and experimentation. This role emphasizes applied research, focusing on information retrieval and reasoning strategies. The ideal candidate will bring 5+...Senior
$70 - $80 per hour
...A leading AI solutions firm in Redwood City seeks a Senior Engineer specializing in AI Evaluation & Reliability. The role focuses on designing evaluation metrics, ensuring operational excellence for AI features, and requires substantial experience in machine learning systems...SeniorContract work3 days per week- Cacheflow is seeking a Senior Applied Research Engineer to enhance AI systems through rigorous experimentation and applied research. This research-focused... ...will design information access strategies, evaluate innovative methodologies, and collaborate closely with...SeniorFlexible hours
$156k - $387.6k
...Senior Software Engineer, Data Platform - Experimentation & Evaluation Location: San Jose Employment Type: Regular Job Code: X9644 Responsibilities Team Introduction Our mission in experimentation and evaluation team is to build the next‑gen A/B testing platform, that...SeniorTemporary workLocal area- ...California, is seeking a Sr Machine Learning Engineer, Tech Lead for Autograder Systems. In this high... ...role, you will define the technical vision for evaluating model outputs and lead a team of MLEs to enhance generative AI features. Candidates should have a Master's or...Senior
$120 per hour
Mercor is seeking expert software engineers skilled in Scala, Kotlin, or OCaml to evaluate advanced AI systems in specialized engineering domains. You'll apply your expertise to assess complex technical scenarios and influence the development of AI in key ecosystems. The...SeniorHourly pay- Blueface Ltd in Washington seeks an experienced AI Evaluator to design and develop evaluation pipelines for conversational AI. The role involves defining metrics, conducting experiments, and ensuring high-quality AI solutions. The ideal candidate will have 5-7 years of...Senior
$120 per hour
Mercor, a leading AI research organization, is seeking expert software engineers specialized in Scala, Kotlin, and OCaml. You'll evaluate complex technical tasks in real-world scenarios and provide structured assessments, influencing the performance of advanced AI systems...SeniorHourly payFlexible hours$240k - $280k
A leading software monitoring company is seeking a Senior Software Engineer on its AI/ML team to build evaluation infrastructure for measuring the performance of AI systems. This role involves designing datasets, creating benchmarks, and ensuring AI features behave reliably...Senior$150k - $250k
...Senior AI Engineer, Agentic Evaluation & V&V At Slingshot Aerospace, we're on a mission to make space safer and more secure for everyone. Our work... ...operations will be powered by better data and smarter software. This role focuses on building and scaling evaluation...SeniorFull timeRemote work- Slingshot Aerospace is looking for a Senior AI Engineer to focus on Agentic Evaluation and V&V. The role involves building evaluation frameworks and simulation... ...AI. Candidates must have 6+ years of experience in software or ML engineering, strong Python skills, and a...SeniorFull timeRemote work
$156k - $387.6k
...Team Introduction Our mission in experimentation and evaluation team is to build the next-gen A/B testing platform, that empowers... ...to make bold hypotheses and cautious verification. As a software engineer in experimentation and evaluation team, you will have the opportunity...SeniorTemporary workLocal area$50 per hour
...This role focuses on creating advanced datasets for training and evaluating large language models, collaborating closely with researchers to enhance AI-driven coding solutions. As a Software Engineering evaluator, you will curate code examples, provide precise solutions...SeniorRemote jobFor contractorsFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Software Engineer - AI Evaluation. Be the first to apply!
- software sales engineer United States
- software engineer internship remote United States
- IT software developer United States
- new grad software engineer United States
- software engineer staff United States
- integration software engineer United States
- machine learning software engineer United States
- software engineer part time United States
- facebook software engineer United States
- senior robotics software engineer United States


