Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Software Engineer - AI Evaluation

Alignerr

Senior Software Engineer – AI Evaluation

What if your engineering skills could directly determine whether the world's most advanced AI systems are actually working? We're looking for Senior Software Engineers to design, build, and scale the evaluation infrastructure that measures AI performance — the critical layer between raw model output and real-world trust.

This is high-impact, technically challenging work at the intersection of software engineering and AI research. You'll build the tools, pipelines, and frameworks that help leading research teams understand what their models can do, where they fail, and how to make them better. If you love building robust systems and care deeply about quality and measurement, this role puts you at the center of the AI revolution.

  • Design and build scalable evaluation pipelines and frameworks for assessing AI model performance across diverse tasks and domains
  • Develop automated testing harnesses, scoring systems, and benchmarking tools for large language models and other AI systems
  • Write clean, production-quality code to process, analyze, and visualize evaluation datasets at scale
  • Create and maintain APIs, dashboards, and internal tools that enable research teams to run, track, and compare evaluations efficiently
  • Collaborate with AI researchers and data scientists to translate evaluation methodologies into reliable, repeatable software
  • Identify edge cases, failure modes, and reliability issues in AI outputs through systematic engineering approaches
  • Optimize system performance, data processing speed, and infrastructure costs
  • Contribute to the architecture and technical direction of the evaluation platform
  • Write clear documentation and participate in code reviews to maintain high engineering standards
  • 5+ years of professional software engineering experience, with a track record of building and shipping production systems
  • Strong proficiency in Python — including experience with data processing libraries (pandas, NumPy) and web frameworks (FastAPI, Flask, or Django)
  • Solid understanding of software architecture, design patterns, and engineering best practices
  • Experience working with large datasets and building data pipelines
  • Comfortable with cloud infrastructure (AWS, GCP, or Azure) and containerized deployments
  • Familiarity with version control (Git), CI/CD workflows, and testing frameworks
  • Strong problem-solving skills and the ability to work through ambiguity independently
  • Excellent written communication skills — you can document your work clearly and collaborate asynchronously
  • Self-motivated and reliable when working independently in a remote environment
  • Experience with ML/AI evaluation, benchmarking, or model testing
  • Familiarity with LLMs, prompt engineering, or AI safety and alignment concepts
  • Background in building developer tools, internal platforms, or data infrastructure
  • Experience with distributed systems, message queues, or workflow orchestration (Airflow, Prefect, etc.)
  • Knowledge of statistical methods for measuring and comparing model performance
  • Prior experience in a remote-first or async-first engineering culture
  • Contributions to open-source projects related to AI, ML, or evaluation tooling
  • Work on cutting-edge AI evaluation projects alongside world-class research labs
  • Directly influence how AI quality and safety are measured at scale — your code shapes the standard
  • Fully remote and flexible — work when and where you're most productive
  • Freelance autonomy with access to deeply meaningful, technically stimulating work
  • Collaborate with a global team of engineers and researchers pushing the boundaries of AI
  • Exposure to the latest developments in AI research, model capabilities, and evaluation science
  • Potential for ongoing work and contract extension as the platform and project scope grow
Vacancy posted 6 hours ago
Similar jobs that could be interesting for youBased on the Senior Software Engineer - AI Evaluation in United States vacancy
  • $50 - $150 per hour

    A leading AI company is seeking a software engineer to review and evaluate model-generated code. This contract role requires several years of software engineering experience, particularly as a full-stack engineer at notable tech firms. You will assess code quality and... 
    Senior
    Hourly pay
    Contract work
    Flexible hours

    Turing

    San Francisco, CA
    1 day ago
  •  ...Learning Commons in Redwood City, CA is seeking a Senior Software Engineer to design and build evaluation systems for educational technology products. As part...  ...Evaluators team, you will work at the intersection of AI, learning science, and product development. The ideal... 
    Senior

    Learning Commons

    Redwood City, CA
    3 days ago
  •  ...leading autonomous driving technology firm is seeking a Senior Software Engineer to architect evaluation methodologies for their simulator. The ideal candidate...  ...design principles. You will work closely with AI research to ensure the simulator accurately represents... 
    Senior

    Waymo

    San Francisco, CA
    3 days ago
  • $204k - $259k

     ...leading autonomous driving technology company is seeking a Senior Software Engineer to architect evaluation methodologies for their hybrid simulator....  ...throughput data processing systems and partnering with AI research teams. The ideal candidate has over 5 years of... 
    Senior
    Full time

    Waymo

    Mountain View, CA
    3 days ago
  •  ...Senior Software Engineer, Simulator Evaluation Mar 02, 2026 Waymo is an autonomous driving technology company with the mission to be the world's most trusted...  ..., physical dynamics, and state-of-the-art Generative AI to create a training ground for the Waymo Driver. The... 
    Senior
    Full time
    Remote work

    Waymo

    San Francisco, CA
    3 days ago
  •  ...Kake is seeking a Senior Software Engineer to contribute to developing AI training data for a leading human data platform. This role involves working at the...  ...experience in software engineering, with strong skills in evaluating AI-generated code and terminal-based workflows.... 
    Senior
    Remote work

    KAKE

    Poland, NY
    3 days ago
  • $175k - $245k

     ...Senior Software Engineer II - Applied AI and Evaluations (Remote Eligible) -REMOTE, USA- For over 20 years, Smartsheet has helped people and teams achieve–well, anything. From seamless work management to smart, scalable solutions, we’ve always worked with flow. We’re... 
    Senior
    Full time
    Temporary work
    Local area
    Immediate start
    Remote work

    Smartsheet

    Bellevue, WA
    11 days ago
  • $136k - $199.2k

     ...Autonomous Driving Software Architect General Motors is a global...  ...About the Organization The Evaluation team builds and evolves the...  ...results into clear feedback for engineering and leadership, and help...  ...systems. Experience leveraging AI-assisted development and... 
    Senior
    Remote work
    Relocation
    Relocation package
    Flexible hours

    General Motors

    United States
    23 hours ago
  • $60 per hour

     ...Mindrift AI Coding Agent Evaluation Specialist Mindrift connects specialists with project-based AI...  ...Not data labeling Not prompt engineering Not writing code from scratch - the...  ...What we look for ~5+ years in software development ~ Core stack: Python (FastAPI... 
    Senior
    Permanent employment
    Temporary work
    Remote work

    Mind Rift

    United States
    4 days ago
  •  ...Kake is seeking a Senior Software Engineer to help develop and evaluate AI training data for an expert platform serving AI agents. This unique role requires strong software engineering expertise to create coding tasks, evaluate AI outputs, and contribute to AI model generation... 
    Senior
    Remote work

    KAKE

    Peru, IL
    3 days ago
  • $120k - $250k

     ...2016 in Silicon Valley, Pony.ai has quickly become a global leader...  ..., and multi-dimensional evaluation. Design and implement high...  ...Build and optimize downstream engineering workflows for Large Language...  ...skills in C/C++, Python, and software design Strong foundation in... 
    Senior
    Temporary work

    pony.ai

    Fremont, CA
    15 days ago
  • $148k - $356.5k

    Senior Software Engineer, Metrics and Evaluation - Autonomous Vehicles page is loaded Senior Software Engineer, Metrics and Evaluation - Autonomous Vehicles...  ...tackle challenges no one else can solve. Our work in AI and digital twins is transforming the world's largest... 
    Senior
    Full time
    Remote work

    NVIDIA Corporation

    Raleigh, NC
    2 days ago
  • $190k - $238k

     ...Senior Software Engineer, Evaluators, Learning Commons Redwood City, CA (Hybrid) Learning Commons aims to scale proven teaching and learning practices to benefit every learner by building AI infrastructure that better connects the way students learn to the tools they... 
    Senior
    Work at office
    Relocation package
    3 days per week

    Learning Commons

    Redwood City, CA
    3 days ago
  • $204k - $259k

     ...+ U.S. states. The Large Model Evaluation team is at the nexus of Waymo’s AI ambition. With advancements in Large...  ...looking for quantitatively‑minded engineers to research and propose new ways...  ...experience in a heavily quantitative software engineering area ~ Experience... 
    Senior
    Full time
    Remote work

    Waymo

    San Francisco, CA
    3 days ago
  • $80 - $100 per hour

     ...Design and build the coding benchmarks and evaluation pipelines used to test frontier AI models on real software engineering work: Design coding benchmarks that evaluate...  ...country of residence. Nice to have Senior or Lead-level profile with a history of technical... 
    Senior
    Full time
    Contract work
    For contractors
    Remote work

    GrabJobs

    United States
    6 hours ago
  •  ...Senior AI Engineer In Pre-training Evaluation Aleph Alpha Research's mission is to deliver category-defining AI innovation that enables open, accessible, and trustworthy deployment of GenAI in industrial applications. Our organization develops foundational models and... 
    Senior
    Remote work
    Relocation
    Flexible hours

    Aleph Alpha

    United States
    1 day ago
  • $356.5k

    NVIDIA Gruppe is seeking a Senior Software Engineer to develop the NeMo Platform, a product that enhances AI systems. You will design Python APIs and systems to monitor agent behaviors and improve performance efficiently. The ideal candidate will have strong Python skills... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • Drata is seeking a Senior Applied Research Engineer to enhance the quality of AI systems through rigorous evaluation and experimentation. This role emphasizes applied research, focusing on information retrieval and reasoning strategies. The ideal candidate will bring 5+... 
    Senior

    jobr.pro

    San Francisco, CA
    2 days ago
  • $70 - $80 per hour

     ...A leading AI solutions firm in Redwood City seeks a Senior Engineer specializing in AI Evaluation & Reliability. The role focuses on designing evaluation metrics, ensuring operational excellence for AI features, and requires substantial experience in machine learning systems... 
    Senior
    Contract work
    3 days per week

    The Mice Groups Inc

    Redwood City, CA
    3 days ago
  • Cacheflow is seeking a Senior Applied Research Engineer to enhance AI systems through rigorous experimentation and applied research. This research-focused...  ...will design information access strategies, evaluate innovative methodologies, and collaborate closely with... 
    Senior
    Flexible hours

    Cacheflow

    San Francisco, CA
    1 day ago
  • $156k - $387.6k

     ...Senior Software Engineer, Data Platform - Experimentation & Evaluation Location: San Jose Employment Type: Regular Job Code: X9644 Responsibilities Team Introduction Our mission in experimentation and evaluation team is to build the next‑gen A/B testing platform, that... 
    Senior
    Temporary work
    Local area

    Ellis Technologies, Inc.

    San Jose, CA
    4 days ago
  •  ...California, is seeking a Sr Machine Learning Engineer, Tech Lead for Autograder Systems. In this high...  ...role, you will define the technical vision for evaluating model outputs and lead a team of MLEs to enhance generative AI features. Candidates should have a Master's or... 
    Senior

    Apple Inc.

    Cupertino, CA
    23 hours ago
  • $120 per hour

    Mercor is seeking expert software engineers skilled in Scala, Kotlin, or OCaml to evaluate advanced AI systems in specialized engineering domains. You'll apply your expertise to assess complex technical scenarios and influence the development of AI in key ecosystems. The... 
    Senior
    Hourly pay

    Mercor

    Henderson, NV
    2 days ago
  • Blueface Ltd in Washington seeks an experienced AI Evaluator to design and develop evaluation pipelines for conversational AI. The role involves defining metrics, conducting experiments, and ensuring high-quality AI solutions. The ideal candidate will have 5-7 years of... 
    Senior

    Blueface Ltd

    Washington DC
    2 days ago
  • $120 per hour

    Mercor, a leading AI research organization, is seeking expert software engineers specialized in Scala, Kotlin, and OCaml. You'll evaluate complex technical tasks in real-world scenarios and provide structured assessments, influencing the performance of advanced AI systems... 
    Senior
    Hourly pay
    Flexible hours

    Mercor

    Lancaster, CA
    2 days ago
  • $240k - $280k

    A leading software monitoring company is seeking a Senior Software Engineer on its AI/ML team to build evaluation infrastructure for measuring the performance of AI systems. This role involves designing datasets, creating benchmarks, and ensuring AI features behave reliably... 
    Senior

    Sentry

    San Francisco, CA
    1 day ago
  • $150k - $250k

     ...Senior AI Engineer, Agentic Evaluation & V&V At Slingshot Aerospace, we're on a mission to make space safer and more secure for everyone. Our work...  ...operations will be powered by better data and smarter software. This role focuses on building and scaling evaluation... 
    Senior
    Full time
    Remote work

    Slingshot Aerospace

    United States
    1 day ago
  • Slingshot Aerospace is looking for a Senior AI Engineer to focus on Agentic Evaluation and V&V. The role involves building evaluation frameworks and simulation...  ...AI. Candidates must have 6+ years of experience in software or ML engineering, strong Python skills, and a... 
    Senior
    Full time
    Remote work

    Slingshot Aerospace

    Phoenix, AZ
    23 hours ago
  • $156k - $387.6k

     ...Team Introduction Our mission in experimentation and evaluation team is to build the next-gen A/B testing platform, that empowers...  ...to make bold hypotheses and cautious verification. As a software engineer in experimentation and evaluation team, you will have the opportunity... 
    Senior
    Temporary work
    Local area

    Tik Tok

    San Jose, CA
    3 days ago
  • $50 per hour

     ...This role focuses on creating advanced datasets for training and evaluating large language models, collaborating closely with researchers to enhance AI-driven coding solutions. As a Software Engineering evaluator, you will curate code examples, provide precise solutions... 
    Senior
    Remote job
    For contractors
    Flexible hours

    SaidGig

    Remote
    7 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Software Engineer - AI Evaluation. Be the first to apply!