AI Evaluation Engineer
$130k - $220kAurora Jobs ApS
This position is at Aurora Jobs ApS
The opportunity summary by the Joinrs AI :
The selection process will be fully managed by Aurora Jobs ApS.
--
**Sede:** San Francisco, CA - Full-time, On-site **Salario:** $130K - $220K **Equity:** $60K - $120K options/year **Sponsorizzazione Visto:** Disponibile caso per caso **About Artificial Analysis** Artificial Analysis is the leading independent AI benchmarking and insights company. They help engineers, enterprises, investors, media, and policymakers understand AI capabilities and make critical decisions about AI strategy. Their benchmarks are trusted by hundreds of thousands of users and are the go-to reference for leading AI labs including OpenAI, Google, Meta, NVIDIA, and Anthropic. Their work is also cited by the Wall Street Journal, Bloomberg, Financial Times, and The Economist. Their benchmarks do not just measure frontier AI — they actively shape it. The company was founded in 2024, has raised $3.5M, and has a team of 30+ employees across San Francisco, Sydney, and Melbourne, with plans to double by mid-year. They are backed by Nat Friedman, Daniel Gross, Andrew Ng, Adam D’Angelo, Clem Delangue, and other leading figures in AI. **About Aurora** Aurora helps top engineers discover opportunities at some of the most ambitious startups worldwide. We work closely with companies to identify exceptional talent and match them with roles where they can have real impact. We are currently helping Artificial Analysis expand their team. **Position: Member of Technical Staff** **What This Role Actually Is** This role is best described as an AI Evaluation Engineer / Technical Generalist. It is not a traditional software engineering role and not a pure ML research position. It combines AI benchmarking, technical product work, strategic analysis, customer-facing collaboration, and startup execution. **About the Role** Artificial Analysis is hiring Members of Technical Staff to design the evaluations that define how AI is measured, produce analysis that shapes how companies and the broader market understand AI, and work directly with the leading AI labs and enterprise customers who rely on their insights. You will develop new benchmarking methodologies, manage relationships with some of the most important AI labs and enterprise customers in the world, and help drive the product direction of the platform. The success bar for this role is becoming a world expert in modern AI technologies. This is a unique combination of product, research, technical, and client-facing work suited to highly driven technical generalists who thrive on breadth and ownership. You will work directly with the founders and across the full team, with outsized impact on both the company and product. **What You’ll Do** * Design, structure, and execute projects evaluating AI systems and technologies. Develop new benchmarking methodologies and datasets that improve evaluation capabilities. * Build reports, strategic analysis, and data visualizations that help enterprises shape their AI strategy. * Collaborate with leading AI companies to benchmark agentic AI applications, foundation models, and hardware systems. * Identify opportunities to improve the benchmarking platform and work with developers to ship those improvements. * Operate in an AI-native workflow using cutting-edge AI tools to create leverage and maintain speed. * Contribute to company strategy and lead major initiatives that support the company’s goal of becoming the leading AI benchmarking platform. **Requirements** * 2–10 years of experience in MBB consulting or technical roles such as TPM, ML Engineer, or similar. * Strong analytical ability, technical fluency in modern AI systems, excellent communication skills, ownership mindset, and comfort operating in a fast-moving startup environment. **Tech Stack** * Python **Why Join** This is a rare chance to get a front-row seat to frontier AI and evaluate cutting-edge systems before they are publicly released. You’ll work directly with the world’s top AI labs while helping define what best-in-class AI performance actually means. The company is financially strong, fast-growing, highly connected, and offers strong ownership and visibility from day one.
--
[#J-ONE] [#J-MIN]
[#J-INTERNAL]
$150k - $250k
...Distyl AI Job Posting Distyl is an applied AI technology company partnering with... ...At Distyl, we build AI systems using Evaluation-Driven Development —an approach where evaluation... ...behavior in production. AI Evaluation Engineers focus on designing and implementing the...SuggestedWork at office3 days per week- ...A pioneering AI technology firm based in San Francisco is seeking an AI Engineer to own the evaluation infrastructure for AI agents. This role requires designing automated pipelines and building observability systems, ensuring agent performance meets enterprise standards...SuggestedRemote workFlexible hours
$150k - $180k
...Location: Remote, located in the US Type: Full-time Department: Engineering Reports to: Director Of Engineering Responsibilities Build and maintain infrastructure and tooling for the AI evaluations platform used by internal teams, including automated testing platform for...SuggestedFull timeRemote workFlexible hours- ...Ironclad, located in San Francisco, is seeking an AI Evaluation Engineer to join their team. This role involves analyzing datasets, designing feedback loops, and partnering closely with AI Engineers to improve model quality. Applicants should have 8+ years of experience...SuggestedContract work
- A cutting-edge AI firm in San Francisco is seeking a Research Engineer to develop evaluation systems and benchmarking pipelines for language models. Candidates should have a strong background in applied research, coding skills, and familiarity with ML models. You will work...Suggested
- ...© 2025 Repovive, Inc. All rights reserved. Back to Jobs Apply Now Compensation Not listed Posted April 25, 2026 Required Skills AI evaluation data pipelines agent instrumentation Requirements Mid/Senior Visa Sponsorship Not mentioned Relocation Not mentioned About the Role...RelocationVisa sponsorship
$240k - $280k
A leading software monitoring company is seeking a Senior Software Engineer on its AI/ML team to build evaluation infrastructure for measuring the performance of AI systems. This role involves designing datasets, creating benchmarks, and ensuring AI features behave reliably...$150k - $180k
...A health technology company is seeking a skilled infrastructure engineer to build and maintain AI evaluation tooling. The ideal candidate has over 5 years of experience in software engineering with a focus on backend systems and production-grade infrastructure. This role...Remote workFlexible hours- ...located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and analytics to... ...leading labs, and ensure user satisfaction through effective evaluation baselines. Competitive salary and benefits offered, with a...
- B Capital seeks a talented individual for an AI Evaluation role in San Francisco. This position involves conducting critical comparative analysis, refining evaluation systems, and collaborating with various teams to enhance model capabilities. The ideal candidate will have...
- A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity...
- ...Nerdleveltech is seeking a Software Engineering evaluator based in San Francisco, California. In this contractor role, you will create datasets for training and evaluating AI models by curating code examples and refining AI-generated solutions across various programming...For contractors10 hours per weekFlexible hours
- ...A technology firm in San Francisco is seeking a Research Engineer to enhance AI model quality. The ideal candidate will build benchmarks, datasets, and evaluation loops to ensure effective performance on critical tasks. This role requires strong programming skills and...
- Cacheflow is seeking a Senior Applied Research Engineer to enhance AI systems through rigorous experimentation and applied research. This research... ...The individual will design information access strategies, evaluate innovative methodologies, and collaborate closely with...Flexible hours
- Drata is seeking a Senior Applied Research Engineer to enhance the quality of AI systems through rigorous evaluation and experimentation. This role emphasizes applied research, focusing on information retrieval and reasoning strategies. The ideal candidate will bring 5+...
$214k - $300k
Monograph is seeking an engineer to build and improve AI evaluation systems aimed at increasing shipping quality for AI tools. You will enhance scalable eval runners, improve benchmarks, and ensure reliability in distributed systems. Strong engineering fundamentals and...$192k - $237.1k
A leading compliance software company in San Francisco is seeking an Applied AI Engineer to innovate compliance automation through applied research and evaluation. This role emphasizes experimentation over production engineering, requiring strong skills in information retrieval...$176k - $253k
Harper, an AI-native commercial insurance company in San Francisco, seeks a Senior Member of Technical Staff focused on AI quality evaluation. This role requires building evaluation systems to measure agent performance and ensure high standards in service. The ideal candidate...Work at office$150k
Tzafon is seeking a skilled engineer to enhance their machine intelligence systems in San Francisco. As part of the team, you'll be responsible for building evaluation infrastructure, designing data pipelines, and implementing fine-tuning processes. Ideal candidates have...- Xterra AI in San Francisco is hiring an AI Research Engineer to develop and build infrastructure for cutting-edge AI systems that tackle complex scientific... ...and domain experts to design agent infrastructures, evaluation frameworks, and data systems that ensure our products...
$80 - $120 per hour
Mercor is looking for an Engineering / Manufacturing / Technical Operations Evaluator to assess AI-generated artifacts. The role requires 5+ years of relevant experience, specifically in identifying errors and providing quality feedback to enhance AI outputs. This is a...Remote jobHourly payContract workWork at office- ...AI Engineer Location: San Francisco or New York City About Pathwork Pathwork is redesigning life & health insurance jobs for... ...owning model orchestration, retrieval, real‑time inference, evaluation, and production infrastructure, and collaborating closely with...Work at office
- ...-moving environment. The Role We’re hiring a mid-level AI Engineer to help build and integrate AI-powered features into our platform... ...that improve user experience and automate processes Evaluate and compare AI models to determine fit for specific product use...
- ...Meet Eloquent AI At Eloquent AI, we’re building the next generation of AI Operators... ...alongside world-class talent in AI, engineering, and product as we redefine the future of... ...agents’ performance via user simulations and evaluations. Requirements ~3+ years of...
$150k - $350k
...About Collate Collate is an AI document generation platform for life sciences. We... ...and founder of Lever. Our AI researchers, engineers, and designers have worked at Google, Nvidia... ..., you’ll define the standards for how we evaluate, and deploy models that directly impact...$150k - $250k
...Max AI – Stripe for Healthcare Max AI is the World’s first human-free, fully-autonomous... ...for over 10 years. And our Head of Engineering was one of the earliest engineers at Figma... ...Responsibilities Build, experiment, and evaluate AI agents and ML models in the NLP domain...- ...At Falconer, we’re transforming how engineers create, access, and share knowledge. We’re looking for a Founding AI Engineer to help us build an AI-powered knowledge platform... ...systems in Python and/or Node.js You can evaluate tradeoffs and propose the most appropriate...Work experience placementWork at officeFlexible hours
$204k - $259k
...billions in simulation across 15+ U.S. states. Waymo's Release Evaluation org ensures that each version of the Waymo Driver is safe... ...objectives under resource constraints. Collaborate with other engineers, data scientists, statisticians and the leadership team to...Remote work$170k - $216k
...billions in simulation across 15+ U.S. states. The Planner Evaluation team works on one of the key challenges in autonomous driving:... ...the car. We are looking for experienced data-minded software engineers and data scientists to help us improve how we characterize and...Remote work$170k - $216k
...diverse set of sensors, enabling software engineers like you to develop multi-modal models... ...high-scale, mission-critical automation and evaluation frameworks that establish the "ultimate... ...~2+ years of experience in industrial AI applications involving the creation, maintenance...Remote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Evaluation Engineer. Be the first to apply!
- ai engineer remote San Francisco, CA
- ai prompt engineer San Francisco, CA
- senior ai engineer San Francisco, CA
- machine learning ai engineer San Francisco, CA
- ai engineer San Francisco, CA
- ai developer San Francisco, CA
- ai ml engineer San Francisco, CA
- ai research engineer San Francisco, CA
- embedded ai engineer
- ai network engineer


