AI Evaluation Engineer

$130k - $220k

Full-time

Aurora Jobs ApS

This position is at Aurora Jobs ApS

The selection process will be fully managed by Aurora Jobs ApS.

**Sede:** San Francisco, CA - Full-time, On-site **Salario:** $130K - $220K **Equity:** $60K - $120K options/year **Sponsorizzazione Visto:** Disponibile caso per caso **About Artificial Analysis** Artificial Analysis is the leading independent AI benchmarking and insights company. They help engineers, enterprises, investors, media, and policymakers understand AI capabilities and make critical decisions about AI strategy. Their benchmarks are trusted by hundreds of thousands of users and are the go-to reference for leading AI labs including OpenAI, Google, Meta, NVIDIA, and Anthropic. Their work is also cited by the Wall Street Journal, Bloomberg, Financial Times, and The Economist. Their benchmarks do not just measure frontier AI — they actively shape it. The company was founded in 2024, has raised $3.5M, and has a team of 30+ employees across San Francisco, Sydney, and Melbourne, with plans to double by mid-year. They are backed by Nat Friedman, Daniel Gross, Andrew Ng, Adam D’Angelo, Clem Delangue, and other leading figures in AI. **About Aurora** Aurora helps top engineers discover opportunities at some of the most ambitious startups worldwide. We work closely with companies to identify exceptional talent and match them with roles where they can have real impact. We are currently helping Artificial Analysis expand their team. **Position: Member of Technical Staff** **What This Role Actually Is** This role is best described as an AI Evaluation Engineer / Technical Generalist. It is not a traditional software engineering role and not a pure ML research position. It combines AI benchmarking, technical product work, strategic analysis, customer-facing collaboration, and startup execution. **About the Role** Artificial Analysis is hiring Members of Technical Staff to design the evaluations that define how AI is measured, produce analysis that shapes how companies and the broader market understand AI, and work directly with the leading AI labs and enterprise customers who rely on their insights. You will develop new benchmarking methodologies, manage relationships with some of the most important AI labs and enterprise customers in the world, and help drive the product direction of the platform. The success bar for this role is becoming a world expert in modern AI technologies. This is a unique combination of product, research, technical, and client-facing work suited to highly driven technical generalists who thrive on breadth and ownership. You will work directly with the founders and across the full team, with outsized impact on both the company and product. **What You’ll Do** * Design, structure, and execute projects evaluating AI systems and technologies. Develop new benchmarking methodologies and datasets that improve evaluation capabilities. * Build reports, strategic analysis, and data visualizations that help enterprises shape their AI strategy. * Collaborate with leading AI companies to benchmark agentic AI applications, foundation models, and hardware systems. * Identify opportunities to improve the benchmarking platform and work with developers to ship those improvements. * Operate in an AI-native workflow using cutting-edge AI tools to create leverage and maintain speed. * Contribute to company strategy and lead major initiatives that support the company’s goal of becoming the leading AI benchmarking platform. **Requirements** * 2–10 years of experience in MBB consulting or technical roles such as TPM, ML Engineer, or similar. * Strong analytical ability, technical fluency in modern AI systems, excellent communication skills, ownership mindset, and comfort operating in a fast-moving startup environment. **Tech Stack** * Python **Why Join** This is a rare chance to get a front-row seat to frontier AI and evaluate cutting-edge systems before they are publicly released. You’ll work directly with the world’s top AI labs while helping define what best-in-class AI performance actually means. The company is financially strong, fast-growing, highly connected, and offers strong ownership and visibility from day one.

[#J-ONE] [#J-MIN]

[#J-INTERNAL]

Apply

Vacancy posted 9 hours ago

Similar jobs that could be interesting for youBased on the AI Evaluation Engineer in San Francisco, CA vacancy

AI Benchmarking Engineer — Evaluations & Failure Analysis
A cutting-edge AI firm in San Francisco is seeking a Research Engineer to develop evaluation systems and benchmarking pipelines for language models. Candidates should have a strong background in applied research, coding skills, and familiarity with ML models. You will work...
Suggested
Mercor
San Francisco, CA
12 hours ago
AI Evaluation Engineer - Production ML Pipelines
$150k - $250k
AI Chopping Block is seeking an AI Evaluation Engineer in San Francisco. This role involves designing evaluation frameworks for AI systems and requires strong Python skills and experience with Evaluation-Driven Development. The base salary range is $150K - $250K with comprehensive...
Suggested
Work at office
AI Chopping Block
San Francisco, CA
12 hours ago
Remote AI Engineer, Quality & Evaluation at Enterprise Scale
A pioneering AI technology firm based in San Francisco is seeking an AI Engineer to own the evaluation infrastructure for AI agents. This role requires designing automated pipelines and building observability systems, ensuring agent performance meets enterprise standards...
Suggested
Remote job
Flexible hours
Fieldguide
San Francisco, CA
3 days ago
AI Evaluation Engineer - Production ML Pipelines
$150k - $250k
Distyl AI is seeking an AI Evaluation Engineer to design evaluation frameworks and build AI systems. The position requires strong Python programming and 2+ years of software engineering experience. Candidates should be systems-oriented with experience in evaluation-driven...
Suggested
Distyl AI
San Francisco, CA
4 days ago
Senior AI Evaluation Engineer — Metrics & Data Pipelines
$240k - $280k
A leading software monitoring company is seeking a Senior Software Engineer on its AI/ML team to build evaluation infrastructure for measuring the performance of AI systems. This role involves designing datasets, creating benchmarks, and ensuring AI features behave reliably...
Suggested
Sentry
San Francisco, CA
2 days ago
AI Model Behavior Engineer—Quality & Evaluation
...located in San Francisco is seeking an innovative Quality Engineer for their AI products. This role blends ops, strategy, and analytics to... ...leading labs, and ensure user satisfaction through effective evaluation baselines. Competitive salary and benefits offered, with a...
Notion
San Francisco, CA
2 days ago
AI Model Evaluation Engineer — Benchmarking & Validation
A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity...
SpreeAI
San Francisco, CA
12 hours ago
AI Safety Evaluation Engineer
Aimling is seeking a professional in San Francisco to develop evaluation infrastructure focused on AI safety. The ideal candidate will have proficiency in Python and a strong ability to analyze data. The role includes responsibilities such as building evaluation datasets...
Flexible hours
Aimling
San Francisco, CA
2 days ago
Applied AI Systems Engineer - ML Infra & Evaluation
$150k
Tzafon is seeking a skilled engineer to enhance their machine intelligence systems in San Francisco. As part of the team, you'll be responsible for building evaluation infrastructure, designing data pipelines, and implementing fine-tuning processes. Ideal candidates have...
Tzafon
San Francisco, CA
12 hours ago
Forward Deployed AI Engineer
...Forward Deployed AI Engineer The opportunity We are looking for a Forward Deployed AI Engineer to serve as the critical bridge... ...domains. You understand the unique data challenges and evaluation paradigms of biological modelling. You have contributed to...
Shift work
Latent Labs
San Francisco, CA
1 day ago
AI Engineer
...exceptional problem solvers, combining expert engineering talent with deep experience in National Security. We build AI for the most important problem in defense -... ...data exploration and feature engineering to evaluation and production monitoring. Data & Systems...
Pytho AI, Inc
San Francisco, CA
4 days ago
AI Engineer
$150k - $250k
...Max AI – Stripe for Healthcare Max AI is the World’s first human-free, fully-autonomous... ...for over 10 years. And our Head of Engineering was one of the earliest engineers at Figma... ...Responsibilities Build, experiment, and evaluate AI agents and ML models in the NLP domain...
Max AI, Inc.
San Francisco, CA
3 days ago
Founding AI Engineer
...Founding Engineers: Build the AI That Builds AI Full-time • Founding Team • Engineering • On-site (India or US) What We're Building... ...on repetitive loops of data curation, prompt refinement, evaluation, and fine-tuning-turning what should be a month of work into...
Full time
Amadeus Search
San Francisco, CA
4 days ago
AI Engineers
...Accellor is an AI-native services firm purpose-built for the post-ChatGPT era. Free... ...outcomes through advanced AI, data, and engineering capabilities. Our mission is to operationalize... ...: Design, build, train, and evaluate AI/ML models using Python, TensorFlow, PyTorch...
Accellor
San Francisco, CA
6 days ago
Founding AI Engineer
$120k - $200k
...Founding AI Engineer San Francisco | Engineering | In office | Full-time Mason AI has cracked a research problem that went unsolved for... ...run, run those experiments, and then analyze them. You’ll be evaluated on your speed at running thoughtful experiments. We do reference...
Full time
Work at office
Clear Path
San Francisco, CA
12 hours ago
Distinguished AI Engineer
$263.9k - $301.2k
...we are creating responsible and reliable AI systems, changing banking for good. For... ...to build world-class applied science and engineering teams to deliver our industry leading... ...inference, similarity search, guardrails, model evaluation, experimentation, governance, and...
Full time
Part time
Local area
Capital One
San Francisco, CA
4 days ago
Gen AI Engineer
...GenAI Engineer Location - (Hybrid from Altimetrik office) Altimetrik's office locations... ...function calling, versioning with evals) Evaluation & observability (ground truth setup,... ...providers (Azure OpenAI, AWS Bedrock, Vertex AI) Workflow orchestration (Airflow,...
Work at office
Futran Tech Solutions Pvt. Ltd.
San Francisco, CA
3 days ago
AI Engineer
$180k - $250k
...AI Engineer We're hiring a full-time AI Engineer to own the prompts, agents, evals, and pipelines behind user-facing features that ship... ...turn them into working prompts, agents, and pipelines. You'll evaluate them rigorously, iterate until they're production-ready, and...
Full time
Work at office
Remote work
Relocation
Fluency Corp
San Francisco, CA
3 days ago
Agentic AI Engineer
...Agentic Ai Engineer At Kargo, our mission is to build a connective tissue between the physical world of freight and the digital ecosystem... ..., and internal systems. Improve agent reliability through evaluation, testing, monitoring, and guardrails. Optimize agent...
Local area
Kargo
San Francisco, CA
3 days ago
Forward Deployed AI Engineer
$150k - $250k
...About Distyl AI Distyl is an applied AI technology company partnering with the world... ...What We Are Looking For Forward Deployed AI Engineers build and operate production AI systems... ...and continuously improve systems through evaluation, feedback, integration, and production iteration...
Work at office
3 days per week
AI Chopping Block
San Francisco, CA
4 days ago
AI Engineer
...demonstrated track record of turning ambitious AI ideas into products people actually use;... ...move effortlessly between research and engineering; have shipped something extraordinary,... ...of your work by regularly running evaluations and tests. Analyze production traces to...
Relocation package
P-1 Ai
San Francisco, CA
12 hours ago
AI Engineer
$180k - $300k
...About the role You'll own the core AI systems that power Gamma: the models, prompts... ...scale. Your job is to elevate quality, evaluate new frontier models, and push into new... ...across our AI stack. You'll work closely with engineering and product to ship improvements that...
Full time
Work at office
Immediate start
Work from home
Gamma
San Francisco, CA
1 day ago
Founding AI Engineer
$120k - $180k
...About the job Founding AI Engineer Job Title: Founding AI Engineer Location: San Francisco, CA (On-site, 5 days/week)... ...monitoring conversational agents. Develop scalable pipelines and evaluation methods for LLMs and conversational AI. Conduct research...
Internship
H1b
Visa sponsorship
Human Capital Solutions
San Francisco, CA
1 day ago
AI Engineer
$180k - $250k
...AI Engineer Location: San Francisco, CA Company Stage of Funding: Seed Stage AI Startup ($6M Raised) Office Type: Onsite... ...application layer of the platform-building production AI agents, evaluation systems, and LLM-powered features that customers rely on...
H1b
Work at office
Visa sponsorship
Recruiting from Scratch
San Francisco, CA
4 days ago
AI Engineer
...AI Engineer Opportunity at Goodfin Goodfin is an AI-native investment platform giving accredited investors access to pre-IPO and alternative... ...in the real world. Implement and improve RAG pipelines, evaluations, and reliability mechanisms. Monitor live AI systems,...
goodfin
San Francisco, CA
2 days ago
Senior AI Research Engineer - RAG, Agents & Evaluation
Cacheflow is seeking a Senior Applied Research Engineer to enhance the effectiveness of our AI systems through focused research and experimentation. This role involves designing information retrieval strategies and collaborating with engineers to turn validated approaches...
Flexible hours
Cacheflow
San Francisco, CA
12 hours ago
AI Engineer
...AI Engineer at Fathom As an AI Engineer at Fathom, you'll be hands-on with LLMs, prototyping and refining our features. Your work... ...experience with the latest available models. Improved or created new evaluations for our existing features. By 90 days, you will have...
Work at office
Remote work
3 days per week
Fathom
San Francisco, CA
2 days ago
AI Engineer
$800 per month
...autonomy. Vooma's mission is to build the AI orchestration platform for America's $800... ...you choose to join us, will be an AI Engineer on our founding team. You're the... ...world tasks Designing and implementing evaluation frameworks to rigorously measure model performance...
Shift work
Vooma
San Francisco, CA
2 days ago
AI Engineer
...you "get stuff done" end-to-end. You use AI to work smarter and solve problems... ...tooling across the spectrum: from prompt engineering and in-context learning to fine-tuned models... ...systems. ~ Understanding of monitoring, evaluation, and iteration in production AI systems....
Worldwide
Airwallex
San Francisco, CA
2 days ago
Senior AI Agent Engineer - Open Models & Evaluation Systems
Sail is the foundation of useful, agentic AI. We are here to take a big swing at the most ambitious engineering challenge of our careers. Everyone working at Sail will become an expert; nothing less will do in our immensely competitive market. Inference is just one piece...
Work at office
Immediate start
Sail Research
San Francisco, CA
3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Evaluation Engineer. Be the first to apply!