AI Evaluation Engineer for Coding Agents

Repovive, Inc.

##### ###### ##### ### # # ### # # ######## ## ## ## ## ## ## # # # # # ####### #### ##### # # # # # # # ###### # ## ## ## ## # # # # # #### # ###### ## ### # ### # ###### $ curl repovive.com/jobs/69ed18d7682d4cf1d9e87166 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ █████████████████████████████████████████████████ ##### ###### ##### ### # # ### # # ######## ## ## ## ## ## ## # # # # # ####### #### ##### # # # # # # # ###### # ## ## ## ## # # # # # #### # ###### ## ### # ### # ###### ##### ###### ##### ### # # ### # # ######## ## ## ## ## ## ## # # # # # ####### #### ##### # # # # # # # ###### # ## ## ## ## # # # # # #### # ###### ## ### # ### # ###### $ curl repovive.com/jobs/69ed18d7682d4cf1d9e87166 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ █████████████████████████████████████████████████ ##### ###### ##### ### # # ### # # ######## ## ## ## ## ## ## # # # # # ####### #### ##### # # # # # # # ###### # ## ## ## ## # # # # # #### # ###### ## ### # ### # ###### $ curl repovive.com/jobs/69ed18d7682d4cf1d9e87166 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ █████████████████████████████████████████████████ Repovive © 2025 Repovive, Inc. All rights reserved. Back to Jobs Apply Now Compensation Not listed Posted April 25, 2026 Required Skills AI evaluation data pipelines agent instrumentation Requirements Mid/Senior Visa Sponsorship Not mentioned Relocation Not mentioned About the Role Build evaluation and quality systems for Cursor's coding agents. Interested in this role? Apply directly on Cursor's website Apply for this Position #J-18808-Ljbffr

Apply

Vacancy posted 20 hours ago

Similar jobs that could be interesting for youBased on the AI Evaluation Engineer for Coding Agents in San Francisco, CA vacancy

Founding AI Engineer Agent Runtime
$100k - $150k
...Most AI systems generate text. We’re building one... ...makes decisions . Pareto Agent is a policy-driven... ...Role As our Founding AI Engineer - Agent Runtime , you will... ...are constrained, evaluated, and enforced by a deterministic... ...— you treat coding agents as an execution...
Suggested
Summer work
Work at office
Flexible hours
Pareto Agent
San Francisco, CA
21 hours ago
AI Systems Engineer, Codex Agents
...AI Systems Engineer - Codex Core Agents About The Team The Codex Core Agents team builds the agent harness... ...part of how models are trained and evaluated, making this one of the highest-... ...model outputs, use tools, execute code, and complete long-horizon tasks safely...
Suggested
OpenAI
San Francisco, CA
5 days ago
Applied AI Engineer, Codex Core Agent
...Applied AI Engineer The Codex Core Agent team builds the kernel of Codex. We own making the agent better... ...on agent behaviors across real-world coding tasks and long-horizon workflows.... ...that get better real-task data into evaluation and research. Work with product teams...
Suggested
OpenAI
San Francisco, CA
3 days ago
AI Agent Engineer - Editor Team
$180k - $215k
...looking for a seasoned Product Engineer to own and accelerate AI across Rive's editor.... ...things: Improving Rive's AI agent - We shipped an AI agent that... ...of design, animation, and code. Most AI tools focus on one... ..., APIs, and techniques and evaluate how they apply to Rive. Work...
Suggested
Full time
Work experience placement
Work at office
Remote work
Rive
San Francisco, CA
20 hours ago
AI Quality Engineer: Agent Evaluation & Metrics
Anysphere is seeking a Software Engineer for the Agent Quality team in San Francisco, CA. In this role... ...design and build infrastructure to evaluate and improve ML agents. Responsibilities... ...Ideal candidates will have experience in AI evaluations, data analysis, and solid software...
Suggested
Anysphere
San Francisco, CA
3 days ago
AI Systems Engineer, Codex Agents
AI Systems Engineer - Codex Core Agents Location San Francisco Employment Type Full time Department Applied... ...model outputs, use tools, execute code, and complete long-horizon tasks safely... ...development environments. Develop evaluation, experimentation, and debugging...
Full time
Work at office
Local area
Relocation package
Flexible hours
Slope
San Francisco, CA
4 days ago
Staff AI Engineer - Architecture Agent Systems (H-1B Eligible)
$140k - $225k
...Technical Staff — SketchPro.ai Location: San Francisco... ...grunt work through AI agents operating directly in... ...What You'll Own Agent engineering across context design,... ...broader AEC ecosystem Evaluation harnesses to determine... ...video calls Heavy daily coding agent usage (Claude Code...
Full time
H1b
Work at office
Visa sponsorship
David Joseph & Company
San Francisco, CA
4 days ago
AI Context & Harness Engineer Build the AI Coding Agent
$100k - $250k
...A pioneering AI software firm is seeking a Senior or Staff AI Context & Harness Engineer in San Francisco. This role involves building and maintaining AI coding agents, researching improved performance methods, and employing advanced context engineering techniques. Candidates...
Work at office
Remote work
Hercules
San Francisco, CA
21 hours ago
AI Security Engineer App & Agent Security
...Brain Co. in San Francisco is looking for a Security Engineer for Applications & AI. You will be responsible for integrating security practices into... ...of application security experience and proficiency in coding. Competitive salary, daily lunches, and strong team collaboration...
BRAIN CORP
San Francisco, CA
21 hours ago
Senior AI Agent Engineer - Open Models & Evaluation Systems
...foundation of useful, agentic AI. We are here to take a big swing at the most ambitious engineering challenge of our careers. Everyone... ...of an effective background agent. Let's design and build the rest... ...Qwen, Llama) performance. Claude Code is all about agent/harness...
Work at office
Immediate start
Sail Research
San Francisco, CA
4 days ago
Senior AI Research Engineer - RAG, Agents & Evaluation
Cacheflow is seeking a Senior Applied Research Engineer to enhance the effectiveness of our AI systems through focused research and experimentation. This role involves designing information retrieval strategies and collaborating with engineers to turn validated approaches...
Flexible hours
Cacheflow
San Francisco, CA
1 day ago
VP of AI Agent Engineering & Platform Innovation
...Turn/River in San Francisco is seeking a VP of Agent Engineering to enhance engineering capacity in portfolio... ...collaborate with CTOs and CPOs to build AI agents that streamline feature development, combining hands-on coding with strategic leadership. The ideal candidate...
TurnRiver.com
San Francisco, CA
20 hours ago
Software Engineer, Quantitative Evaluations
$170k - $216k
...Software Engineer, Quantitative Evaluations Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver... ...software changes and simulated outcomes. Champion code health and best practices in a large and complex code base...
Full time
Remote work
Waymo
San Francisco, CA
3 days ago
AI Evaluations Engineer - Healthcare
$150k - $180k
...AI Evaluations Engineer – HealthcareLocation: Remote, located in the USType: Full-timeDepartment: EngineeringReports... ...testing platform for AI voice agents, debugging and observability tools.... ...production-grade infrastructure with code, including APIs, services, and data pipelines...
Remote work
Flexible hours
Ellipsis Health
San Francisco, CA
1 day ago
AI Engineer
...AI Engineer Opportunity at Goodfin Goodfin is an AI-native investment... ...and improve RAG pipelines, evaluations, and reliability mechanisms.... ...particularly around LLM ecosystems and agent frameworks. Who You Are... .... ~ Strong hands-on coding experience in Python and...
goodfin
San Francisco, CA
3 days ago
AI Prompt Engineer
...AI Prompt Engineer San Francisco, CA (On-Site M-F) Our client is an... ...call center and scheduling agents. About the Role As an... ...sub-agent architectures, and evaluation harnesses to iteratively improve... ...effectively using modern AI coding tools. ~ On-site...
latitude
San Francisco, CA
1 day ago
AI Forward Deployed Engineer
...their hybrid cloud and AI journeys. With support... ...an AI Forward Deployed Engineer, you will work with customers... ...and adoption. Evaluate Model Performance: Assess... ...developing or working with agent‐based AI solutions (e.g... ...engineering: Strong coding skills (ideally in Python...
Worldwide
IBM Computing
San Francisco, CA
4 days ago
Founding AI Engineer
...we’re transforming how engineers create, access, and share... ...looking for a Founding AI Engineer to help us... ...including architecture, coding, testing, and deploying... .../or Node.js You can evaluate tradeoffs and propose the... ...already built your own agents) You have fine-tuned...
Work experience placement
Work at office
Flexible hours
Falconer
San Francisco, CA
21 hours ago
Responsible AI Engineer
...Accenture’s Global Responsible AI team within the Global Data &... ...if you’re an experienced RAI Engineer with a Responsible AI background... ...practices. Detecting, evaluating, and applying relevant RAI dimensions... ...data preparation, design, coding, testing, deployment, and support...
Work experience placement
Live in
Work at office
Local area
Accenture
San Francisco, CA
2 days ago
Remote AI Engineer, Quality & Evaluation at Enterprise Scale
...A pioneering AI technology firm based in San Francisco is seeking an AI Engineer to own the evaluation infrastructure for AI agents. This role requires designing automated pipelines and building observability systems, ensuring agent performance meets enterprise standards...
Remote work
Flexible hours
Fieldguide.ai
San Francisco, CA
21 hours ago
Sr. AI GTM Engineer
$164.7k - $266k
...and implement end‑to‑end AI workflows that power... ...briefs into concrete, agent‑powered flows: from data... ...Partner with IT, Data Engineering, and platform teams to... ...language models (LLMs), coding assistants, or agentic... ...retrieval strategies, LLM evaluation frameworks, and common...
Work at office
Remote work
2 days per week
DocuSign
San Francisco, CA
21 hours ago
AI Engineer, Evaluation
$150k - $250k
...Distyl AI Job Posting Distyl is an applied AI... ...build AI systems using Evaluation-Driven Development —an... ...production. AI Evaluation Engineers focus on designing and... ...production Python code, build evaluation pipelines... ...inform prompt design, agent logic, model selection,...
Work at office
3 days per week
Distyl AI
San Francisco, CA
3 days ago
AI Software Engineer
...Are The Agentic AI Software Engineer - Cybersecurity Systems designs... ...on building and maintaining agent-based artificial... ...of autonomously generating code, conducting security analyses... ...cybersecurity use cases. Develop evaluation metrics for AI accuracy in threat...
Local area
Work from home
Bishop Fox
San Francisco, CA
4 days ago
Senior Software Engineer, AI Engineer
$170k - $210k
...Senior Software Engineer, AI Engineer Hybrid - SF Bay Area About... ...to be fluent with modern AI coding tools (Claude Code, Cursor, Copilot... ...healthcare-grade safety and evaluation. What You Will Do... ...design, retrieval, tool use, agents, evaluation, and production operations...
Work at office
Immediate start
Remote work
Shift work
2 days per week
Midi Health
San Francisco, CA
3 days ago
AI Evaluation Engineer: NLP for Contracts
...Ironclad Inc. is seeking an AI Evaluation Engineer to enhance contract management through AI. Located in San Francisco, the role involves analyzing datasets, designing feedback loops, and ensuring continuous improvement of ML systems. Ideal candidates will have a quantitative...
Contract work
Flexible hours
Ironclad Inc
San Francisco, CA
1 day ago
AI Benchmarking Engineer Evaluations & Failure Analysis
...A cutting-edge AI firm in San Francisco is seeking a Research Engineer to develop evaluation systems and benchmarking pipelines for language models. Candidates should have a strong background in applied research, coding skills, and familiarity with ML models. You will...
Mercor Inc
San Francisco, CA
20 hours ago
Sr. AI GTM Engineer
$164.7k - $266k
...campaigns to autonomous, AI‑driven customer journeys. As an AI GTM Engineer on the Growth team, you’... ...AI‑powered workflows, agents, and tooling that make Marketing... ...language models (LLMs), coding assistants, or agentic... ...strategies, LLM evaluation frameworks, and common failure...
Contract work
Work at office
Local area
Remote work
2 days per week
Unavailable
San Francisco, CA
21 hours ago
Forward Deployed AI Engineer
$115k - $200k
...About the job Forward Deployed AI Engineer Forward Deployed AI... ...-backed startup applying AI agents to billions of events daily to... ...in production. Strong coding skills in Python, Java, TypeScript... ...(precision, recall, evaluation). Understanding how LLM systems...
Work at office
Visa sponsorship
Jenn Nguyen and Friends
San Francisco, CA
5 days ago
AI Evaluation Engineer Data-Driven Contract Intelligence
...Ironclad, located in San Francisco, is seeking an AI Evaluation Engineer to join their team. This role involves analyzing datasets, designing feedback loops, and partnering closely with AI Engineers to improve model quality. Applicants should have 8+ years of experience...
Contract work
Ironclad Inc
San Francisco, CA
22 hours ago
AI Engineer, Production Agents
...re looking for a founding engineer focused on building production agents—someone who will push our... ...Agents on a New AI Platform This isn’t a typical... ...they can operate on real code and real systems. Make Agents... ...Collaborate Closely with Product & Evaluation: Work with PMs and...
Flexible hours
Guild.ai, Inc.
San Francisco, CA
21 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Evaluation Engineer for Coding Agents. Be the first to apply!