AI Evaluation Engineer for Coding Agents
Repovive, Inc.
##### ###### ##### ### # # ### # # ######## ## ## ## ## ## ## # # # # # ####### #### ##### # # # # # # # ###### # ## ## ## ## # # # # # #### # ###### ## ### # ### # ###### $ curl repovive.com/jobs/69ed18d7682d4cf1d9e87166 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ █████████████████████████████████████████████████ ##### ###### ##### ### # # ### # # ######## ## ## ## ## ## ## # # # # # ####### #### ##### # # # # # # # ###### # ## ## ## ## # # # # # #### # ###### ## ### # ### # ###### ##### ###### ##### ### # # ### # # ######## ## ## ## ## ## ## # # # # # ####### #### ##### # # # # # # # ###### # ## ## ## ## # # # # # #### # ###### ## ### # ### # ###### $ curl repovive.com/jobs/69ed18d7682d4cf1d9e87166 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ █████████████████████████████████████████████████ ##### ###### ##### ### # # ### # # ######## ## ## ## ## ## ## # # # # # ####### #### ##### # # # # # # # ###### # ## ## ## ## # # # # # #### # ###### ## ### # ### # ###### $ curl repovive.com/jobs/69ed18d7682d4cf1d9e87166 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ █████████████████████████████████████████████████ Repovive © 2025 Repovive, Inc. All rights reserved. Back to Jobs Apply Now Compensation Not listed Posted April 25, 2026 Required Skills AI evaluation data pipelines agent instrumentation Requirements Mid/Senior Visa Sponsorship Not mentioned Relocation Not mentioned About the Role Build evaluation and quality systems for Cursor's coding agents. Interested in this role? Apply directly on Cursor's website Apply for this Position #J-18808-Ljbffr
$100k - $150k
...Most AI systems generate text. We’re building one... ...makes decisions . Pareto Agent is a policy-driven... ...Role As our Founding AI Engineer - Agent Runtime , you will... ...are constrained, evaluated, and enforced by a deterministic... ...— you treat coding agents as an execution...SuggestedSummer workWork at officeFlexible hours- ...AI Systems Engineer - Codex Core Agents About The Team The Codex Core Agents team builds the agent harness... ...part of how models are trained and evaluated, making this one of the highest-... ...model outputs, use tools, execute code, and complete long-horizon tasks safely...Suggested
- ...Applied AI Engineer The Codex Core Agent team builds the kernel of Codex. We own making the agent better... ...on agent behaviors across real-world coding tasks and long-horizon workflows.... ...that get better real-task data into evaluation and research. Work with product teams...Suggested
$180k - $215k
...looking for a seasoned Product Engineer to own and accelerate AI across Rive's editor.... ...things: Improving Rive's AI agent - We shipped an AI agent that... ...of design, animation, and code. Most AI tools focus on one... ..., APIs, and techniques and evaluate how they apply to Rive. Work...SuggestedFull timeWork experience placementWork at officeRemote work- Anysphere is seeking a Software Engineer for the Agent Quality team in San Francisco, CA. In this role... ...design and build infrastructure to evaluate and improve ML agents. Responsibilities... ...Ideal candidates will have experience in AI evaluations, data analysis, and solid software...Suggested
- AI Systems Engineer - Codex Core Agents Location San Francisco Employment Type Full time Department Applied... ...model outputs, use tools, execute code, and complete long-horizon tasks safely... ...development environments. Develop evaluation, experimentation, and debugging...Full timeWork at officeLocal areaRelocation packageFlexible hours
$140k - $225k
...Technical Staff — SketchPro.ai Location: San Francisco... ...grunt work through AI agents operating directly in... ...What You'll Own Agent engineering across context design,... ...broader AEC ecosystem Evaluation harnesses to determine... ...video calls Heavy daily coding agent usage (Claude Code...Full timeH1bWork at officeVisa sponsorship$100k - $250k
...A pioneering AI software firm is seeking a Senior or Staff AI Context & Harness Engineer in San Francisco. This role involves building and maintaining AI coding agents, researching improved performance methods, and employing advanced context engineering techniques. Candidates...Work at officeRemote work- ...Brain Co. in San Francisco is looking for a Security Engineer for Applications & AI. You will be responsible for integrating security practices into... ...of application security experience and proficiency in coding. Competitive salary, daily lunches, and strong team collaboration...
- ...foundation of useful, agentic AI. We are here to take a big swing at the most ambitious engineering challenge of our careers. Everyone... ...of an effective background agent. Let's design and build the rest... ...Qwen, Llama) performance. Claude Code is all about agent/harness...Work at officeImmediate start
- Cacheflow is seeking a Senior Applied Research Engineer to enhance the effectiveness of our AI systems through focused research and experimentation. This role involves designing information retrieval strategies and collaborating with engineers to turn validated approaches...Flexible hours
- ...Turn/River in San Francisco is seeking a VP of Agent Engineering to enhance engineering capacity in portfolio... ...collaborate with CTOs and CPOs to build AI agents that streamline feature development, combining hands-on coding with strategic leadership. The ideal candidate...
$170k - $216k
...Software Engineer, Quantitative Evaluations Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver... ...software changes and simulated outcomes. Champion code health and best practices in a large and complex code base...Full timeRemote work$150k - $180k
...AI Evaluations Engineer – HealthcareLocation: Remote, located in the USType: Full-timeDepartment: EngineeringReports... ...testing platform for AI voice agents, debugging and observability tools.... ...production-grade infrastructure with code, including APIs, services, and data pipelines...Remote workFlexible hours- ...AI Engineer Opportunity at Goodfin Goodfin is an AI-native investment... ...and improve RAG pipelines, evaluations, and reliability mechanisms.... ...particularly around LLM ecosystems and agent frameworks. Who You Are... .... ~ Strong hands-on coding experience in Python and...
- ...AI Prompt Engineer San Francisco, CA (On-Site M-F) Our client is an... ...call center and scheduling agents. About the Role As an... ...sub-agent architectures, and evaluation harnesses to iteratively improve... ...effectively using modern AI coding tools. ~ On-site...
- ...their hybrid cloud and AI journeys. With support... ...an AI Forward Deployed Engineer, you will work with customers... ...and adoption. Evaluate Model Performance: Assess... ...developing or working with agent‐based AI solutions (e.g... ...engineering: Strong coding skills (ideally in Python...Worldwide
- ...we’re transforming how engineers create, access, and share... ...looking for a Founding AI Engineer to help us... ...including architecture, coding, testing, and deploying... .../or Node.js You can evaluate tradeoffs and propose the... ...already built your own agents) You have fine-tuned...Work experience placementWork at officeFlexible hours
- ...Accenture’s Global Responsible AI team within the Global Data &... ...if you’re an experienced RAI Engineer with a Responsible AI background... ...practices. Detecting, evaluating, and applying relevant RAI dimensions... ...data preparation, design, coding, testing, deployment, and support...Work experience placementLive inWork at officeLocal area
- ...A pioneering AI technology firm based in San Francisco is seeking an AI Engineer to own the evaluation infrastructure for AI agents. This role requires designing automated pipelines and building observability systems, ensuring agent performance meets enterprise standards...Remote workFlexible hours
$164.7k - $266k
...and implement end‑to‑end AI workflows that power... ...briefs into concrete, agent‑powered flows: from data... ...Partner with IT, Data Engineering, and platform teams to... ...language models (LLMs), coding assistants, or agentic... ...retrieval strategies, LLM evaluation frameworks, and common...Work at officeRemote work2 days per week$150k - $250k
...Distyl AI Job Posting Distyl is an applied AI... ...build AI systems using Evaluation-Driven Development —an... ...production. AI Evaluation Engineers focus on designing and... ...production Python code, build evaluation pipelines... ...inform prompt design, agent logic, model selection,...Work at office3 days per week- ...Are The Agentic AI Software Engineer - Cybersecurity Systems designs... ...on building and maintaining agent-based artificial... ...of autonomously generating code, conducting security analyses... ...cybersecurity use cases. Develop evaluation metrics for AI accuracy in threat...Local areaWork from home
$170k - $210k
...Senior Software Engineer, AI Engineer Hybrid - SF Bay Area About... ...to be fluent with modern AI coding tools (Claude Code, Cursor, Copilot... ...healthcare-grade safety and evaluation. What You Will Do... ...design, retrieval, tool use, agents, evaluation, and production operations...Work at officeImmediate startRemote workShift work2 days per week- ...Ironclad Inc. is seeking an AI Evaluation Engineer to enhance contract management through AI. Located in San Francisco, the role involves analyzing datasets, designing feedback loops, and ensuring continuous improvement of ML systems. Ideal candidates will have a quantitative...Contract workFlexible hours
- ...A cutting-edge AI firm in San Francisco is seeking a Research Engineer to develop evaluation systems and benchmarking pipelines for language models. Candidates should have a strong background in applied research, coding skills, and familiarity with ML models. You will...
$164.7k - $266k
...campaigns to autonomous, AI‑driven customer journeys. As an AI GTM Engineer on the Growth team, you’... ...AI‑powered workflows, agents, and tooling that make Marketing... ...language models (LLMs), coding assistants, or agentic... ...strategies, LLM evaluation frameworks, and common failure...Contract workWork at officeLocal areaRemote work2 days per week$115k - $200k
...About the job Forward Deployed AI Engineer Forward Deployed AI... ...-backed startup applying AI agents to billions of events daily to... ...in production. Strong coding skills in Python, Java, TypeScript... ...(precision, recall, evaluation). Understanding how LLM systems...Work at officeVisa sponsorship- ...Ironclad, located in San Francisco, is seeking an AI Evaluation Engineer to join their team. This role involves analyzing datasets, designing feedback loops, and partnering closely with AI Engineers to improve model quality. Applicants should have 8+ years of experience...Contract work
- ...re looking for a founding engineer focused on building production agents—someone who will push our... ...Agents on a New AI Platform This isn’t a typical... ...they can operate on real code and real systems. Make Agents... ...Collaborate Closely with Product & Evaluation: Work with PMs and...Flexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to AI Evaluation Engineer for Coding Agents. Be the first to apply!
- senior ai engineer San Francisco, CA
- ai ml engineer San Francisco, CA
- ai engineer remote San Francisco, CA
- ai engineer San Francisco, CA
- ai prompt engineer San Francisco, CA
- ai developer San Francisco, CA
- ai research engineer San Francisco, CA
- machine learning ai engineer San Francisco, CA
- booking agent San Francisco, CA
- sourcing agent San Francisco, CA

