Senior AI Quality Engineer — Agent Evaluation & Testing
$176k - $253kHarper
Harper is seeking a Senior Member of Technical Staff, AI Quality, in San Francisco. Your main goal will be to turn agent quality into quantifiable metrics, ensuring high standards through robust evaluation processes. You'll build capability regression evaluation suites, design grading systems, and work directly with engineers to ensure our AI systems excel. Ideal candidates have 3–6 years of software experience, particularly in LLM and agent evaluations. Competitive compensation includes a base salary of $176,000–$253,000, with equity options and benefits like meals and a gym membership. #J-18808-Ljbffr Harper
- Anysphere is seeking a Software Engineer for the Agent Quality team in San Francisco, CA. In this role, you... ...design and build infrastructure to evaluate and improve ML agents. Responsibilities... ...candidates will have experience in AI evaluations, data analysis, and solid...Suggested
$96.8k - $306.4k
...Job Description The Senior Principal AI Agent / ML Software Engineer is a Senior Staff-level,... ..., memory, retrieval, evaluation, guardrails, and cloud services... ...eval suites, regression testing, experimentation, safety... ...to contribute high-quality production code, reviews...SeniorTemporary workFlexible hours- OutSystems is seeking a Senior AI Quality Engineer in San Francisco to lead quality initiatives in an AI-powered environment. This role includes defining test strategies, mentoring engineers, and ensuring reliable product delivery. The ideal candidate has extensive experience...Senior
$204k - $235k
OutSystems, Inc. is seeking a Senior AI Quality Engineer to lead quality management within their AI-integrated product ecosystem. In this role, you will define testing strategies, implement automation processes, and mentor junior engineers. The ideal candidate will have...Senior- Ellipsis Health is seeking a Forward Deployed QA Engineer to ensure the quality of its conversational AI product, Sage. The role requires expertise in software... ...will engage with clients to perform rigorous testing and analysis. This position is based in the San Francisco...SeniorRemote jobFlexible hours
- OutSystems Inc. is looking for a Senior Quality Engineer in San Francisco to lead quality initiatives within our AI product ecosystem. You will define and implement comprehensive testing strategies, focusing on automation and AI metrics, ensuring the reliability and success...Senior
- Cacheflow is seeking a Senior Applied Research Engineer to enhance the effectiveness of our AI systems through focused research and experimentation. This role involves designing information retrieval strategies and collaborating with engineers to turn validated approaches...SeniorFlexible hours
$160k - $207k
...gets smarter as you build, with AI that learns your context to... ...Gartner in Application Security Testing and is trusted by leading... ....dev. About the role As an AI engineer, you’ll apply LLM technologies... ...powered solutions and rigorously evaluate the efficacy of different prompts...SeniorCurrently hiringLocal areaRemote workWeekend work3 days per week$200k - $290k
...is building production-ready AI agents that handle complex, real-world... ...at scale. As a Full-Stack Engineer on the Agent Engineering team... ...performance. Integrate and evaluate cutting-edge text and voice models... ..., maintainable code, strong testing practices, and thoughtful...- black.ai is seeking a Senior Research Engineer in San Francisco to develop next-gen generative video and audio technology. This role significantly impacts... ...designing context for multi-turn sessions, building evaluation metrics, and closely collaborating with product and...Senior
- Principal AI Engineer (LLM Agents & Orchestration) Role Title: Principal AI Engineer (LLM Agents & Orchestration) Focus... ...memory and context awareness across sessions. Evaluation & Observability: Establish a rigorous testing framework for non‑deterministic model outputs...
$194k - $239k
...Senior Agentic Ai Engineer Hover helps people design, improve, and... ...focused on building, testing, and improving production... ...of: multi-agent orchestration production AI systems evaluation and reliability engineering... ...to deliver high-quality AI experiences. Contribute...SeniorFull timeFor contractorsWork at officeLocal areaFlexible hours- ...re building an agentic AI caregiver advocate... ...coordinates across multiple sub-agents to get things done. It... ...over time. The AI engineering challenge: build an... ...-use frameworks, and evaluation infrastructure that... ...good enough. Develop testing infrastructure for multi...SeniorImmediate startRemote workFlexible hours
- ...Senior AI Engineer Disney Entertainment and ESPN Product & Technology... ...You will create shared agents, initializers,... ...governance, observability, and evaluation, so teams can deliver high-quality AI solutions quickly—... ...in Python (libraries, testing, packaging) and...Senior
$141.9k - $190.3k
...Senior Software Engineer - AI Core Engineering Disney Entertainment and... ...You will create shared agents, initializers,... ...governance, observability, and evaluation, so teams can deliver high-quality AI solutions quickly—... ...in Python (libraries, testing, packaging) and...Senior$240k - $280k
A leading software monitoring company is seeking a Senior Software Engineer on its AI/ML team to build evaluation infrastructure for measuring the performance of AI systems. This role involves designing datasets, creating benchmarks, and ensuring AI features behave reliably...Senior$170k - $210k
...Summary At the Innovation & Engineering Center California (IECC) we conduct... ...models in real vehicles, evaluate testability, and ensure... ...documentation skills. Desired Skills AI ethics: bias mitigation and... ...‑employment substance abuse testing. #J-18808-Ljbffr...SeniorContract workOverseas- ...inventive research, design, and engineering. Our organization is very flat... ...As a Software Engineer on the Agent Quality team at Cursor, you’ll build the measurement, evaluation, and feedback-loop... ...Designing and building best-in‑class AI evaluation system: curated datasets...
- B Capital is seeking a highly skilled AI Platform Engineer to enhance their ML/AI platform that powers autonomous AI agents at scale. This pivotal role combines software engineering... ...agent harness infrastructure, implement evaluation frameworks, and ensure a seamless journey...Senior
$137k - $188k
...reporting to the Forensic Engineering Manager, the Senior Compliance Engineer is a key... ...Essential Job Functions: Product Testing and Analysis Test consumer... ..., and multimedia framework evaluation. Analyze Android and iOS... ...gather intelligence. Use AI‑assisted tools to support product...SeniorFull timeWork at officeLocal areaRemote workWorldwide$216k - $270k
About Scale AI Scale AI is the data... ...Role Overview As a Senior Forward Deployed AI Engineer on our... ...configure AI models and agents within customer... ...sources Implement evaluation frameworks to... ...experimentation and A/B testing to improve model... ...the high‑quality data and full‑...SeniorFull time- Drata is seeking a Senior Applied Research Engineer to enhance the quality of AI systems through rigorous evaluation and experimentation. This role emphasizes applied research, focusing on information retrieval and reasoning strategies. The ideal candidate will bring 5+...Senior
$130k - $160k
...Senior Quality Engineer – Design Assurance (Firmware / Electrical Systems) An innovative, well-... ...and assess quality, validation, and testing impacts. Develop and execute test... ...Experience developing test methodologies and evaluating the impact of design changes on...SeniorContract work$124k - $280k
...expertise, and network to deliver quality results. You motivate and... ...through innovative, AI-driven solutions. As a Senior Manager, you will lead... ...strategy, transformation and engineering projects and teams Design... ...with team members. We evaluate these factors thoughtfully...SeniorFull timeH1b- ...at the intersection of AI, biology, chemistry, and large-scale engineering. Our goal is to translate... ...systems. The Role As a Senior AI/ML Engineer, you will... ...Do Design, train, and evaluate large-scale models, including... ...: clean code, testing, reproducibility, and debugging...SeniorRemote workFlexible hours
$155k
...About the Team The Quality Engineering team builds the shared testing infrastructure and... ...are looking for a Senior Software Engineer,... ...in implementing how AI reshapes quality engineering... ...that enable AI agents to validate the... ...Experience using or evaluating AI-powered engineering...SeniorContract workLocal areaHome officeFlexible hoursShift work- ...B consulting Industry. AI Transformation will be... ...will be delivered through agents. We built that. Our AI... ...for an AI Product Engineer who sits at the intersection... ...you prototype ideas, evaluate tradeoffs, and ship MVPs... ...prototype rapidly and test product directions with...SeniorWork at officeLocal areaRelocation package
$86.5k - $142.7k
...prototypes and builds modern, AI‑enabled applications and... ...‑of‑concept, and guiding engineering teams through complex... ...search, prompt orchestration, evaluation and guardrails. Author... ...and raise technical quality. Leverage AI coding and testing tools to accelerate development...SeniorSummer holidayFlexible hours$120k - $140k
...manage User Acceptance Testing (UAT) deadlines,... ...maintaining rigorous quality standards. Ellipsis... ...Forward Deployed QA Engineer, you will occupy a critical... ...core conversational AI product, Sage, across... ...by configuring shadow agents. Prompt Evaluation & Optimization: Apply...SeniorFull timeRemote workFlexible hours- Perplexity is seeking energetic engineers to join our highly driven Agents engineering team. The Agents team consists of AI/ML, backend, and full-... ...Ensure a high craft and quality bar, in both AI agent... ...reliability, code quality, AI evaluation, testing, and maintenance across...Flexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior AI Quality Engineer — Agent Evaluation & Testing. Be the first to apply!
- ai research engineer San Francisco, CA
- ai developer San Francisco, CA
- ai prompt engineer San Francisco, CA
- ai engineer San Francisco, CA
- senior ai engineer San Francisco, CA
- ai ml engineer San Francisco, CA
- ai engineer remote San Francisco, CA
- machine learning ai engineer San Francisco, CA
- senior manager quality engineering San Francisco, CA
- quality assurance quality control engineer San Francisco, CA


