Senior AI Agent Engineer - Open Models & Evaluation Systems
Sail Research
Sail is the foundation of useful, agentic AI. We are here to take a big swing at the most ambitious engineering challenge of our careers. Everyone working at Sail will become an expert; nothing less will do in our immensely competitive market. Inference is just one piece of an effective background agent. Let's design and build the rest of the system, that turns billions of tokens into the best possible answers. What you’ll do Design custom evals for multi-turn, massively parallel agents. Build agent harnesses to improve open model (Deepseek, Qwen, Llama) performance. Claude Code is all about agent/harness codesign; let's do the same for open source! Automate prompt optimization techniques like DSPy. What we’re looking for Experience building AI agents. Familiarity with open source models. Interview process Meet the CEO. This is the first step because we respect your time. Ask any question and get a definitive answer immediately. Meet the CTO, who will ask about your experience, and share as much technical detail about Sail as you want to hear. Come in to Sail's SF office for an interview day. Meet the whole team, then you'll have 3-4 hours to work on a problem that closely simulates the work we do daily. It's an objectively scored task, so you'll have immediate feedback on how well your code is working - just like we do in production! AI assistance is highly encouraged, and we'll provide a laptop with all the best tools set up. Finish with a short presentation describing your process, learnings, and results. Offer. Once the team decides we want to work with you, we make a strong offer quickly and will be quite persistent over email/text/calls :) Life at Sail We work out of a beautiful, sunny office in downtown San Francisco. All meals are on us (and actually great; SF is a food paradise and it would be a shame to eat only bowl slop). Everyone gets a Studio Display at their desk. We are serious about investing in anything that saves us time or energy. There are six different ways to make coffee or tea in the office. A friendly (hypoallergenic) black cat named Coco visits occasionally. #J-18808-Ljbffr Sail Research
- B Capital seeks a talented individual for an AI Evaluation role in San Francisco. This position involves conducting... ...comparative analysis, refining evaluation systems, and collaborating with various teams to enhance model capabilities. The ideal candidate will have strong...Suggested
- ...and optimizing features for an AI runtime and SaaS platform. The... ...years of experience in backend systems, proficiency in Python and C++,... ...teams. You will contribute to open-source initiatives and help shape... ...position offers a hybrid working model with a hands-on approach to AI...Senior
$176k - $253k
Harper is seeking a Senior Member of Technical Staff, AI Quality, in San Francisco... ...goal will be to turn agent quality into... ...standards through robust evaluation processes. You'll build... ...suites, design grading systems, and work directly with engineers to ensure our AI...Senior- AI Systems Engineer - Codex Core Agents About the team: The Codex Core Agents team builds... ...agent harness that turns model capability into real-world... ...execution, orchestration, evaluations, production reliability,... ...quality. The harness is open source and increasingly part...Suggested
$124k - $280k
...Data, Analytics & AI Industry/... ...data and analytics engineering focus on leveraging... ...optimising algorithms, models, and systems to enable... ...relevant. Initiate open and honest... ...health plans. As a Senior Manager, you will... ...team members. We evaluate these factors thoughtfully...SeniorFull timeH1b- Wallman Unlimited Company is seeking an AI Engineer to design and implement core systems for autonomous financial audits in San Francisco. This role offers... ...architecture to deployment, requiring strong skills in AI agents and full-stack programming. You will work in a fast-...Senior
- A technology firm specializing in AI solutions is seeking an experienced AI/ML Engineer located in San Francisco. This role involves designing and implementing autonomous AI agent systems and developing feedback mechanisms for self-improvement. Candidates should have over...Senior
- United States Digital Space LLC is seeking an AI Systems Engineer to build the core systems that enhance Codex agents' performance in production. You’ll collaborate with research and infrastructure teams to design agent harness capabilities and ensure reliability across...Senior
- A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity...
- Cacheflow is seeking a Senior Applied Research Engineer to enhance the effectiveness of our AI systems through focused research and experimentation. This role involves designing information retrieval strategies and collaborating with engineers to turn validated approaches...SeniorFlexible hours
$240k - $280k
...software monitoring company is seeking a Senior Software Engineer on its AI/ML team to build evaluation infrastructure for measuring the performance of AI systems. This role involves designing... ...The position offers a hybrid work model and a salary range of $240,000 to $...Senior$159.2k - $301.6k
..., reusable design systems, and collaboration... ...next generation of AI‑native creative... ...both users and AI agents. Our mission is to... ...forward‑thinking engineers who are excited to... ...experience. Develop evaluation and quality frameworks... ...: If this role is open to hiring in...SeniorTemporary workLocal area$231k - $340k
Harvey is seeking a Senior AI Engineer in San Francisco, CA, to design and enhance their AI platform, focusing on model integration, evaluation, and shared infrastructure. Candidates should have 8+ years of backend systems experience, including AI/ML engineering, and a...$105.8k - $174.8k
...skills and ambitions. As a Senior AI Native Engineer, you will be at the... ...and implementing scalable AI systems that learn and make predictions... ...to improve high‑performance models. This position may have travel... ..., transforming data and evaluating results to make meaningful...SeniorFull timeWork experience placementSummer holidayFlexible hours- Senior AI Architect - Multi-Agent Systems & Platform Infrastructure Senior AI Architect - Multi... ...Orchestration / Head of Engineering Seniority: Senior-Level (... ...and refine test plans, evaluation pipelines, and debug tools... ...LLMs • Contributions to open-source AI orchestration or...SeniorFull timeWork at officeRemote work
- ...AI Systems Engineer Transluce is a fast-moving research lab building the... ...set industry standards for evaluation. We are a non-profit with a... ...cross-organisational reach (open-source tools the entire community... ...enough to allow complex model introspection and intervention...Flexible hours
- Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation to build the systems that ensure multimodal AI behaves reliably, consistently... ...with HuggingFace ecosystem or open-source ML toolkits. Experience building...
$215k - $230k
...trajectory. The AI Engineering Team is chartered... ...on Large Language Models (LLMs) and agentic systems . Our mission is... ...deeply involved in evaluating and integrating... ...tools in the LLM and agent space — including open‑source stacks,... ...knowledge sharing. Senior Engineer:...Local areaRemote work$124k - $280k
...Competency: Data, Analytics & AI Industry/Sector:... ...in data and analytics engineering focus on leveraging... ...algorithms, models, and systems to enable intelligent... ...relevant. Initiate open and honest coaching conversations... ...with team members. We evaluate these factors...SeniorFull timeH1b- A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates...Senior
- Crusoe is seeking a Senior Staff Software Engineer for the AI Model Lifecycle team in San Francisco, CA. The candidate will manage fine-tuning systems and training pipelines for large language models, contributing to the development of AI solutions. The ideal candidate...Senior
$160k - $207k
...as you build, with AI that learns your context... ...the role As an AI engineer, you’ll apply LLM... ...and rigorously evaluate the efficacy of different prompts and models through experimentation... ...provided you are open to learning them quickly... ...employee with a system that equally...SeniorCurrently hiringLocal areaRemote workWeekend work3 days per week- Block, Inc. is seeking senior AI engineers in San Francisco to design and develop innovative conversational AI systems. The role involves training language models, collaborating with various teams, and contributing to AI infrastructure handling millions of interactions....SeniorFull time
$166.7k - $225.9k
...Hybrid Department Engineering Job Summary Drata... ...on experience — and AI is at the center of... .... We are seeking a Senior AI Product Engineer... ...capabilities of LLMs, agents, and RAG pipelines... ...; surface where model outputs break down... ...agents Exposure to RAG system design - not as an...SeniorFull time- Drata is seeking a Senior Applied Research Engineer to enhance the quality of AI systems through rigorous evaluation and experimentation. This role emphasizes applied research, focusing on information retrieval and reasoning strategies. The ideal candidate will bring 5+...Senior
- Anysphere is seeking a Software Engineer for the Agent Quality team in San... ...and build infrastructure to evaluate and improve ML agents. Responsibilities... ...include creating evaluation systems, defining quality metrics,... ...will have experience in AI evaluations, data analysis,...
$150k
Tzafon is seeking a skilled engineer to enhance their machine intelligence systems in San Francisco. As part of the team, you'll be responsible for building evaluation infrastructure, designing data pipelines, and implementing fine-tuning processes. Ideal candidates have...- Build autonomous AI agents that form feedback-driven, self-improving systems for enterprise operations. Python TensorFlow... ...+ years of experience in AI/ML engineering, Strong background in Python... ...platforms, Knowledge of large language models and agentic AI systems,...Senior
- About Scale AI Scale AI is the data... ...Overview As a Senior Staff Forward Deployed AI Engineer on our... ...adoption of AI systems in production environments... ...configure AI models and agents within customer... ...Implement evaluation frameworks to measure... ...to open‑source AI/ML projects...Senior
$225.4k - $257.2k
...responsible and reliable AI systems, changing banking for... ...applied science and engineering teams to deliver our industry... ...of customers. Our AI models and platforms empower... ..., guardrails, model evaluation, experimentation,... ...Leverage a broad stack of Open Source and SaaS AI...SeniorFull timePart timeLocal area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior AI Agent Engineer - Open Models & Evaluation Systems. Be the first to apply!
- signing agent San Francisco, CA
- work from home chat agent San Francisco, CA
- title agent San Francisco, CA
- cruise agent San Francisco, CA
- import export agent San Francisco, CA
- remote chat agent San Francisco, CA
- executive protection agent San Francisco, CA
- commissioning agent San Francisco, CA
- showing agent San Francisco, CA
- airport agent San Francisco, CA

