Research Engineer: Build Self-Improving Agent Systems
Judgment Labs Inc.
Judgment Labs builds infrastructure for Agent Behavior Monitoring (ABM). While traditional observability focuses on logging exceptions and latency, our ABM surfaces behavioral anomalies such as instruction drifts and context retrieval loss in scaled production environments. Hundreds of teams building autonomous agents rely on Judgment to understand how their systems are behaving post-deployment. Instead of reactive incident triage, they cluster patterns across conversations and workflows, correlate regressions to specific interaction types, and pinpoint where reliability breaks down in their usage context. We’ve raised $30M+ across two rounds in the past five months. Our investors include Lightspeed, SV Angel, Valor Equity Partners, Nova Global, Chris Manning, Michael Ovitz, Michael Abbott, Cory Levy, Kevin Hartz, and others. The Role: We are looking for Research Engineers to build AI systems that use agent interaction data to help us understand how agents behave, evaluate them at scale, and improve them through learning and feedback. Your research will not live on a whiteboard. You'll work directly with real-world agent data, apply frontier methods in production, and see your work ship immediately into the product. By making agent behavior measurable and debuggable, your systems will support teams deploying agents across finance, legal, operations, and other high-stakes workflows. You will own projects end-to-end, with significant autonomy, and work closely with the team to build self-improving agent systems. What You'll Do: Build systems to aggregate, index, and analyze large-scale agent interaction data to extract meaningful evaluation signals Develop agent-based systems for analyzing and evaluating complex, long-running behaviors Design and implement post-training and optimization workflows to improve agent behavior Build internal tools and infrastructure to support rapid experimentation, analysis, and training What We're Looking For: You identify with at least one of the following: You care about data quality, evaluation, and benchmarking, and are comfortable working hands-on with messy data You have experience building agent systems and working with them in real-world or production settings You have a strong background in reinforcement learning, agents, or machine learning fundamentals You are comfortable working across infrastructure and systems, spanning training, data pipelines, and model serving. You are comfortable working across teams to translate research into product, balancing real-world customer constraints and tradeoffs. You enjoy turning ambiguous problems into clear, well-designed plans Why Judgment? Agents can’t work without this. Today’s agents hallucinate, drift, and break in production. We’re building the infrastructure that fixes this: the monitoring layer that makes agents self-improving. We’re wired to win. We're a team of less than 20 but we ship like 50+ on the daily. You'll be working with olympiad medalists, debate champions, and competitive athletes who bring that same intensity to company building. Fast track to founding. Our engineers interface directly with customers, ship code into their environments, and use their feedback to dictate what’s next on the roadmap. Everyone on the team is either an ex-founder or a founder-to-be. We make sure our people do their best work. If you deserve a spot on the team, money will never get in the way of it. Full benefits, Equinox, and a private chef to take care of you. We sprint hard but we play hard, ask us about our Smash/Mario Kart tournaments. #J-18808-Ljbffr Judgment Labs Inc.
$300k
Research Engineer, Agent Systems One of the most mission-driven organizations in AI is building the infrastructure that makes intelligent agents safe... ..., act, fail, recover, and improve in production. $300K - $6... ...agents can operate and self-validate safely Continuously...SuggestedVisa sponsorship- ...San Francisco is looking for a candidate to drive research initiatives that influence engineering solutions. You'll build evaluations using real tool data, tackle search challenges for tools, and train systems for improved accuracy. Ideal candidates will have research...Suggested
- Embedding VC is seeking a founding engineer to help build core products and systems. Work directly with the CEO and CTO as part of an experienced team. You will design AI systems, implement features, and enhance product usability. The ideal candidate has over 5 years of...SuggestedFlexible hours
$160k - $300k
...meaningful use cases. The Agents team builds everything from... ..., multi-source research. We’ve built our... ...by distributed systems built for scale.... ...LLM inference engine - a distributed,... ...business problems, improving processes, and enhancing... ...* Voluntary Self-Identification...SuggestedContract workFor contractorsFor subcontractorWork at office$264.8k - $331k
...are doubling down on building out state of the art post... ...necessary for complex agents in enterprises around... ...The Enterprise ML Research Lab works on the front... ...As an ML Sys Research Engineer, you'll work on building... ...technologies to optimize our ML system. Your customer will be...SuggestedFull time- Judgment Labs in San Francisco seeks a Research Engineer to develop AI systems analyzing agent interaction data. This hands-on position involves building self-improving systems that support teams across finance, legal, and operations. Ideal candidates have experience in...
- The Role As an Applied Research Engineer , you will serve as... ...language processing systems. You will be instrumental... ...; experience building with foundation models... ...solving business problems, improving processes, and... ...LLM applications and agents is a plus. Excellent...
$300k
Aionia Group in San Francisco is looking for a Research Engineer, Agent Systems. This role involves developing foundational systems that ensure agent reliability and safety in real-world applications. You will work directly with top researchers in a mission-driven environment...$190k - $270k
...Team The company AI Research organization is... ...advantage, and we’re building the models and agents that unlock it. Our... ...advanced multi‑agent systems. The Data Agent... ...by shipping direct improvements to Genie, the company... ...exploration with product and engineering rigor. Clear...Worldwide$200k - $350k
...Labs Judgment Labs builds infrastructure for Agent Behavior Monitoring (... ...understand how their systems behave post-deployment... ...hiring an Agent Product Engineer to build high-taste products for self-learning agents. The... ...Build, evaluate, and improve agents that power...H1bWork at officeRelocationVisa sponsorship$295k - $380k
...The Team The team works on research and systems that advance frontier models... ..., which means we also build the infrastructure needed to... ...The Role This is a systems engineering role focused on ML training... ...express and harder to misuse. Improve reliability, debuggability,...$120k - $200k
...We are actively seeking a Research Engineer specializing in Machine Learning... ...technical expertise to build scalable systems, all within the innovative... ...creators, encouraging self-expression, and enabling users... ...testing and iterative improvement processes to optimize the...Casual workWork at office- ...fast-growing enterprise AI startup in San Francisco, is seeking an AI/ML Research Engineer. This role is pivotal as you will join an elite founding team, working on designing multi-agent systems and vision-language models. Your research will rapidly transition into production...
$264.8k - $331k
...Meta, we are doubling down on building out state of the art post-training... ...necessary for complex agents in enterprises around the world. The Enterprise ML Research Lab works on the front lines of... ...that enable complex multi-agent systems to directly learn from both process...Full time$220k - $280k
...the role In your role as Senior Research Engineer, you'll be at the heart of building the next generation of generative... ...Storytelling team builds the agentic systems behind Canva's video product. We... ...to help define how Canva's video agents think, plan, and ship. You’ll...Work at officeLocal areaFlexible hours$250k
Acceler8 Talent is hiring engineers in San Francisco for a rapidly growing AI startup focused on building and deploying production AI systems. The team is deploying multi-agent AI systems and large-scale automation platforms, requiring strong engineering fundamentals and...$250k - $300k
At Labelbox, we're building the critical infrastructure that powers... ...breakthrough AI models at leading research labs and enterprises. Since 2... ..., and quality control systems that enable teams to produce... ...benchmark and evaluate autonomous agent capabilities. Design agent-...Work at officeFlexible hours2 days per week$231k - $340k
...rare chance to help build a generational... ...expert feedback and agent traces into models... ...are looking for a research engineer who can help scale... ...for someone who can self‑manage model... ...validation loops that improve quality on long‑horizon... ...and reward systems that are reliable...$180k - $270k
Research Engineer (Focused on Search/IR) You'll own and advance... ...information retrieval systems at the core of... ...search role where you'll build and operate everything... ...to connect search/IR improvements with model training and... ...incremental processing. Self‑directed experimenter...Full timeTemporary workRemote work- Tykhe Inc in San Francisco, CA is seeking a Research Engineer who will be responsible for designing experiments and building task generation systems. You will work on generating realistic curricula and transforming research prototypes into reliable systems. The ideal candidate...
- Senior AI Architect - Multi-Agent Systems & Platform Infrastructure Senior... ...& Orchestration / Head of Engineering Seniority: Senior-Level (... ...About Nivalto + AURA Nivalto is building AURA — the world’s first... ..., predictive analytics, and self-healing orchestration to ensure...Full timeWork at officeRemote work
- Gallop Intelligence Inc. in San Francisco is looking for innovative individuals to build production AI agents for Fortune 500 companies. You will be responsible for architecting and owning systems from start to finish, ensuring impactful deployments while working in a fast-...Internship
- ...Wonderschool Wonderschool builds software and systems that help businesses... ...building systems to improve compliance,... ...already deployed a multi-agent system using... ...operate across product, engineering, design, data, and operations... ...large teams Not a research or experimentation...Immediate startShift work
$160k - $250k
...'re a team of founders, engineers, researchers, creatives, and operators building what we believe will be... ...engineers shaping the core systems that power Blok. You won’t just build agents - you’ll design the... ...and how their behavior improves over time . This is a deeply...Work at officeWeekend work3 days per week$310k
...reinforcement learning research, building next-generation... ...Role As a Research Engineer/Research Scientist... ...and general-purpose agents, including the systems that power various... ...research. You're a self-starter who takes initiative... ...to debug and improve it. You have a deep...Work at officeRelocation package$350k
...interpretable, and steerable AI systems. We want AI to be safe... ...group of committed researchers, engineers, policy experts, and... ...working together to build beneficial AI systems.... ...Interface with and improve our internal technical... ...Status Select... Voluntary Self-Identification For...Full timeContract workFor contractorsFor subcontractorWork at officeVisa sponsorshipFlexible hours$280k
...and steerable AI systems. We want AI to be... ...group of committed researchers, engineers, policy experts,... ...together to build beneficial AI systems... ...misalignment to improve our empirical understanding... .... Run multi-agent reinforcement... ...Select... Voluntary Self-Identification...Contract workFor contractorsFor subcontractorWork at officeRelocationVisa sponsorshipWork visaFlexible hours- ...and conversational AI systems. This person will work across applied research, model development,... ...work closely with engineering and product teams to improve model quality, speed... ...Generation Systems Build and improve machine... ...data, weak labels, self‑supervised methods,...
- ...technology firm in San Francisco is looking for a Founding Research Engineer to design and prototype core systems that convert messy health data into actionable... ...collaboration with clinicians and engineers to improve healthcare services and user experience. Competitive...
- The role As a research systems engineer, you'll train frontier-scale models and develop the methods that make continual learning work inside enterprise... ...at scale, explore cutting‑edge RL techniques, and build the tools that let us understand what's actually happening...Work at officeVisa sponsorshipRelocation package
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Research Engineer: Build Self-Improving Agent Systems. Be the first to apply!
- ai research engineer San Francisco, CA
- research software engineer San Francisco, CA
- junior machine learning research engineer San Francisco, CA
- deep learning research engineer San Francisco, CA
- senior research engineer San Francisco, CA
- research programmer San Francisco, CA
- research assistant engineering San Francisco, CA
- research engineer San Francisco, CA
- signing agent San Francisco, CA
- work from home chat agent San Francisco, CA

