Staff Back End Engineer, Evals - Hazel AI

$275k - $325k

Altruist

About Altruist Altruist is transforming the multi-trillion dollar wealth management industry by building an AI platform for wealth professionals. We partner with financial advisors nationwide, empowering them to grow, optimize time and resources, and deliver superior outcomes for their clients. We're looking for exceptional talent to help us achieve our mission of making financial advice better, more affordable, and accessible to all. If you're passionate about challenging the status quo and want to do the most important work of your life, we'd love to meet you! But first, our values Kindness - Kindness doesn’t just equal niceness. We listen to understand. We embrace, and encourage healthy debate and diverse perspectives. We approach conflict openly, honestly, and respectfully.

Brilliance - Humility is the skill we’re most proud of and possessing a growth mindset is always top of mind. We take ownership in everything we touch; regularly using our unique superpowers to reach a common goal as a team. We succeed and fail as one.

Grit - When challenges arise, we stay laser focused on achieving our mission and finding a way forward, even when it’s hard. We are nimble and maintain a sense of urgency, swiftly adapting to change and overcoming obstacles. About Hazel: Hazel.ai is building the AI engine for wealth management that unlocks 10x growth, efficiency and value for financial advisors and their clients in a regulated industry. Since its launch last September, Hazel has organically and rapidly grown its user base. Hazel is a part of Altruist’s broader mission to make financial advice better, more affordable, and accessible to all. This role is hybrid, with four in-office days per week at our San Francisco FiDi location. The opportunity: Architect our evaluation platform from first principles – the observability, scoring, golden datasets, verification agents, and CI/CD integration that define standards of quality. You'll work shoulder-to-shoulder with backend engineers, product managers, and a growing bench of subject matter experts, including practicing CFPs, CPAs, and tax planners, to translate fiduciary-grade requirements into automated quality signals. Your impact: Design and build Hazel's evals platform end-to-end – online scoring, offline benchmarks, regression suites, LLM-as-judge pipelines, and human-in-the-loop review workflows across every Hazel surface. Build production observability and monitoring for AI quality: hallucination rates, factual accuracy, refusal behavior, latency, cost, and domain-specific quality signals across tax planning, financial planning, investment analysis, and operational AI workflows. Architect data curation pipelines that turn real advisor interactions into evaluation datasets – with rigorous sampling strategies, labeling protocols, dataset versioning, and the privacy and consent controls required for regulated finance. Build and steward Hazel's golden datasets in close partnership with SMEs and a network of practicing advisors, CFPs, and tax professionals – translating their tacit expertise into precise, measurable eval criteria. Develop LLM verification agents that catch hallucinations, computational errors, and compliance violations before they ever reach an advisor or client. Integrate evals into our deployment pipeline so that every prompt change, model swap, harness modification, or RAG pipeline tweak runs against regression and acceptance criteria before shipping – making evals a first-class deployment gate, not a quarterly audit. Partner with the team building Hazel's model-agnostic orchestration harness to evaluate cross-model and cross-provider performance, surface tradeoffs, and inform routing decisions across Anthropic, OpenAI, and self-hosted models. Define quality SLOs for each Hazel surface and build alerting that catches regressions in production before our customers do – especially for high-stakes flows like tax and financial planning. Establish Hazel's eval methodology as a defensible competitive advantage – infrastructure good enough that model upgrades from frontier labs become accelerants for us, not threats. What you bring: 8+ years of engineering experience, with at least 2 years focused on evaluation infrastructure, model quality, fine-tuning, or ML platform work for production systems. Deep familiarity with evaluation and scoring methodologies for modern AI systems – RAG evaluation, document processing, fine-tuned model assessment, agentic and tool-use system evaluation, LLM-as-judge frameworks, and human evaluation protocols. Experience designing and curating golden datasets – sampling strategies, inter-rater agreement, dataset versioning, and managing the long tail of edge cases. Comfort working across the stack – data engineering (SQL, dbt, warehouses), backend integration (APIs, async pipelines, queues), and observability tooling. Strong communication skills. You can translate fuzzy domain requirements from advisors and SMEs into precise, measurable, automatable eval criteria – and explain quality tradeoffs clearly to engineers, product managers, and leadership. A bias toward shipping. You believe great evals enable speed, not just safety, and you build tools that engineers actually want to use. Bonus Points: Prior experience at an applied AI company building evals, model quality, or applied research infrastructure. Experience evaluating multi-step agentic workflows, tool-use systems, or RAG pipelines in production. Familiarity with frameworks like Braintrust, Langfuse or similar — including a clear point of view on when to use which. Background in regulated industries (financial services, healthcare, legal) where accuracy, auditability, and the cost of a wrong answer are unusually high. Experience building human-in-the-loop labeling workflows, annotation tooling, or red-teaming programs. Domain knowledge of wealth management, tax planning, or financial planning — or genuine excitement to learn it deeply alongside our SME bench. San Francisco, CA salary range $275,000—$325,000 USD What we bring Attracting and retaining top-tier talent is a priority. We are proud of the culture we’ve built and are cognizant of the ever-changing professional landscape. Our dynamic offering of perks and benefits are tailored for you to feel your best while doing your best. A hybrid work schedule for most positions to promote strong, in-person collaboration. Stunning, amenity-filled office spaces in Culver City, CA, San Francisco, CA, and Dallas, TX. Our offices are intentionally designed for comfort, collaboration, and productivity. Competitive pay and equity for eligible positions. Premium healthcare, dental, and vision insurance plans (HMO and PPO). 401k savings plan with a 4% match and immediate vesting. 16 week paid parental leave after one year of employment. Professional growth and development opportunities including an employee mobility program and an annual L&D budget allocation for each employee. Company perks program (includes discounts on pet insurance, fitness, cell phone plans, and travel, etc.). Financial guidance program (includes counseling on navigating debt, tracking personal spend, saving and planning goals, home-purchasing preparedness, etc.). One month work from anywhere policy (with the exception of a few countries). Total compensation includes a competitive benefits package, along with equity in the form of Stock Options (ISOs) for eligible roles. For salaried positions, a salary offer will be determined by a number of factors including experience, skill level, internal pay equity, geographic location, and other relevant business considerations. We review all employee pay and compensation programs regularly to ensure fair, equitable, and competitive pay. At Altruist, we are committed to providing fair, equitable, and competitive compensation by leveraging market data to inform our pay bands. Base salaries will be reviewed at regular intervals throughout the year, typically in conjunction with performance review cycles. By evaluating compensation on a regular basis, we are able to reward high performance and ensure all employees have opportunities for growth. Don’t meet every single requirement? Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification. At Altruist we are dedicated to building a diverse, inclusive, and authentic workplace, so if you’re excited about this role, but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyways. You may be just the right candidate for this or other roles.

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Staff Back End Engineer, Evals - Hazel AI in San Francisco, CA vacancy

Senior Back End Engineer, Hazel AI
$200k - $250k
...trillion dollar wealth management industry by building an AI platform for wealth professionals. We partner with... ...swiftly adapting to change and overcoming obstacles. About Hazel Hazel is building the AI engine for wealth management that helps unlock 10x growth,...
Suggested
Work at office
Immediate start
Altruist
San Francisco, CA
1 day ago
Backend Software Engineer (Evals)
$230k - $385k
...organization by applying cutting-edge AI models to real-world... .... From customer operations to engineering, we develop an ecosystem of automation... ...help to design and build an evals infrastructure that measures... ...termination of employment or end of assignment; and maintain...
Suggested
Internship
OpenAI
San Francisco, CA
3 days ago
Senior Backend Software Engineer, AI Observability & Evals Platform (LangSmith)
$175k - $225k
...Senior Backend Engineer In person 5 days/week in San Francisco, Boston, MA, New York.... ...that power LangChain's observability and evals platform. You will work on the core services... ...developers to monitor and evaluate their AI applications at scale. While the focus is...
Suggested
Work at office
Flexible hours
LangChain
San Francisco, CA
5 days ago
Senior Full Stack Engineer, Hazel AI
$200k - $250k
...trillion dollar wealth management industry by building an AI platform for wealth professionals. We partner with... ...swiftly adapting to change and overcoming obstacles. About Hazel Hazel is building the AI engine for wealth management that helps unlock 10x growth,...
Suggested
Work at office
Immediate start
Altruist
San Francisco, CA
2 days ago
Senior Backend Software Engineer, Observability & Evals Platform (LangSmith)
$170k - $195k
...agents ubiquitous. We provide the agent engineering platform and open source frameworks developers... ...of developers worldwide and powers AI teams at companies like Replit, Clay, Cloudflare... ...that power LangChain’s observability and evals platform. You will work on the core...
Suggested
Worldwide
Flexible hours
LangChain
San Francisco, CA
3 days ago
Senior Backend Engineer, Observability & Evals
$170k - $195k
...A tech company specializing in AI is looking for a Senior Backend Engineer to build backend systems for their observability and evaluation platform. This role requires over 5 years of experience in backend engineering and proficiency in languages like Python or Go. The...
LangChain
San Francisco, CA
3 days ago
FullStack Engineer, Observability & Evals Platform (LangSmith)
$140k - $175k
...ubiquitous. We provide the agent engineering platform and open source... ...developers worldwide and powers AI teams at companies like Replit... ...commercial observability and evals platform product. In this role... ...enterprise customers, developer end-users and internal stakeholders...
Worldwide
Flexible hours
LangChain
San Francisco, CA
3 days ago
Staff Backend Engineer, Listings & Host Tools AI
A leading accommodation service provider located in San Francisco is looking for a Staff Software Engineer in Listings & Host Tools. The ideal candidate will have over 10 years of experience in the tech industry, with skills in team leadership and product development. Responsibilities...
airbnb, Inc.
San Francisco, CA
22 hours ago
Software Engineer, Backend, Workflow Runner
$140k - $260k
...Profound AI Marketing Platform Profound is the marketing platform... ...measurable growth channel. Backed by Lightspeed, Sequoia, Kleiner... ...Do Build core workflow engine primitives used to orchestrate... ...and comfortable owning services end-to-end in production Solid with...
Work at office
Visa sponsorship
Shift work
Profound
San Francisco, CA
1 day ago
Senior Software Engineer (Backend) ›
$160k - $180k
...Role Summary As a Senior Software Engineer, you’ll own major parts of our AI stack. You’ll prototype zero-to-one... ...ability to build production systems end to end ~ Experience shipping ML or... ...retrieval, embeddings, experimentation, and evals ~ Ability to design multi-step...
Full time
Contract work
Temporary work
Work experience placement
Work at office
Filevine
San Francisco, CA
2 days ago
Senior Fullstack/Backend Engineer
$180k - $280k
...services is a $1T/year industry. If AI has automated coding, marketing is next. The Engineering Challenge Fully autonomous... ...observability, and LLM agents, evals, tool-use systems, retrieval/... ..." to "why did this Postgres row end up in the wrong state?" You are...
Work at office
Relocation package
Shift work
daydream Labs, Inc
San Francisco, CA
3 days ago
Senior Backend Engineer, RCM AI
$170k - $240k
...Description At Commure, we're building the AI Operating System for healthcare, the... ...of AI features from prompt design and evals through production infrastructure, and... .... We're looking for a Senior Backend Engineer who takes ownership end-to-end, moves fast, and wants their...
Full time
Work at office
Immediate start
Commure
San Francisco, CA
a month ago
Senior Backend Engineer, RCM AI
...Senior Backend Engineer At Commure, we're building the AI Operating System for healthcare, the foundation that defines... ...AI features from prompt design and evals through production infrastructure,... ...Engineer who takes ownership end-to-end, moves fast, and wants their...
Full time
Immediate start
Commure
San Francisco, CA
4 days ago
Senior Back End Engineer
$150k - $270k
...you.com is an AI-powered search and productivity platform designed to empower users with personalized, efficient, and trustworthy search... ...you.com, we are on a mission to create the most helpful search engine in the world—one that prioritizes transparency, privacy, and user...
Full time
Immediate start
Remote work
Work from home
Flexible hours
Y.O.U.
San Francisco, CA
4 days ago
Senior Back End Engineer [Remote-US]
$195k - $280k
...edge thinking with the long-term backing of leading insurer, State Farm... ...innovation company that engineers advanced risk prediction and prevention... ..., digital & increasingly AI-native insurance platform.... ...substantial product features end-to-end # Evaluate and integrate...
Remote job
Extra income
Local area
Work from home
Home office
Flexible hours
Quanata
San Francisco, CA
26 days ago
AI Engineer - Harness & Evals
...Build is creating the agentic AI stack for the built world. We... ...with far more leverage. We are backed by leading investors and operators... ...role We are looking for an AI engineer, core to build the... ...calling, structured outputs, RAG, evals, tracing, or agent frameworks....
Build.com
San Francisco, CA
3 days ago
Research Engineer - Benchmarking, Evals & Failure Analysis
...work. We partner with leading AI labs and enterprises to provide... ...About the Role As a Research Engineer at Mercor, you’ll work at the intersection... .... You’ll design and run evals, build rubrics and scorers, and... ...operate LLM evaluation systems end‑to‑end runs, scoring,...
Work at office
Mercor
San Francisco, CA
4 days ago
Staff Back-End Engineer, Commerce
$150k - $200k
...our San Francisco office. About the Role: We're seeking a Staff Backend Engineer to join our remotely distributed Commerce team, someone who... ...) : Lead the adoption of Spec-Driven Development (SDD) with AI tools, ensuring engineers consistently turn well-defined specs...
Work at office
Immediate start
Remote work
Flexible hours
Taskrabbit
San Francisco, CA
17 days ago
Backend Software Engineer (Evals) - Support Automation Engineering
$255k - $405k
...organization by applying cutting-edge AI models to real-world... .... From customer operations to engineering, we develop an ecosystem of automation... ...help to design and build an evals infrastructure that measures... ...termination of employment or end of assignment; and maintain...
Internship
Work at office
Local area
Relocation package
Flexible hours
OpenAI
San Francisco, CA
more than 2 months ago
Senior Frontend Engineer, AI Observability & Evals Platform
$155k - $195k
...We help developers build mission-critical AI applications across the entire agent... ...organization. Founded in 2023, LangChain powers top engineering teams at companies like Replit, Lovable,... ...Build reusable components and front-end libraries for future use Translate...
LangChain
San Francisco, CA
16 hours ago
Software Engineer, Backend San Francisco, New York City, +more
...Braintrust is the AI observability platform. By connecting evals and observability in one workflow, Braintrust gives... ...the role We’re looking for a backend engineer who’s excited to build the... ...product engineers to ship polished, end-to-end features that solve real problems...
Flexible hours
Braintrust Data, Inc.
San Francisco, CA
4 days ago
Senior Back End Engineer
...At Horizon3.ai, we’re building a team of bold thinkers, problem solvers, and Learn-it-Alls. We’re looking for individuals who: Love solving tough problems , especially the ones no one else has cracked. Thrive in high-performing teams that celebrate success and lift each...
Internship
Remote work
Horizon3 AI, Inc.
San Francisco, CA
3 days ago
Staff Backend Architect: AI-Driven, Distributed Systems
...A leading streaming service is looking for a Staff Software Engineer in New York who will define architectural direction for distributed systems, lead cross-team initiatives, and mentor engineers. The candidate should have deep expertise in backend development, particularly...
Tubi TV
San Francisco, CA
3 days ago
Founding Backend Engineer
$175k - $300k
...About Trove Trove is developing an AI associate for financial firms - think enterprise search & agents for private equity, hedge... ...great founding team in SF. Shivaal Roy (CTO) was a founding engineer at Glean ($100M+ ARR today) and managed the Assistant and Search...
Trove
San Francisco, CA
2 days ago
Senior Back End Engineer AI Engineering San Francisco / Hybrid
...As a Senior Back End Engineer, you'll be responsible for designing and implementing robust backend systems that power intelligent AI agents for Fortune 500 companies. You'll work directly with AWS Bedrock to build sophisticated user interfaces and engineer enterprise-...
Remote work
Flexible hours
Soul of the Machine
San Francisco, CA
4 days ago
Founding Full Stack Engineer
$180k - $230k
...B2B SaaS startup in the AI-powered sales automation... ...hiring its first Founding Engineer. You'll join the CTO as... ...performance. Evals & Self-Improvement: Pipelines... ...quality and feed learnings back automatically, so demos... ...something demo-able by end of day. Trace a quality...
Remote work
Clera
San Francisco, CA
14 days ago
Senior Frontend Engineer, AI Observability & Evals Platform
$175k - $240k
...ubiquitous. We build the foundation for agent engineering in the real world, helping developers move from prototypes to production-ready AI agents that teams can rely on. We began as... ...Build reusable components and front-end libraries for future use Translate designs...
Work at office
Flexible hours
LangChain, Inc
San Francisco, CA
22 hours ago
Backend Software Engineer
...Infrastructure for the World’s Largest Dataset You’ll be our seventh engineering hire. You’ll have full ownership over major features, play a... ...80/20 solutions. You’re great at “figuring it out”: Recall.ai is a low-structure, high-trust environment. There’s minimal...
Immediate start
Recall
San Francisco, CA
2 days ago
Senior AI Backend Engineer
...Role We're hiring Senior Backend + Applied AI Engineers to build the core systems that power... ...reliable. You think instinctively in terms of evals, model regression tests, traceability,... ...use. You’ll own critical infrastructure end‑to‑end: from data ingestion and storage to...
Work at office
Local area
Relocation package
Klarity Intelligence, Inc.
San Francisco, CA
3 days ago
AI Backend Engineer
...AI Backend Engineer Experience: 1-3 years shipping production code (or exceptional... ...built a certain way and push back when complexity isn't... ...think instinctively in terms of evals, model regression tests, traceability... ...– shipping features end‑to‑end with increasing scope...
Full time
Work at office
Local area
Immediate start
Relocation package
Tensec
San Francisco, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Back End Engineer, Evals - Hazel AI. Be the first to apply!