Staff Back End Engineer, Evals - Hazel AI
$275k - $325kAltruist
About Altruist
Altruist is transforming the multi-trillion dollar wealth management industry by building an AI platform for wealth professionals. We partner with financial advisors nationwide, empowering them to grow, optimize time and resources, and deliver superior outcomes for their clients.
We're looking for exceptional talent to help us achieve our mission of making financial advice better, more affordable, and accessible to all. If you're passionate about challenging the status quo and want to do the most important work of your life, we'd love to meet you!
But first, our values
Kindness - Kindness doesn’t just equal niceness. We listen to understand. We embrace, and encourage healthy debate and diverse perspectives. We approach conflict openly, honestly, and respectfully. Brilliance - Humility is the skill we’re most proud of and possessing a growth mindset is always top of mind. We take ownership in everything we touch; regularly using our unique superpowers to reach a common goal as a team. We succeed and fail as one. Grit - When challenges arise, we stay laser focused on achieving our mission and finding a way forward, even when it’s hard. We are nimble and maintain a sense of urgency, swiftly adapting to change and overcoming obstacles.
About Hazel:
Hazel.ai is building the AI engine for wealth management that unlocks 10x growth, efficiency and value for financial advisors and their clients in a regulated industry. Since its launch last September, Hazel has organically and rapidly grown its user base.
Hazel is a part of Altruist’s broader mission to make financial advice better, more affordable, and accessible to all.
This role is hybrid, with four in-office days per week at our San Francisco FiDi location.
The opportunity:
Architect our evaluation platform from first principles – the observability, scoring, golden datasets, verification agents, and CI/CD integration that define standards of quality. You'll work shoulder-to-shoulder with backend engineers, product managers, and a growing bench of subject matter experts, including practicing CFPs, CPAs, and tax planners, to translate fiduciary-grade requirements into automated quality signals.
Your impact:
Design and build Hazel's evals platform end-to-end – online scoring, offline benchmarks, regression suites, LLM-as-judge pipelines, and human-in-the-loop review workflows across every Hazel surface.
Build production observability and monitoring for AI quality: hallucination rates, factual accuracy, refusal behavior, latency, cost, and domain-specific quality signals across tax planning, financial planning, investment analysis, and operational AI workflows.
Architect data curation pipelines that turn real advisor interactions into evaluation datasets – with rigorous sampling strategies, labeling protocols, dataset versioning, and the privacy and consent controls required for regulated finance.
Build and steward Hazel's golden datasets in close partnership with SMEs and a network of practicing advisors, CFPs, and tax professionals – translating their tacit expertise into precise, measurable eval criteria.
Develop LLM verification agents that catch hallucinations, computational errors, and compliance violations before they ever reach an advisor or client.
Integrate evals into our deployment pipeline so that every prompt change, model swap, harness modification, or RAG pipeline tweak runs against regression and acceptance criteria before shipping – making evals a first-class deployment gate, not a quarterly audit.
Partner with the team building Hazel's model-agnostic orchestration harness to evaluate cross-model and cross-provider performance, surface tradeoffs, and inform routing decisions across Anthropic, OpenAI, and self-hosted models.
Define quality SLOs for each Hazel surface and build alerting that catches regressions in production before our customers do – especially for high-stakes flows like tax and financial planning.
Establish Hazel's eval methodology as a defensible competitive advantage – infrastructure good enough that model upgrades from frontier labs become accelerants for us, not threats.
What you bring:
8+ years of engineering experience, with at least 2 years focused on evaluation infrastructure, model quality, fine-tuning, or ML platform work for production systems.
Deep familiarity with evaluation and scoring methodologies for modern AI systems – RAG evaluation, document processing, fine-tuned model assessment, agentic and tool-use system evaluation, LLM-as-judge frameworks, and human evaluation protocols.
Experience designing and curating golden datasets – sampling strategies, inter-rater agreement, dataset versioning, and managing the long tail of edge cases.
Comfort working across the stack – data engineering (SQL, dbt, warehouses), backend integration (APIs, async pipelines, queues), and observability tooling.
Strong communication skills. You can translate fuzzy domain requirements from advisors and SMEs into precise, measurable, automatable eval criteria – and explain quality tradeoffs clearly to engineers, product managers, and leadership.
A bias toward shipping. You believe great evals enable speed, not just safety, and you build tools that engineers actually want to use.
Bonus Points:
Prior experience at an applied AI company building evals, model quality, or applied research infrastructure.
Experience evaluating multi-step agentic workflows, tool-use systems, or RAG pipelines in production.
Familiarity with frameworks like Braintrust, Langfuse or similar — including a clear point of view on when to use which.
Background in regulated industries (financial services, healthcare, legal) where accuracy, auditability, and the cost of a wrong answer are unusually high.
Experience building human-in-the-loop labeling workflows, annotation tooling, or red-teaming programs.
Domain knowledge of wealth management, tax planning, or financial planning — or genuine excitement to learn it deeply alongside our SME bench.
San Francisco, CA salary range
$275,000—$325,000 USD
What we bring
Attracting and retaining top-tier talent is a priority. We are proud of the culture we’ve built and are cognizant of the ever-changing professional landscape. Our dynamic offering of perks and benefits are tailored for you to feel your best while doing your best.
A hybrid work schedule for most positions to promote strong, in-person collaboration.
Stunning, amenity-filled office spaces in Culver City, CA, San Francisco, CA, and Dallas, TX. Our offices are intentionally designed for comfort, collaboration, and productivity.
Competitive pay and equity for eligible positions.
Premium healthcare, dental, and vision insurance plans (HMO and PPO).
401k savings plan with a 4% match and immediate vesting.
16 week paid parental leave after one year of employment.
Professional growth and development opportunities including an employee mobility program and an annual L&D budget allocation for each employee.
Company perks program (includes discounts on pet insurance, fitness, cell phone plans, and travel, etc.).
Financial guidance program (includes counseling on navigating debt, tracking personal spend, saving and planning goals, home-purchasing preparedness, etc.).
One month work from anywhere policy (with the exception of a few countries).
Total compensation includes a competitive benefits package, along with equity in the form of Stock Options (ISOs) for eligible roles. For salaried positions, a salary offer will be determined by a number of factors including experience, skill level, internal pay equity, geographic location, and other relevant business considerations. We review all employee pay and compensation programs regularly to ensure fair, equitable, and competitive pay. At Altruist, we are committed to providing fair, equitable, and competitive compensation by leveraging market data to inform our pay bands. Base salaries will be reviewed at regular intervals throughout the year, typically in conjunction with performance review cycles. By evaluating compensation on a regular basis, we are able to reward high performance and ensure all employees have opportunities for growth.
Don’t meet every single requirement? Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification. At Altruist we are dedicated to building a diverse, inclusive, and authentic workplace, so if you’re excited about this role, but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyways. You may be just the right candidate for this or other roles.
Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Staff Back End Engineer, Evals - Hazel AI in San Francisco, CA vacancy
$200k - $250k
...trillion dollar wealth management industry by building an AI platform for wealth professionals. We partner with... ...swiftly adapting to change and overcoming obstacles. About Hazel Hazel is building the AI engine for wealth management that helps unlock 10x growth,...SuggestedWork at officeImmediate start$230k - $385k
...organization by applying cutting-edge AI models to real-world... .... From customer operations to engineering, we develop an ecosystem of automation... ...help to design and build an evals infrastructure that measures... ...termination of employment or end of assignment; and maintain...SuggestedInternship$175k - $225k
...Senior Backend Engineer In person 5 days/week in San Francisco, Boston, MA, New York.... ...that power LangChain's observability and evals platform. You will work on the core services... ...developers to monitor and evaluate their AI applications at scale. While the focus is...SuggestedWork at officeFlexible hours$200k - $250k
...trillion dollar wealth management industry by building an AI platform for wealth professionals. We partner with... ...swiftly adapting to change and overcoming obstacles. About Hazel Hazel is building the AI engine for wealth management that helps unlock 10x growth,...SuggestedWork at officeImmediate start$170k - $195k
...agents ubiquitous. We provide the agent engineering platform and open source frameworks developers... ...of developers worldwide and powers AI teams at companies like Replit, Clay, Cloudflare... ...that power LangChain’s observability and evals platform. You will work on the core...SuggestedWorldwideFlexible hours$170k - $195k
...A tech company specializing in AI is looking for a Senior Backend Engineer to build backend systems for their observability and evaluation platform. This role requires over 5 years of experience in backend engineering and proficiency in languages like Python or Go. The...$140k - $175k
...ubiquitous. We provide the agent engineering platform and open source... ...developers worldwide and powers AI teams at companies like Replit... ...commercial observability and evals platform product. In this role... ...enterprise customers, developer end-users and internal stakeholders...WorldwideFlexible hours- A leading accommodation service provider located in San Francisco is looking for a Staff Software Engineer in Listings & Host Tools. The ideal candidate will have over 10 years of experience in the tech industry, with skills in team leadership and product development. Responsibilities...
$140k - $260k
...Profound AI Marketing Platform Profound is the marketing platform... ...measurable growth channel. Backed by Lightspeed, Sequoia, Kleiner... ...Do Build core workflow engine primitives used to orchestrate... ...and comfortable owning services end-to-end in production Solid with...Work at officeVisa sponsorshipShift work$160k - $180k
...Role Summary As a Senior Software Engineer, you’ll own major parts of our AI stack. You’ll prototype zero-to-one... ...ability to build production systems end to end ~ Experience shipping ML or... ...retrieval, embeddings, experimentation, and evals ~ Ability to design multi-step...Full timeContract workTemporary workWork experience placementWork at office$180k - $280k
...services is a $1T/year industry. If AI has automated coding, marketing is next. The Engineering Challenge Fully autonomous... ...observability, and LLM agents, evals, tool-use systems, retrieval/... ..." to "why did this Postgres row end up in the wrong state?" You are...Work at officeRelocation packageShift work$170k - $240k
...Description At Commure, we're building the AI Operating System for healthcare, the... ...of AI features from prompt design and evals through production infrastructure, and... .... We're looking for a Senior Backend Engineer who takes ownership end-to-end, moves fast, and wants their...Full timeWork at officeImmediate start- ...Senior Backend Engineer At Commure, we're building the AI Operating System for healthcare, the foundation that defines... ...AI features from prompt design and evals through production infrastructure,... ...Engineer who takes ownership end-to-end, moves fast, and wants their...Full timeImmediate start
$150k - $270k
...you.com is an AI-powered search and productivity platform designed to empower users with personalized, efficient, and trustworthy search... ...you.com, we are on a mission to create the most helpful search engine in the world—one that prioritizes transparency, privacy, and user...Full timeImmediate startRemote workWork from homeFlexible hours$195k - $280k
...edge thinking with the long-term backing of leading insurer, State Farm... ...innovation company that engineers advanced risk prediction and prevention... ..., digital & increasingly AI-native insurance platform.... ...substantial product features end-to-end # Evaluate and integrate...Remote jobExtra incomeLocal areaWork from homeHome officeFlexible hours- ...Build is creating the agentic AI stack for the built world. We... ...with far more leverage. We are backed by leading investors and operators... ...role We are looking for an AI engineer, core to build the... ...calling, structured outputs, RAG, evals, tracing, or agent frameworks....
- ...work. We partner with leading AI labs and enterprises to provide... ...About the Role As a Research Engineer at Mercor, you’ll work at the intersection... .... You’ll design and run evals, build rubrics and scorers, and... ...operate LLM evaluation systems end‑to‑end runs, scoring,...Work at office
$150k - $200k
...our San Francisco office. About the Role: We're seeking a Staff Backend Engineer to join our remotely distributed Commerce team, someone who... ...) : Lead the adoption of Spec-Driven Development (SDD) with AI tools, ensuring engineers consistently turn well-defined specs...Work at officeImmediate startRemote workFlexible hours$255k - $405k
...organization by applying cutting-edge AI models to real-world... .... From customer operations to engineering, we develop an ecosystem of automation... ...help to design and build an evals infrastructure that measures... ...termination of employment or end of assignment; and maintain...InternshipWork at officeLocal areaRelocation packageFlexible hours$155k - $195k
...We help developers build mission-critical AI applications across the entire agent... ...organization. Founded in 2023, LangChain powers top engineering teams at companies like Replit, Lovable,... ...Build reusable components and front-end libraries for future use Translate...- ...Braintrust is the AI observability platform. By connecting evals and observability in one workflow, Braintrust gives... ...the role We’re looking for a backend engineer who’s excited to build the... ...product engineers to ship polished, end-to-end features that solve real problems...Flexible hours
- ...At Horizon3.ai, we’re building a team of bold thinkers, problem solvers, and Learn-it-Alls. We’re looking for individuals who: Love solving tough problems , especially the ones no one else has cracked. Thrive in high-performing teams that celebrate success and lift each...InternshipRemote work
- ...A leading streaming service is looking for a Staff Software Engineer in New York who will define architectural direction for distributed systems, lead cross-team initiatives, and mentor engineers. The candidate should have deep expertise in backend development, particularly...
$175k - $300k
...About Trove Trove is developing an AI associate for financial firms - think enterprise search & agents for private equity, hedge... ...great founding team in SF. Shivaal Roy (CTO) was a founding engineer at Glean ($100M+ ARR today) and managed the Assistant and Search...- ...As a Senior Back End Engineer, you'll be responsible for designing and implementing robust backend systems that power intelligent AI agents for Fortune 500 companies. You'll work directly with AWS Bedrock to build sophisticated user interfaces and engineer enterprise-...Remote workFlexible hours
$180k - $230k
...B2B SaaS startup in the AI-powered sales automation... ...hiring its first Founding Engineer. You'll join the CTO as... ...performance. Evals & Self-Improvement: Pipelines... ...quality and feed learnings back automatically, so demos... ...something demo-able by end of day. Trace a quality...Remote work$175k - $240k
...ubiquitous. We build the foundation for agent engineering in the real world, helping developers move from prototypes to production-ready AI agents that teams can rely on. We began as... ...Build reusable components and front-end libraries for future use Translate designs...Work at officeFlexible hours- ...Infrastructure for the World’s Largest Dataset You’ll be our seventh engineering hire. You’ll have full ownership over major features, play a... ...80/20 solutions. You’re great at “figuring it out”: Recall.ai is a low-structure, high-trust environment. There’s minimal...Immediate start
- ...Role We're hiring Senior Backend + Applied AI Engineers to build the core systems that power... ...reliable. You think instinctively in terms of evals, model regression tests, traceability,... ...use. You’ll own critical infrastructure end‑to‑end: from data ingestion and storage to...Work at officeLocal areaRelocation package
- ...AI Backend Engineer Experience: 1-3 years shipping production code (or exceptional... ...built a certain way and push back when complexity isn't... ...think instinctively in terms of evals, model regression tests, traceability... ...– shipping features end‑to‑end with increasing scope...Full timeWork at officeLocal areaImmediate startRelocation package
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Back End Engineer, Evals - Hazel AI. Be the first to apply!
Related searches
- staff automation engineer San Francisco, CA
- staff data engineer San Francisco, CA
- research assistant engineering San Francisco, CA
- assistant engineer San Francisco, CA
- staff engineer San Francisco, CA
- assistant mechanical engineer San Francisco, CA
- software engineer staff San Francisco, CA
- assistant engineering manager San Francisco, CA
- senior staff systems engineer San Francisco, CA
- assistant civil engineer San Francisco, CA



