Staff Software Engineer- AI Agent Evaluations

$217.57k - $271k

Full-time

ID.me

COMPANY OVERVIEW

ID.me is the next-generation digital identity wallet that simplifies how individuals securely prove their identity online. Consumers can verify their identity with ID.me once and seamlessly login across websites without having to create a new login and verify their identity again. Over 152 million users experience streamlined login and identity verification with ID.me at 20 federal agencies, 45 state government agencies, and 70+ healthcare organizations. More than 600+ consumer brands use ID.me to verify communities and user segments to honor service and build more authentic relationships. ID.me’s technology meets the federal standards for consumer authentication set by the Commerce Department and is approved as a NIST 800-63-3 IAL2 / AAL2 credential service provider by the Kantara Initiative. ID.me is committed to “No Identity Left Behind” to enable all people to have a secure digital identity. To learn more, visit [ ID.me is a full-time, in-office culture. Unless a specific job description explicitly states otherwise, all roles are on-site five days per week at one of our offices in McLean, VA; Mountain View, CA; New York City, NY; or Tampa, FL. Certain roles — such as field-based sales or other remote-by-design positions — may have different work arrangements as noted in their individual postings. At ID.me, we embrace the thoughtful use of AI tools in our daily work and there are even occasions where we leverage AI in our hiring process. However, during the interview process, we want to understand your individual skills and experiences. Therefore, we have guidelines on how AI can be appropriately used during your application and interviews which can be found here [

ABOUT THE ROLE

This Staff Engineer role sits at the intersection of engineering, applied AI, testing and developer experience. You will define and lead the discipline of testing AI agents, evaluating LLM behavior, and ensuring the reliability of agentic systems operating in production. It requires deep engineering rigor, original thinking about what "correctness" means for non-deterministic systems, and the ability to build eval infrastructure and developer tooling that the entire engineering org depends on. Expert in building and maintaining Retrieval-Augmented Generation (RAG) pipelines, with a deep focus on strategic data chunking and data quality enforcement. Experience in establishing pre-retrieval data quality gates to optimize vector search accuracy, minimize retrieval-induced noise, and significantly reduce LLM hallucination rates in production-deployed agent systems. You will establish quality standards for how ID.me ships AI-powered features safely, mentor engineers across teams on AI testing best practices, and partner directly with product and platform teams to embed quality into every stage of agent development.

WHAT YOU'LL DO

* Define AI Quality Standards: Own the framework for how ID.me evaluates, validates, and monitors AI agents — from prompt-based features to fully autonomous multi-step workflows. * Build Eval Infrastructure: Design and maintain evaluation pipelines for LLM outputs, agent behavior, tool use, and multi-turn interactions across development, staging, and production environments. * Production Observability for Agents: Instrument agentic systems for behavioral drift, regression, and failure modes that traditional metrics miss — latency, correctness, hallucination rate, tool misuse, and policy adherence. * Agentic Test Strategy: Lead the design of test suites that handle non-determinism — red-teaming agents, golden dataset construction, LLM-as-judge pipelines, and property-based testing for AI outputs. * Champion Developer Experience: Build the internal tooling, feedback loops, and testing workflows that make it fast and safe for engineers to develop and ship AI features with confidence. Reduce friction in the agent development inner loop — local testing, fast eval runs, and clear signal on regressions. * Drive AI-First Engineering Culture: Raise the quality bar across the engineering org by establishing patterns, tooling, and education for how teams write, test, and deploy AI features responsibly. * Cross-Team Collaboration: Partner with Security, Platform, Product, and AI/ML teams to embed quality gates into agent development workflows. * Mentorship: Guide senior and mid-level engineers through evaluation design, observability strategy, and testing approaches specific to AI systems.

BASIC QUALIFICATIONS

Bachelor's degree in Computer Science, Engineering, or equivalent experience
8+ years building and operating production software systems
Demonstrated experience evaluating or testing LLM-powered features or

autonomous agents in production * Proficiency with AI-assisted development tools (Claude Code, Cursor, or equivalent) — you build with AI every day

Strong backend engineering fundamentals in Python, Java, Go, or equivalent
Experience designing test infrastructure, CI/CD quality gates, or evaluation

pipelines at scale * Experience improving developer experience — building internal tooling, reducing toil, or accelerating engineering workflows * Proven ability to lead cross-team technical initiatives and influence engineering standards * Strong written and verbal communication across engineering, product, and leadership * Experience building eval frameworks for LLM agents (e.g., correctness graders, LLM-as-judge, human-in-the-loop evals, benchmark dataset curation) * Familiarity with agentic frameworks (Claude API / Anthropic SDK, BrainTrust, LangChain, LangGraph, CrewAI, or similar) * Production monitoring experience for AI systems: behavioral drift detection, output sampling, shadow scoring * Red-teaming or adversarial testing experience for AI models or agents

PREFERRED QUALIFICATIONS

Background in identity verification, fraud detection, or regulated industries
Familiarity with Anthropic's model evaluation methodology or similar

published eval research * Experience with observability tooling (Datadog, OpenTelemetry) applied to AI workloads * Track record of building developer tooling or platforms that other teams adopt widely The annual base salary listed does not include a company bonus, incentive for sales roles, equity and benefits which will be determined based on experience, skills, education, relevant training, geographic location and role. ID.me offers comprehensive medical, dental, vision, health savings account, flexible spending accounts (medical, limited purpose, dependent care, commuter benefit accounts), basic and voluntary life and AD&D insurance, 401(k) with company match, parental leave, ability to participate in unlimited paid time off subject to the terms and conditions of the PTO policy, including 8 company wide holidays, short and long-term disability insurance, accident and critical illness insurance, referral bonus policy, employee assistance program, pet insurance, travel assistant program, wellbeing and childcare discounts, benefit advocates, and a learning and development benefit. Final offers may vary from the amount listed based on qualifications, professional experiences, skills, education, relevant training, geographic location, and other job related factors. Mountain View, CA Pay Range

$217,565—$271,000 USD

ID.me maintains a work environment free from discrimination, where employees are treated with dignity and respect. All ID.me employees share in the responsibility for fulfilling our commitment to equal employment opportunity. ID.me does not discriminate against any employee or applicant on the basis of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. ID.me adheres to these principles in all aspects of employment, including recruitment, hiring, training, compensation, promotion, benefits, social and recreational programs, and discipline. In addition, ID.me's policy is to provide reasonable accommodation to qualified employees who have protected disabilities to the extent required by applicable laws, regulations and ordinances where a particular employee works. Upon request we will provide you with more information about such accommodations. Please review our Privacy Policy, including our CCPA policy, at id.me/privacy [ If you provide ID.me with any personally identifiable information you confirm that you have read and agree to be bound by the terms and conditions set out in our Privacy Policy. ID.me participates in E-Verify.

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Staff Software Engineer- AI Agent Evaluations in Mountain View, CA vacancy

Senior Staff Software Engineer
$214k - $289.5k
...Come join the Intuit as a Sr Staff Software Engineer and help us power prosperity around the world... ...managers and designers by leveraging AI & relevant technology frameworks. Responsibilities... ...of AI technologies Understands evaluation tools to validate and measure the...
Suggested
Temporary work
Work experience placement
Intuit
Mountain View, CA
2 days ago
Staff Product Manager - AI Agent Evaluations
$185k - $250k
Workato, located in Palo Alto, California, is seeking a Staff Product Manager to define AI evaluations and improve the agent experience. This role requires strong product management skills, with over 7 years of experience, particularly in AI/ML systems. The position offers...
Suggested
Flexible hours
Workato
Palo Alto, CA
3 days ago
Senior Software Engineer, Autonomy Evaluation
$144.7k - $221.4k
...introspect autonomous driving software performance at... ...developers and systems engineers. Design and implement analysis... ...stack, including evaluation of perception, prediction... ...Background in modeling agent interaction and... ...Experience leveraging AI‑assisted development and...
Suggested
Local area
Remote work
Relocation
Relocation package
Flexible hours
General Motors
Sunnyvale, CA
1 day ago
Staff Backend Software Engineer
$197k - $266.5k
...thinkers. In pursuit of becoming AI-native, we recognize that... ...and implementation of durable software solutions to solve critical... ...technology Coach and mentor junior engineers on best practices of software... ...in production and evaluate their impact on software performance...
Suggested
Work experience placement
Intuit
Mountain View, CA
4 days ago
Staff Software Engineer (Planning & Controls)
$218.8k - $335.3k
...intuitive design, intelligent software, and next-generation safety... ...Controller team within Embodied AI. We formulate and solve... ...technical reviews and drive software engineering best practices across the... ...toparticipatein a company vehicle evaluation program, through which you...
Suggested
Work experience placement
Local area
Remote work
Flexible hours
General Motors
Mountain View, CA
5 days ago
Staff Software Engineer - DevOps/VCU-CCU Infotainment Systems
$189.3k - $290.7k
## Staff Software Engineer - DevOps/VCU-CCU Infotainment SystemsApplyremote type: Hybridlocations: Mountain... ..., and root cause analysis* Leverage AI/ML techniques for predictive build... ...to participate in a company vehicle evaluation program, through which you will be assigned...
Remote work
Relocation package
General Motors
Mountain View, CA
4 days ago
Staff Software Engineer
$264.55k - $300k
...degree in Computer Science, Engineering, Computer Information Systems... ...experience in the job offered or in a Software Engineer-related occupation.... ..., we are a pioneering AI lab with exceptional interdisciplinary... ...simulation, and performance evaluation Develop robust evaluation...
Full time
Work at office
Google Inc.
Mountain View, CA
5 days ago
Staff Software Engineer, Learned & Hybrid Behavior Planning
$240k - $265k
...an artificial intelligence (AI) powered technology stack... ...its commercial self-driving software to develop, test and deploy... ...Defense. We are looking for a Staff Software Engineer to help shape how learned... ..., requirements, evaluation, and deployment. Work on learned...
Visa sponsorship
Omaze
Mountain View, CA
3 days ago
Senior/Staff Software Engineer, Behavior Verification Mountain View, California (HQ)
$176.4k - $319.72k
Senior/Staff Software Engineer, Behavior Verification Who We Are Nuro is a self-driving technology company... ...driver, combining cutting‑edge AI with automotive‑grade hardware. Nuro licenses... ...for implementing metrics that evaluate the end‑to‑end behavior of the Nuro Driver...
Odd job
Work experience placement
Nuro, Inc.
Mountain View, CA
5 days ago
Senior Staff Software Engineer - High Performance GPU Inference Systems
$248.71k - $292.6k
About Groq Groq delivers fast, efficient AI inference. Our LPU-based system powers... ...anything is possible. Build fast. Sr. Staff Software Engineer - High Performance GPU Inference... ...information is for accommodation requests only. Evaluation of requests for reasonable...
I did my part and supported the Regular Toilet
Palo Alto, CA
3 days ago
Staff Software Engineer, YouTube Ads Marketplace Optimization
$207k - $301k
Staff Software Engineer, YouTube Ads Marketplace Optimization Mountain View, CA, USA Qualifications... ...infrastructure (model deployment, model evaluation, data processing, debugging, fine... ...). Experience integrating generative AI tools or LLM interfaces into workflows...
Google Inc.
Mountain View, CA
5 days ago
Staff Software Engineer, Search Quality
$165.3k - $219.68k
...and running the world's best data and AI infrastructure platform so our customers... ...search, and data exploration. As a Staff Software Engineer for Search Quality, you will drive the... ...technical direction of ranking, relevance, evaluation, and quality initiatives across...
Local area
Worldwide
Menlo Ventures
Mountain View, CA
2 days ago
Staff Software Engineer, Applied Research, Foundation User Models
$197k - $291k
Staff Software Engineer, Applied Research, Foundation User Models corporate_fare Google place Mountain... ...design (e.g., model deployment, model evaluation, data processing, debugging, fine-... ...use by other teams and products. The AI and Infrastructure team is redefining...
Full time
Immediate start
Worldwide
Google Inc.
Mountain View, CA
2 days ago
Senior Staff Software Engineer, TPU Performance
$262k - $365k
Senior Staff Software Engineer, TPU Performance corporate_fare Google place Sunnyvale, CA, USA Apply... ...(e.g., model deployment, model evaluation, data processing, debugging, fine tuning... ...workloads using PyTorch and JAX. The AI and Infrastructure team is redefining...
Worldwide
Google Inc.
Sunnyvale, CA
5 days ago
Senior Staff Software Engineer, Ads Surfaces & Experiences
$262k - $365k
Senior Staff Software Engineer, YouTube Ads Surfaces and Experiences corporate_fare YouTube place Mountain... ...(e.g., model deployment, model evaluation, data processing, debugging, fine tuning... ...and technological shifts, such as new AI capabilities. As an Senior Staff...
Full time
Temporary work
Shift work
Google Inc.
Mountain View, CA
3 days ago
Senior Staff Software Engineer, DataX
Senior Staff Software Engineer, DataX Become a part of Intuit's "Builder Catalysts" community as a Senior... .... We are dedicated to building AI‑native experiences from the ground up,... ...limitations of AI technologies. Understands evaluation tools to validate and measure the...
Temporary work
Work experience placement
ATX Venture Partners
Mountain View, CA
4 days ago
Staff Software Engineer-Front End
$197k - $266.5k
Staff Software Engineer-Front End Category Software Engineering Location Mountain View, California... ...imaginative thinkers. In pursuit of becoming AI-native, we recognize that fostering a... ...guidance to the team by analyzing, evaluating, and prioritizing technical issues/...
Work experience placement
Worldwide
ATX Venture Partners
Mountain View, CA
4 days ago
Senior Staff Software Engineer, AI/ML GenAI, Google Ads
$262k - $365k
Senior Staff Software Engineer, AI/ML GenAI, Google Ads Google Mountain View, CA, USA Advanced Experience owning outcomes and decision making... ...industry-scale ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning). 5 years of...
Full time
Google Inc.
Mountain View, CA
5 days ago
Staff Software Engineer, Autonomy Evaluation - Remote
A leading automotive technology company is seeking an experienced professional to lead evaluation for autonomous driving software. This role involves defining metrics and analyses, leading cross-functional teams, and developing innovative statistical methods. Required...
Remote job
Work at office
Local area
General Motors
Sunnyvale, CA
2 days ago
Senior Staff Software Engineer, TPU Performance
$262k - $365k
...experience. 8 years of experience in software development. 7 years of... ....g., model deployment, model evaluation, data processing, debugging,... ...Master’s degree or PhD in Engineering, Computer Science, or a related... ...using PyTorch and JAX. The AI and Infrastructure team is...
Worldwide
Google
Sunnyvale, CA
3 days ago
Staff Software Engineer, Machine Learning Compilers, Edge TPU
$197k - $291k
Staff Software Engineer, Machine Learning Compilers, Edge TPU Google Mountain View, CA, USA ; Kirkland... ...(e.g., model deployment, model evaluation, etc.). Preferred qualifications: Master... ...Our team combines the best of Google AI, Software, and Hardware to create radically...
Full time
Temporary work
Google Inc.
Mountain View, CA
5 days ago
Staff Software Engineer
...how people discover, evaluate, and purchase products... ...Nectar, we're building the AI-native social... ...for both sides. Our AI agents listen in real time, surface... ...ex-Meta product and engineering leaders, we work with... ...Role We're looking for a Staff Software Engineer to build and...
Shift work
Nectar Inc
Palo Alto, CA
3 days ago
Staff Software Engineer, DevAI
$251k
...build the next generation of AI-powered developer tools... ...on transforming the Software Development Life Cycle (SDLC... ...Generative AI and multi-agent frameworks, enabling our engineering teams to build, test, and... ...Develop data pipelines and evaluation strategies to deploy...
Full time
Remote work
DiversityJobs
Mountain View, CA
2 days ago
Staff/Senior Software Engineer, Offboard Infrastructure
$183.83k - $333.93k
...driver, combining cutting‑edge AI with automotive‑grade hardware.... ...connected future. About the Role Our software team is growing, and we are looking for talented engineers to join us and be instrumental... ...platform supports the autonomy evaluation infrastructure by providing...
Icehouseventures
Mountain View, CA
1 day ago
Staff Software Engineer, Onboard Infrastructure
$235k - $352k
...combining cutting‑edge AI with automotive-grade hardware... .... About the Role As a Staff Technical Lead on... ..., and autonomy software performance. You will... ...multiple stakeholders, mentor engineers, and deliver robust systems... ...define requirements, evaluate and integrate next‑...
Kindredventures
Mountain View, CA
1 day ago
Staff Software Engineer, Generative AI, Core Machine Learning
$207k - $300k
Staff Software Engineer, Generative AI, Core Machine Learning corporate_fare Google place Mountain View, CA,... ...infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine... ...transform the development of AI agents from an artisanal craft into an...
Full time
Google Inc.
Mountain View, CA
5 days ago
Staff Software Engineer, Planning Infrastructure
$240k - $265k
...an artificial intelligence (AI) powered technology stack purpose... ...its commercial self-driving software to develop, test and deploy... ...are looking for a Senior or Staff Software Engineer to build infrastructure,... ...Planning team develop, debug, evaluate, and deploy autonomy...
Visa sponsorship
Omaze
Mountain View, CA
1 day ago
Staff Software Engineer - Agent Architecture
$225k - $285k
...Responsibilities We build agentic AI products that our... ...modalities. These agents sit on top of the same... .... We're looking for a Staff Engineer to own the... ...are designed, tested, evaluated, and operated in a regulated... ...requirements8+ years of software engineering experience...
Work at office
Work from home
Flexible hours
Shift work
PayNearMe, Inc.
Santa Clara, CA
9 days ago
Member of Technical Staff (Software Engineer, Applied AI)
...Location Type Hybrid Department AI Perplexity is looking for an Applied ML Engineer to design, build, and iterate on... ...ranking and surfacing) Rigorously evaluate LLM/ML models with both offline and... ..., NLP, and/or ranking. Strong software engineering skills (Python, production...
Full time
Pantera Capital
Palo Alto, CA
4 days ago
Member of Technical Staff (Software Engineer, Computer)
$220k - $405k
...Type Full time Department Product Engineering Compensation $220K - $405K • Offers... ...take into account the nuances of AI, working with agents, context, evaluation, personalization and the ground truth... ...4+ years of professional software engineering experience. Strong experience...
Full time
Local area
Pantera Capital
Palo Alto, CA
5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Software Engineer- AI Agent Evaluations. Be the first to apply!