Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Software Engineer- AI Agent Evaluations

$217.57k - $271k
Full-time

ID.me

COMPANY OVERVIEW

ID.me is the next-generation digital identity wallet that simplifies how individuals securely prove their identity online. Consumers can verify their identity with ID.me once and seamlessly login across websites without having to create a new login and verify their identity again. Over 152 million users experience streamlined login and identity verification with ID.me at 20 federal agencies, 45 state government agencies, and 70+ healthcare organizations. More than 600+ consumer brands use ID.me to verify communities and user segments to honor service and build more authentic relationships. ID.me’s technology meets the federal standards for consumer authentication set by the Commerce Department and is approved as a NIST 800-63-3 IAL2 / AAL2 credential service provider by the Kantara Initiative. ID.me is committed to “No Identity Left Behind” to enable all people to have a secure digital identity. To learn more, visit [ ID.me is a full-time, in-office culture. Unless a specific job description explicitly states otherwise, all roles are on-site five days per week at one of our offices in McLean, VA; Mountain View, CA; New York City, NY; or Tampa, FL. Certain roles — such as field-based sales or other remote-by-design positions — may have different work arrangements as noted in their individual postings. At ID.me, we embrace the thoughtful use of AI tools in our daily work and there are even occasions where we leverage AI in our hiring process. However, during the interview process, we want to understand your individual skills and experiences. Therefore, we have guidelines on how AI can be appropriately used during your application and interviews which can be found here [

ABOUT THE ROLE

This Staff Engineer role sits at the intersection of engineering, applied AI, testing and developer experience. You will define and lead the discipline of testing AI agents, evaluating LLM behavior, and ensuring the reliability of agentic systems operating in production. It requires deep engineering rigor, original thinking about what "correctness" means for non-deterministic systems, and the ability to build eval infrastructure and developer tooling that the entire engineering org depends on. Expert in building and maintaining Retrieval-Augmented Generation (RAG) pipelines, with a deep focus on strategic data chunking and data quality enforcement. Experience in establishing pre-retrieval data quality gates to optimize vector search accuracy, minimize retrieval-induced noise, and significantly reduce LLM hallucination rates in production-deployed agent systems. You will establish quality standards for how ID.me ships AI-powered features safely, mentor engineers across teams on AI testing best practices, and partner directly with product and platform teams to embed quality into every stage of agent development.

WHAT YOU'LL DO

* Define AI Quality Standards: Own the framework for how ID.me evaluates, validates, and monitors AI agents — from prompt-based features to fully autonomous multi-step workflows. * Build Eval Infrastructure: Design and maintain evaluation pipelines for LLM outputs, agent behavior, tool use, and multi-turn interactions across development, staging, and production environments. * Production Observability for Agents: Instrument agentic systems for behavioral drift, regression, and failure modes that traditional metrics miss — latency, correctness, hallucination rate, tool misuse, and policy adherence. * Agentic Test Strategy: Lead the design of test suites that handle non-determinism — red-teaming agents, golden dataset construction, LLM-as-judge pipelines, and property-based testing for AI outputs. * Champion Developer Experience: Build the internal tooling, feedback loops, and testing workflows that make it fast and safe for engineers to develop and ship AI features with confidence. Reduce friction in the agent development inner loop — local testing, fast eval runs, and clear signal on regressions. * Drive AI-First Engineering Culture: Raise the quality bar across the engineering org by establishing patterns, tooling, and education for how teams write, test, and deploy AI features responsibly. * Cross-Team Collaboration: Partner with Security, Platform, Product, and AI/ML teams to embed quality gates into agent development workflows. * Mentorship: Guide senior and mid-level engineers through evaluation design, observability strategy, and testing approaches specific to AI systems.

BASIC QUALIFICATIONS

  • Bachelor's degree in Computer Science, Engineering, or equivalent experience
  • 8+ years building and operating production software systems
  • Demonstrated experience evaluating or testing LLM-powered features or
autonomous agents in production * Proficiency with AI-assisted development tools (Claude Code, Cursor, or equivalent) — you build with AI every day
  • Strong backend engineering fundamentals in Python, Java, Go, or equivalent
  • Experience designing test infrastructure, CI/CD quality gates, or evaluation
pipelines at scale * Experience improving developer experience — building internal tooling, reducing toil, or accelerating engineering workflows * Proven ability to lead cross-team technical initiatives and influence engineering standards * Strong written and verbal communication across engineering, product, and leadership * Experience building eval frameworks for LLM agents (e.g., correctness graders, LLM-as-judge, human-in-the-loop evals, benchmark dataset curation) * Familiarity with agentic frameworks (Claude API / Anthropic SDK, BrainTrust, LangChain, LangGraph, CrewAI, or similar) * Production monitoring experience for AI systems: behavioral drift detection, output sampling, shadow scoring * Red-teaming or adversarial testing experience for AI models or agents

PREFERRED QUALIFICATIONS

  • Background in identity verification, fraud detection, or regulated industries
  • Familiarity with Anthropic's model evaluation methodology or similar
published eval research * Experience with observability tooling (Datadog, OpenTelemetry) applied to AI workloads * Track record of building developer tooling or platforms that other teams adopt widely The annual base salary listed does not include a company bonus, incentive for sales roles, equity and benefits which will be determined based on experience, skills, education, relevant training, geographic location and role. ID.me offers comprehensive medical, dental, vision, health savings account, flexible spending accounts (medical, limited purpose, dependent care, commuter benefit accounts), basic and voluntary life and AD&D insurance, 401(k) with company match, parental leave, ability to participate in unlimited paid time off subject to the terms and conditions of the PTO policy, including 8 company wide holidays, short and long-term disability insurance, accident and critical illness insurance, referral bonus policy, employee assistance program, pet insurance, travel assistant program, wellbeing and childcare discounts, benefit advocates, and a learning and development benefit. Final offers may vary from the amount listed based on qualifications, professional experiences, skills, education, relevant training, geographic location, and other job related factors. Mountain View, CA Pay Range

$217,565—$271,000 USD

ID.me maintains a work environment free from discrimination, where employees are treated with dignity and respect. All ID.me employees share in the responsibility for fulfilling our commitment to equal employment opportunity. ID.me does not discriminate against any employee or applicant on the basis of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. ID.me adheres to these principles in all aspects of employment, including recruitment, hiring, training, compensation, promotion, benefits, social and recreational programs, and discipline. In addition, ID.me's policy is to provide reasonable accommodation to qualified employees who have protected disabilities to the extent required by applicable laws, regulations and ordinances where a particular employee works. Upon request we will provide you with more information about such accommodations. Please review our Privacy Policy, including our CCPA policy, at id.me/privacy [ If you provide ID.me with any personally identifiable information you confirm that you have read and agree to be bound by the terms and conditions set out in our Privacy Policy. ID.me participates in E-Verify.

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Staff Software Engineer- AI Agent Evaluations in Mountain View, CA vacancy
  • $214k - $289.5k

     ...Come join the Intuit as a Sr Staff Software Engineer and help us power prosperity around the world...  ...managers and designers by leveraging AI & relevant technology frameworks. Responsibilities...  ...of AI technologies Understands evaluation tools to validate and measure the... 
    Suggested
    Temporary work
    Work experience placement

    Intuit

    Mountain View, CA
    2 days ago
  • $185k - $250k

    Workato, located in Palo Alto, California, is seeking a Staff Product Manager to define AI evaluations and improve the agent experience. This role requires strong product management skills, with over 7 years of experience, particularly in AI/ML systems. The position offers... 
    Suggested
    Flexible hours

    Workato

    Palo Alto, CA
    3 days ago
  • $144.7k - $221.4k

     ...introspect autonomous driving software performance at...  ...developers and systems engineers. Design and implement analysis...  ...stack, including evaluation of perception, prediction...  ...Background in modeling agent interaction and...  ...Experience leveraging AI‑assisted development and... 
    Suggested
    Local area
    Remote work
    Relocation
    Relocation package
    Flexible hours

    General Motors

    Sunnyvale, CA
    1 day ago
  • $197k - $266.5k

     ...thinkers. In pursuit of becoming AI-native, we recognize that...  ...and implementation of durable software solutions to solve critical...  ...technology Coach and mentor junior engineers on best practices of software...  ...in production and evaluate their impact on software performance... 
    Suggested
    Work experience placement

    Intuit

    Mountain View, CA
    4 days ago
  • $218.8k - $335.3k

     ...intuitive design, intelligent software, and next-generation safety...  ...Controller team within Embodied AI. We formulate and solve...  ...technical reviews and drive software engineering best practices across the...  ...toparticipatein a company vehicle evaluation program, through which you... 
    Suggested
    Work experience placement
    Local area
    Remote work
    Flexible hours

    General Motors

    Mountain View, CA
    5 days ago
  • $189.3k - $290.7k

    ## Staff Software Engineer - DevOps/VCU-CCU Infotainment SystemsApplyremote type: Hybridlocations: Mountain...  ..., and root cause analysis* Leverage AI/ML techniques for predictive build...  ...to participate in a company vehicle evaluation program, through which you will be assigned... 
    Remote work
    Relocation package

    General Motors

    Mountain View, CA
    4 days ago
  • $264.55k - $300k

     ...degree in Computer Science, Engineering, Computer Information Systems...  ...experience in the job offered or in a Software Engineer-related occupation....  ..., we are a pioneering AI lab with exceptional interdisciplinary...  ...simulation, and performance evaluation Develop robust evaluation... 
    Full time
    Work at office

    Google Inc.

    Mountain View, CA
    5 days ago
  • $240k - $265k

     ...an artificial intelligence (AI) powered technology stack...  ...its commercial self-driving software to develop, test and deploy...  ...Defense. We are looking for a Staff Software Engineer to help shape how learned...  ..., requirements, evaluation, and deployment. Work on learned... 
    Visa sponsorship

    Omaze

    Mountain View, CA
    3 days ago
  • $176.4k - $319.72k

    Senior/Staff Software Engineer, Behavior Verification Who We Are Nuro is a self-driving technology company...  ...driver, combining cutting‑edge AI with automotive‑grade hardware. Nuro licenses...  ...for implementing metrics that evaluate the end‑to‑end behavior of the Nuro Driver... 
    Odd job
    Work experience placement

    Nuro, Inc.

    Mountain View, CA
    5 days ago
  • $248.71k - $292.6k

    About Groq Groq delivers fast, efficient AI inference. Our LPU-based system powers...  ...anything is possible. Build fast. Sr. Staff Software Engineer - High Performance GPU Inference...  ...information is for accommodation requests only. Evaluation of requests for reasonable... 

    I did my part and supported the Regular Toilet

    Palo Alto, CA
    3 days ago
  • $207k - $301k

    Staff Software Engineer, YouTube Ads Marketplace Optimization Mountain View, CA, USA Qualifications...  ...infrastructure (model deployment, model evaluation, data processing, debugging, fine...  ...). Experience integrating generative AI tools or LLM interfaces into workflows... 

    Google Inc.

    Mountain View, CA
    5 days ago
  • $165.3k - $219.68k

     ...and running the world's best data and AI infrastructure platform so our customers...  ...search, and data exploration. As a Staff Software Engineer for Search Quality, you will drive the...  ...technical direction of ranking, relevance, evaluation, and quality initiatives across... 
    Local area
    Worldwide

    Menlo Ventures

    Mountain View, CA
    2 days ago
  • $197k - $291k

    Staff Software Engineer, Applied Research, Foundation User Models corporate_fare Google place Mountain...  ...design (e.g., model deployment, model evaluation, data processing, debugging, fine-...  ...use by other teams and products. The AI and Infrastructure team is redefining... 
    Full time
    Immediate start
    Worldwide

    Google Inc.

    Mountain View, CA
    2 days ago
  • $262k - $365k

    Senior Staff Software Engineer, TPU Performance corporate_fare Google place Sunnyvale, CA, USA Apply...  ...(e.g., model deployment, model evaluation, data processing, debugging, fine tuning...  ...workloads using PyTorch and JAX. The AI and Infrastructure team is redefining... 
    Worldwide

    Google Inc.

    Sunnyvale, CA
    5 days ago
  • $262k - $365k

    Senior Staff Software Engineer, YouTube Ads Surfaces and Experiences corporate_fare YouTube place Mountain...  ...(e.g., model deployment, model evaluation, data processing, debugging, fine tuning...  ...and technological shifts, such as new AI capabilities. As an Senior Staff... 
    Full time
    Temporary work
    Shift work

    Google Inc.

    Mountain View, CA
    3 days ago
  • Senior Staff Software Engineer, DataX Become a part of Intuit's "Builder Catalysts" community as a Senior...  .... We are dedicated to building AI‑native experiences from the ground up,...  ...limitations of AI technologies. Understands evaluation tools to validate and measure the... 
    Temporary work
    Work experience placement

    ATX Venture Partners

    Mountain View, CA
    4 days ago
  • $197k - $266.5k

    Staff Software Engineer-Front End Category Software Engineering Location Mountain View, California...  ...imaginative thinkers. In pursuit of becoming AI-native, we recognize that fostering a...  ...guidance to the team by analyzing, evaluating, and prioritizing technical issues/... 
    Work experience placement
    Worldwide

    ATX Venture Partners

    Mountain View, CA
    4 days ago
  • $262k - $365k

    Senior Staff Software Engineer, AI/ML GenAI, Google Ads Google Mountain View, CA, USA Advanced Experience owning outcomes and decision making...  ...industry-scale ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning). 5 years of... 
    Full time

    Google Inc.

    Mountain View, CA
    5 days ago
  • A leading automotive technology company is seeking an experienced professional to lead evaluation for autonomous driving software. This role involves defining metrics and analyses, leading cross-functional teams, and developing innovative statistical methods. Required... 
    Remote job
    Work at office
    Local area

    General Motors

    Sunnyvale, CA
    2 days ago
  • $262k - $365k

     ...experience. 8 years of experience in software development. 7 years of...  ....g., model deployment, model evaluation, data processing, debugging,...  ...Master’s degree or PhD in Engineering, Computer Science, or a related...  ...using PyTorch and JAX. The AI and Infrastructure team is... 
    Worldwide

    Google

    Sunnyvale, CA
    3 days ago
  • $197k - $291k

    Staff Software Engineer, Machine Learning Compilers, Edge TPU Google Mountain View, CA, USA ; Kirkland...  ...(e.g., model deployment, model evaluation, etc.). Preferred qualifications: Master...  ...Our team combines the best of Google AI, Software, and Hardware to create radically... 
    Full time
    Temporary work

    Google Inc.

    Mountain View, CA
    5 days ago
  •  ...how people discover, evaluate, and purchase products...  ...Nectar, we're building the AI-native social...  ...for both sides. Our AI agents listen in real time, surface...  ...ex-Meta product and engineering leaders, we work with...  ...Role We're looking for a Staff Software Engineer to build and... 
    Shift work

    Nectar Inc

    Palo Alto, CA
    3 days ago
  • $251k

     ...build the next generation of AI-powered developer tools...  ...on transforming the Software Development Life Cycle (SDLC...  ...Generative AI and multi-agent frameworks, enabling our engineering teams to build, test, and...  ...Develop data pipelines and evaluation strategies to deploy... 
    Full time
    Remote work

    DiversityJobs

    Mountain View, CA
    2 days ago
  • $183.83k - $333.93k

     ...driver, combining cutting‑edge AI with automotive‑grade hardware....  ...connected future. About the Role Our software team is growing, and we are looking for talented engineers to join us and be instrumental...  ...platform supports the autonomy evaluation infrastructure by providing... 

    Icehouseventures

    Mountain View, CA
    1 day ago
  • $235k - $352k

     ...combining cutting‑edge AI with automotive-grade hardware...  .... About the Role As a Staff Technical Lead on...  ..., and autonomy software performance. You will...  ...multiple stakeholders, mentor engineers, and deliver robust systems...  ...define requirements, evaluate and integrate next‑... 

    Kindredventures

    Mountain View, CA
    1 day ago
  • $207k - $300k

    Staff Software Engineer, Generative AI, Core Machine Learning corporate_fare Google place Mountain View, CA,...  ...infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine...  ...transform the development of AI agents from an artisanal craft into an... 
    Full time

    Google Inc.

    Mountain View, CA
    5 days ago
  • $240k - $265k

     ...an artificial intelligence (AI) powered technology stack purpose...  ...its commercial self-driving software to develop, test and deploy...  ...are looking for a Senior or Staff Software Engineer to build infrastructure,...  ...Planning team develop, debug, evaluate, and deploy autonomy... 
    Visa sponsorship

    Omaze

    Mountain View, CA
    1 day ago
  • $225k - $285k

     ...Responsibilities We build agentic AI products that our...  ...modalities. These agents sit on top of the same...  .... We're looking for a Staff Engineer to own the...  ...are designed, tested, evaluated, and operated in a regulated...  ...requirements8+ years of software engineering experience... 
    Work at office
    Work from home
    Flexible hours
    Shift work

    PayNearMe, Inc.

    Santa Clara, CA
    9 days ago
  •  ...Location Type Hybrid Department AI Perplexity is looking for an Applied ML Engineer to design, build, and iterate on...  ...ranking and surfacing) Rigorously evaluate LLM/ML models with both offline and...  ..., NLP, and/or ranking. Strong software engineering skills (Python, production... 
    Full time

    Pantera Capital

    Palo Alto, CA
    4 days ago
  • $220k - $405k

     ...Type Full time Department Product Engineering Compensation $220K - $405K • Offers...  ...take into account the nuances of AI, working with agents, context, evaluation, personalization and the ground truth...  ...4+ years of professional software engineering experience. Strong experience... 
    Full time
    Local area

    Pantera Capital

    Palo Alto, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Software Engineer- AI Agent Evaluations. Be the first to apply!