Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Software Engineer- AI Agent Evaluations

$217.57k - $271k
Full-time

ID.me

Company Overview ID.me is the next-generation digital identity wallet that simplifies how individuals securely prove their identity online. Consumers can verify their identity with ID.me once and seamlessly login across websites without having to create a new login and verify their identity again. Over 152 million users experience streamlined login and identity verification with ID.me at 20 federal agencies, 45 state government agencies, and 70+ healthcare organizations. More than 600+ consumer brands use ID.me to verify communities and user segments to honor service and build more authentic relationships. ID.me’s technology meets the federal standards for consumer authentication set by the Commerce Department and is approved as a NIST 800-63-3 IAL2 / AAL2 credential service provider by the Kantara Initiative. ID.me is committed to “No Identity Left Behind” to enable all people to have a secure digital identity. To learn more, visit About the Role This Staff Engineer role sits at the intersection of engineering, applied AI, testing and developer experience. You will define and lead the discipline of testing AI agents, evaluating LLM behavior, and ensuring the reliability of agentic systems operating in production. It requires deep engineering rigor, original thinking about what "correctness" means for non-deterministic systems, and the ability to build eval infrastructure and developer tooling that the entire engineering org depends on. Expert in building and maintaining Retrieval-Augmented Generation (RAG) pipelines, with a deep focus on strategic data chunking and data quality enforcement. Experience in establishing pre-retrieval data quality gates to optimize vector search accuracy, minimize retrieval-induced noise, and significantly reduce LLM hallucination rates in production-deployed agent systems. You will establish quality standards for how ID.me ships AI-powered features safely, mentor engineers across teams on AI testing best practices, and partner directly with product and platform teams to embed quality into every stage of agent development. What You'll Do Define AI Quality Standards: Own the framework for how ID.me evaluates, validates, and monitors AI agents — from prompt-based features to fully autonomous multi-step workflows. Build Eval Infrastructure: Design and maintain evaluation pipelines for LLM outputs, agent behavior, tool use, and multi-turn interactions across development, staging, and production environments. Production Observability for Agents: Instrument agentic systems for behavioral drift, regression, and failure modes that traditional metrics miss — latency, correctness, hallucination rate, tool misuse, and policy adherence. Agentic Test Strategy: Lead the design of test suites that handle non-determinism — red-teaming agents, golden dataset construction, LLM-as-judge pipelines, and property-based testing for AI outputs. Champion Developer Experience: Build the internal tooling, feedback loops, and testing workflows that make it fast and safe for engineers to develop and ship AI features with confidence. Reduce friction in the agent development inner loop — local testing, fast eval runs, and clear signal on regressions. Drive AI-First Engineering Culture: Raise the quality bar across the engineering org by establishing patterns, tooling, and education for how teams write, test, and deploy AI features responsibly. Cross-Team Collaboration: Partner with Security, Platform, Product, and AI/ML teams to embed quality gates into agent development workflows. Mentorship: Guide senior and mid-level engineers through evaluation design, observability strategy, and testing approaches specific to AI systems. Basic Qualifications Bachelor's degree in Computer Science, Engineering, or equivalent experience 8+ years building and operating production software systems Demonstrated experience evaluating or testing LLM-powered features or autonomous agents in production Proficiency with AI-assisted development tools (Claude Code, Cursor, or equivalent) — you build with AI every day Strong backend engineering fundamentals in Python, Java, Go, or equivalent Experience designing test infrastructure, CI/CD quality gates, or evaluation pipelines at scale Experience improving developer experience — building internal tooling, reducing toil, or accelerating engineering workflows Proven ability to lead cross-team technical initiatives and influence engineering standards Strong written and verbal communication across engineering, product, and leadership Experience building eval frameworks for LLM agents (e.g., correctness graders, LLM-as-judge, human-in-the-loop evals, benchmark dataset curation) Familiarity with agentic frameworks (Claude API / Anthropic SDK, BrainTrust, LangChain, LangGraph, CrewAI, or similar) Production monitoring experience for AI systems: behavioral drift detection, output sampling, shadow scoring Red-teaming or adversarial testing experience for AI models or agents Preferred Qualifications Background in identity verification, fraud detection, or regulated industries Familiarity with Anthropic's model evaluation methodology or similar published eval research Experience with observability tooling (Datadog, OpenTelemetry) applied to AI workloads Track record of building developer tooling or platforms that other teams adopt widely The annual base salary listed does not include a company bonus, incentive for sales roles, equity and benefits which will be determined based on experience, skills, education, relevant training, geographic location and role. ID.me offers comprehensive medical, dental, vision, health savings account, flexible spending accounts (medical, limited purpose, dependent care, commuter benefit accounts), basic and voluntary life and AD&D insurance, 401(k) with company match, parental leave, ability to participate in unlimited paid time off subject to the terms and conditions of the PTO policy, including 8 company wide holidays, short and long-term disability insurance, accident and critical illness insurance, referral bonus policy, employee assistance program, pet insurance, travel assistant program, wellbeing and childcare discounts, benefit advocates, and a learning and development benefit. The above represents the anticipated total rewards package for this job requisition. Final offers may vary from the amount listed based on qualifications, professional experiences, skills, education, relevant training, geographic location, and other job related factors. Mountain View, CA Pay Range

$217,565—$271,000 USD

ID.me is a full-time, in-office culture. Unless a specific job description explicitly states otherwise, all roles are on-site five days per week at one of our offices in McLean, VA; Mountain View, CA; New York City, NY; or Tampa, FL. Certain roles — such as field-based sales or other remote-by-design positions — may have different work arrangements as noted in their individual postings. ID.me maintains a work environment free from discrimination, where employees are treated with dignity and respect. All ID.me employees share in the responsibility for fulfilling our commitment to equal employment opportunity. ID.me does not discriminate against any employee or applicant on the basis of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. ID.me adheres to these principles in all aspects of employment, including recruitment, hiring, training, compensation, promotion, benefits, social and recreational programs, and discipline. In addition, ID.me's policy is to provide reasonable accommodation to qualified employees who have protected disabilities to the extent required by applicable laws, regulations and ordinances where a particular employee works. Upon request we will provide you with more information about such accommodations. Please review our Privacy Policy, including our CCPA policy, at id.me/privacy. If you provide ID.me with any personally identifiable information you confirm that you have read and agree to be bound by the terms and conditions set out in our Privacy Policy. ID.me participates in E-Verify.

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Staff Software Engineer- AI Agent Evaluations in Mountain View, CA vacancy
  • $238k - $302k

     ...Staff Software Engineer, Simulator Evaluation Waymo is an autonomous driving technology company with the mission...  ...dynamics, and state-of-the-art Generative AI to create a training ground for the...  .../or classical simulation systems (Agent-based modeling, heuristics).... 
    Suggested
    Full time
    Remote work
    Shift work

    Waymo

    Mountain View, CA
    3 days ago
  • $262k - $365k

    Senior Staff Software Engineer, ML Infrastructure, Agents Infrastructure Google Sunnyvale, CA, USA Qualifications Bachelor...  ...(e.g., model deployment, model evaluation, data processing, debugging, fine...  ...software solutions. Applied AI builds conversational agents deployed... 
    Suggested
    Full time

    Google Inc.

    Sunnyvale, CA
    1 day ago
  • $262k - $365k

    Senior Staff Software Engineer, Infrastructure, Agents Infra Advanced Experience owning outcomes and decision making...  ...infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine...  ...push technology forward. Applied AI builds conversational agents... 
    Suggested
    Full time

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $262k - $365k

     ...firm is seeking a Senior Staff Research Engineer to work on cutting-edge AI projects. Responsibilities include developing evaluation frameworks, optimizing...  ...data usage, and analyzing agent performance. Candidates...  ...have at least 8 years of software development experience,... 
    Suggested
    Full time

    Google Inc.

    Mountain View, CA
    4 days ago
  • $123.2k - $189.1k

     ...Overview As a member of the core AV software reliability team , you will be responsible...  ...by turning failures into actionable engineering insights at scale. This is a software...  ...intelligent triage, deep software debugging, and AI-assisted failure analysis across... 
    Suggested
    Local area
    Work from home
    Flexible hours

    General Motors

    Sunnyvale, CA
    20 hours ago
  • $204k - $259k

     ...of-the-art Generative AI to create a training ground...  ...Driver. The Simulator Evaluation team faces the...  ...looking for a Senior Software Engineer to build the metrics and...  ...will report to a Senior Staff Software Engineering Manager...  ...or experience with agent-based modeling and... 
    Full time
    Remote work

    Waymo

    Mountain View, CA
    19 hours ago
  • $170k - $216k

     ...powers the Waymo Driver. Our software allows the Waymo Driver to perceive...  ...sensors, enabling software engineers like you to develop multi-...  ...using an automated system Evaluate new hardware specifications...  ...of experience in industrial AI applications involving the creation... 
    Full time
    Remote work

    Waymo

    Mountain View, CA
    1 day ago
  •  ...Overview Come join Intuit as a Senior Staff Software Engineer and help us power prosperity around...  ...of ownership. Knowledge of building AI native applications Guides the...  ...limitations of AI technologies. Understands evaluation tools to validate and measure the... 
    Temporary work
    Work experience placement
    Relocation package

    Intuit

    Mountain View, CA
    19 hours ago
  • $238k - $302k

     ...An autonomous driving technology company in San Francisco is seeking a data-minded software engineer to improve the evaluation of its onboard software. The ideal candidate will have 5+ years of experience in coding and statistical analysis, with a BS/MS in a quantitative... 
    Full time

    Waymo

    Mountain View, CA
    2 days ago
  •  ...AI Native Staff Software Engineer At Nubank, AI is not a bolt-on feature — it is existential to our next...  ...AI development, promoting reusable agents, commands and context structures that...  ...safely build on. Establishing evaluation and monitoring practices for AI outputs... 
    Work at office
    Flexible hours

    Nubank

    Palo Alto, CA
    20 hours ago
  •  ...Staff Software Engineer - Machine Learning - Calibration Pittsburgh, PA, Palo Alto, CA, Detroit, MI Latitude AI develops automated driving technologies, including L3, for Ford vehicles...  ...on the state of the art and to evaluate the robustness and quality of solutions... 
    Work at office
    Immediate start

    Latitude AI

    Palo Alto, CA
    4 days ago
  • $180k - $260k

     ...solution that integrates advanced software and hardware powering the...  ...We are looking for talented Staff Engineers with expertise in classical and...  ...Train perception models, evaluate their performance, investigate...  ...profitable AVs Tech Brew: Gatik AI exec unpacks the regulations... 
    Odd job
    Work at office

    Gatik AI

    Mountain View, CA
    2 days ago
  • $229k - $343k

     ...glasses, Spectacles ( . Snap Engineering ( teams build fun and...  ...forefront. We're looking for a Staff Software Engineer to join the...  ...implementation and launch Evaluate technical tradeoffs of every...  ...guarantee code quality Utilize AI tools and high velocity... 
    Work experience placement
    Live in
    Work at office
    Local area

    Snap

    Palo Alto, CA
    20 hours ago
  • $218.8k - $335.3k

     ...intuitive design, intelligent software, and next-generation safety...  ...Controller team within Embodied AI. We formulate and solve...  ...technical reviews and drive software engineering best practices across the...  ...participate in a company vehicle evaluation program, through which you... 
    Work experience placement
    Local area
    Remote work
    Work from home
    Relocation package
    Flexible hours

    General Motors

    Mountain View, CA
    1 day ago
  • $180k - $260k

     ...solution that integrates advanced software and hardware powering the...  ...We are seeking senior or staff software engineers to join our planning team...  ...for trajectory generation, evaluation, and deployment in real-...  ...profitable AVs Tech Brew: Gatik AI exec unpacks the... 
    Odd job
    Work at office

    Gatik AI

    Mountain View, CA
    2 days ago
  • $251k - $310k

     ...Staff Software Engineer, Quantitative Evaluation Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver... 
    Full time
    Remote work

    Waymo

    Mountain View, CA
    2 days ago
  • $232k

     ...Staff Software Engineer – AV Labs Uber San Francisco, CA, US / Sunnyvale, CA, US About the...  ...will be at the forefront of Physical AI, building advanced autonomy algorithms...  ...management of upstream sensor dependencies and evaluation metrics for the stack. Technical... 
    Full time
    Work experience placement
    Work at office
    Remote work

    Softbank Investment Advisers

    Sunnyvale, CA
    20 hours ago
  • $161.71k - $234.33k

     ...Staff Software Engineer, Test Mountain View, CA We are CARIAD, an automotive software development team with the Volkswagen Group...  ...pipelines for efficiency, scalability, and reliability. Evaluate and implement AI-assisted testing techniques where applicable within... 
    Permanent employment
    Temporary work
    Early shift

    CARIAD, Inc.

    Mountain View, CA
    11 days ago
  • $220k - $250k

     ...Staff Software Engineer Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and...  ...documents. You will evaluate tools and frameworks, carefully... 
    Temporary work

    Crusoe

    Sunnyvale, CA
    4 days ago
  • $208.73k - $253k

     ...vertically integrated AI infrastructure company...  ...the future of Crusoe's engineering muscle. We're responsible...  ...testing and building AI agent platforms. You will...  ...state of Crusoe's evolving software development efforts and...  ...: Expert ability to evaluate technical tradeoffs and... 
    Full time
    Temporary work

    Crusoe

    Sunnyvale, CA
    2 days ago
  •  ...that can meet the demands that come with AI and Machine Learning requirements. What was...  ...Design, develop, and optimize scalable software systems across the full stack. Hands...  ...processes and maintain a consistent and fair evaluation of all applicants. Thank you for your... 

    ECL Services

    Mountain View, CA
    3 days ago
  • $181k - $226k

     ...Harness is the AI Software Delivery Platform company, led by technologist...  ..., 82M builds, 18T flag evaluations, 8M security scans, 9.1B optimized...  ...comes under the Platform Engineering charter, focused on...  ...SpiceDB) or OPA (Open Policy Agent) Familiarity with Identity... 
    Local area
    Immediate start
    Flexible hours
    Shift work

    Harness

    Mountain View, CA
    20 hours ago
  • $185k - $250k

    Workato, located in Palo Alto, California, is seeking a Staff Product Manager to define AI evaluations and improve the agent experience. This role requires strong product management skills, with over 7 years of experience, particularly in AI/ML systems. The position offers... 
    Flexible hours

    Workato

    Palo Alto, CA
    1 day ago
  • $207k - $340k

     ...Principal Staff Software Engineer, AI Advertiser Growth This role will be based in Sunnyvale. At LinkedIn, our approach to flexible work is...  ...drive relevance and optimize advertiser outcomes. Develop, evaluate, and fine-tune large LLMs to improve accuracy, alignment... 
    For contractors
    Work experience placement
    Work at office
    Flexible hours

    LinkedIn

    Sunnyvale, CA
    8 days ago
  • $235k - $352k

     ...combining cutting-edge AI with automotive-grade hardware...  ...the Role As a Staff Technical Lead on...  ...execution, and autonomy software performance. You will...  ...multiple stakeholders, mentor engineers, and deliver robust...  ...define requirements, evaluate and integrate next-generation... 

    Nuro

    Mountain View, CA
    4 days ago
  •  ...Principal Staff Software Engineer, AI Advertiser Growth Full‑time • Hybrid • Sunnyvale, CA LinkedIn’s AI and Machine Learning Engineers develop...  ...relevance and optimize advertiser outcomes. Develop, evaluate, and fine‑tune large language models to improve accuracy,... 
    Full time
    Work experience placement

    LinkedIn

    Mountain View, CA
    2 days ago
  • $208.73k - $279.57k

     ...Staff Software Engineer For The Ai Model Lifecycle Team Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only...  ...model, and experiment management: versioning, lineage, evaluation, and reproducible fine-tuning at scale. What You'll... 
    Temporary work

    Crusoe

    Sunnyvale, CA
    1 day ago
  • $193.93k - $352.29k

     ...Staff/Senior Software Engineer, Onboard Infrastructure Mountain View, California (HQ) Nuro is a self...  ...driver, combining cutting-edge AI with automotive-grade hardware. Nuro...  ...stakeholders and external suppliers to define, evaluate, and integrate the next-generation HW... 

    Nuro

    Mountain View, CA
    4 days ago
  • $207k - $340k

     ...needs of the team. LinkedIn's AI and Machine Learning Engineers are both data/research scientists and software engineers, who develop and...  .... As a Principal Staff Software Engineer you will be...  ...advertiser outcomes. Develop, evaluate, and fine-tune large LLMs to... 
    For contractors
    Work experience placement
    Work at office
    Flexible hours

    LinkedIn

    Mountain View, CA
    20 hours ago
  • $281k - $356k

     ...Senior Staff Software Engineer, Model Post Training Waymo is an autonomous driving technology company...  ...the next generation of frontier AI models. You will: Post-training...  ...the technical bar for how Waymo trains, evaluates, and deploys LLM models in the... 
    Full time
    Remote work

    Waymo

    Mountain View, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Software Engineer- AI Agent Evaluations. Be the first to apply!