Staff Software Engineer- AI Agent Evaluations
$217.57k - $271kFull-time
ID.me
COMPANY OVERVIEW
ID.me is the next-generation digital identity wallet that simplifies how individuals securely prove their identity online. Consumers can verify their identity with ID.me once and seamlessly login across websites without having to create a new login and verify their identity again. Over 152 million users experience streamlined login and identity verification with ID.me at 20 federal agencies, 45 state government agencies, and 70+ healthcare organizations. More than 600+ consumer brands use ID.me to verify communities and user segments to honor service and build more authentic relationships. ID.me’s technology meets the federal standards for consumer authentication set by the Commerce Department and is approved as a NIST 800-63-3 IAL2 / AAL2 credential service provider by the Kantara Initiative. ID.me is committed to “No Identity Left Behind” to enable all people to have a secure digital identity. To learn more, visit [ ID.me is a full-time, in-office culture. Unless a specific job description explicitly states otherwise, all roles are on-site five days per week at one of our offices in McLean, VA; Mountain View, CA; New York City, NY; or Tampa, FL. Certain roles — such as field-based sales or other remote-by-design positions — may have different work arrangements as noted in their individual postings. At ID.me, we embrace the thoughtful use of AI tools in our daily work and there are even occasions where we leverage AI in our hiring process. However, during the interview process, we want to understand your individual skills and experiences. Therefore, we have guidelines on how AI can be appropriately used during your application and interviews which can be found here [ABOUT THE ROLE
This Staff Engineer role sits at the intersection of engineering, applied AI, testing and developer experience. You will define and lead the discipline of testing AI agents, evaluating LLM behavior, and ensuring the reliability of agentic systems operating in production. It requires deep engineering rigor, original thinking about what "correctness" means for non-deterministic systems, and the ability to build eval infrastructure and developer tooling that the entire engineering org depends on. Expert in building and maintaining Retrieval-Augmented Generation (RAG) pipelines, with a deep focus on strategic data chunking and data quality enforcement. Experience in establishing pre-retrieval data quality gates to optimize vector search accuracy, minimize retrieval-induced noise, and significantly reduce LLM hallucination rates in production-deployed agent systems. You will establish quality standards for how ID.me ships AI-powered features safely, mentor engineers across teams on AI testing best practices, and partner directly with product and platform teams to embed quality into every stage of agent development.WHAT YOU'LL DO
* Define AI Quality Standards: Own the framework for how ID.me evaluates, validates, and monitors AI agents — from prompt-based features to fully autonomous multi-step workflows. * Build Eval Infrastructure: Design and maintain evaluation pipelines for LLM outputs, agent behavior, tool use, and multi-turn interactions across development, staging, and production environments. * Production Observability for Agents: Instrument agentic systems for behavioral drift, regression, and failure modes that traditional metrics miss — latency, correctness, hallucination rate, tool misuse, and policy adherence. * Agentic Test Strategy: Lead the design of test suites that handle non-determinism — red-teaming agents, golden dataset construction, LLM-as-judge pipelines, and property-based testing for AI outputs. * Champion Developer Experience: Build the internal tooling, feedback loops, and testing workflows that make it fast and safe for engineers to develop and ship AI features with confidence. Reduce friction in the agent development inner loop — local testing, fast eval runs, and clear signal on regressions. * Drive AI-First Engineering Culture: Raise the quality bar across the engineering org by establishing patterns, tooling, and education for how teams write, test, and deploy AI features responsibly. * Cross-Team Collaboration: Partner with Security, Platform, Product, and AI/ML teams to embed quality gates into agent development workflows. * Mentorship: Guide senior and mid-level engineers through evaluation design, observability strategy, and testing approaches specific to AI systems.BASIC QUALIFICATIONS
- Bachelor's degree in Computer Science, Engineering, or equivalent experience
- 8+ years building and operating production software systems
- Demonstrated experience evaluating or testing LLM-powered features or
- Strong backend engineering fundamentals in Python, Java, Go, or equivalent
- Experience designing test infrastructure, CI/CD quality gates, or evaluation
PREFERRED QUALIFICATIONS
- Background in identity verification, fraud detection, or regulated industries
- Familiarity with Anthropic's model evaluation methodology or similar
$217,565—$271,000 USD
ID.me maintains a work environment free from discrimination, where employees are treated with dignity and respect. All ID.me employees share in the responsibility for fulfilling our commitment to equal employment opportunity. ID.me does not discriminate against any employee or applicant on the basis of age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. ID.me adheres to these principles in all aspects of employment, including recruitment, hiring, training, compensation, promotion, benefits, social and recreational programs, and discipline. In addition, ID.me's policy is to provide reasonable accommodation to qualified employees who have protected disabilities to the extent required by applicable laws, regulations and ordinances where a particular employee works. Upon request we will provide you with more information about such accommodations. Please review our Privacy Policy, including our CCPA policy, at id.me/privacy [ If you provide ID.me with any personally identifiable information you confirm that you have read and agree to be bound by the terms and conditions set out in our Privacy Policy. ID.me participates in E-Verify.Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Staff Software Engineer- AI Agent Evaluations in Mountain View, CA vacancy
$214k - $289.5k
...Come join the Intuit as a Sr Staff Software Engineer and help us power prosperity around the world... ...managers and designers by leveraging AI & relevant technology frameworks. Responsibilities... ...of AI technologies Understands evaluation tools to validate and measure the...SuggestedTemporary workWork experience placement$185k - $250k
Workato, located in Palo Alto, California, is seeking a Staff Product Manager to define AI evaluations and improve the agent experience. This role requires strong product management skills, with over 7 years of experience, particularly in AI/ML systems. The position offers...SuggestedFlexible hours$144.7k - $221.4k
...introspect autonomous driving software performance at... ...developers and systems engineers. Design and implement analysis... ...stack, including evaluation of perception, prediction... ...Background in modeling agent interaction and... ...Experience leveraging AI‑assisted development and...SuggestedLocal areaRemote workRelocationRelocation packageFlexible hours$197k - $266.5k
...thinkers. In pursuit of becoming AI-native, we recognize that... ...and implementation of durable software solutions to solve critical... ...technology Coach and mentor junior engineers on best practices of software... ...in production and evaluate their impact on software performance...SuggestedWork experience placement$218.8k - $335.3k
...intuitive design, intelligent software, and next-generation safety... ...Controller team within Embodied AI. We formulate and solve... ...technical reviews and drive software engineering best practices across the... ...toparticipatein a company vehicle evaluation program, through which you...SuggestedWork experience placementLocal areaRemote workFlexible hours$189.3k - $290.7k
## Staff Software Engineer - DevOps/VCU-CCU Infotainment SystemsApplyremote type: Hybridlocations: Mountain... ..., and root cause analysis* Leverage AI/ML techniques for predictive build... ...to participate in a company vehicle evaluation program, through which you will be assigned...Remote workRelocation package$264.55k - $300k
...degree in Computer Science, Engineering, Computer Information Systems... ...experience in the job offered or in a Software Engineer-related occupation.... ..., we are a pioneering AI lab with exceptional interdisciplinary... ...simulation, and performance evaluation Develop robust evaluation...Full timeWork at office$240k - $265k
...an artificial intelligence (AI) powered technology stack... ...its commercial self-driving software to develop, test and deploy... ...Defense. We are looking for a Staff Software Engineer to help shape how learned... ..., requirements, evaluation, and deployment. Work on learned...Visa sponsorship$176.4k - $319.72k
Senior/Staff Software Engineer, Behavior Verification Who We Are Nuro is a self-driving technology company... ...driver, combining cutting‑edge AI with automotive‑grade hardware. Nuro licenses... ...for implementing metrics that evaluate the end‑to‑end behavior of the Nuro Driver...Odd jobWork experience placement$248.71k - $292.6k
About Groq Groq delivers fast, efficient AI inference. Our LPU-based system powers... ...anything is possible. Build fast. Sr. Staff Software Engineer - High Performance GPU Inference... ...information is for accommodation requests only. Evaluation of requests for reasonable...$207k - $301k
Staff Software Engineer, YouTube Ads Marketplace Optimization Mountain View, CA, USA Qualifications... ...infrastructure (model deployment, model evaluation, data processing, debugging, fine... ...). Experience integrating generative AI tools or LLM interfaces into workflows...$165.3k - $219.68k
...and running the world's best data and AI infrastructure platform so our customers... ...search, and data exploration. As a Staff Software Engineer for Search Quality, you will drive the... ...technical direction of ranking, relevance, evaluation, and quality initiatives across...Local areaWorldwide$197k - $291k
Staff Software Engineer, Applied Research, Foundation User Models corporate_fare Google place Mountain... ...design (e.g., model deployment, model evaluation, data processing, debugging, fine-... ...use by other teams and products. The AI and Infrastructure team is redefining...Full timeImmediate startWorldwide$262k - $365k
Senior Staff Software Engineer, TPU Performance corporate_fare Google place Sunnyvale, CA, USA Apply... ...(e.g., model deployment, model evaluation, data processing, debugging, fine tuning... ...workloads using PyTorch and JAX. The AI and Infrastructure team is redefining...Worldwide$262k - $365k
Senior Staff Software Engineer, YouTube Ads Surfaces and Experiences corporate_fare YouTube place Mountain... ...(e.g., model deployment, model evaluation, data processing, debugging, fine tuning... ...and technological shifts, such as new AI capabilities. As an Senior Staff...Full timeTemporary workShift work- Senior Staff Software Engineer, DataX Become a part of Intuit's "Builder Catalysts" community as a Senior... .... We are dedicated to building AI‑native experiences from the ground up,... ...limitations of AI technologies. Understands evaluation tools to validate and measure the...Temporary workWork experience placement
$197k - $266.5k
Staff Software Engineer-Front End Category Software Engineering Location Mountain View, California... ...imaginative thinkers. In pursuit of becoming AI-native, we recognize that fostering a... ...guidance to the team by analyzing, evaluating, and prioritizing technical issues/...Work experience placementWorldwide$262k - $365k
Senior Staff Software Engineer, AI/ML GenAI, Google Ads Google Mountain View, CA, USA Advanced Experience owning outcomes and decision making... ...industry-scale ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning). 5 years of...Full time- A leading automotive technology company is seeking an experienced professional to lead evaluation for autonomous driving software. This role involves defining metrics and analyses, leading cross-functional teams, and developing innovative statistical methods. Required...Remote jobWork at officeLocal area
$262k - $365k
...experience. 8 years of experience in software development. 7 years of... ....g., model deployment, model evaluation, data processing, debugging,... ...Master’s degree or PhD in Engineering, Computer Science, or a related... ...using PyTorch and JAX. The AI and Infrastructure team is...Worldwide$197k - $291k
Staff Software Engineer, Machine Learning Compilers, Edge TPU Google Mountain View, CA, USA ; Kirkland... ...(e.g., model deployment, model evaluation, etc.). Preferred qualifications: Master... ...Our team combines the best of Google AI, Software, and Hardware to create radically...Full timeTemporary work- ...how people discover, evaluate, and purchase products... ...Nectar, we're building the AI-native social... ...for both sides. Our AI agents listen in real time, surface... ...ex-Meta product and engineering leaders, we work with... ...Role We're looking for a Staff Software Engineer to build and...Shift work
$251k
...build the next generation of AI-powered developer tools... ...on transforming the Software Development Life Cycle (SDLC... ...Generative AI and multi-agent frameworks, enabling our engineering teams to build, test, and... ...Develop data pipelines and evaluation strategies to deploy...Full timeRemote work$183.83k - $333.93k
...driver, combining cutting‑edge AI with automotive‑grade hardware.... ...connected future. About the Role Our software team is growing, and we are looking for talented engineers to join us and be instrumental... ...platform supports the autonomy evaluation infrastructure by providing...$235k - $352k
...combining cutting‑edge AI with automotive-grade hardware... .... About the Role As a Staff Technical Lead on... ..., and autonomy software performance. You will... ...multiple stakeholders, mentor engineers, and deliver robust systems... ...define requirements, evaluate and integrate next‑...$207k - $300k
Staff Software Engineer, Generative AI, Core Machine Learning corporate_fare Google place Mountain View, CA,... ...infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine... ...transform the development of AI agents from an artisanal craft into an...Full time$240k - $265k
...an artificial intelligence (AI) powered technology stack purpose... ...its commercial self-driving software to develop, test and deploy... ...are looking for a Senior or Staff Software Engineer to build infrastructure,... ...Planning team develop, debug, evaluate, and deploy autonomy...Visa sponsorship$225k - $285k
...Responsibilities We build agentic AI products that our... ...modalities. These agents sit on top of the same... .... We're looking for a Staff Engineer to own the... ...are designed, tested, evaluated, and operated in a regulated... ...requirements8+ years of software engineering experience...Work at officeWork from homeFlexible hoursShift work- ...Location Type Hybrid Department AI Perplexity is looking for an Applied ML Engineer to design, build, and iterate on... ...ranking and surfacing) Rigorously evaluate LLM/ML models with both offline and... ..., NLP, and/or ranking. Strong software engineering skills (Python, production...Full time
$220k - $405k
...Type Full time Department Product Engineering Compensation $220K - $405K • Offers... ...take into account the nuances of AI, working with agents, context, evaluation, personalization and the ground truth... ...4+ years of professional software engineering experience. Strong experience...Full timeLocal area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Software Engineer- AI Agent Evaluations. Be the first to apply!
Related searches
- work from home chat agent Mountain View, CA
- cruise agent Mountain View, CA
- import export agent Mountain View, CA
- remote chat agent Mountain View, CA
- executive protection agent Mountain View, CA
- commissioning agent Mountain View, CA
- airport agent Mountain View, CA
- operations agent Mountain View, CA
- agent Mountain View, CA
- state farm agent Mountain View, CA

