Staff Engineer, Evals Platform & Model Benchmarking
$200kMagic
Magic, located in San Francisco, is seeking a Member of Technical Staff to build the internal evaluations platform that supports critical company decisions. You will design, implement, and validate evaluation tasks for large-scale systems, ensuring correctness and reproducibility. The role is pivotal for research decisions and product quality, with a compensation range between $200K - $550K, including equity and benefits like unlimited paid time off and health insurance. #J-18808-Ljbffr Magic
- Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation... ..., VLM, or Stable Diffusion model evals Image/Video benchmarking techniques... ...of fashion, SPREEAI offers a platform to make your mark. #J-18808-Ljbffr...Suggested
- ...Sciforium's Next-Generation Model Serving Platform Architect Sciforium is an AI infrastructure... ...from AMD with hands-on support from AMD engineers the team is scaling rapidly to build... .... Drive performance profiling, benchmarking, and observability across the inference...SuggestedWork at officeFlexible hours
$231k - $340k
Harvey is seeking a Senior AI Engineer in San Francisco, CA, to design and enhance their AI platform, focusing on model integration, evaluation, and shared infrastructure. Candidates should have 8+ years of backend systems experience, including AI/ML engineering, and a...Suggested$217k - $303.9k
...information, visit The Android Platform team sets the technical direction for... ...delightful Reddit experiences. As a Staff Android Engineer , you will be a technical leader for... ..., level, and country location, benchmarked against similar stage growth companies...SuggestedFor contractorsWork experience placementFlexible hours$192k - $260k
A leading data and AI company is seeking a Staff Engineer to design and implement core systems for Foundation Model Serving. The ideal candidate will have over 10 years of experience in building large-scale distributed systems and will collaborate closely across teams...Suggested- A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have...Remote work
- A leading AI solutions company in San Francisco is seeking an ML Eval Engineer to design evaluation benchmarks and improve model performance. This role involves working with unstructured enterprise data and collaborating closely with the ML and engineering teams. You will...
- Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning...Full time
- ...scale clients. Now, we’re assembling a founding core engineering team to build and train models that understand these systems, optimize operations, anticipate... ...from the ground up. Think in systems, not just benchmarks. Are excited to model the physical world and...
- ...ComfyUI. You'll be the person who takes the newest open-source models (image, video, 3D, audio, multimodal...) and brings them into ComfyUI... ...-the-art open-source models to run natively in the ComfyUI core engine Design and build the native nodes that expose new model...
- Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while enhancing core performance metrics across model execution. You'll work with advanced performance techniques such as GPU/CUDA optimizations...Remote job
- A leading AI research company in San Francisco is seeking a Staff Research Engineer to enhance the efficiency of large language models. In this role, you will develop and implement advanced techniques to optimize model performance in production. Ideal candidates will hold...Remote work
- Xcede is looking for a Member of Technical Staff focused on AI Safety to lead red-teaming efforts and ensure the robustness of next-... ...Applicants should have deep expertise in LLM safety, strong software engineering skills, and relevant academic qualifications in AI or related...
- ...humanity. We’re training and deploying frontier models for developers and enterprises who are... .... Cohere is a team of researchers, engineers, designers, and more, who are passionate... ...these are our preferred locations. As a Staff Research Engineer, you will develop, prototype...Full timeWork at officeRemote workFlexible hours
$175k - $240k
...ubiquitous. We build the foundation for agent engineering in the real world, helping developers... ...tools and have grown to also offer a platform for building, evaluating, deploying,... ...raised at Series B from IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we're...Work at officeFlexible hours- A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity...
- A leading data and AI company is seeking a Staff Engineer to design and implement core systems for their Foundation Model Serving. The position focuses on large-scale distributed systems, optimizing GPU workloads, and collaborating across teams. Applicants should have...
- A leading data and AI company in San Francisco is seeking a Staff Engineer to design and implement systems for their AI/ML Model Serving platform. You will collaborate with product, infrastructure, and research teams to ensure high-performance system delivery. The ideal...
$98k - $140k
...work with product and engineering teams to build systems... ...ship prompt fixes, run evals and, in effect, shape... ...you'll shape Notion’s model strategy and work directly... ..., Google, and others. Benchmark across dimensions:... ...observability and eval platforms (e.g., Braintrust)....Live inWork at officeLocal area$160k - $250k
...infrastructure for mechanical engineering workflows is hiring a Staff Engineer — Agentic AI to... ...implementations, and benchmark against real workflows. Drive... ...stories into testable evals and close the loop between... ...management, error recovery, model routing, and context management...For contractorsWork at office$305k
Anthropic is looking for a Product Manager for Claude Code's model performance team in San Francisco. As a Product Manager, you will... ...end model launches, implement evaluations, and collaborate with engineers and researchers. The ideal candidate has an engineering...$305k
...committed researchers, engineers, policy experts, and business... ...on Claude Code's model performance team, you will... ...end-to-end, build evals that measure what matters... ...developers, and competitive benchmarks into clear priorities... ..., we expect all staff to be in one of our offices...Work at officeVisa sponsorshipFlexible hours$216k - $270k
...As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving of LLMs. Our platform powers cutting... ...and engineers to integrate and optimize models for production and research use cases. Conduct...Full time$253k - $308k
Harper Group, based in San Francisco, is seeking a Staff Engineer to lead efforts in engineering productivity and AI quality. This role involves establishing CI/CD quality gates, integration test harnesses, and developing automated PR preflights that enhance coding efficiency...$176k - $253k
...in San Francisco, is looking for a Senior Member of Technical Staff to enhance developer experience through optimizing CI/CD processes... ...performance and involves building an efficient development platform that integrates closely with internal teams. The ideal applicant...- ...to help build their open superintelligence infrastructure in San Francisco. You will lead efforts in developing a hosted training platform that enables users to launch LoRA and fine-tuning runs on managed GPU clusters. Ideal candidates will have strong Kubernetes operations...Flexible hours
$224k - $315k
Rippling is seeking a Staff Software Engineer to join their Talent Products team in San Francisco. This role involves architecting product infrastructure... ...products. You will work closely with both product and platform teams, mentoring junior engineers while ensuring quality...- A tech company specialized in identity management is looking for staff-level engineers in San Francisco, California. Candidates should have a strong background in scalable product development and proficiency in technologies like Next.js, JavaScript, TypeScript, and Go....
- ...Francisco is seeking a Member of Technical Staff to build core systems and own product... ...and moving the mission from prototype to platform in a talent-dense team. The ideal... ...development, API design, and possesses a strong engineering culture. You will have the opportunity...
- ...Technical Individual Contributor to define and execute the long-term vision for the Trust Platform in San Francisco. With over 12 years of experience in backend and platform engineering, you will drive strategic architectural decisions and lead initiatives to enhance...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Staff Engineer, Evals Platform & Model Benchmarking. Be the first to apply!
- staff automation engineer San Francisco, CA
- staff data engineer San Francisco, CA
- research assistant engineering San Francisco, CA
- assistant engineer San Francisco, CA
- staff engineer San Francisco, CA
- assistant mechanical engineer San Francisco, CA
- software engineer staff San Francisco, CA
- assistant engineering manager San Francisco, CA
- senior staff systems engineer San Francisco, CA
- assistant civil engineer San Francisco, CA

