Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Engineer, Evals Platform & Model Benchmarking

$200k

Magic

Magic, located in San Francisco, is seeking a Member of Technical Staff to build the internal evaluations platform that supports critical company decisions. You will design, implement, and validate evaluation tasks for large-scale systems, ensuring correctness and reproducibility. The role is pivotal for research decisions and product quality, with a compensation range between $200K - $550K, including equity and benefits like unlimited paid time off and health insurance. #J-18808-Ljbffr Magic

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Staff Engineer, Evals Platform & Model Benchmarking in San Francisco, CA vacancy
  • Software Engineer (Model Evaluation & Benchmarking) About the Role We are hiring Engineers focused on AI Model Evaluation...  ..., VLM, or Stable Diffusion model evals Image/Video benchmarking techniques...  ...of fashion, SPREEAI offers a platform to make your mark. #J-18808-Ljbffr... 
    Suggested

    SpreeAI

    San Francisco, CA
    1 day ago
  •  ...Sciforium's Next-Generation Model Serving Platform Architect Sciforium is an AI infrastructure...  ...from AMD with hands-on support from AMD engineers the team is scaling rapidly to build...  .... Drive performance profiling, benchmarking, and observability across the inference... 
    Suggested
    Work at office
    Flexible hours

    Sciforium

    San Francisco, CA
    3 days ago
  • $231k - $340k

    Harvey is seeking a Senior AI Engineer in San Francisco, CA, to design and enhance their AI platform, focusing on model integration, evaluation, and shared infrastructure. Candidates should have 8+ years of backend systems experience, including AI/ML engineering, and a... 
    Suggested

    Harvey

    San Francisco, CA
    1 day ago
  • $217k - $303.9k

     ...information, visit The Android Platform team sets the technical direction for...  ...delightful Reddit experiences. As a Staff Android Engineer , you will be a technical leader for...  ..., level, and country location, benchmarked against similar stage growth companies... 
    Suggested
    For contractors
    Work experience placement
    Flexible hours

    Reddit

    San Francisco, CA
    7 days ago
  • $192k - $260k

    A leading data and AI company is seeking a Staff Engineer to design and implement core systems for Foundation Model Serving. The ideal candidate will have over 10 years of experience in building large-scale distributed systems and will collaborate closely across teams... 
    Suggested

    Databricks Inc.

    San Francisco, CA
    1 day ago
  • A leading AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have... 
    Remote work

    Cohere

    San Francisco, CA
    1 day ago
  • A leading AI solutions company in San Francisco is seeking an ML Eval Engineer to design evaluation benchmarks and improve model performance. This role involves working with unstructured enterprise data and collaborating closely with the ML and engineering teams. You will... 

    Reducto

    San Francisco, CA
    4 days ago
  • Refresh AI is seeking a Research Engineer in San Francisco to push the boundaries of benchmarking technology. You will build benchmarks that labs use for evaluating coding abilities and computer-use capability. Your role will require expertise in reinforcement learning... 
    Full time

    Refresh AI

    San Francisco, CA
    4 days ago
  •  ...scale clients. Now, we’re assembling a founding core engineering team to build and train models that understand these systems, optimize operations, anticipate...  ...from the ground up. Think in systems, not just benchmarks. Are excited to model the physical world and... 

    Meter

    San Francisco, CA
    1 day ago
  •  ...ComfyUI. You'll be the person who takes the newest open-source models (image, video, 3D, audio, multimodal...) and brings them into ComfyUI...  ...-the-art open-source models to run natively in the ComfyUI core engine Design and build the native nodes that expose new model... 

    ComfyUI

    San Francisco, CA
    3 days ago
  • Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses on building reliable ML systems while enhancing core performance metrics across model execution. You'll work with advanced performance techniques such as GPU/CUDA optimizations... 
    Remote job

    Jaide Health

    San Francisco, CA
    4 days ago
  • A leading AI research company in San Francisco is seeking a Staff Research Engineer to enhance the efficiency of large language models. In this role, you will develop and implement advanced techniques to optimize model performance in production. Ideal candidates will hold... 
    Remote work

    Cohere

    San Francisco, CA
    1 day ago
  • Xcede is looking for a Member of Technical Staff focused on AI Safety to lead red-teaming efforts and ensure the robustness of next-...  ...Applicants should have deep expertise in LLM safety, strong software engineering skills, and relevant academic qualifications in AI or related... 

    Xcede

    San Francisco, CA
    23 hours ago
  •  ...humanity. We’re training and deploying frontier models for developers and enterprises who are...  .... Cohere is a team of researchers, engineers, designers, and more, who are passionate...  ...these are our preferred locations. As a Staff Research Engineer, you will develop, prototype... 
    Full time
    Work at office
    Remote work
    Flexible hours

    Cohere

    San Francisco, CA
    4 days ago
  • $175k - $240k

     ...ubiquitous. We build the foundation for agent engineering in the real world, helping developers...  ...tools and have grown to also offer a platform for building, evaluating, deploying,...  ...raised at Series B from IVP, Sequoia, Benchmark, CapitalG, and Sapphire Ventures, we're... 
    Work at office
    Flexible hours

    LangChain, Inc

    San Francisco, CA
    2 days ago
  • A fast-growing AI company seeks a Software Engineer to focus on Model Evaluation & Benchmarking. This role involves building evaluation systems for multimodal AI, ensuring reliable performance. The ideal candidate will possess strong Python programming skills, familiarity... 

    SpreeAI

    San Francisco, CA
    1 day ago
  • A leading data and AI company is seeking a Staff Engineer to design and implement core systems for their Foundation Model Serving. The position focuses on large-scale distributed systems, optimizing GPU workloads, and collaborating across teams. Applicants should have... 

    Menlo Ventures

    San Francisco, CA
    4 days ago
  • A leading data and AI company in San Francisco is seeking a Staff Engineer to design and implement systems for their AI/ML Model Serving platform. You will collaborate with product, infrastructure, and research teams to ensure high-performance system delivery. The ideal... 

    Menlo Ventures

    San Francisco, CA
    4 days ago
  • $98k - $140k

     ...work with product and engineering teams to build systems...  ...ship prompt fixes, run evals and, in effect, shape...  ...you'll shape Notion’s model strategy and work directly...  ..., Google, and others. Benchmark across dimensions:...  ...observability and eval platforms (e.g., Braintrust).... 
    Live in
    Work at office
    Local area

    Notion

    San Francisco, CA
    1 day ago
  • $160k - $250k

     ...infrastructure for mechanical engineering workflows is hiring a Staff Engineer — Agentic AI to...  ...implementations, and benchmark against real workflows. Drive...  ...stories into testable evals and close the loop between...  ...management, error recovery, model routing, and context management... 
    For contractors
    Work at office

    CLERA

    San Francisco, CA
    4 days ago
  • $305k

    Anthropic is looking for a Product Manager for Claude Code's model performance team in San Francisco. As a Product Manager, you will...  ...end model launches, implement evaluations, and collaborate with engineers and researchers. The ideal candidate has an engineering... 

    Anthropic

    San Francisco, CA
    23 hours ago
  • $305k

     ...committed researchers, engineers, policy experts, and business...  ...on Claude Code's model performance team, you will...  ...end-to-end, build evals that measure what matters...  ...developers, and competitive benchmarks into clear priorities...  ..., we expect all staff to be in one of our offices... 
    Work at office
    Visa sponsorship
    Flexible hours

    Colorwave Inc

    San Francisco, CA
    23 hours ago
  • $216k - $270k

     ...As a Software Engineer on the ML Infrastructure team, you will design and build platforms for scalable, reliable, and efficient serving of LLMs. Our platform powers cutting...  ...and engineers to integrate and optimize models for production and research use cases. Conduct... 
    Full time

    Scale AI

    San Francisco, CA
    18 days ago
  • $253k - $308k

    Harper Group, based in San Francisco, is seeking a Staff Engineer to lead efforts in engineering productivity and AI quality. This role involves establishing CI/CD quality gates, integration test harnesses, and developing automated PR preflights that enhance coding efficiency... 

    Harper Group

    San Francisco, CA
    4 days ago
  • $176k - $253k

     ...in San Francisco, is looking for a Senior Member of Technical Staff to enhance developer experience through optimizing CI/CD processes...  ...performance and involves building an efficient development platform that integrates closely with internal teams. The ideal applicant... 

    Harper Group

    San Francisco, CA
    3 days ago
  •  ...to help build their open superintelligence infrastructure in San Francisco. You will lead efforts in developing a hosted training platform that enables users to launch LoRA and fine-tuning runs on managed GPU clusters. Ideal candidates will have strong Kubernetes operations... 
    Flexible hours

    Prime-Intellect

    San Francisco, CA
    2 days ago
  • $224k - $315k

    Rippling is seeking a Staff Software Engineer to join their Talent Products team in San Francisco. This role involves architecting product infrastructure...  ...products. You will work closely with both product and platform teams, mentoring junior engineers while ensuring quality... 

    Rippling

    San Francisco, CA
    4 days ago
  • A tech company specialized in identity management is looking for staff-level engineers in San Francisco, California. Candidates should have a strong background in scalable product development and proficiency in technologies like Next.js, JavaScript, TypeScript, and Go.... 

    Clerk, Inc.

    San Francisco, CA
    2 days ago
  •  ...Francisco is seeking a Member of Technical Staff to build core systems and own product...  ...and moving the mission from prototype to platform in a talent-dense team. The ideal...  ...development, API design, and possesses a strong engineering culture. You will have the opportunity... 

    Getcatalog

    San Francisco, CA
    2 days ago
  •  ...Technical Individual Contributor to define and execute the long-term vision for the Trust Platform in San Francisco. With over 12 years of experience in backend and platform engineering, you will drive strategic architectural decisions and lead initiatives to enhance... 

    airbnb, Inc.

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Engineer, Evals Platform & Model Benchmarking. Be the first to apply!