Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Inference Performance Engineer

FATHOM

Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure. The ideal candidate will have extensive experience with LLM frameworks, quantization techniques, and strong Python skills. This role focuses on improving real systems that impact millions of meetings, ensuring efficient GPU utilization, and debugging production issues. Fathom offers competitive compensation and a supportive environment for personal and professional growth. #J-18808-Ljbffr Fathom

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the AI Inference Performance Engineer in San Francisco, CA vacancy
  • FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative... 
    Performance

    FriendliAI

    San Francisco, CA
    4 days ago
  •  ...company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing...  ...models, focusing on high-performance delivery of audio and image inputs....  ...product teams to push the boundaries of AI technology, ensuring reliable production... 
    Performance

    OpenAI

    San Francisco, CA
    3 days ago
  • Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open... 
    Performance

    Anyscale

    San Francisco, CA
    4 days ago
  • Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming... 
    Performance
    Flexible hours

    Liquid AI

    San Francisco, CA
    1 day ago
  • $160k - $320k

    A leading AI computing firm is seeking a Systems Engineer in San Francisco or Los Angeles to scale AI inference. Candidates should have strong C++ skills, HPC experience, and knowledge...  ...designing GPU kernels, optimizing performance, and collaborating with technical leads... 
    Performance

    Vast.ai

    San Francisco, CA
    4 days ago
  • $200k - $280k

    A leading AI company in San Francisco is looking for a Staff Machine Learning Engineer to enhance inference systems at production scale. You will design algorithms, optimize performance, and collaborate on RL and post-training pipelines. Ideal candidates have 3+ years of... 
    Performance
    Full time

    AI Chopping Block, Inc.

    San Francisco, CA
    1 day ago
  • A tech startup focused on AI workloads is seeking a Member of...  ...Staff to design and optimize inference systems. The role involves managing...  ...and improving execution performance across various components....  ...should have strong software engineering skills and experience with ML... 
    Performance

    Gimlet Labs

    San Francisco, CA
    2 days ago
  • A leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for designing high-performance GPU kernels and using advanced techniques to boost computation efficiency. Ideal... 
    Performance

    Baseten

    San Francisco, CA
    4 days ago
  • $325k

    A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate...  ...collaboration with researchers and focus on performance optimization. Compensation ranges from $325K to $4... 
    Performance

    OpenAI

    San Francisco, CA
    4 days ago
  •  ...Francisco is seeking a skilled individual to enhance the API infrastructure supporting AI models. The role involves designing and optimizing backend services, focusing on performance and reliability. Candidates should have over 3 years of experience with distributed systems... 
    Performance

    Baseten

    San Francisco, CA
    4 days ago
  •  ...Technical Staff focused on ML systems and inference in San Francisco. You will design and...  ...fast, predictable, and scalable performance. Key responsibilities include optimizing...  ...should have strong foundations in software engineering, experience with ML inference systems,... 
    Performance

    Gimlet Labs, Inc.

    San Francisco, CA
    4 days ago
  •  ...Inference Engineer We’re partnered with an AI infrastructure company building next-generation systems for large-scale AI workloads. Their platform...  ...heterogeneous hardware to unlock major gains in performance, efficiency, and cost. The team is solving some of the... 
    Performance

    Acceler8 Talent

    San Francisco, CA
    3 days ago
  •  ...Job Description Machine Learning Engineer, Inference Want to solve realtime inference problems...  ...This role is with a fast-growing voice AI company building the realtime speech...  .... The team already operates beyond the performance of most publicly available realtime... 
    Performance
    Remote work
    Flexible hours

    techire ai

    San Francisco, CA
    4 days ago
  •  ...Our mission is to architect AI that learns from and interacts...  ...model innovation and systems engineering paired with a design-minded product...  ...Role We're hiring an Inference Engineer to advance our...  ...systems with high demands on performance, reliability, and observability... 
    Performance
    Work at office
    Visa sponsorship
    Flexible hours

    Cartesia, Inc.

    San Francisco, CA
    4 days ago
  • $160k - $230k

     ...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together.ai, we are building state-of-the-art infrastructure to enable efficient...  ..., pushing the boundaries of performance, scalability, and cost-efficiency.... 
    Performance
    Full time

    Together AI

    San Francisco, CA
    1 day ago
  •  ...chats with other countries. AI then powers the worlds' response...  ...-shelf options with the same inference budget. Some fun...  ...Improving closed source models' performance by training tuned endpoints....  ...stage startup or as a founding engineer. This job listing is for... 
    Performance
    Full time
    Visa sponsorship

    Pax Historia

    San Francisco, CA
    4 days ago
  • $220k - $320k

    inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques... 
    Performance

    inference.net

    San Francisco, CA
    4 days ago
  • $180k - $270k

    Plaud is seeking talented individuals for AI infrastructure roles in San Francisco, focusing on building high-performance inference engines for speech AI. Ideal candidates will have substantial experience in GPU architecture and real-time systems. This position offers a... 
    Performance

    Plaud

    San Francisco, CA
    2 days ago
  • $225k

    Magic is hiring a Kernel Engineer in San Francisco to design and maintain high-performance kernels that optimize throughput and latency during AI training and inference. The ideal candidate has low-level programming expertise, particularly for AI accelerators like NVIDIA... 
    Performance

    Magic

    San Francisco, CA
    4 days ago
  • AI Chopping Block, Inc. is looking for a specialized role to model inference performance across application, model, and fleet layers. Responsibilities include building performance models and analyzing inference workloads to identify and optimize bottlenecks. Ideal candidates... 
    Performance

    AI Chopping Block, Inc.

    San Francisco, CA
    4 days ago
  • $220k - $320k

    A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation... 
    Performance
    Local area

    Inference

    San Francisco, CA
    3 days ago
  • Acceler8 Talent is partnering with an AI infrastructure startup building the platform...  ...M Series A, and is scaling a world-class engineering team across inference, distributed systems, compiler infrastructure, and high-performance AI compute. Their platform automatically... 
    Performance

    Acceler8 Talent

    San Francisco, CA
    2 days ago
  •  ...focused team of YC and unicorn founders and senior engineers with deep expertise in 3D, generative video, developer...  ...the Role We're looking for a Founding Engineer, ML Inference with deep expertise in high-performance ML engineering. This is a highly technical, high-impact... 
    Performance
    Relocation
    Visa sponsorship
    Relocation package

    Reactor

    San Francisco, CA
    5 days ago
  •  ...tech stacks to accelerate the progress of AI applications out into the real world....  ...date. About the role As a Distributed LLM Inference Engineer, you will help with systems and optimizations that push the boundaries of performance for inference at large scale. This is an... 
    Performance
    Work at office

    Anyscale

    San Francisco, CA
    4 days ago
  • A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT.... 
    Performance

    Baseten

    San Francisco, CA
    4 days ago
  • Acceler8 Talent is looking for a Software Engineer in San Francisco to focus on building and optimizing inference systems for next-generation AI at scale. You will design production inference pipelines and improve system performance under real production constraints. The... 
    Performance

    Acceler8 Talent

    San Francisco, CA
    2 days ago
  •  ...role unique is that our platform delivers AI inference. Validating whether inference works...  .... We are looking for a dedicated QA engineer who can own the product's quality, ensuring...  ...using Locust to validate platform performance under real-world conditions. Own frontend... 
    Performance
    Worldwide
    Flexible hours

    FriendliAI Corp

    San Francisco, CA
    2 days ago
  •  ...massive investment in commercial AI, organizations often find...  ...reliable, controllable, and performant in practice. Our mission is...  ...CTGT's Senior Machine Learning Engineer will operate deep within the...  ...policy enforcement at inference time. Who You Are Strong... 
    Performance

    CTGT

    San Francisco, CA
    2 days ago
  • Cartesia is looking for an Inference Engineer in San Francisco to enhance real-time multimodal intelligence. You will design and build scalable, low-latency model inference systems while collaborating with researchers. The ideal candidate has strong engineering skills... 
    Flexible hours

    Cartesia

    San Francisco, CA
    2 days ago
  •  ...Role We are looking for a well-rounded AI/ML Systems Engineer to join our team and build LM Studio...  ...you will build and evolve on-device inference engines and integrations for local LLMs...  ...support for new models, and optimize performance across a wide range of hardware... 
    Performance
    Local area

    GrabJobs

    San Francisco, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Inference Performance Engineer. Be the first to apply!