Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Inference Performance Engineer

FATHOM

Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure. The ideal candidate will have extensive experience with LLM frameworks, quantization techniques, and strong Python skills. This role focuses on improving real systems that impact millions of meetings, ensuring efficient GPU utilization, and debugging production issues. Fathom offers competitive compensation and a supportive environment for personal and professional growth. #J-18808-Ljbffr Fathom

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the AI Inference Performance Engineer in San Francisco, CA vacancy
  •  ...FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative... 
    Performance

    FriendliAI

    San Francisco, CA
    5 days ago
  • $167.2k - $209k

     ...world. DigitalOcean is expanding its AI Infrastructure layer to support the...  ...applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role, you...  ...their models with industry-leading performance and reliability. This is a hands‑on... 
    Performance
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    4 days ago
  •  ...company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing...  ...models, focusing on high-performance delivery of audio and image inputs....  ...product teams to push the boundaries of AI technology, ensuring reliable production... 
    Performance

    OpenAI

    San Francisco, CA
    4 days ago
  • Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming... 
    Performance
    Flexible hours

    Liquid AI

    San Francisco, CA
    2 days ago
  • Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open... 
    Performance

    Anyscale

    San Francisco, CA
    5 days ago
  •  ...MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and reliability. The ideal candidate will have 3+ years of experience in production... 
    Performance

    MakerMaker.AI

    San Francisco, CA
    4 days ago
  • A tech startup focused on AI workloads is seeking a Member of...  ...Staff to design and optimize inference systems. The role involves managing...  ...and improving execution performance across various components....  ...should have strong software engineering skills and experience with ML... 
    Performance

    Gimlet Labs

    San Francisco, CA
    3 days ago
  •  ...on building and optimizing ML inference systems in San Francisco. The...  ...pipelines and enhancing performance under real-world workloads. Candidates...  ...should have strong software engineering skills, experience with ML...  ...to work at the forefront of AI infrastructure technology. #J... 
    Performance

    Acceler8 Talent

    San Francisco, CA
    4 days ago
  •  ...A leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for designing high-performance GPU kernels and using advanced techniques to boost computation efficiency. Ideal... 
    Performance

    Baseten

    San Francisco, CA
    5 days ago
  •  ...Francisco is seeking a skilled individual to enhance the API infrastructure supporting AI models. The role involves designing and optimizing backend services, focusing on performance and reliability. Candidates should have over 3 years of experience with distributed systems... 
    Performance

    Baseten

    San Francisco, CA
    4 days ago
  • $325k

     ...A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments. The ideal candidate...  ...collaboration with researchers and focus on performance optimization. Compensation ranges from $325K to $4... 
    Performance

    OpenAI

    San Francisco, CA
    4 days ago
  •  ...Technical Staff focused on ML systems and inference in San Francisco. You will design and...  ...fast, predictable, and scalable performance. Key responsibilities include optimizing...  ...should have strong foundations in software engineering, experience with ML inference systems,... 
    Performance

    Gimlet Labs, Inc.

    San Francisco, CA
    5 days ago
  • $300k

     ...technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal candidate will possess strong...  ...of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive... 
    Performance
    Visa sponsorship
    Relocation package

    Trades Workforce Solutions

    San Francisco, CA
    5 days ago
  •  ...Job Description Machine Learning Engineer, Inference Want to solve realtime inference problems...  ...This role is with a fast-growing voice AI company building the realtime speech...  .... The team already operates beyond the performance of most publicly available realtime... 
    Performance
    Remote work
    Flexible hours

    techire ai

    San Francisco, CA
    5 days ago
  •  ...ABOUT THE ROLE You build and operate the inference systems that serve our models in...  ...with running real workloads. This is an engineering role, not a research role. You'll measure...  ...large models at high throughput Own the performance characteristics of those systems end‑to... 
    Performance

    MakerMaker.AI

    San Francisco, CA
    5 days ago
  •  ...A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT.... 
    Performance

    Baseten

    San Francisco, CA
    5 days ago
  • $160k - $230k

     ...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together.ai, we are building state-of-the-art infrastructure to enable efficient...  ..., pushing the boundaries of performance, scalability, and cost-efficiency.... 
    Performance
    Full time

    Together AI

    San Francisco, CA
    22 days ago
  •  ...Our mission is to architect AI that learns from and interacts...  ...model innovation and systems engineering paired with a design-minded product...  ...Role We're hiring an Inference Engineer to advance our...  ...systems with high demands on performance, reliability, and observability... 
    Performance
    Work at office
    Visa sponsorship
    Flexible hours

    Cartesia, Inc.

    San Francisco, CA
    5 days ago
  • Gravity Engineering Services Pvt Ltd. is looking for an Inference Frameworks and Optimization Engineer to enhance the performance of AI infrastructure. This role involves designing distributed inference engines that support multimodal models, optimizing frameworks for low... 
    Performance

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    2 days ago
  • $225k

    Magic is hiring a Kernel Engineer in San Francisco to design and maintain high-performance kernels that optimize throughput and latency during AI training and inference. The ideal candidate has low-level programming expertise, particularly for AI accelerators like NVIDIA... 
    Performance

    Magic

    San Francisco, CA
    5 days ago
  •  ...OpenAI is seeking a GPU Inference Engineer based in San Francisco, CA. In this high-impact role, you'll optimize inference performance and scalability for Robotics research, driving engineering efforts to enhance model serving and system efficiency. The ideal candidate... 
    Performance
    Work at office
    Relocation
    Relocation package

    OpenAI

    San Francisco, CA
    21 hours ago
  • $220k - $320k

    inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques... 
    Performance

    inference.net

    San Francisco, CA
    5 days ago
  •  ...MakerMaker, based in San Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work to close the significant performance gap that exists in modern... 
    Performance

    MakerMaker

    San Francisco, CA
    4 days ago
  • $220k - $320k

    A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation... 
    Performance
    Local area

    Inference

    San Francisco, CA
    4 days ago
  •  ...tech stacks to accelerate the progress of AI applications out into the real world....  ...expert. About the Role As a Distributed LLM Inference Engineer, you will help systems and optimizations that push the boundaries of performance for inference at large scale. This is an... 
    Performance

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    1 day ago
  • $220k - $320k

     ...ML Model Serving Engineer Want to build the layer that actually makes AI usable in real time? You’ll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models... 
    Performance
    3 days per week

    Trades Workforce Solutions

    San Francisco, CA
    5 days ago
  •  ...focused team of YC and unicorn founders and senior engineers with deep expertise in 3D, generative video, developer...  ...the Role We're looking for a Founding Engineer, ML Inference with deep expertise in high-performance ML engineering. This is a highly technical, high-impact... 
    Performance
    Relocation
    Visa sponsorship
    Relocation package

    Reactor

    San Francisco, CA
    1 day ago
  •  ...strive to seamlessly blend high‑level AI capabilities with the constraints of physical...  ...About the Role We’re looking for a GPU Inference Engineer to contribute to improvements in model...  ...initiatives to optimize inference performance and scalability. You’ll also be engaged... 
    Performance
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    23 hours ago
  • Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients... 
    Performance

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    1 day ago
  • $225k

     ...domain‑specific RL, ultra‑long context, and inference‑time compute to achieve this goal. About The Role As a Software Engineer on the Inference & RL Systems team, you...  ...What you’ll work on Design and scale high‑performance inference serving systems Optimize KV‑cache... 
    Performance
    Relocation
    Visa sponsorship

    Magic

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Inference Performance Engineer. Be the first to apply!