Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

LLM Inference & Model-Performance Engineer

Baseten

A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT. The ideal candidate will have a relevant degree and strong programming skills in Python or C++. Competitive compensation, generous PTO, and full medical benefits are offered. Join us in shaping the future of AI! #J-18808-Ljbffr Baseten

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the LLM Inference & Model-Performance Engineer in San Francisco, CA vacancy
  • $160k - $230k

     ...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together...  ...inference for large language models (LLMs). Our mission is to...  ...infrastructure, pushing the boundaries of performance, scalability, and cost-efficiency... 
    Performance
    Full time

    Together AI

    San Francisco, CA
    24 days ago
  • Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open... 
    Performance

    Anyscale

    San Francisco, CA
    2 days ago
  •  ...without needing to be a distributed systems expert. About the Role As a Distributed LLM Inference Engineer, you will help systems and optimizations that push the boundaries of performance for inference at large scale. This is an incredibly critical role to Anyscale as it... 
    Performance

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    3 days ago
  • Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients... 
    Performance

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    3 days ago
  •  ...Francisco is seeking a skilled individual to enhance the API infrastructure supporting AI models. The role involves designing and optimizing backend services, focusing on performance and reliability. Candidates should have over 3 years of experience with distributed... 
    Performance

    Baseten

    San Francisco, CA
    2 days ago
  • $180k - $270k

     ...for productivity. The role involves collaborating with machine learning researchers and engineering teams to define metrics, improve model capabilities, and ensure effective performance tracking. Candidates should bring strong software engineering skills, particularly in... 
    Performance

    Plaud

    San Francisco, CA
    4 days ago
  • Gravity Engineering Services Pvt Ltd. is looking for an Inference Frameworks and Optimization Engineer to enhance the performance of AI infrastructure. This role involves designing distributed...  ...inference engines that support multimodal models, optimizing frameworks for low-... 
    Performance

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    4 days ago
  •  ...AI research firm in San Francisco is seeking a Member of Technical Staff specialized in Model Efficiency. In this role, you will enhance LLM inference systems by tackling performance issues and collaborating with cross-functional teams. Ideal candidates have over 5... 
    Performance
    Remote work

    Cohere

    San Francisco, CA
    2 days ago
  •  ...inventive research, design, and engineering. Our organization is very...  ...the Role You will lead the Model Routing & Inference team at Cursor, owning the...  ..., optimizing for cost and performance. Designing routing...  ...frameworks (vLLM, TensorRT‑LLM, TGI), load balancing, or building... 
    Performance

    Anysphere

    San Francisco, CA
    5 days ago
  •  ...practice of medicine—and the inference systems that power them...  ...class. We’re looking for an Engineering Manager to lead and grow our Model Inference team. The...  ...to pushing the frontier of LLM serving techniques. You’ll lead a high-performing team of AI inference engineers... 
    Performance
    Hourly pay
    Full time
    Flexible hours

    AI Chopping Block, Inc.

    San Francisco, CA
    2 days ago
  • $74.38 - $83.8 per hour

     ...Specialty Software Engineer - API Developer - GenAI...  ...Rather than building models, you'll be responsible...  ...Exposure to model serving or inference gateways...  ...Kubernetes GenAI, LLM, or agent-based frameworks...  ...and optimize service performance Contribute to architecture... 
    Performance
    Full time
    Contract work
    Temporary work
    Flexible hours

    Motion Recruitment

    San Francisco, CA
    3 days ago
  • Jaide Health is seeking an engineer for their Model Efficiency team in San Francisco. The role focuses...  ...ML systems while enhancing core performance metrics across model execution. You'...  ...C++ or Python and insights into the LLM inference ecosystem. A commitment to diversity... 
    Performance
    Remote job

    Jaide Health

    San Francisco, CA
    5 days ago
  •  ...Model Implementation Engineer Sciforium is an AI infrastructure company developing next-generation...  ...models across modalities, ensuring high performance and reliability from day one. You...  ...with large-scale model training or inference systems. Contributions to open-... 
    Performance
    Flexible hours

    Sciforium

    San Francisco, CA
    9 days ago
  • $220k - $320k

    A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2...  ...in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation of $220,000... 
    Performance
    Local area

    Inference

    San Francisco, CA
    1 day ago
  • $167.2k - $209k

     ...applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role,...  ...can deploy and scale their models with industry-leading performance and reliability. This is a...  ...inference serving frameworks such as llm‑d, NVIDIA Dynamo, or Ray Serve.... 
    Performance
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    6 days ago
  •  ...looking for an experienced leader for the Model Routing & Inference team in San Francisco. This role...  ...routing, cluster management, and performance. The ideal candidate has a strong background...  ...-throughput systems and software engineering fundamentals, combined with... 
    Performance

    Anysphere

    San Francisco, CA
    5 days ago
  • $350k

     ...committed researchers, engineers, policy experts, and business...  ...Role Anthropic's inference fleet serves Claude to...  ...: accelerator kernels, model servers, distributed...  ...measure how the fleet performs against its theoretical...  ...infrastructure or general LLM serving stacks. Direct... 
    Performance
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    5 days ago
  •  ...developing open weight models for individuals,...  ...As a Forward Deployed Engineer Lead, Post-Training, you...  ...that measure real-world performance. Own the data pipeline...  ...infrastructure teams to ensure inference performance, cost...  ...2+ years focused on LLM post-training in a... 
    Performance
    Relocation package

    Reflection

    San Francisco, CA
    2 days ago
  • $295k

     ...About the Team Our Inference team brings OpenAI's most capable...  ...access our start-of-the-art AI models, allowing them to do things...  ...able to before. We focus on performant and efficient model inference...  ...Role We are looking for an engineer who wants to take the world's... 
    Performance

    OpenAI

    San Francisco, CA
    1 day ago
  • $180k - $270k

     ...to elevate productivity and performance through note‑taking solutions...  ...on. Possess strong software engineering skills (especially in Python)...  ...can run at scale against live model checkpoints. Can deeply partner...  ..." looks like for a Speech LLM, translating capabilities (like... 
    Performance
    Full time
    Work at office
    Worldwide

    Plaud

    San Francisco, CA
    4 days ago
  •  ...inherent to generative models. CTGT is the...  ...reliable, controllable, and performant in practice. Our mission...  ...the gap between LLM capabilities and domain...  ...Senior Machine Learning Engineer will operate deep within...  ...policy enforcement at inference time. Who You Are... 
    Performance

    CTGT

    San Francisco, CA
    5 days ago
  • Slope is seeking a Founding Compiler Engineer in San Francisco, responsible for designing core compiler infrastructure and optimizing AI models. You will write CUDA kernels and conduct performance reviews, contributing to Luminal's mission of making AI workloads portable... 
    Performance
    Full time

    Slope

    San Francisco, CA
    1 day ago
  •  ...Job Description Machine Learning Engineer, Inference Want to solve realtime inference problems...  ..., and making state-of-the-art speech models actually behave correctly in realtime...  ...The team already operates beyond the performance of most publicly available realtime... 
    Performance
    Remote work
    Flexible hours

    techire ai

    San Francisco, CA
    2 days ago
  •  ...do. We're pioneering the model architectures that will make...  ...model innovation and systems engineering paired with a design-minded...  ...the Role We're hiring an Inference Engineer to advance our...  ...systems with high demands on performance, reliability, and observability... 
    Performance
    Work at office
    Visa sponsorship
    Flexible hours

    Cartesia, Inc.

    San Francisco, CA
    2 days ago
  • $220k - $320k

    inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques... 
    Performance

    inference.net

    San Francisco, CA
    2 days ago
  • PetsApp is looking for strong engineers to evaluate and benchmark LLM models at their San Francisco office. The role involves analyzing model performance and working closely with both open-source and closed-source model labs. Candidates should have significant Python experience... 
    Performance
    Work at office
    Relocation package

    PetsApp

    San Francisco, CA
    3 days ago
  • $300k

     ...firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal candidate...  ...of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive base salary of up to ~$300... 
    Performance
    Visa sponsorship
    Relocation package

    Trades Workforce Solutions

    San Francisco, CA
    5 days ago
  • FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative... 
    Performance

    FriendliAI

    San Francisco, CA
    2 days ago
  • Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize...  ...kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming experience... 
    Performance
    Flexible hours

    Liquid AI

    San Francisco, CA
    4 days ago
  •  ...THE ROLE You build and operate the inference systems that serve our models in production. The work spans...  ...running real workloads. This is an engineering role, not a research role. You'll...  ...models at high throughput Own the performance characteristics of those systems end... 
    Performance

    MakerMaker.AI

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to LLM Inference & Model-Performance Engineer. Be the first to apply!