Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

LLM Inference Engineer: Scalable Serving (SF Onsite)

Gravity Engineering Services Pvt Ltd.

Gravity Engineering Services Pvt Ltd. is seeking a talented individual in San Francisco to architect and implement robust, scalable inference systems for AI models. This in-person role focuses on optimizing model serving infrastructures for high throughput and low latency. The ideal candidate has deep expertise in Python, PyTorch, and modern inference systems. You'll work closely with the research team to bring innovative capabilities into production, along with building tools to support rapid experimentation. #J-18808-Ljbffr Gravity Engineering Services Pvt Ltd.

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the LLM Inference Engineer: Scalable Serving (SF Onsite) in San Francisco, CA vacancy
  • $160k - $230k

     ...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together...  ...to enable efficient and scalable inference for large language models...  ...parallelism for high-performance serving. Apply CUDA graph optimizations... 
    Suggested
    Full time

    Together AI

    San Francisco, CA
    24 days ago
  • $167.2k - $209k

     ...their drive to build the simplest scalable cloud. If you have a growth...  ...applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role,...  ...with distributed inference serving frameworks such as llm‑d, NVIDIA Dynamo, or Ray Serve.... 
    Suggested
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    6 days ago
  •  ...source project that is creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI, Uber, Spotify,...  ...0+ million raised to date. About the role As a Distributed LLM Inference Engineer, you will help with systems and optimizations that push the... 
    Suggested
    Work at office

    Anyscale

    San Francisco, CA
    2 days ago
  • Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients... 
    Suggested

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    3 days ago
  •  ...Member of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves designing end-to-...  ...real-world workloads. Candidates should have strong software engineering skills, experience with ML inference systems, and proficiency... 
    Suggested

    Acceler8 Talent

    San Francisco, CA
    1 day ago
  • Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open... 

    Anyscale

    San Francisco, CA
    2 days ago
  • $350k

     ...growing group of committed researchers, engineers, policy experts, and business leaders...  .... About the Role Anthropic's inference fleet serves Claude to millions of users across our...  ...or inference infrastructure or general LLM serving stacks. Direct large-scale... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    16 hours ago
  • Location NYC, SF Employment Type Full time Location...  ...our Forward Deployed Engineering function. This team...  ...orchestrating complex LLM workflows, integrating...  ...internal codebase for inference, fine‑tuning, and...  ...on‑premises), ensuring scalability, performance, and reliability... 
    Full time
    Relocation package

    B Capital

    San Francisco, CA
    3 days ago
  •  ...backend developers, security engineers, data engineers, data scientists...  ...team's mandate is to build scalable bottoms-up growth motions across...  ...engineering to content creation Serving as a technical bridge between...  ...and attend events in the SF Bay Area (and around the US when... 

    Marimo Inc.

    San Francisco, CA
    2 days ago
  •  ...lives. About the Role We’re looking for a GPU Inference Engineer to contribute to improvements in model serving efficiency for our Robotics research. This is a...  ...initiatives to optimize inference performance and scalability. You’ll also be engaged in model design, to help... 
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    16 hours ago
  • OpenAI is seeking a GPU Inference Engineer based in San Francisco, CA. In this high-impact role, you'll optimize inference performance and scalability for Robotics research, driving engineering efforts to enhance model serving and system efficiency. The ideal candidate... 
    Work at office
    Relocation
    Relocation package

    OpenAI

    San Francisco, CA
    16 hours ago
  •  ...ecosystem of libraries for scalable machine learning. Companies...  ...the role The Customer Engineer will play a crucial role in...  ...and passionate about ML/AI, LLM, vLLM and the role of AI in...  ...for training, fine-tuning and inference/serving of LLMs ~ Experience running... 

    Anyscale, Inc

    San Francisco, CA
    1 day ago
  • $74.38 - $83.8 per hour

     ...Specialty Software Engineer - API Developer - GenAI...  ...responsible for creating the scalable service layer that...  ...Area (Hybrid – 3 days onsite) Required Skills &...  ...Exposure to model serving or inference gateways Microservices...  ...Kubernetes GenAI, LLM, or agent-based frameworks... 
    Full time
    Contract work
    Temporary work
    Flexible hours

    Motion Recruitment

    San Francisco, CA
    3 days ago
  • $180k - $250k

     ...looking for a skilled individual to help maintain generative media models' performance. You will design and implement innovative model serving architectures while working with the Applied ML team and customers. The ideal candidate has expertise in systems programming and... 

    fal

    San Francisco, CA
    16 hours ago
  •  ...Job Description Machine Learning Engineer, Inference Want to solve realtime inference problems where milliseconds genuinely matter?...  ...bottlenecks Optimising TensorRT, Triton, ONNX Runtime, and custom serving systems Managing KV cache systems, speculative decoding,... 
    Remote work
    Flexible hours

    techire ai

    San Francisco, CA
    16 hours ago
  •  ...Sciforium Gpu Kernel Engineer Sciforium is an AI...  ...proprietary, high-efficiency serving platform. Backed by...  ...-scale training and inference. This role is ideal...  ...to the efficiency and scalability of our ML platform....  ...focus on large-scale LLM training and inference... 
    Flexible hours

    Sciforium

    San Francisco, CA
    4 days ago
  •  ...team of YC and unicorn founders and senior engineers with deep expertise in 3D, generative...  ...'re looking for a Founding Engineer, ML Inference with deep expertise in high-performance...  ...media models. You'll work across the model-serving stack, designing novel inference... 
    Relocation
    Visa sponsorship
    Relocation package

    Reactor

    San Francisco, CA
    3 days ago
  •  ...Francisco, on‑site ABOUT THE ROLE You build and operate the inference systems that serve our models in production. The work spans serving...  ...infrastructure that come with running real workloads. This is an engineering role, not a research role. You'll measure, profile, debug... 

    MakerMaker.AI

    San Francisco, CA
    4 days ago
  • $160k - $230k

    Together AI is seeking an Inference Frameworks and Optimization Engineer in San Francisco, California. The role focuses on designing and optimizing distributed inference engines, ensuring efficient deployment of large language models and vision models. The ideal candidate... 

    Together AI

    San Francisco, CA
    1 day ago
  •  ...to accelerate software engineering. Today, engineers spend...  ...where he led AI model serving and AI agent initiatives...  ...automatically evaluate LLM outputs and improve agentic...  ...tradeoffs Build scalable infra to support long-running...  ...work permit Work in SF in person 3 days a week... 
    Work at office
    3 days per week

    Embedding VC

    San Francisco, CA
    16 hours ago
  • Gravity Engineering Services Pvt Ltd. is looking for an Inference Frameworks and Optimization Engineer to enhance the performance of AI infrastructure. This role involves designing distributed inference engines that support multimodal models, optimizing frameworks for low... 

    Gravity Engineering Services Pvt Ltd.

    San Francisco, CA
    4 days ago
  • A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT.... 

    Baseten

    San Francisco, CA
    2 days ago
  •  ...platform delivers AI inference. Validating whether inference...  ...for a dedicated QA engineer who can own the...  ...Develop and run load and scalability tests using Locust to...  ...strategies that account for LLM inference. Work...  ...Working knowledge of LLM serving. ~ Strong experience... 
    Worldwide
    Flexible hours

    FriendliAI Corp

    San Francisco, CA
    16 hours ago
  •  ...compute into useful intelligence - the inference services that serve LLMs at scale and the data pipelines...  ...about both. Researchers and ML engineers will hand you workloads that barely run...  ...~ Bonus: hands-on experience with LLM inference engines (vLLM, SGLang, TensorRT... 
    Flexible hours

    Adaption

    San Francisco, CA
    22 days ago
  •  ...looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and...  ...3+ years of experience in production-grade serving infrastructure, be fluent in Python, and have... 

    MakerMaker.AI

    San Francisco, CA
    4 days ago
  • $116k - $195k

     ...build the foundation for agent engineering in the real world, helping...  ...team. In this role, you will serve as a trusted technical advisor...  ...engineering best practices Develop scalable enablement frameworks...  ...of the modern AI/LLM stack ~ Passion for educating... 
    Work at office
    Flexible hours

    LangChain, Inc

    San Francisco, CA
    4 days ago
  • $110k - $140k

     ...Role The Implementation Engineer is a key bridge between the Business...  ...efficiently translated into scalable technical solutions. Deeply...  .... Hands-on exposure to LLM orchestration frameworks and...  ...reflects the communities we serve. We welcome and encourage qualified... 
    Work at office

    Monks Limited

    San Francisco, CA
    16 hours ago
  • $269.1k - $307.2k

     ...Distinguished Engineer (Messaging & Marketing Technology) As a Distinguished...  ..., and Turbopuffer to power LLM-driven tooling. This role...  ...design and implementation Serve as an authoritative expert on...  ..., such as performance, scalability and operability Continue learning... 
    Full time
    Part time
    Local area

    Capital One

    San Francisco, CA
    16 days ago
  • $150k - $250k

     ...Full Stack Software Engineer United States (Remote)...  ...Backend: Python, FastAPI, LLM agents, Postgres,...  ...and Athens, Greece. Our SF team works in-person in...  ...platform is performant, scalable, and secure Provide...  ...environment. Humility: Serve a greater goal, stay... 
    Full time
    Remote work
    2 days per week
    3 days per week

    Active Impact Investments

    San Francisco, CA
    4 days ago
  •  ...of 3 looking to expand to 5. Serving hundreds of requests for clinics...  ...and scaling our Voice AI LLM and orchestration system. Specific...  ...looking for Frontend‑focused engineers - the majority of our work optimizes...  ...work - we will be in the SF office 3-4 days a week. What We... 
    Work at office
    Remote work
    Flexible hours
    3 days per week

    Health Harbor

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to LLM Inference Engineer: Scalable Serving (SF Onsite). Be the first to apply!