LLM Inference Engineer: Scalable Serving (SF Onsite)

Gravity Engineering Services Pvt Ltd.

Gravity Engineering Services Pvt Ltd. is seeking a talented individual in San Francisco to architect and implement robust, scalable inference systems for AI models. This in-person role focuses on optimizing model serving infrastructures for high throughput and low latency. The ideal candidate has deep expertise in Python, PyTorch, and modern inference systems. You'll work closely with the research team to bring innovative capabilities into production, along with building tools to support rapid experimentation. #J-18808-Ljbffr Gravity Engineering Services Pvt Ltd.

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the LLM Inference Engineer: Scalable Serving (SF Onsite) in San Francisco, CA vacancy

LLM Inference Frameworks and Optimization Engineer
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together... ...to enable efficient and scalable inference for large language models... ...parallelism for high-performance serving. Apply CUDA graph optimizations...
Suggested
Full time
Together AI
San Francisco, CA
24 days ago
Senior Engineer 2: AI Inference Engine Systems
$167.2k - $209k
...their drive to build the simplest scalable cloud. If you have a growth... ...applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role,... ...with distributed inference serving frameworks such as llm‑d, NVIDIA Dynamo, or Ray Serve....
Suggested
Local area
Remote work
Worldwide
Flexible hours
DigitalOcean
San Francisco, CA
6 days ago
Distributed LLM Inference Engineer
...source project that is creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI, Uber, Spotify,... ...0+ million raised to date. About the role As a Distributed LLM Inference Engineer, you will help with systems and optimizations that push the...
Suggested
Work at office
Anyscale
San Francisco, CA
2 days ago
System Engineering In
Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients...
Suggested
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
3 days ago
Staff ML Inference Systems Engineer - Scalable GPU Infra (SF)
...Member of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves designing end-to-... ...real-world workloads. Candidates should have strong software engineering skills, experience with ML inference systems, and proficiency...
Suggested
Acceler8 Talent
San Francisco, CA
1 day ago
Distributed LLM Inference Engineer - Scale AI at Speed
Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open...
Anyscale
San Francisco, CA
2 days ago
System Performance Engineering
$350k
...growing group of committed researchers, engineers, policy experts, and business leaders... .... About the Role Anthropic's inference fleet serves Claude to millions of users across our... ...or inference infrastructure or general LLM serving stacks. Direct large-scale...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
16 hours ago
Forward Deployed Engineer, Lead - LLM Post-training
Location NYC, SF Employment Type Full time Location... ...our Forward Deployed Engineering function. This team... ...orchestrating complex LLM workflows, integrating... ...internal codebase for inference, fine‑tuning, and... ...on‑premises), ensuring scalability, performance, and reliability...
Full time
Relocation package
B Capital
San Francisco, CA
3 days ago
Senior Developer Relations Engineer — Growth (SF only)
...backend developers, security engineers, data engineers, data scientists... ...team's mandate is to build scalable bottoms-up growth motions across... ...engineering to content creation Serving as a technical bridge between... ...and attend events in the SF Bay Area (and around the US when...
Marimo Inc.
San Francisco, CA
2 days ago
Robotic Engineering M
...lives. About the Role We’re looking for a GPU Inference Engineer to contribute to improvements in model serving efficiency for our Robotics research. This is a... ...initiatives to optimize inference performance and scalability. You’ll also be engaged in model design, to help...
Work at office
Relocation package
OpenAI
San Francisco, CA
16 hours ago
Robotics GPU Inference Engineer — Hybrid (Relocation)
OpenAI is seeking a GPU Inference Engineer based in San Francisco, CA. In this high-impact role, you'll optimize inference performance and scalability for Robotics research, driving engineering efforts to enhance model serving and system efficiency. The ideal candidate...
Work at office
Relocation
Relocation package
OpenAI
San Francisco, CA
16 hours ago
Customer Engineer
...ecosystem of libraries for scalable machine learning. Companies... ...the role The Customer Engineer will play a crucial role in... ...and passionate about ML/AI, LLM, vLLM and the role of AI in... ...for training, fine-tuning and inference/serving of LLMs ~ Experience running...
Anyscale, Inc
San Francisco, CA
1 day ago
Backend Integration Engineer (AI/Model Services)
$74.38 - $83.8 per hour
...Specialty Software Engineer - API Developer - GenAI... ...responsible for creating the scalable service layer that... ...Area (Hybrid – 3 days onsite) Required Skills &... ...Exposure to model serving or inference gateways Microservices... ...Kubernetes GenAI, LLM, or agent-based frameworks...
Full time
Contract work
Temporary work
Flexible hours
Motion Recruitment
San Francisco, CA
3 days ago
Staff ML Performance & Systems Engineer — Scalable Inference
$180k - $250k
...looking for a skilled individual to help maintain generative media models' performance. You will design and implement innovative model serving architectures while working with the Applied ML team and customers. The ideal candidate has expertise in systems programming and...
fal
San Francisco, CA
16 hours ago
Inference Engineer
...Job Description Machine Learning Engineer, Inference Want to solve realtime inference problems where milliseconds genuinely matter?... ...bottlenecks Optimising TensorRT, Triton, ONNX Runtime, and custom serving systems Managing KV cache systems, speculative decoding,...
Remote work
Flexible hours
techire ai
San Francisco, CA
16 hours ago
GPU Kernel Engineer
...Sciforium Gpu Kernel Engineer Sciforium is an AI... ...proprietary, high-efficiency serving platform. Backed by... ...-scale training and inference. This role is ideal... ...to the efficiency and scalability of our ML platform.... ...focus on large-scale LLM training and inference...
Flexible hours
Sciforium
San Francisco, CA
4 days ago
Founding Engineer, ML Inference
...team of YC and unicorn founders and senior engineers with deep expertise in 3D, generative... ...'re looking for a Founding Engineer, ML Inference with deep expertise in high-performance... ...media models. You'll work across the model-serving stack, designing novel inference...
Relocation
Visa sponsorship
Relocation package
Reactor
San Francisco, CA
3 days ago
INFERENCE ENGINEER
...Francisco, on‑site ABOUT THE ROLE You build and operate the inference systems that serve our models in production. The work spans serving... ...infrastructure that come with running real workloads. This is an engineering role, not a research role. You'll measure, profile, debug...
MakerMaker.AI
San Francisco, CA
4 days ago
LLM Inference Engineer: Frameworks & Optimizations
$160k - $230k
Together AI is seeking an Inference Frameworks and Optimization Engineer in San Francisco, California. The role focuses on designing and optimizing distributed inference engines, ensuring efficient deployment of large language models and vision models. The ideal candidate...
Together AI
San Francisco, CA
1 day ago
Founding Engineer
...to accelerate software engineering. Today, engineers spend... ...where he led AI model serving and AI agent initiatives... ...automatically evaluate LLM outputs and improve agentic... ...tradeoffs Build scalable infra to support long-running... ...work permit Work in SF in person 3 days a week...
Work at office
3 days per week
Embedding VC
San Francisco, CA
16 hours ago
LLM Inference & Optimization Engineer
Gravity Engineering Services Pvt Ltd. is looking for an Inference Frameworks and Optimization Engineer to enhance the performance of AI infrastructure. This role involves designing distributed inference engines that support multimodal models, optimizing frameworks for low...
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
4 days ago
LLM Inference & Model-Performance Engineer
A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT....
Baseten
San Francisco, CA
2 days ago
QA Engineering Tech
...platform delivers AI inference. Validating whether inference... ...for a dedicated QA engineer who can own the... ...Develop and run load and scalability tests using Locust to... ...strategies that account for LLM inference. Work... ...Working knowledge of LLM serving. ~ Strong experience...
Worldwide
Flexible hours
FriendliAI Corp
San Francisco, CA
16 hours ago
Distributed Systems Engineer, Data & Inference Platform
...compute into useful intelligence - the inference services that serve LLMs at scale and the data pipelines... ...about both. Researchers and ML engineers will hand you workloads that barely run... ...~ Bonus: hands-on experience with LLM inference engines (vLLM, SGLang, TensorRT...
Flexible hours
Adaption
San Francisco, CA
22 days ago
Senior ML Inference Engineer Production Systems
...looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and... ...3+ years of experience in production-grade serving infrastructure, be fluent in Python, and have...
MakerMaker.AI
San Francisco, CA
4 days ago
Customer Engineer (West Coast)
$116k - $195k
...build the foundation for agent engineering in the real world, helping... ...team. In this role, you will serve as a trusted technical advisor... ...engineering best practices Develop scalable enablement frameworks... ...of the modern AI/LLM stack ~ Passion for educating...
Work at office
Flexible hours
LangChain, Inc
San Francisco, CA
4 days ago
Implementation Engineer
$110k - $140k
...Role The Implementation Engineer is a key bridge between the Business... ...efficiently translated into scalable technical solutions. Deeply... .... Hands-on exposure to LLM orchestration frameworks and... ...reflects the communities we serve. We welcome and encourage qualified...
Work at office
Monks Limited
San Francisco, CA
16 hours ago
Distinguished Engineer (Messaging & Marketing Technology)
$269.1k - $307.2k
...Distinguished Engineer (Messaging & Marketing Technology) As a Distinguished... ..., and Turbopuffer to power LLM-driven tooling. This role... ...design and implementation Serve as an authoritative expert on... ..., such as performance, scalability and operability Continue learning...
Full time
Part time
Local area
Capital One
San Francisco, CA
16 days ago
Full Stack Software Engineer (Piq Energy / Remote/Hybrid US)
$150k - $250k
...Full Stack Software Engineer United States (Remote)... ...Backend: Python, FastAPI, LLM agents, Postgres,... ...and Athens, Greece. Our SF team works in-person in... ...platform is performant, scalable, and secure Provide... ...environment. Humility: Serve a greater goal, stay...
Full time
Remote work
2 days per week
3 days per week
Active Impact Investments
San Francisco, CA
4 days ago
Founding Engineer .
...of 3 looking to expand to 5. Serving hundreds of requests for clinics... ...and scaling our Voice AI LLM and orchestration system. Specific... ...looking for Frontend‑focused engineers - the majority of our work optimizes... ...work - we will be in the SF office 3-4 days a week. What We...
Work at office
Remote work
Flexible hours
3 days per week
Health Harbor
San Francisco, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to LLM Inference Engineer: Scalable Serving (SF Onsite). Be the first to apply!