LLM Inference Engineer: Scalable Serving (SF Onsite)
Gravity Engineering Services Pvt Ltd.
Gravity Engineering Services Pvt Ltd. is seeking a talented individual in San Francisco to architect and implement robust, scalable inference systems for AI models. This in-person role focuses on optimizing model serving infrastructures for high throughput and low latency. The ideal candidate has deep expertise in Python, PyTorch, and modern inference systems. You'll work closely with the research team to bring innovative capabilities into production, along with building tools to support rapid experimentation. #J-18808-Ljbffr Gravity Engineering Services Pvt Ltd.
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together... ...to enable efficient and scalable inference for large language models... ...parallelism for high-performance serving. Apply CUDA graph optimizations...SuggestedFull time$167.2k - $209k
...their drive to build the simplest scalable cloud. If you have a growth... ...applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role,... ...with distributed inference serving frameworks such as llm‑d, NVIDIA Dynamo, or Ray Serve....SuggestedLocal areaRemote workWorldwideFlexible hours- ...source project that is creating an ecosystem of libraries for scalable machine learning. Companies like OpenAI, Uber, Spotify,... ...0+ million raised to date. About the role As a Distributed LLM Inference Engineer, you will help with systems and optimizations that push the...SuggestedWork at office
- Gravity Engineering Services Pvt Ltd. is looking for a Distributed LLM Inference Engineer to join their team. This critical role focuses on enhancing performance for ML inference, ensuring scalability and efficiency in solutions used by both open-source and corporate clients...Suggested
- ...Member of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves designing end-to-... ...real-world workloads. Candidates should have strong software engineering skills, experience with ML inference systems, and proficiency...Suggested
- Anyscale is seeking a Distributed LLM Inference Engineer in San Francisco, California. This pivotal role involves pushing the boundaries of performance for ML inference at scale. You'll work closely with product teams to deliver end-to-end solutions while leveraging open...
$350k
...growing group of committed researchers, engineers, policy experts, and business leaders... .... About the Role Anthropic's inference fleet serves Claude to millions of users across our... ...or inference infrastructure or general LLM serving stacks. Direct large-scale...Work at officeVisa sponsorshipFlexible hours- Location NYC, SF Employment Type Full time Location... ...our Forward Deployed Engineering function. This team... ...orchestrating complex LLM workflows, integrating... ...internal codebase for inference, fine‑tuning, and... ...on‑premises), ensuring scalability, performance, and reliability...Full timeRelocation package
- ...backend developers, security engineers, data engineers, data scientists... ...team's mandate is to build scalable bottoms-up growth motions across... ...engineering to content creation Serving as a technical bridge between... ...and attend events in the SF Bay Area (and around the US when...
- ...lives. About the Role We’re looking for a GPU Inference Engineer to contribute to improvements in model serving efficiency for our Robotics research. This is a... ...initiatives to optimize inference performance and scalability. You’ll also be engaged in model design, to help...Work at officeRelocation package
- OpenAI is seeking a GPU Inference Engineer based in San Francisco, CA. In this high-impact role, you'll optimize inference performance and scalability for Robotics research, driving engineering efforts to enhance model serving and system efficiency. The ideal candidate...Work at officeRelocationRelocation package
- ...ecosystem of libraries for scalable machine learning. Companies... ...the role The Customer Engineer will play a crucial role in... ...and passionate about ML/AI, LLM, vLLM and the role of AI in... ...for training, fine-tuning and inference/serving of LLMs ~ Experience running...
$74.38 - $83.8 per hour
...Specialty Software Engineer - API Developer - GenAI... ...responsible for creating the scalable service layer that... ...Area (Hybrid – 3 days onsite) Required Skills &... ...Exposure to model serving or inference gateways Microservices... ...Kubernetes GenAI, LLM, or agent-based frameworks...Full timeContract workTemporary workFlexible hours$180k - $250k
...looking for a skilled individual to help maintain generative media models' performance. You will design and implement innovative model serving architectures while working with the Applied ML team and customers. The ideal candidate has expertise in systems programming and...- ...Job Description Machine Learning Engineer, Inference Want to solve realtime inference problems where milliseconds genuinely matter?... ...bottlenecks Optimising TensorRT, Triton, ONNX Runtime, and custom serving systems Managing KV cache systems, speculative decoding,...Remote workFlexible hours
- ...Sciforium Gpu Kernel Engineer Sciforium is an AI... ...proprietary, high-efficiency serving platform. Backed by... ...-scale training and inference. This role is ideal... ...to the efficiency and scalability of our ML platform.... ...focus on large-scale LLM training and inference...Flexible hours
- ...team of YC and unicorn founders and senior engineers with deep expertise in 3D, generative... ...'re looking for a Founding Engineer, ML Inference with deep expertise in high-performance... ...media models. You'll work across the model-serving stack, designing novel inference...RelocationVisa sponsorshipRelocation package
- ...Francisco, on‑site ABOUT THE ROLE You build and operate the inference systems that serve our models in production. The work spans serving... ...infrastructure that come with running real workloads. This is an engineering role, not a research role. You'll measure, profile, debug...
$160k - $230k
Together AI is seeking an Inference Frameworks and Optimization Engineer in San Francisco, California. The role focuses on designing and optimizing distributed inference engines, ensuring efficient deployment of large language models and vision models. The ideal candidate...- ...to accelerate software engineering. Today, engineers spend... ...where he led AI model serving and AI agent initiatives... ...automatically evaluate LLM outputs and improve agentic... ...tradeoffs Build scalable infra to support long-running... ...work permit Work in SF in person 3 days a week...Work at office3 days per week
- Gravity Engineering Services Pvt Ltd. is looking for an Inference Frameworks and Optimization Engineer to enhance the performance of AI infrastructure. This role involves designing distributed inference engines that support multimodal models, optimizing frameworks for low...
- A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT....
- ...platform delivers AI inference. Validating whether inference... ...for a dedicated QA engineer who can own the... ...Develop and run load and scalability tests using Locust to... ...strategies that account for LLM inference. Work... ...Working knowledge of LLM serving. ~ Strong experience...WorldwideFlexible hours
- ...compute into useful intelligence - the inference services that serve LLMs at scale and the data pipelines... ...about both. Researchers and ML engineers will hand you workloads that barely run... ...~ Bonus: hands-on experience with LLM inference engines (vLLM, SGLang, TensorRT...Flexible hours
- ...looking for a Senior Machine Learning Systems Engineer in San Francisco. In this role, you will build and operate production inference systems, optimizing for performance and... ...3+ years of experience in production-grade serving infrastructure, be fluent in Python, and have...
$116k - $195k
...build the foundation for agent engineering in the real world, helping... ...team. In this role, you will serve as a trusted technical advisor... ...engineering best practices Develop scalable enablement frameworks... ...of the modern AI/LLM stack ~ Passion for educating...Work at officeFlexible hours$110k - $140k
...Role The Implementation Engineer is a key bridge between the Business... ...efficiently translated into scalable technical solutions. Deeply... .... Hands-on exposure to LLM orchestration frameworks and... ...reflects the communities we serve. We welcome and encourage qualified...Work at office$269.1k - $307.2k
...Distinguished Engineer (Messaging & Marketing Technology) As a Distinguished... ..., and Turbopuffer to power LLM-driven tooling. This role... ...design and implementation Serve as an authoritative expert on... ..., such as performance, scalability and operability Continue learning...Full timePart timeLocal area$150k - $250k
...Full Stack Software Engineer United States (Remote)... ...Backend: Python, FastAPI, LLM agents, Postgres,... ...and Athens, Greece. Our SF team works in-person in... ...platform is performant, scalable, and secure Provide... ...environment. Humility: Serve a greater goal, stay...Full timeRemote work2 days per week3 days per week- ...of 3 looking to expand to 5. Serving hundreds of requests for clinics... ...and scaling our Voice AI LLM and orchestration system. Specific... ...looking for Frontend‑focused engineers - the majority of our work optimizes... ...work - we will be in the SF office 3-4 days a week. What We...Work at officeRemote workFlexible hours3 days per week
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to LLM Inference Engineer: Scalable Serving (SF Onsite). Be the first to apply!

