Cloud Inference Engineer

SupportFinity

Qualifications CUDA + GPU inference optimization vLLM, SGLang, or TensorRT-LLM experience KV caching, paged attention, batching, token streaming, etc. Distributed compute (with GPUs is a super plus) No degree required Company Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line. Role Founding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud. Day To Day Responsibilities Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc. Conducting model performance reviews Improve scheduler, batcher, autoscaling; profile latency, cost, utilization Sometimes write kernels #J-18808-Ljbffr

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the Cloud Inference Engineer in San Francisco, CA vacancy

Distributed Systems Engineer, Data & Inference Platform
...systems that turn raw compute into useful intelligence - the inference services that serve LLMs at scale and the data pipelines that... ...call pager that keeps you honest about both. Researchers and ML engineers will hand you workloads that barely run; you'll hand them back...
Suggested
Flexible hours
Adaption
San Francisco, CA
12 days ago
Senior ML Platform Engineer - Remote, Scalable Inference
$230k - $265k
...Parafin is seeking a Software Engineer to lead the evolution of their ML Platform, ensuring robust and scalable systems for data scientists... ...and maintain core platform functionalities, enhance real-time inference processes, and collaborate across teams to ensure quality. A...
Suggested
Remote work
Parafin Inc
San Francisco, CA
2 days ago
Senior Backend Engineer, Inference Platform
$160k - $250k
...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI is building the Inference Platform that brings the most advanced generative AI models to the world. Our platform powers multi-tenant serverless workloads and dedicated endpoints...
Suggested
Full time
Local area
Together AI
San Francisco, CA
4 days ago
Group Product Manager, Managed Inference & AI Cloud
...You will manage the complete product lifecycle for our Managed Inference offerings and work on launching scalable AI services. The ideal... ...management, excellent communication skills, and a strong background in cloud infrastructures like AWS or Azure. This role offers competitive...
Suggested
Crusoe Energy Systems LLC
San Francisco, CA
1 day ago
Senior Backend Architect - Real-Time AI Inference
...URun in San Francisco is seeking an experienced backend engineer to build services for its stateful real-time inference platform. You will work closely with product, infrastructure, and AI teams in an early-stage environment that values innovation and efficiency. This...
Suggested
U-Run
San Francisco, CA
2 days ago
Technical Program Manager, Cloud Inference
$290k
...growing group of committed researchers, engineers, policy experts, and business leaders working... ...Program Manager to support our critical cloud deployments. In this role you will be an... ...customers on it. Background in ML inference, model serving infrastructure, or...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
5 days ago
Founding Cloud Inference Engineer (Low-Latency AI Serving)
...A pioneering AI technology firm in San Francisco is seeking a founding member to optimize and serve models on Luminal Cloud. The role involves deploying models with advanced optimization techniques, conducting performance reviews, and enhancing scheduling processes. Ideal...
SupportFinity
San Francisco, CA
2 days ago
Cloud-Scale AI Inference Architect
...role involves designing large-scale deployment architectures, solving AI inference challenges, and collaborating closely with customers' DevOps teams. Ideal candidates will have 3+ years in cloud infrastructure or DevOps, strong skills in Kubernetes, Docker, Terraform,...
Flexible hours
FriendliAI
San Francisco, CA
1 day ago
Senior Engineer, AI Inference Platform
$139.2k - $174k
A leading cloud services provider is looking for a Senior Engineer 2 to join their AI Infrastructure Control Plane team. This role involves architecting high-quality software solutions for AI workloads while driving design and operational excellence. Candidates should...
Remote work
DigitalOcean
San Francisco, CA
2 days ago
Staff + Sr. Software Engineer, Cloud Inference
$320k
...Staff + Sr. Software Engineer, Cloud Inference San Francisco, CA About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
3 days ago
Platform Workload Engineer: AI Inference & Benchmarking
A leading AI technology firm in San Francisco seeks an SW Engineer to enable production workloads and performance testing on advanced platforms. Responsibilities include validating key machine learning models, developing benchmarks, and collaborating with engineering teams...
AI Chopping Block, Inc.
San Francisco, CA
2 days ago
Realtime Inference Engineer Scalable AI Serving
...Cartesia is looking for an Inference Engineer in San Francisco to enhance real-time multimodal intelligence. You will design and build scalable, low-latency model inference systems while collaborating with researchers. The ideal candidate has strong engineering skills...
Flexible hours
Cartesia, Inc.
San Francisco, CA
2 days ago
Senior Engineer 2: AI Inference Engine Systems
$167.2k - $209k
...relentless in their drive to build the simplest scalable cloud. If you have a growth mindset, naturally like to think big... ...of AI-driven applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role, you will be a key technical...
Local area
Remote work
Worldwide
Flexible hours
DigitalOcean
San Francisco, CA
6 days ago
Senior Inference Performance Engineer GPU & CUDA
$220k - $320k
...A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation...
Local area
Inference
San Francisco, CA
3 days ago
GPU Kernel Engineer for AI Inference & Performance
...FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...
FriendliAI
San Francisco, CA
1 day ago
Real-Time GPU Inference Optimization Engineer
$300k
...leading technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal... ...understanding of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive base...
Visa sponsorship
Relocation package
Trades Workforce Solutions
San Francisco, CA
2 days ago
Real-Time LLM Inference & Speech Serving Engineer
$180k - $270k
...Plaud is seeking talented individuals for AI infrastructure roles in San Francisco, focusing on building high-performance inference engines for speech AI. Ideal candidates will have substantial experience in GPU architecture and real-time systems. This position offers...
Plaud
San Francisco, CA
2 days ago
Inference Engineer
...team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build... ...world's foremost experts in AI. About the Role We're hiring an Inference Engineer to advance our mission of building real-time multimodal...
Work at office
Visa sponsorship
Flexible hours
Cartesia, Inc.
San Francisco, CA
1 day ago
INFERENCE ENGINEER
...ABOUT THE ROLE You build and operate the inference systems that serve our models in... ...with running real workloads. This is an engineering role, not a research role. You'll measure... ...serving frameworks Experience with mixed cloud and on‑premises deployments Familiarity...
MakerMaker.AI
San Francisco, CA
1 day ago
LLM Inference Frameworks and Optimization Engineer
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together.ai, we are building state-of-the-art infrastructure to enable efficient and scalable inference for large language models (LLMs). Our mission is to...
Full time
Together AI
San Francisco, CA
14 days ago
Engineering Manager, Cloud Platform
...BASETEN Baseten powers mission-critical inference for the world's most dynamic AI... ...Conviction. Join us and help build the platform engineers turn to to ship AI products. THE ROLE... ...the Engineering Manager for Baseten's Cloud Platform team, you will directly manage...
Flexible hours
Baseten
San Francisco, CA
5 days ago
Distributed LLM Inference Engineer
...to be backed by Andreessen Horowitz, NEA, and Addition with $250+ million raised to date. About the role As a Distributed LLM Inference Engineer, you will help with systems and optimizations that push the boundaries of performance for inference at large scale. This is...
Work at office
Anyscale
San Francisco, CA
2 days ago
LLM Inference Engineer: Frameworks & Optimizations
$160k - $230k
Together AI is seeking an Inference Frameworks and Optimization Engineer in San Francisco, California. The role focuses on designing and optimizing distributed inference engines, ensuring efficient deployment of large language models and vision models. The ideal candidate...
Together AI
San Francisco, CA
1 day ago
Multimodal Inference Engineer — Scale GPU AI Models
An innovative company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing and implementing infrastructure for large-scale multimodal models, focusing on high-performance delivery of audio and image inputs. You'll collaborate...
OpenAI
San Francisco, CA
1 day ago
Edge Inference Engineer: Optimize On-Device AI Kernels
Liquid AI is seeking a Systems Programmer to join their Edge Inference team in San Francisco. In this role, you will implement and optimize inference kernels on various hardware, ensuring efficiency and performance. Ideal candidates have over 5 years of systems programming...
Flexible hours
Liquid AI
San Francisco, CA
4 days ago
Senior Inference Performance Engineer - GPU & CUDA
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques...
inference.net
San Francisco, CA
2 days ago
Founding Engineer, ML Inference
...Join a small, focused team of YC and unicorn founders and senior engineers with deep expertise in 3D, generative video, developer... ...possible. About the Role We're looking for a Founding Engineer, ML Inference with deep expertise in high-performance ML engineering. This...
Relocation
Visa sponsorship
Relocation package
Reactor
San Francisco, CA
3 days ago
Inference Engineer
...stealth, the company has already reached eight-figure revenue, raised an $80M Series A, and is scaling a world-class engineering team across inference, distributed systems, compiler infrastructure, and high-performance AI compute. Their platform automatically maps complex...
Acceler8 Talent
San Francisco, CA
22 hours ago
LLM Inference & Model-Performance Engineer
...A leading AI platform company in San Francisco is seeking a Software Engineer focused on machine learning performance. This role involves implementing advanced techniques for ML model inference and debugging performance issues with frameworks like PyTorch and TensorRT...
BaseTen
San Francisco, CA
1 day ago
Remote Realtime Speech Inference Engineer
Machine Learning Engineer, Inference Want to solve realtime inference problems where milliseconds genuinely matter? This role is with a fast-growing voice AI company building the realtime speech infrastructure layer behind hundreds of millions of production conversations...
Remote job
Flexible hours
Trades Workforce Solutions
San Francisco, CA
12 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Cloud Inference Engineer. Be the first to apply!