Senior AI Inference Engineer - GPU, Rust & CUDA

$220k

Perplexity

Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience in software engineering with a focus on ML inference, familiarity with deep learning frameworks, and a strong understanding of GPU architectures. Compensation ranges from $220K to $485K. #J-18808-Ljbffr

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the Senior AI Inference Engineer - GPU, Rust & CUDA in San Francisco, CA vacancy

Senior AI Inference Infra Engineer—GPU Cluster Ops
Acceler8 Talent in San Francisco is partnering with a rapidly growing AI infrastructure company to own the core cluster infrastructure powering a heterogeneous AI cloud. You will manage large CPU/GPU/accelerator clusters, bare‑metal provisioning, and production readiness...
Senior
Acceler8 Talent
San Francisco, CA
18 hours ago
Senior Inference Performance Engineer - GPU & CUDA
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques...
Senior
inference.net
San Francisco, CA
2 days ago
Rust AI Inference Engineer for Local LLMs
OpenInfer is looking for a Full-Time AI Software Engineer specialized in Rust to enhance our local AI inference systems. You will work on developing and maintaining the LLM inference engine and collaborating with product teams to integrate new features across major client...
Suggested
Full time
Local area
Flexible hours
OpenInfer
San Francisco, CA
18 hours ago
Senior AI Inference Performance Engineer (Remote)
A leading cloud infrastructure company is seeking a Senior Engineer 2 to join their AI Inference Optimization team. The role involves leading the technical... ...high-performance computing and a strong understanding of GPU architectures. The position offers a competitive salary...
Senior
Remote job
DigitalOcean
San Francisco, CA
2 days ago
Member of Technical Staff (AI Inference Engineer)
$220k
We build and run the inference engine behind every Perplexity query and deploy dozens of model... ...and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer... ...management to support in API Gateway. GPU kernels migration to CuTe DSL. Port...
Suggested
Perplexity
San Francisco, CA
2 days ago
Senior Inference Performance Engineer — GPU & CUDA
$220k - $320k
A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation...
Senior
Local area
Inference
San Francisco, CA
1 day ago
Senior Backend Engineer, Inference Platform
$160k - $250k
Together AI is building the Inference Platform that brings the most advanced generative... ...data centers and model engine pods. Develop auto‑... ...in one or more of: Rust, Go, Python, or TypeScript... ...plus. Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies...
Senior
Full time
Local area
Together AI
San Francisco, CA
18 hours ago
AI Infrastructure Engineer: Scalable GPU Inference, On-Site
An innovative studio is seeking an AI Infrastructure Engineer to enhance their ML infrastructure for groundbreaking anime games. This role involves designing and implementing cutting-edge inference architectures to support various platforms. As part of a small, agile team...
Worldwide
Spellbrush
San Francisco, CA
18 hours ago
Senior AI Runtime Engineer: Large-Scale GPU Training
Gravity Engineering Services Pvt Ltd. is seeking a Senior Software Engineer for AI Runtime in San Francisco, California. The role involves driving the architecture of a managed GPU training platform, solving complex training challenges, and enhancing performance and reliability...
Senior
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
18 hours ago
Senior AI Runtime Engineer: Scale GPU Training
$160k - $225k
Cacheflow is seeking a Senior Software Engineer for AI Runtime at Databricks, located in San Francisco. You will be instrumental in building and scaling systems for large-scale GPU training, ensuring high throughput and resilience in training across expansive fleets of...
Senior
Cacheflow
San Francisco, CA
4 days ago
Senior AI Infrastructure Engineer - HPC & GPU Clusters
Accenture is seeking a seasoned AI Infrastructure Architect in San Francisco to design and implement scalable AI infrastructure, including... ...for enterprise clients. The role requires deep experience with GPU/DPUs, networking, storage, and orchestration tools, plus strong...
Senior
Accenture
San Francisco, CA
18 hours ago
Senior AI Inference Data Plane Engineer — Remote
$167.2k - $209k
A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong...
Senior
Remote job
DigitalOcean
San Francisco, CA
1 day ago
Senior AI Inference Systems Engineer
Acceler8 Talent is seeking a Senior Software Engineer to design and optimize AI inference systems for production workloads. You’ll work across runtime behavior, scheduling, memory management and system performance to deliver faster, more scalable AI inference. You’ll collaborate...
Senior
Acceler8 Talent
San Francisco, CA
18 hours ago
Senior ML Infra Engineer: GPU Inference & 99.99% Uptime
A technology company in San Francisco is seeking an experienced Infrastructure Engineer to ensure 99.99% uptime while working with custom inference stacks and managing GPU loads. The ideal candidate will have deep infrastructure expertise and a passion for tackling complex...
Senior
Morph Inc.
San Francisco, CA
18 hours ago
Senior AI Security Engineer - Platform & Inference
Lightning AI in San Francisco or Seattle is seeking a Senior Application Security Engineer to secure our AI/ML platforms and inference services. You will work with platform, ML, and infrastructure teams to identify risks and implement secure architectures. The role emphasizes...
Senior
Lightning-Ai
San Francisco, CA
5 days ago
Senior Inference Platform Engineer - Data Center
$300k
...startup building an AI and cloud platform,... ...model training, or inference. Our client... ...operates high-performance GPU clusters powering... ...operate inference engines such as vLLM, SGLang... ...in Python, Go, Rust, or a comparable language... ...software stacks (CUDA, Triton, NCCL) and...
Senior
Permanent employment
Worldwide
San Francisco, CA
more than 2 months ago
Senior GPU ML Infra Engineer — Mid-Training & Inference
A cutting-edge AI technology company based in San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate...
Senior
Reflection AI
San Francisco, CA
2 days ago
Staff + Senior Software Engineer, Inference Deployment
$320k
...interpretable, and steerable AI systems. We want AI to... ...committed researchers, engineers, policy experts, and... ...’s mandate is to make inference deployment boring and... ...into production across GPU, TPU, and Trainium fleets... ...with Python and/or Rust in production systems...
Senior
Work at office
Visa sponsorship
Flexible hours
Shift work
Anthropic
San Francisco, CA
18 hours ago
Senior AI Infrastructure Engineer - Training Platform
$216k - $270k
As a Software Engineer on the Machine Learning Infrastructure... ...” for our large-scale GPU clusters. You will... ...compute into breakthrough AI. You will: Architect and... ...languages (e.g., Python, Go, Rust, C++). Experience with... ...and hardware stack (CUDA, NCCL). Experience with...
Senior
Full time
For contractors
Scale AI
San Francisco, CA
18 hours ago
Senior Backend Engineer - GPU Inference & Real-time Systems
...leading design technology company in San Francisco is seeking a Senior Software Engineer for Backend (Systems / Infrastructure). You will architect... ...demand grows. This role involves optimizing APIs, managing GPU workloads, and collaborating with cross-functional teams....
Senior
Vizcom
San Francisco, CA
18 hours ago
Staff+ Software Engineer, Inference Runtime
...role Anthropic's Inference organization... ...efficiency that frontier AI demands. We... ...for a Staff Engineer to be a technical... ...builds on. This is a senior IC role with... ...performance‑sensitive Rust and Python... ...management - across GPU , TPU , and... ...accelerator ecosystem ( CUDA/GPU , TPU , or...
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
18 hours ago
AI Inference Support Engineer - GPU & HPC
Together AI is looking for a Customer Support Engineer in San Francisco, CA. In this role, you will assist customers with complex technical challenges involving cutting-edge AI solutions. The position requires a strong technical background and excellent communication skills...
Remote work
Together AI
San Francisco, CA
18 hours ago
Staff AI Inference Kernel Engineer
Sail builds the world’s most efficient software for inference and agent hosting. In this role, you’ll own token processing down to the lowest... ...design and implement exotic parallelism schemes, write custom GPU kernels for regimes like cascade attention, and understand every...
Sail
San Francisco, CA
1 day ago
Lead AI Infrastructure Engineer: GPU Clusters & Reliability
Luma AI in San Francisco is seeking a leader to define reliability for a frontier AI infrastructure. You will architect and operate large GPU environments, pushing the limits of training and inference while partnering with research and product to scale systems and improve...
Luma AI
San Francisco, CA
2 days ago
Senior AI Inference Deployment Engineer
$320k
Anthropic in San Francisco is seeking a Software Engineer for their Launch Engineering team. You'll design infrastructure to deploy AI models efficiently and manage resource constraints while ensuring high reliability. Ideal candidates should have robust software engineering...
Senior
Anthropic
San Francisco, CA
18 hours ago
Senior Principal AI Agent / ML Software Engineer (OCI)
$96.8k - $306.4k
...Job Description The Senior Principal AI Agent / ML Software Engineer is a Senior Staff-level, hands‑on technical leadership... ..., autonomous workflows, scalable inference infrastructure, and enterprise AI... ...for low latency, high throughput, GPU efficiency, reliability, cost,...
Senior
Temporary work
Flexible hours
Oracle
San Francisco, CA
4 days ago
Staff AI Research Infra Engineer for GPU Clusters
Gravity Engineering Services Pvt Ltd. is seeking a Staff Software Engineer for AI Research Infrastructure. You will develop and run a... ...orchestrate large-scale training and inference workloads across thousands of... ...languages like C++, Rust, and Go. Join us to influence...
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
18 hours ago
Staff/Senior Software AI engineer: Predictive Analytics
...and scaling predictive analytics and AI systems on top of Amperesand’s IoT platform... ...including data ingestion, feature engineering, model training, inference, deployment, and monitoring. Develop... ...skills in Python ; Go, Java, or Rust is a plus. Experience with ML frameworks...
Senior
Work experience placement
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
18 hours ago
Senior Staff AI Engineer
...and scale revolutionary AI‑powered enterprise... ...seeking an experienced AI Engineer with deep expertise in... ...to join our team as a Senior Staff Architect. In this... ...techniques , including inference‑time search, chain‑of‑thought... ...(Kubernetes, GPU/TPU clusters, and cloud...
Senior
Flexible hours
JazzX AI
San Francisco, CA
18 hours ago
Platform Engineer, GPU Inference & Training Orchestration
Perplexity is seeking an experienced platform engineer to own a unified, self-serve compute platform for training and inference workloads. You will design systems that launch... ...training jobs and operate inference services without GPU provisioning burdens. You will manage the GPU...
Apply
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Inference Engineer - GPU, Rust & CUDA. Be the first to apply!