Senior AI Inference Engineer - GPU, Rust & CUDA

$220k

Perplexity

Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience in software engineering with a focus on ML inference, familiarity with deep learning frameworks, and a strong understanding of GPU architectures. Compensation ranges from $220K to $485K. #J-18808-Ljbffr Perplexity

Apply

Vacancy posted 2 days ago

Similar jobs that could be interesting for youBased on the Senior AI Inference Engineer - GPU, Rust & CUDA in San Francisco, CA vacancy

Software Engineer, Inference GPU Enablement
...About the Team OpenAI’s Inference team ensures that our... ...Role We’re hiring engineers to scale and optimize OpenAI... ...across emerging GPU platforms. You’ll work... ...GPU kernels using HIP, CUDA, or Triton, and care deeply... ...OpenAI OpenAI is an AI research and deployment...
Suggested
Full time
OpenAI
San Francisco, CA
1 day ago
Software Engineer, Inference - CUDA / Kernels
...scale. As part of the inference team, you’ll be responsible... ...for a kernel-focused engineer to lead efforts in... ...porting, and optimizing GPU kernels used in inference... ...deep familiarity with CUDA or equivalent kernel programming... ...OpenAI OpenAI is an AI research and deployment...
Suggested
Full time
OpenAI
San Francisco, CA
1 day ago
Senior Inference Platform Engineer - Data Center
$300k
...startup building an AI and cloud platform,... ...model training, or inference. Our client... ...operates high-performance GPU clusters powering... ...operate inference engines such as vLLM, SGLang... ...in Python, Go, Rust, or a comparable language... ...software stacks (CUDA, Triton, NCCL) and...
Senior
Permanent employment
Worldwide
San Francisco, CA
more than 2 months ago
Senior AI/ML Engineer
...Our client is a well-funded AI startup building production-... ...customers. They are looking for a Senior AI/ML Engineer to own model training... ...pipelines, evaluation systems, and inference serving at scale. Full-time,... ...with distributed training, GPU optimization, or inference...
Senior
Full time
Clera
San Francisco, CA
1 day ago
Senior Staff AI Engineer
...The role: SoFi’s Staff AI Engineer is a hands-on AI engineering... ...organization. This is a critical, senior role responsible for setting... ...high-throughput, low-latency inference across diverse hardware... ...managing the underlying Kubernetes/GPU orchestration for custom...
Senior
Full time
Sofi
San Francisco, CA
1 day ago
Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform)
$229.9k - $262.4k
...Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform) Overview: At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital One has been an industry leader in using machine learning to create...
Senior
Full time
Part time
Local area
Capital One
San Francisco, CA
6 days ago
Staff + Senior Software Engineer, Cloud Inference
$300k
...interpretable, and steerable AI systems. We want AI to be safe... ...group of committed researchers, engineers, policy experts, and business... ...About the Role The Cloud Inference team scales and optimizes Claude... ...Proficiency in Python or Rust The annual compensation...
Senior
Full time
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
1 day ago
Software Engineer, Inference - TL
...access state-of-the-art AI models - unlocking new... ...high-performance model inference and accelerating research... ...this role, you’ll lead engineering efforts to ensure our... ...responsible for shaping our CUDA strategy, driving... ...Mentor engineers on GPU performance, CUDA development...
Full time
OpenAI
San Francisco, CA
1 day ago
Software Engineer - GPU Kernels
...Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor,... ...help build the platform engineers turn to to ship AI... ...ROLE We’re seeking a GPU Kernel Engineer to join our... ...Write and optimize code using CUDA, PTX assembly, and architecture...
Full time
Flexible hours
Baseten
San Francisco, CA
1 day ago
Software Engineer, Inference Deployment
$320k
...interpretable, and steerable AI systems. We want AI to... ...committed researchers, engineers, policy experts, and... ...Our mandate is to make inference deployment boring and... ...into production across GPU, TPU, and Trainium fleets... ...with Python and/or Rust in production systems...
Full time
Work at office
Visa sponsorship
Flexible hours
Shift work
Anthropic
San Francisco, CA
1 day ago
Software Engineer, Inference
...are a small, fast-growing team of engineers in San Francisco powering Fortune 1... ...Specialize in low-latency, high-throughput inference for OCR and multimodal models. Own... ...~ Strong Python, plus C++ or CUDA exposure ~ Experience with GPU profiling and model serving Nice...
Full time
Work at office
Visa sponsorship
Relocation package
Pulse
San Francisco, CA
1 day ago
AI Engineer — LLM Infra
...with the web by building AI agents that can... ...Scale infra for agentic inference (throughput and latency... ...Work closely with product engineers to translate cutting‑edge... ...with ML infrastructure (GPU clusters) and... ...systems experience (Triton, CUDA) High IQ, high EQ, high...
Work at office
Relocation
Visa sponsorship
Yutori
San Francisco, CA
1 day ago
Software Engineer, Model Inference
...About the Team Our Inference team brings OpenAI’s most... ...access our start-of-the-art AI models, allowing them... ...We are looking for an engineer who wants to take the... ...every FLOP and every GB of GPU RAM of our hardware.... ...optimize them (e.g. NCCL, CUDA), as well as HPC...
Full time
OpenAI
San Francisco, CA
1 day ago
Senior AI Engineer
$250k - $300k
...About Us At You.com, we are building the AI Search Infrastructure that powers modern... ..., and useful. Our team includes engineers, researchers, product builders, and operators... ...APIs — improving agentic results, cutting inference cost and token usage, and getting strong...
Senior
Full time
Immediate start
Remote work
Work from home
Flexible hours
You.com
San Francisco, CA
1 day ago
Senior Software Engineer (Rust) at Symbolica - San Francisco, US
~ Senior Software Engineer (Rust) at Symbolica – San Francisco, US Senior Software Engineer (Rust) at Symbolica – San Francisco, US About Us Symbolica is an AI research lab pioneering the application of category theory to enable logical reasoning in machines...
Senior
Work at office
Shift work
Victrays
San Francisco, CA
3 days ago
Senior AI / Machine Learning Engineer
...that sit at the intersection of AI, biology, chemistry, and large-scale engineering. Our goal is to translate complex... ...those systems. The Role As a Senior AI/ML Engineer, you will lead the... ...decisions around model serving, inference efficiency, and lifecycle...
Senior
Full time
Remote work
Flexible hours
Absentia Labs
San Francisco, CA
1 day ago
Staff/Senior Software Engineer (Rust)
$2,000 per month
.... Our vision is to build a world where AI/ML and analytics are powered by decentralized... ...Role As a Principal/Staff Software Engineer , you will help build out the next generation... ...- 8+ years experience working with Rust - Experience with DevOps / CD Experience...
Senior
Full time
Nextdata
San Francisco, CA
1 day ago
Senior AI Software Engineer
$149k - $240k
...Who We Are HP IQ is HP’s new AI innovation lab. Combining startup... ...a diverse, world‑class team—engineers, designers, researchers, and product... .... We are looking for a Senior Software Engineer to design and... ...storage solutions for real‑time AI inference and processing. Implement...
Senior
Full time
Temporary work
Local area
Flexible hours
HP IQ
San Francisco, CA
5 days ago
Senior Software Engineer - Kernel/Security
$255k - $405k
...Lambda is the #1 GPU Cloud for ML/AI teams training, fine-tuning and inferencing AI models, where engineers can easily, securely and affordably... ...private clouds and managed inference services – servicing government... ...code. Familiarity with Rust, Python, and Go is a plus....
Senior
Full time
Work at office
Local area
Work from home
Flexible hours
Lambda
San Francisco, CA
1 day ago
Applied AI Inference Engineer
$250k - $300k
...the only vertically integrated AI infrastructure company built... ...production. That means owning the inference stack end to end: profiling... ...also work directly with customer engineering teams to tailor deployments to... ...like vLLM and SGLang to the CUDA kernels underneath, profiling...
Temporary work
Crusoe
San Francisco, CA
10 days ago
Senior Lead AI Engineer
$229.9k - $262.4k
...Overview Senior Lead AI Engineer At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years... ...including foundation model training, large language model inference, similarity search, guardrails, model evaluation,...
Senior
Full time
Part time
Local area
Capital One
San Francisco, CA
3 days ago
AI Research Engineer
...Felicis, Figma Ventures, AI Grant, and more, and we're... ...accomplished a lot as just one engineer and one designer!) to a... ...with modeling / inference tools such as Pytorch and CUDA. Pragmatic and product-focused... ...must, and enthusiasm for Rust is highly encouraged. An...
Poly
San Francisco, CA
5 days ago
Senior. Distinguished AI Engineer - Agentic AI Platform (Remote Eligible)
$286.2k - $326.7k
Senior. Distinguished AI Engineer - Agentic AI Platform (Remote Eligible) At Capital One, we are creating responsible and reliable AI systems... ...AI and ML algorithms or technologies (e.g. LLM Inference, Similarity Search and VectorDBs, Guardrails, Memory) using...
Senior
Full time
Part time
Work at office
Local area
Remote work
Capital One Financial Corporation
San Francisco, CA
1 day ago
Senior Software Engineer - Model Performance
$220k - $320k
...Help us make inference blazingly fast. If you love squeezing... ...GPUs, diving deep into CUDA kernels, and turning... ...need frontier-quality AI at a fraction of the cost... ...ten-person team of engineers who work in-person in downtown... ...CUDA kernels and GPU utilization across our...
Senior
Work at office
Inference
San Francisco, CA
4 days ago
Software Engineer, Supercomputing
$350k
...and tools to make AI work for their unique... ...We are scientists, engineers, and builders who’... ...build, and operate the GPU supercomputing... ...scale training and inference. You will deliver high... ...(we use Python or Rust). Experience... ...Familiarity with CUDA/NCCL and performance...
Full time
Immediate start
Visa sponsorship
Work visa
Relocation package
Thinking Machines Lab
San Francisco, CA
1 day ago
Software Engineer, Productivity - Inference Runtime
...We’re hiring a Developer Productivity engineer to support OpenAI’s Inference Runtime teams. These teams own the... ...caused by infrastructure instability, GPU scheduling, or test environment issues... ...About OpenAI OpenAI is an AI research and deployment company dedicated...
Full time
OpenAI
San Francisco, CA
1 day ago
Software Engineer- BIS (Baseten Inference Stack)
...BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion,... ...us and help build the platform engineers turn to to ship AI products. THE... ...distributed runtimes, networking, and GPU workloads Make thoughtful...
Full time
Flexible hours
Baseten
San Francisco, CA
1 day ago
Software Engineer, Inference - Multi Modal
...About the Team OpenAI’s Inference team powers the deployment of our... ...a small, fast-moving team of engineers focused on delivering a world-... ...pushing the boundaries of what AI can do. We’re expanding into... ...-level improvements including GPU utilization, tensor parallelism...
Full time
OpenAI
San Francisco, CA
1 day ago
Software Engineer - Voice AI (Inference Runtime)
...BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion,... ...us and help build the platform engineers turn to to ship AI products. THE... ..., increase throughput, and improve GPU efficiency via profiling, runtime tuning...
Full time
Flexible hours
Baseten
San Francisco, CA
1 day ago
Software Engineer, Inference
$300k
...interpretable, and steerable AI systems. We want AI to be safe... ...group of committed researchers, engineers, policy experts, and business... ...About the role Our Inference team is responsible for building... ...infrastructure (AWS, GCP) Python or Rust Representative projects:...
Full time
Work at office
Worldwide
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Inference Engineer - GPU, Rust & CUDA. Be the first to apply!