Senior AI Inference Engineer - GPU, Rust & CUDA
$220kPerplexity
Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience in software engineering with a focus on ML inference, familiarity with deep learning frameworks, and a strong understanding of GPU architectures. Compensation ranges from $220K to $485K. #J-18808-Ljbffr
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques...Senior- A leading cloud infrastructure company is seeking a Senior Engineer 2 to join their AI Inference Optimization team. The role involves leading the technical... ...high-performance computing and a strong understanding of GPU architectures. The position offers a competitive salary...SeniorRemote job
$220k
...We build and run the inference engine behind every Perplexity query and deploy dozens of model... ...and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer... ...management to support in API Gateway. GPU kernels migration to CuTe DSL. Port our...Suggested- ...California is hiring a Backend / Infrastructure Engineer to develop the backbone of their video... ...work on cloud ingestion, distributed GPU inference pipelines, and collaborate with ML... ...offering a dynamic environment for impactful AI development. #J-18808-Ljbffr PassFortSuggested
$325k
...About the Team Our Inference team brings OpenAI's most... ...access our state-of-the-art AI models, allowing them... ...the Role We're hiring engineers to scale and optimize OpenAI... ...across emerging GPU platforms. You'll work... ...GPU kernels using HIP, CUDA, or Triton, and care deeply...Suggested$220k - $320k
A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation...SeniorLocal area- ...Asari AI in San Francisco is seeking individuals to optimize high-performance, mission-critical computing systems. You'll work with... ...performance and design complex systems. The ideal candidate has strong CUDA C experience and fluency in Python and C/C++. We offer...Flexible hours
$175k - $225k
...led by veteran operators and engineers, alumni of Sonos, Paypal, Tesla... ...We're looking for an AI Inference Engineer who lives at the boundary... ...country. If you are obsessed with CUDA kernels, TensorRT... ...kernels and perform low-level GPU tuning to maximize throughput...Local areaRemote work$160k - $250k
...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI is building the Inference Platform that brings... ...programming in one or more of: Rust, Go, Python, or... ...plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies...SeniorFull timeLocal area$160k - $225k
...Cacheflow is seeking a Senior Software Engineer for AI Runtime at Databricks, located in San Francisco. You will be instrumental in building and scaling systems for large-scale GPU training, ensuring high throughput and resilience in training across expansive fleets of...Senior- ...Hamilton Barnes Associates Limited is looking for a Senior Storage Engineer to support large-scale AI infrastructure in San Francisco. This role involves designing... ...scalable storage solutions for high-performance GPU platforms. The ideal candidate has extensive experience...SeniorRemote work
- Accellor is seeking an AI Engineer in San Francisco to develop and optimize AI/ML models. The ideal candidate should have strong Python skills... ...include building training pipelines and debugging models on CUDA-enabled GPUs. This position is an excellent opportunity for...
- Quadric in San Francisco is looking for an experienced AI Kernel Engineer to develop and optimize AI kernels for their innovative neural processing... ...and more than 5 years of relevant experience. Knowledge of CUDA, DSP, and C/C++ is essential. Benefits include life insurance...Senior
$167.2k - $209k
A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong...SeniorRemote job$216k - $270k
...As a Software Engineer on the Machine Learning Infrastructure... ..." for our large-scale GPU clusters. You will... ...compute into breakthrough AI. You will: Architect... ...languages (e.g. Python, Go, Rust, C++) ~ Experience with... ...and hardware stack (CUDA, NCCL) Experience with...SeniorFull time- A cutting-edge AI technology company based in San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate...Senior
- Pragmatike is seeking a CUDA Kernel Engineer for a remote position to develop and optimize NVIDIA CUDA kernels for high-performance AI systems. The ideal candidate will have a deep understanding of GPU architecture, performance optimization strategies, and hands-on experience...SeniorRemote workRelocation package
$300k
...startup building an AI and cloud platform,... ...model training, or inference. Our client... ...operates high-performance GPU clusters powering... ...operate inference engines such as vLLM, SGLang... ...in Python, Go, Rust, or a comparable language... ...software stacks (CUDA, Triton, NCCL) and...SeniorPermanent employmentWorldwide- ...Linuxcareers is seeking an Infrastructure/Cluster Engineer to design and operate large-scale clusters that enable AI inference at scale. The role focuses on managing diverse... ...systems for cluster health. Experience with GPU infrastructure is a plus. #J-18808-Ljbffr...
- ...leading design technology company in San Francisco is seeking a Senior Software Engineer for Backend (Systems / Infrastructure). You will architect... ...demand grows. This role involves optimizing APIs, managing GPU workloads, and collaborating with cross-functional teams....Senior
$405k
...Staff Engineer, Inference Runtime Anthropic's mission... ...interpretable, and steerable AI systems. We want... ...on. This is a senior IC role with broad... ...-sensitive Rust and Python codebase... ...management – across GPU, TPU, and Trainium... ...ecosystem (CUDA/GPU, TPU, or Trainium...Work at officeVisa sponsorshipFlexible hours- ...An innovative AI company is seeking a Software Engineer to develop infrastructure that supports AI training and inference workflows. This role requires strong object-oriented programming... ...scalable model serving, optimize multi-GPU infrastructure, and enhance system reliability...
- Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure.... ...impact millions of meetings, ensuring efficient GPU utilization, and debugging production...
$190k - $270k
AI Chopping Block, Inc. is seeking an AI Infrastructure Engineer in San Francisco. This role requires maintaining user-facing services and production systems, specializing in systems while ensuring their reliability and scalability. Candidates should have 5+ years of experience...Senior$150k - $250k
...Senior/Staff AI Engineer Job Locations US-CA-San Francisco - Remote | US-NC-Raleigh... ...infrastructure behind real-world model serving and inference. This is the role for engineers who... ...Improve performance across GPU and CPU pathways Work on KV cache,...SeniorFull timeRemote work- ...A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal...Senior
- ...financial world. The role: SoFi's Senior Staff AI Engineer is a hands-on AI engineering role in... ...ensure high-throughput, low-latency inference across diverse hardware footprints.... ...and managing the underlying Kubernetes/GPU orchestration for custom model deployments...SeniorRemote work
- ...About Us Most AI is frozen in place... ...intelligence - the inference services that serve... ...Researchers and ML engineers will hand you workloads... ...heterogeneous GPU fleets. Batching, scheduling... ...language (Go, Rust, C++). ~ Working... ...accelerator stack: CUDA fundamentals, NCCL,...Flexible hours
- ...Senior Software Engineer We're hiring a Senior Software Engineer onto our Applied AI team to build and extend the backend systems that power... ...layer that connects them to our GPU-resident compute. A note... ...Familiarity with causal inference or graph-based systems...SeniorWork at office
- ...Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work... ...will have a strong background in CUDA or similar, with proven experience in kernel...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior AI Inference Engineer - GPU, Rust & CUDA. Be the first to apply!
- ai engineer remote San Francisco, CA
- ai prompt engineer San Francisco, CA
- senior ai engineer San Francisco, CA
- machine learning ai engineer San Francisco, CA
- ai engineer San Francisco, CA
- ai developer San Francisco, CA
- ai ml engineer San Francisco, CA
- ai research engineer San Francisco, CA
- senior office manager San Francisco, CA
- senior automation controls engineer San Francisco, CA

