Senior Inference Performance Engineer — GPU & CUDA

$220k - $320k

Inference

A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation of $220,000 to $320,000 plus equity and benefits, focusing on accelerating AI inference. Join an innovative team in downtown San Francisco, hybrid options available for local candidates. #J-18808-Ljbffr Inference

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Senior Inference Performance Engineer — GPU & CUDA in San Francisco, CA vacancy

Senior CUDA Kernel Engineer - GPU Performance Lead
Pragmatike is seeking a CUDA Kernel Engineer for a remote position to develop and optimize NVIDIA CUDA kernels for high-performance AI systems. The ideal candidate will have a deep understanding of GPU architecture, performance optimization strategies, and hands-on experience...
Senior
Performance
Remote work
Relocation package
Pragmatike
San Francisco, CA
3 days ago
Senior GPU Kernel Engineer - Accelerate AI Training Systems
...Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work... ...will have a strong background in CUDA or similar, with proven experience in kernel...
Senior
Performance
MakerMaker
San Francisco, CA
1 day ago
Senior Engineer 2: GPU Kernel and Performance
$167.2k - $209k
...DigitalOcean is seeking a Senior Engineer 2 to play a key technical role in our AI Inference Optimization team.... ...the industry-leading performance for our inference services... ...inference engine and GPU kernel layers,... ...their software stacks (CUDA, ROCm, TensorRT, OpenAI...
Senior
Performance
Local area
Remote work
Worldwide
Flexible hours
DigitalOcean
San Francisco, CA
3 days ago
GPU Kernel Engineer for AI Inference & Performance
FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...
Performance
FriendliAI
San Francisco, CA
19 hours ago
Real-Time GPU Inference Optimization Engineer
$300k
...technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The... ...possess strong experience with CUDA/Triton, a deep understanding of... ...execution, and a knack for optimizing inference latency for large generative...
Performance
Visa sponsorship
Relocation package
Trades Workforce Solutions
San Francisco, CA
3 days ago
Senior GPU Kernel Engineer for Autonomous Driving
$128.7k - $261.3k
...autonomous driving technology. The role focuses on designing high-performance GPU kernels, optimizing ML performance, and collaborating cross-... .... The ideal candidate will have a strong background in CUDA programming and C++, with a minimum of 3 years of industry experience...
Senior
Performance
Israelvcforum
San Francisco, CA
1 day ago
Senior GPU Kernel Engineer for AI Acceleration
...Francisco is seeking a highly skilled GPU Kernel Engineer to design and optimize custom GPU kernels... ...vendors, with a focus on enhancing performance for AI workloads. The ideal candidate... ...and Python, and a deep understanding of CUDA and performance strategies. Benefits...
Senior
Performance
Flexible hours
Sciforium
San Francisco, CA
1 day ago
Senior ML Inference Engineer Production Systems
MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this... ...build and operate production inference systems, optimizing for performance and reliability. The ideal candidate... ...and have strong knowledge in GPU-accelerated inference....
Senior
Performance
MakerMaker.AI
San Francisco, CA
2 days ago
GPU Kernel Engineer: Build Fast AI Inference at Scale
...AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for... ...computation efficiency. Ideal candidates have 1–5 years of CUDA development experience and a strong understanding of...
Performance
Baseten
San Francisco, CA
4 days ago
Senior GPU ML Infra Engineer — Mid-Training & Inference
...San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate will have hands-on experience with modern...
Senior
Performance
Reflection AI
San Francisco, CA
19 hours ago
GPU Inference Engineer - Robotics (Hybrid SF)
Centaur Labs is seeking a GPU Inference Engineer to enhance model serving efficiency for Robotics research in San Francisco, CA. This high-impact role involves optimizing inference performance and collaborating with research teams on inference-friendly models. Key responsibilities...
Performance
Centaur Labs
San Francisco, CA
2 days ago
Senior Compiler Engineer
...achieve this. As demand for inference grows, teams will need to... ...Help shape Luminal’s engineering culture from the ground up... .../RDNA assembly, or other GPU ISAs Familiarity with ML... ...tasks will include writing CUDA kernels, conducting model performance reviews. #J-18808-Ljbffr...
Senior
Performance
Full time
Slope
San Francisco, CA
4 days ago
Senior AI Inference Engineer - GPU, Rust & CUDA
$220k
Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience...
Senior
Perplexity
San Francisco, CA
4 days ago
Senior ML Inference Systems Engineer
...of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and improving execution performance across various components. Ideal candidates should have strong software engineering skills and experience with ML inference systems...
Senior
Performance
Gimlet Labs
San Francisco, CA
3 days ago
Senior Robot Perception Engineer - Smart Robotics
ABOUT THE ROLE As a senior Robot Perception Engineer on the Smart Robotics team at Bright... ...deep learning. Optimize model inference for GPU deployment, leveraging CUDA, TensorRT, and related... ...CoaXPress). C/C++ experience for performance‑critical components. Experience...
Senior
Performance
Dormont Manufacturing Co
San Francisco, CA
1 day ago
Multimodal Inference Engineer — Scale GPU AI Models
An innovative company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing and implementing infrastructure... ...for large-scale multimodal models, focusing on high-performance delivery of audio and image inputs. You'll collaborate...
Performance
Jobleads-US
San Francisco, CA
19 hours ago
Senior Backend Engineer - GPU Inference & Real-time Systems
...company in San Francisco is seeking a Senior Software Engineer for Backend (Systems / Infrastructure)... ...role involves optimizing APIs, managing GPU workloads, and collaborating with... ...with TypeScript/Node, strong skills in performance tuning and distributed systems. This position...
Senior
Performance
Vizcom
San Francisco, CA
3 days ago
GPU Kernel Engineer
$100k - $120k
...foundation models. As training and inference workloads grow, we need kernel‑level... ...Responsibilities Lead a team of kernel and system engineers focused on performance-critical code Design, implement, and... ...kernels for CPU (AVX/ARM NEON), GPU (CUDA/ROCm), and hardware accelerators...
Performance
Coda Robotics
San Francisco, CA
3 days ago
GPU Optimization Engineer
$300k
GPU Optimisation Engineer — Real-Time Inference Want to push GPU performance to its limits — not in theory, but in production systems handling real-time speech and multimodal... ..., and scheduling Writing and tuning custom CUDA / Triton kernels for performance-critical paths...
Performance
Relocation
Visa sponsorship
Free visa
Techire Ai
San Francisco, CA
2 days ago
Remote Realtime Speech Inference Engineer
Machine Learning Engineer, Inference Want to solve realtime inference problems... ..., scheduler design, GPU utilisation, concurrency optimisation... ...already operates beyond the performance of most publicly available... ...TensorRT, Triton, vLLM, CUDA Graphs, ONNX Runtime, or custom...
Performance
Remote job
Flexible hours
Trades Workforce Solutions
San Francisco, CA
3 days ago
LLM Inference Frameworks and Optimization Engineer
...efficient and scalable inference for large language models... ...the boundaries of performance, scalability, and cost-... ...Frameworks and Optimization Engineer to design, develop, and... ...-throughput inference, GPU/accelerator... ...performance serving. Apply CUDA graph optimizations, TensorRT...
Performance
Gravity Engineering Services Pvt Ltd.
San Francisco, CA
2 days ago
Founding Engineer, ML Inference
...and unicorn founders and senior engineers with deep expertise in 3... ...a Founding Engineer, ML Inference with deep expertise in high-performance ML engineering. This is... ...using torch.compile, custom CUDA kernels, and specialized... ...Working knowledge of GPU hardware (NVIDIA) and...
Performance
Relocation
Visa sponsorship
Relocation package
Reactor
San Francisco, CA
1 day ago
GPU Kernel Engineer
...hands‑on support from AMD engineers the team is scaling... ...a highly skilled GPU Kernel Engineer who is... ...pushing the limits of performance on modern accelerators... ...large‑scale training and inference. This role is ideal for... ...using C++, PTX, CUDA, ROCm, Triton, and/or...
Performance
Flexible hours
Sciforium
San Francisco, CA
1 day ago
Senior Model Inference Engineer for Production-Scale AI
$325k
A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments... ...role involves collaboration with researchers and focus on performance optimization. Compensation ranges from $325K to $490K. #J-1880...
Senior
Performance
Jobleads-US
San Francisco, CA
19 hours ago
Senior Site Reliability Engineer AI Infrastructure
Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote... ...routes training and inference jobs across global... ...debug large‑scale GPU infrastructure used... ...and can reason about performance from network fabric... ...actually run - NCCL, CUDA, PyTorch distributed...
Senior
Performance
Full time
Remote work
Cortes 23
San Francisco, CA
3 days ago
Senior Site Reliability Engineer (GPU Clusters) - Hosting
$250k
...building a next-generation GPU platform designed for AI... ..., experimentation, and inference at scale. The company is... ...company is looking for a Senior / Staff Site Reliability Engineer to support and scale large... ...reliability, scalability, and performance of HPC and cloud...
Senior
Performance
Permanent employment
Remote work
San Francisco, CA
a month ago
Engineer, Inference & Model serving
$220k - $320k
ML Model Serving Engineer Want to build the layer that actually... ...’ll join a team focused on inference, where performance is the product. This is about... ...problems around batching, GPU efficiency, memory... ...infrastructure Exposure to CUDA, GPU profiling tools, or systems...
Performance
3 days per week
Trades Workforce Solutions
San Francisco, CA
3 days ago
Senior GPU Infrastructure Engineer
...we offer an innovative GPU marketplace and AI inference service that promise affordability... ...Role We're seeking a Senior Infrastructure Engineer to help build and scale... ...of GPU architecture, CUDA, and GPU compute... ...Familiarity with high-performance networking technologies...
Senior
Performance
Remote work
Hyperbolic Labs
San Francisco, CA
3 days ago
Senior C++ Systems Engineer - GPU/NIC Performance
Thunder Compute is looking for a core C++ Systems Developer in San Francisco. Your role involves performance optimization and debugging in critical systems. The ideal candidate has top-tier C++ skills and a strong responsibility ethic from day one. This is a full-time in...
Senior
Performance
Full time
Thunder Compute
San Francisco, CA
19 hours ago
Senior GPU HPC Platform Reliability Engineer
A leading AI research company in San Francisco is seeking a software engineer for its Fleet High Performance Computing team. In this role, you'll ensure the reliability and uptime of the compute fleet, working with automation systems and monitoring tools. Ideal candidates...
Senior
Performance
Jobleads-US
San Francisco, CA
4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Inference Performance Engineer — GPU & CUDA. Be the first to apply!