Senior Inference Performance Engineer GPU & CUDA
$220k - $320kInference
A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation of $220,000 to $320,000 plus equity and benefits, focusing on accelerating AI inference. Join an innovative team in downtown San Francisco, hybrid options available for local candidates. #J-18808-Ljbffr
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques...SeniorPerformance- Pragmatike is seeking a CUDA Kernel Engineer for a remote position to develop and optimize NVIDIA CUDA kernels for high-performance AI systems. The ideal candidate will have a deep understanding of GPU architecture, performance optimization strategies, and hands-on experience...SeniorPerformanceRemote workRelocation package
- ...Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work... ...will have a strong background in CUDA or similar, with proven experience in kernel...SeniorPerformance
$167.2k - $209k
...DigitalOcean is seeking a Senior Engineer 2 to play a key technical role in our AI Inference Optimization team.... ...the industry-leading performance for our inference... ...inference engine and GPU kernel layers, ensuring... ...their software stacks (CUDA, ROCm, TensorRT, OpenAI...SeniorPerformanceLocal areaRemote workWorldwideFlexible hours$300k
...technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The... ...possess strong experience with CUDA/Triton, a deep understanding of... ...execution, and a knack for optimizing inference latency for large generative...PerformanceVisa sponsorshipRelocation package- FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...Performance
$128.7k - $261.3k
...autonomous driving technology. The role focuses on designing high-performance GPU kernels, optimizing ML performance, and collaborating cross-... .... The ideal candidate will have a strong background in CUDA programming and C++, with a minimum of 3 years of industry experience...SeniorPerformance$167.2k - $209k
...driven applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role, you... ...their models with industry-leading performance and reliability. This is a hands-on... ...& Interconnects: Understanding of GPU-level optimisation and experience...SeniorPerformanceLocal areaRemote workWorldwideFlexible hours$225k
...RL, ultra‑long context, and inference‑time compute to achieve this... ...About The Role As a Software Engineer on the Inference & RL Systems... ...work on Design and scale high‑performance inference serving systems Optimize... ...bottlenecks across GPU, networking, and storage layers...SeniorPerformanceRelocationVisa sponsorship- Requirements Deep experience with GPU programming and performance work (CUDA, Triton, CUTLASS, or similar). Any other deep systems... ...you , 3+ years of professional software engineering experience with meaningful work on ML inference or high-performance systems , Familiarity...Performance
- ...San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate will have hands-on experience with modern...SeniorPerformance
- ...AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for... ...computation efficiency. Ideal candidates have 1-5 years of CUDA development experience and a strong understanding of...Performance
$160k - $320k
...leading AI computing firm is seeking a Systems Engineer in San Francisco or Los Angeles to scale AI inference. Candidates should have strong C++ skills,... .... Responsibilities include designing GPU kernels, optimizing performance, and collaborating with technical leads to...Performance- Pragmatike is seeking a CUDA Kernel Engineer to work remotely for a rapidly growing AI startup. The ideal candidate will have extensive... ...NVIDIA CUDA kernels, with a strong understanding of GPU architecture and performance optimization. Responsibilities include designing CUDA...PerformanceRemote jobRelocation package
$180k - $250k
A leading technology company in San Francisco is seeking a skilled engineer to build custom compute environments, enhancing GPU performance for customer workloads. Candidates should have deep expertise in Linux virtualization and networking fundamentals, along with experience...SeniorPerformanceRelocation package- ...achieve this. As demand for inference grows, teams will need to... ...Help shape Luminal’s engineering culture from the ground up... .../RDNA assembly, or other GPU ISAs Familiarity with ML... ...tasks will include writing CUDA kernels, conducting model performance reviews. #J-18808-Ljbffr...SeniorPerformanceFull time
- Acceler8 Talent is looking for a Software Engineer in San Francisco to focus on building and optimizing inference systems for next-generation AI at scale. You will... ...production inference pipelines and improve system performance under real production constraints. The ideal...SeniorPerformance
$220k
Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience...Senior- ...of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and improving execution performance across various components. Ideal candidates should have strong software engineering skills and experience with ML inference systems...SeniorPerformance
- A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong...SeniorPerformance
- ...you. ABOUT THE ROLE As a senior Robot Perception Engineer on the Smart Robotics team at... ...learning Optimize model inference for GPU deployment, leveraging CUDA, TensorRT, and related acceleration... ...) C/C++ experience for performance-critical components...SeniorPerformance
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About... ...the boundaries of performance, scalability, and cost-efficiency... ...-throughput inference, GPU/accelerator optimizations... ...performance serving. Apply CUDA graph optimizations,...PerformanceFull time- ...GPU Kernel Engineer Sciforium is an AI infrastructure company developing... ...pushing the limits of performance on modern accelerators. In... ...for large-scale training and inference. This role is ideal for... ...GPU kernels using C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas...PerformanceFlexible hours
$500 per month
...re a small team of engineers, former US... ...architecture and performance of our full software... ...computer vision, ML inference, controls, fire control... ...across CPU, GPU, memory, and I/O;... ...against. This is a senior IC role with subteam... ...Develop and optimize CUDA kernels for high-throughput...SeniorPerformancePermanent employmentWork at officeMonday to FridayNight shiftWeekend work- An innovative company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing and implementing infrastructure... ...for large-scale multimodal models, focusing on high-performance delivery of audio and image inputs. You'll collaborate...Performance
- ...company in San Francisco is seeking a Senior Software Engineer for Backend (Systems / Infrastructure)... ...role involves optimizing APIs, managing GPU workloads, and collaborating with... ...with TypeScript/Node, strong skills in performance tuning and distributed systems. This position...SeniorPerformance
$100k - $120k
...foundation models. As training and inference workloads grow, we need kernel‑level... ...Responsibilities Lead a team of kernel and system engineers focused on performance-critical code Design, implement, and... ...kernels for CPU (AVX/ARM NEON), GPU (CUDA/ROCm), and hardware accelerators...Performance- ...is scaling a world-class engineering team across inference, distributed systems, compiler... ...infrastructure, and high-performance AI compute. Their... ...custom inference runtimes CUDA, kernel optimization, or compiler... ...Experience optimizing GPU utilization at scale Background...Performance
$300k
GPU Optimisation Engineer — Real-Time Inference Want to push GPU performance to its limits — not in theory, but in production systems handling real-time speech and multimodal... ..., and scheduling Writing and tuning custom CUDA / Triton kernels for performance-critical paths...PerformanceRelocationVisa sponsorshipFree visa$250k
...building a next-generation GPU platform designed for AI... ..., experimentation, and inference at scale. The company is... ...company is looking for a Senior / Staff Site Reliability Engineer to support and scale large... ...reliability, scalability, and performance of HPC and cloud...SeniorPerformancePermanent employmentRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Inference Performance Engineer GPU & CUDA. Be the first to apply!
- senior game producer San Francisco, CA
- senior manager process engineering San Francisco, CA
- senior manufacturing engineer San Francisco, CA
- senior director fp&a San Francisco, CA
- senior manager clinical operations San Francisco, CA
- senior lead project manager San Francisco, CA
- senior manager quality engineering San Francisco, CA
- senior device engineer San Francisco, CA
- senior full stack developer San Francisco, CA
- senior manufacturing manager San Francisco, CA


