Real-Time GPU Inference Optimization Engineer
$300kTrades Workforce Solutions
A leading technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal candidate will possess strong experience with CUDA/Triton, a deep understanding of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive base salary of up to ~$300,000 and meaningful equity, this opportunity emphasizes growth rather than backfilling previous roles. Relocation and visa support is available. #J-18808-Ljbffr
$180k - $270k
...infrastructure roles in San Francisco, focusing on building high-performance inference engines for speech AI. Ideal candidates will have substantial experience in GPU architecture and real-time systems. This position offers a competitive salary range of $180K - $270K,...Suggested- A defense tech startup is looking for a Robotics Software Engineer in San Francisco, CA. You will optimize real-time systems performance and ensure subsystem integration of various components. Candidates should have 3-6+ years in robotics engineering and expert-level C++...Suggested
- ...Opportunity The company is looking for a Robotics Software Engineer to own and optimize the real-time systems that power a humanoid robot fleet. This is not... ...across at least two to three of: networking, GPU/CPU workloads, video streaming, drivers, kernel scheduling...SuggestedWork experience placement
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together... ..., high-throughput inference, GPU/accelerator optimizations, and software... ...US base salary range for this full-time position is: $160,000 - $230,000 +...SuggestedFull time$200k - $280k
...Francisco is looking for a Staff Machine Learning Engineer to enhance inference systems at production scale. You will design algorithms, optimize performance, and collaborate on RL and... ...systems and algorithms. This is a full-time role offering a competitive salary between...SuggestedFull time$350k
...committed researchers, engineers, policy experts,... ...Role Anthropic's inference fleet serves... ...regression from request timing down through... ...the highest-impact optimizations your analysis surfaces... ...Familiarity with GPU/TPU/accelerator... ...signals reliably catch real model-output...Work at officeVisa sponsorshipFlexible hours$300k
GPU Optimisation Engineer — Real-Time Inference Want to push GPU performance to its limits — not in theory, but in production systems handling real-time speech and multimodal workloads? This team is building low-latency AI systems where milliseconds actually matter. The...RelocationVisa sponsorshipFree visa- Genesis AI is seeking an experienced individual to develop low-latency inference pipelines for on-device deployment in robotics. The role involves designing and optimizing distributed systems on GPU clusters, implementing efficient low-level code such as CUDA and Triton...
- ...intelligence that evolves in real-time. Our vision is AI... ...intelligence - the inference services that serve LLMs... .... Researchers and ML engineers will hand you workloads... ...systems for LLMs, optimizing throughput, latency, and... ...across heterogeneous GPU fleets. Batching, scheduling...Flexible hours
- ...of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves... ...pipelines and enhancing performance under real-world workloads. Candidates should have strong software engineering skills, experience with ML inference systems...
$160k - $320k
...leading AI computing firm is seeking a Systems Engineer in San Francisco or Los Angeles to scale AI inference. Candidates should have strong C++ skills,... ...techniques. Responsibilities include designing GPU kernels, optimizing performance, and collaborating with technical...- ...computer vision seeks a Senior State Estimation Engineer in San Francisco to develop algorithms for real-time pose estimation and mapping. The ideal candidate... ...while contributing to impactful projects aimed at optimizing transit systems. Join us to advance safety and sustainability...
- ...technology company in San Francisco is seeking a Senior Software Engineer for Backend (Systems / Infrastructure). You will... ...maintain scalability as demand grows. This role involves optimizing APIs, managing GPU workloads, and collaborating with cross-functional teams....
- Dimensional Inc. is seeking an experienced engineer with deep expertise in Rust to enhance performance-critical systems for real-time robotic perception and control. You will... ...and implement high-performance components, optimize algorithms, and work closely with robotics...
- ...recruit an exceptional Embedded Software Engineer - Real-Time Linux to help build the foundational... ...role for developing high-performance, GPU-accelerated compute platforms tailored... ...Machine Learning engineers to develop and optimize high performance autonomous systems....
$280k
...committed researchers, engineers, policy experts,... ...innovations in GPU performance and systems... ...cutting-edge optimizations that directly enable... ...improve inference efficiency. Working... ...language models with real-world impact Care... ...least 25% of the time. However, some roles...Work at officeVisa sponsorshipFlexible hours- ...technology company in San Francisco is seeking a Software Engineer with strong Rust experience to build and optimize software for autonomous robots. You will work on... ...inception to completion and have a passion for real-time software and embedded systems, this is the role...
- ...technology company in San Francisco is seeking a Software Engineer to develop and optimize autonomous defense systems. The role requires expertise... ...Rust, and Python, along with a strong understanding of real-time performance and embedded systems. Candidates should have...
$160k - $320k
...deliver excellence. We seek engineers/researchers with strong... ...programming experience to help scale AI inference. You’ll leverage your... ...of high-performance systems to optimize GPU performance at the bleeding edge of AI. Full-Time On-site at either our SF or...Full timeWork at office- ...GPU Kernel Engineer Sciforium is an AI infrastructure company developing... ...frontier AI models and real-time applications. About the... ...role, you will design and optimize custom GPU kernels that power... ...for large-scale training and inference. This role is ideal for...Flexible hours
- ...platforms. You will design and integrate control systems, working on real hardware alongside a small, dedicated team. Applicants should... ...a strong background in robotics with hands-on experience in real-time control system design. The position offers competitive salary, meaningful...Relocation package
- A leading navigation technology firm is seeking a Staff Embedded Software Engineer to develop high-performance real-time software that integrates various sensors. The ideal candidate has over 7 years of experience in embedded systems, with strong expertise in modern C++...
- FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...
- ...kind of platform for real-time generative media, enabling... ...founders and senior engineers with deep expertise in... ...Founding Engineer, ML Inference with deep expertise in... ...inference frameworks, optimizing inference performance,... ...Working knowledge of GPU hardware (NVIDIA) and...RelocationVisa sponsorshipRelocation package
- An innovative AI solutions company in San Francisco seeks a Perception Engineer to develop and optimize monocular SLAM algorithms for real-time localization and 3D mapping. The ideal candidate will have strong expertise in C++ and Python, with a solid background in computer...
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques...- ...leading technology firm in San Francisco is seeking a skilled Perception Engineer to develop SLAM systems using monocular cameras. The ideal candidate will design and optimize algorithms for robust real-time localization and mapping in dynamic environments. Candidates should...
$160k - $230k
Together AI is seeking an Inference Frameworks and Optimization Engineer in San Francisco, California. The role focuses on designing and optimizing distributed... ...in deep learning inference frameworks, proficiency in GPU programming, and strong collaboration skills....- etc. is hiring a Vision Systems Engineer in San Francisco to develop detection and tracking algorithms for space-based IR sensing programs. This role involves deploying real-time software solutions on embedded hardware for US national security missions. Candidates should...
$200k - $300k
...starved. Technology gave people more time for the things they wanted to do... ...About the Role As a System Engineer, GPU Fleet, you will manage, operate, and optimize hyperscale GPU compute... ...infrastructure supporting AI/ML training and inference workloads. Ensure high...Local area
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Real-Time GPU Inference Optimization Engineer. Be the first to apply!

