Real-Time GPU Inference Optimization Engineer

$300k

Trades Workforce Solutions

A leading technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal candidate will possess strong experience with CUDA/Triton, a deep understanding of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive base salary of up to ~$300,000 and meaningful equity, this opportunity emphasizes growth rather than backfilling previous roles. Relocation and visa support is available. #J-18808-Ljbffr

Apply

Vacancy posted 4 days ago

Similar jobs that could be interesting for youBased on the Real-Time GPU Inference Optimization Engineer in San Francisco, CA vacancy

Real-Time LLM Inference & Speech Serving Engineer
$180k - $270k
...infrastructure roles in San Francisco, focusing on building high-performance inference engines for speech AI. Ideal candidates will have substantial experience in GPU architecture and real-time systems. This position offers a competitive salary range of $180K - $270K,...
Suggested
Plaud
San Francisco, CA
2 days ago
Latency-Focused Real-Time Robotics Software Engineer (C++, GPU)
A defense tech startup is looking for a Robotics Software Engineer in San Francisco, CA. You will optimize real-time systems performance and ensure subsystem integration of various components. Candidates should have 3-6+ years in robotics engineering and expert-level C++...
Suggested
Aurelius Systems, Inc
San Francisco, CA
2 days ago
Robotics Software Engineer - Real-Time Systems
...Opportunity The company is looking for a Robotics Software Engineer to own and optimize the real-time systems that power a humanoid robot fleet. This is not... ...across at least two to three of: networking, GPU/CPU workloads, video streaming, drivers, kernel scheduling...
Suggested
Work experience placement
Rethink recruit
San Francisco, CA
13 hours ago
LLM Inference Frameworks and Optimization Engineer
$160k - $230k
...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together... ..., high-throughput inference, GPU/accelerator optimizations, and software... ...US base salary range for this full-time position is: $160,000 - $230,000 +...
Suggested
Full time
Together AI
San Francisco, CA
6 days ago
GPU Systems Support Engineer for High-Performance Inference
$200k - $280k
...Francisco is looking for a Staff Machine Learning Engineer to enhance inference systems at production scale. You will design algorithms, optimize performance, and collaborate on RL and... ...systems and algorithms. This is a full-time role offering a competitive salary between...
Suggested
Full time
AI Chopping Block, Inc.
San Francisco, CA
1 day ago
Performance Engineer, Inference Systems
$350k
...committed researchers, engineers, policy experts,... ...Role Anthropic's inference fleet serves... ...regression from request timing down through... ...the highest-impact optimizations your analysis surfaces... ...Familiarity with GPU/TPU/accelerator... ...signals reliably catch real model-output...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
2 days ago
GPU Optimization Engineer
$300k
GPU Optimisation Engineer — Real-Time Inference Want to push GPU performance to its limits — not in theory, but in production systems handling real-time speech and multimodal workloads? This team is building low-latency AI systems where milliseconds actually matter. The...
Relocation
Visa sponsorship
Free visa
Techire Ai
San Francisco, CA
1 day ago
Low-Latency Inference Systems Engineer - On-Device & GPU
Genesis AI is seeking an experienced individual to develop low-latency inference pipelines for on-device deployment in robotics. The role involves designing and optimizing distributed systems on GPU clusters, implementing efficient low-level code such as CUDA and Triton...
Genesis AI
San Francisco, CA
4 days ago
Distributed Systems Engineer, Data & Inference Platform
...intelligence that evolves in real-time. Our vision is AI... ...intelligence - the inference services that serve LLMs... .... Researchers and ML engineers will hand you workloads... ...systems for LLMs, optimizing throughput, latency, and... ...across heterogeneous GPU fleets. Batching, scheduling...
Flexible hours
Adaption
San Francisco, CA
4 days ago
Staff ML Inference Systems Engineer - Scalable GPU Infra (SF)
...of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves... ...pipelines and enhancing performance under real-world workloads. Candidates should have strong software engineering skills, experience with ML inference systems...
Acceler8 Talent
San Francisco, CA
3 days ago
GPU HPC Systems Engineer for AI Inference & Equity
$160k - $320k
...leading AI computing firm is seeking a Systems Engineer in San Francisco or Los Angeles to scale AI inference. Candidates should have strong C++ skills,... ...techniques. Responsibilities include designing GPU kernels, optimizing performance, and collaborating with technical...
Vast.ai
San Francisco, CA
4 days ago
Senior Real-Time Pose Estimation Engineer (SLAM & Sensors)
...computer vision seeks a Senior State Estimation Engineer in San Francisco to develop algorithms for real-time pose estimation and mapping. The ideal candidate... ...while contributing to impactful projects aimed at optimizing transit systems. Join us to advance safety and sustainability...
Hayden AI
San Francisco, CA
4 days ago
Senior Backend Engineer - GPU Inference & Real-time Systems
...technology company in San Francisco is seeking a Senior Software Engineer for Backend (Systems / Infrastructure). You will... ...maintain scalability as demand grows. This role involves optimizing APIs, managing GPU workloads, and collaborating with cross-functional teams....
Vizcom
San Francisco, CA
2 days ago
Rust Systems Engineer - Real-Time Robotics Infrastructure
Dimensional Inc. is seeking an experienced engineer with deep expertise in Rust to enhance performance-critical systems for real-time robotic perception and control. You will... ...and implement high-performance components, optimize algorithms, and work closely with robotics...
Dimensional Inc.
San Francisco, CA
2 days ago
Embedded Software Engineer - Real-Time Linux
...recruit an exceptional Embedded Software Engineer - Real-Time Linux to help build the foundational... ...role for developing high-performance, GPU-accelerated compute platforms tailored... ...Machine Learning engineers to develop and optimize high performance autonomous systems....
Maven Robotics
San Francisco, CA
13 hours ago
Performance Engineer, GPU
$280k
...committed researchers, engineers, policy experts,... ...innovations in GPU performance and systems... ...cutting-edge optimizations that directly enable... ...improve inference efficiency. Working... ...language models with real-world impact Care... ...least 25% of the time. However, some roles...
Work at office
Visa sponsorship
Flexible hours
Anthropic
San Francisco, CA
3 days ago
Robotics Software Engineer (Rust): Real-Time, Scalable
...technology company in San Francisco is seeking a Software Engineer with strong Rust experience to build and optimize software for autonomous robots. You will work on... ...inception to completion and have a passion for real-time software and embedded systems, this is the role...
Pantograph
San Francisco, CA
3 days ago
Autonomous Defense Software Engineer — Real-Time, Equity
...technology company in San Francisco is seeking a Software Engineer to develop and optimize autonomous defense systems. The role requires expertise... ...Rust, and Python, along with a strong understanding of real-time performance and embedded systems. Candidates should have...
Mach Industries
San Francisco, CA
13 hours ago
GPU Systems Engineer - HPC / Parallel Computing
$160k - $320k
...deliver excellence. We seek engineers/researchers with strong... ...programming experience to help scale AI inference. You’ll leverage your... ...of high-performance systems to optimize GPU performance at the bleeding edge of AI. Full-Time On-site at either our SF or...
Full time
Work at office
Vast
San Francisco, CA
4 days ago
GPU Kernel Engineer
...GPU Kernel Engineer Sciforium is an AI infrastructure company developing... ...frontier AI models and real-time applications. About the... ...role, you will design and optimize custom GPU kernels that power... ...for large-scale training and inference. This role is ideal for...
Flexible hours
Sciforium
San Francisco, CA
1 day ago
Senior Robotics Controls Engineer — Real-Time & Hardware-In-Loop
...platforms. You will design and integrate control systems, working on real hardware alongside a small, dedicated team. Applicants should... ...a strong background in robotics with hands-on experience in real-time control system design. The position offers competitive salary, meaningful...
Relocation package
Industrial Next (YC W22)
San Francisco, CA
3 days ago
Staff Embedded C++ Engineer - Real-Time Sensor Fusion
A leading navigation technology firm is seeking a Staff Embedded Software Engineer to develop high-performance real-time software that integrates various sensors. The ideal candidate has over 7 years of experience in embedded systems, with strong expertise in modern C++...
Point One Navigation
San Francisco, CA
4 days ago
GPU Kernel Engineer for AI Inference & Performance
FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative...
FriendliAI
San Francisco, CA
4 days ago
Founding Engineer, ML Inference
...kind of platform for real-time generative media, enabling... ...founders and senior engineers with deep expertise in... ...Founding Engineer, ML Inference with deep expertise in... ...inference frameworks, optimizing inference performance,... ...Working knowledge of GPU hardware (NVIDIA) and...
Relocation
Visa sponsorship
Relocation package
Reactor
San Francisco, CA
13 hours ago
Monocular SLAM Engineer for Real-Time 3D Mapping
An innovative AI solutions company in San Francisco seeks a Perception Engineer to develop and optimize monocular SLAM algorithms for real-time localization and 3D mapping. The ideal candidate will have strong expertise in C++ and Python, with a solid background in computer...
EchoTwin AI
San Francisco, CA
2 days ago
Senior Inference Performance Engineer - GPU & CUDA
$220k - $320k
inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques...
inference.net
San Francisco, CA
4 days ago
Monocular SLAM Engineer for Real-Time 3D Mapping
...leading technology firm in San Francisco is seeking a skilled Perception Engineer to develop SLAM systems using monocular cameras. The ideal candidate will design and optimize algorithms for robust real-time localization and mapping in dynamic environments. Candidates should...
EchoTwin AI, Inc.
San Francisco, CA
13 hours ago
LLM Inference Engineer: Frameworks & Optimizations
$160k - $230k
Together AI is seeking an Inference Frameworks and Optimization Engineer in San Francisco, California. The role focuses on designing and optimizing distributed... ...in deep learning inference frameworks, proficiency in GPU programming, and strong collaboration skills....
Together AI
San Francisco, CA
3 days ago
IR Vision Systems Engineer - Real-Time Space Embedded
etc. is hiring a Vision Systems Engineer in San Francisco to develop detection and tracking algorithms for space-based IR sensing programs. This role involves deploying real-time software solutions on embedded hardware for US national security missions. Candidates should...
etc.
San Francisco, CA
4 days ago
System Engineer, GPU Fleet
$200k - $300k
...starved. Technology gave people more time for the things they wanted to do... ...About the Role As a System Engineer, GPU Fleet, you will manage, operate, and optimize hyperscale GPU compute... ...infrastructure supporting AI/ML training and inference workloads. Ensure high...
Local area
Fluidstack
San Francisco, CA
19 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Real-Time GPU Inference Optimization Engineer. Be the first to apply!