Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Real-Time GPU Inference Optimization Engineer

$300k

Trades Workforce Solutions

A leading technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The ideal candidate will possess strong experience with CUDA/Triton, a deep understanding of GPU execution, and a knack for optimizing inference latency for large generative models. With a competitive base salary of up to ~$300,000 and meaningful equity, this opportunity emphasizes growth rather than backfilling previous roles. Relocation and visa support is available. #J-18808-Ljbffr

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Real-Time GPU Inference Optimization Engineer in San Francisco, CA vacancy
  • $180k - $270k

     ...infrastructure roles in San Francisco, focusing on building high-performance inference engines for speech AI. Ideal candidates will have substantial experience in GPU architecture and real-time systems. This position offers a competitive salary range of $180K - $270K,... 
    Suggested

    Plaud

    San Francisco, CA
    2 days ago
  • A defense tech startup is looking for a Robotics Software Engineer in San Francisco, CA. You will optimize real-time systems performance and ensure subsystem integration of various components. Candidates should have 3-6+ years in robotics engineering and expert-level C++... 
    Suggested

    Aurelius Systems, Inc

    San Francisco, CA
    2 days ago
  •  ...Opportunity The company is looking for a Robotics Software Engineer to own and optimize the real-time systems that power a humanoid robot fleet. This is not...  ...across at least two to three of: networking, GPU/CPU workloads, video streaming, drivers, kernel scheduling... 
    Suggested
    Work experience placement

    Rethink recruit

    San Francisco, CA
    13 hours ago
  • $160k - $230k

     ...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About the Role At Together...  ..., high-throughput inference, GPU/accelerator optimizations, and software...  ...US base salary range for this full-time position is: $160,000 - $230,000 +... 
    Suggested
    Full time

    Together AI

    San Francisco, CA
    6 days ago
  • $200k - $280k

     ...Francisco is looking for a Staff Machine Learning Engineer to enhance inference systems at production scale. You will design algorithms, optimize performance, and collaborate on RL and...  ...systems and algorithms. This is a full-time role offering a competitive salary between... 
    Suggested
    Full time

    AI Chopping Block, Inc.

    San Francisco, CA
    1 day ago
  • $350k

     ...committed researchers, engineers, policy experts,...  ...Role Anthropic's inference fleet serves...  ...regression from request timing down through...  ...the highest-impact optimizations your analysis surfaces...  ...Familiarity with GPU/TPU/accelerator...  ...signals reliably catch real model-output... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    2 days ago
  • $300k

    GPU Optimisation Engineer — Real-Time Inference Want to push GPU performance to its limits — not in theory, but in production systems handling real-time speech and multimodal workloads? This team is building low-latency AI systems where milliseconds actually matter. The... 
    Relocation
    Visa sponsorship
    Free visa

    Techire Ai

    San Francisco, CA
    1 day ago
  • Genesis AI is seeking an experienced individual to develop low-latency inference pipelines for on-device deployment in robotics. The role involves designing and optimizing distributed systems on GPU clusters, implementing efficient low-level code such as CUDA and Triton... 

    Genesis AI

    San Francisco, CA
    4 days ago
  •  ...intelligence that evolves in real-time. Our vision is AI...  ...intelligence - the inference services that serve LLMs...  .... Researchers and ML engineers will hand you workloads...  ...systems for LLMs, optimizing throughput, latency, and...  ...across heterogeneous GPU fleets. Batching, scheduling... 
    Flexible hours

    Adaption

    San Francisco, CA
    4 days ago
  •  ...of Technical Staff focused on building and optimizing ML inference systems in San Francisco. The role involves...  ...pipelines and enhancing performance under real-world workloads. Candidates should have strong software engineering skills, experience with ML inference systems... 

    Acceler8 Talent

    San Francisco, CA
    3 days ago
  • $160k - $320k

     ...leading AI computing firm is seeking a Systems Engineer in San Francisco or Los Angeles to scale AI inference. Candidates should have strong C++ skills,...  ...techniques. Responsibilities include designing GPU kernels, optimizing performance, and collaborating with technical... 

    Vast.ai

    San Francisco, CA
    4 days ago
  •  ...computer vision seeks a Senior State Estimation Engineer in San Francisco to develop algorithms for real-time pose estimation and mapping. The ideal candidate...  ...while contributing to impactful projects aimed at optimizing transit systems. Join us to advance safety and sustainability... 

    Hayden AI

    San Francisco, CA
    4 days ago
  •  ...technology company in San Francisco is seeking a Senior Software Engineer for Backend (Systems / Infrastructure). You will...  ...maintain scalability as demand grows. This role involves optimizing APIs, managing GPU workloads, and collaborating with cross-functional teams.... 

    Vizcom

    San Francisco, CA
    2 days ago
  • Dimensional Inc. is seeking an experienced engineer with deep expertise in Rust to enhance performance-critical systems for real-time robotic perception and control. You will...  ...and implement high-performance components, optimize algorithms, and work closely with robotics... 

    Dimensional Inc.

    San Francisco, CA
    2 days ago
  •  ...recruit an exceptional Embedded Software Engineer - Real-Time Linux to help build the foundational...  ...role for developing high-performance, GPU-accelerated compute platforms tailored...  ...Machine Learning engineers to develop and optimize high performance autonomous systems.... 

    Maven Robotics

    San Francisco, CA
    13 hours ago
  • $280k

     ...committed researchers, engineers, policy experts,...  ...innovations in GPU performance and systems...  ...cutting-edge optimizations that directly enable...  ...improve inference efficiency. Working...  ...language models with real-world impact Care...  ...least 25% of the time. However, some roles... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    3 days ago
  •  ...technology company in San Francisco is seeking a Software Engineer with strong Rust experience to build and optimize software for autonomous robots. You will work on...  ...inception to completion and have a passion for real-time software and embedded systems, this is the role... 

    Pantograph

    San Francisco, CA
    3 days ago
  •  ...technology company in San Francisco is seeking a Software Engineer to develop and optimize autonomous defense systems. The role requires expertise...  ...Rust, and Python, along with a strong understanding of real-time performance and embedded systems. Candidates should have... 

    Mach Industries

    San Francisco, CA
    13 hours ago
  • $160k - $320k

     ...deliver excellence.  We seek engineers/researchers with strong...  ...programming experience to help scale AI inference. You’ll leverage your...  ...of high-performance systems to optimize GPU performance at the bleeding edge of AI. Full-Time On-site at either our SF or... 
    Full time
    Work at office

    Vast

    San Francisco, CA
    4 days ago
  •  ...GPU Kernel Engineer Sciforium is an AI infrastructure company developing...  ...frontier AI models and real-time applications. About the...  ...role, you will design and optimize custom GPU kernels that power...  ...for large-scale training and inference. This role is ideal for... 
    Flexible hours

    Sciforium

    San Francisco, CA
    1 day ago
  •  ...platforms. You will design and integrate control systems, working on real hardware alongside a small, dedicated team. Applicants should...  ...a strong background in robotics with hands-on experience in real-time control system design. The position offers competitive salary, meaningful... 
    Relocation package

    Industrial Next (YC W22)

    San Francisco, CA
    3 days ago
  • A leading navigation technology firm is seeking a Staff Embedded Software Engineer to develop high-performance real-time software that integrates various sensors. The ideal candidate has over 7 years of experience in embedded systems, with strong expertise in modern C++... 

    Point One Navigation

    San Francisco, CA
    4 days ago
  • FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative... 

    FriendliAI

    San Francisco, CA
    4 days ago
  •  ...kind of platform for real-time generative media, enabling...  ...founders and senior engineers with deep expertise in...  ...Founding Engineer, ML Inference with deep expertise in...  ...inference frameworks, optimizing inference performance,...  ...Working knowledge of GPU hardware (NVIDIA) and... 
    Relocation
    Visa sponsorship
    Relocation package

    Reactor

    San Francisco, CA
    13 hours ago
  • An innovative AI solutions company in San Francisco seeks a Perception Engineer to develop and optimize monocular SLAM algorithms for real-time localization and 3D mapping. The ideal candidate will have strong expertise in C++ and Python, with a solid background in computer... 

    EchoTwin AI

    San Francisco, CA
    2 days ago
  • $220k - $320k

    inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques... 

    inference.net

    San Francisco, CA
    4 days ago
  •  ...leading technology firm in San Francisco is seeking a skilled Perception Engineer to develop SLAM systems using monocular cameras. The ideal candidate will design and optimize algorithms for robust real-time localization and mapping in dynamic environments. Candidates should... 

    EchoTwin AI, Inc.

    San Francisco, CA
    13 hours ago
  • $160k - $230k

    Together AI is seeking an Inference Frameworks and Optimization Engineer in San Francisco, California. The role focuses on designing and optimizing distributed...  ...in deep learning inference frameworks, proficiency in GPU programming, and strong collaboration skills.... 

    Together AI

    San Francisco, CA
    3 days ago
  • etc. is hiring a Vision Systems Engineer in San Francisco to develop detection and tracking algorithms for space-based IR sensing programs. This role involves deploying real-time software solutions on embedded hardware for US national security missions. Candidates should... 

    etc.

    San Francisco, CA
    4 days ago
  • $200k - $300k

     ...starved. Technology gave people more time for the things they wanted to do...  ...About the Role As a System Engineer, GPU Fleet, you will manage, operate, and optimize hyperscale GPU compute...  ...infrastructure supporting AI/ML training and inference workloads. Ensure high... 
    Local area

    Fluidstack

    San Francisco, CA
    19 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Real-Time GPU Inference Optimization Engineer. Be the first to apply!