Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Inference Performance Engineer - GPU & CUDA

$220k - $320k

inference.net

inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques and debugging inference frameworks. The role offers a competitive salary of $220,000 - $320,000 plus equity, and values curiosity and fast learning. You will join a team devoted to innovative AI solutions. #J-18808-Ljbffr inference.net

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Senior Inference Performance Engineer - GPU & CUDA in San Francisco, CA vacancy
  • $220k - $320k

    A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation... 
    Senior
    Performance
    Local area

    Inference

    San Francisco, CA
    3 days ago
  • Pragmatike is seeking a CUDA Kernel Engineer for a remote position to develop and optimize NVIDIA CUDA kernels for high-performance AI systems. The ideal candidate will have a deep understanding of GPU architecture, performance optimization strategies, and hands-on experience... 
    Senior
    Performance
    Remote work
    Relocation package

    Pragmatike

    San Francisco, CA
    2 days ago
  • FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative... 
    Performance

    FriendliAI

    San Francisco, CA
    4 days ago
  •  ...San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate will have hands-on experience with modern... 
    Senior
    Performance

    Reflection AI

    San Francisco, CA
    4 days ago
  •  ...AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for...  ...computation efficiency. Ideal candidates have 1-5 years of CUDA development experience and a strong understanding of... 
    Performance

    Baseten

    San Francisco, CA
    4 days ago
  • $160k - $320k

     ...leading AI computing firm is seeking a Systems Engineer in San Francisco or Los Angeles to scale AI inference. Candidates should have strong C++ skills,...  .... Responsibilities include designing GPU kernels, optimizing performance, and collaborating with technical leads to... 
    Performance

    Vast.ai

    San Francisco, CA
    4 days ago
  • Pragmatike is seeking a CUDA Kernel Engineer to work remotely for a rapidly growing AI startup. The ideal candidate will have extensive...  ...NVIDIA CUDA kernels, with a strong understanding of GPU architecture and performance optimization. Responsibilities include designing CUDA... 
    Performance
    Remote job
    Relocation package

    Pragmatike

    San Francisco, CA
    2 days ago
  • $180k - $250k

    A leading technology company in San Francisco is seeking a skilled engineer to build custom compute environments, enhancing GPU performance for customer workloads. Candidates should have deep expertise in Linux virtualization and networking fundamentals, along with experience... 
    Senior
    Performance
    Relocation package

    Fal

    San Francisco, CA
    4 days ago
  • Acceler8 Talent is looking for a Software Engineer in San Francisco to focus on building and optimizing inference systems for next-generation AI at scale. You will...  ...production inference pipelines and improve system performance under real production constraints. The ideal... 
    Senior
    Performance

    Acceler8 Talent

    San Francisco, CA
    2 days ago
  • $220k

    Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience... 
    Senior

    Perplexity

    San Francisco, CA
    3 days ago
  •  ...you.  ABOUT THE ROLE As a senior Robot Perception Engineer on the Smart Robotics team at...  ...learning Optimize model inference for GPU deployment, leveraging CUDA, TensorRT, and related acceleration...  ...) C/C++ experience for performance-critical components... 
    Senior
    Performance
    Full time

    Bright Machines

    San Francisco, CA
    a month ago
  •  ...of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and improving execution performance across various components. Ideal candidates should have strong software engineering skills and experience with ML inference systems... 
    Senior
    Performance

    Gimlet Labs

    San Francisco, CA
    2 days ago
  • A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong... 
    Senior
    Performance

    Hyperbolic Labs

    San Francisco, CA
    4 days ago
  •  ...Description Machine Learning Engineer, Inference Want to solve realtime...  ..., scheduler design, GPU utilisation, concurrency optimisation...  ...already operates beyond the performance of most publicly available...  ...TensorRT, Triton, vLLM, CUDA Graphs, ONNX Runtime, or custom... 
    Performance
    Remote work
    Flexible hours

    techire ai

    San Francisco, CA
    4 days ago
  •  ...GPU Kernel Engineer Sciforium is an AI infrastructure company developing...  ...pushing the limits of performance on modern accelerators. In...  ...for large-scale training and inference. This role is ideal for...  ...GPU kernels using C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas... 
    Performance
    Flexible hours

    Sciforium

    San Francisco, CA
    1 day ago
  • $160k - $230k

     ...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About...  ...the boundaries of performance, scalability, and cost-efficiency...  ...-throughput inference, GPU/accelerator optimizations...  ...performance serving. Apply CUDA graph optimizations,... 
    Performance
    Full time

    Together AI

    San Francisco, CA
    1 day ago
  • An innovative company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing and implementing infrastructure...  ...for large-scale multimodal models, focusing on high-performance delivery of audio and image inputs. You'll collaborate... 
    Performance

    OpenAI

    San Francisco, CA
    3 days ago
  •  ...is scaling a world-class engineering team across inference, distributed systems, compiler...  ...infrastructure, and high-performance AI compute. Their...  ...custom inference runtimes CUDA, kernel optimization, or compiler...  ...Experience optimizing GPU utilization at scale Background... 
    Performance

    Acceler8 Talent

    San Francisco, CA
    2 days ago
  • $100k - $120k

     ...foundation models. As training and inference workloads grow, we need kernel‑level...  ...Responsibilities Lead a team of kernel and system engineers focused on performance-critical code Design, implement, and...  ...kernels for CPU (AVX/ARM NEON), GPU (CUDA/ROCm), and hardware accelerators... 
    Performance

    Coda Robotics

    San Francisco, CA
    2 days ago
  •  ...company in San Francisco is seeking a Senior Software Engineer for Backend (Systems / Infrastructure)...  ...role involves optimizing APIs, managing GPU workloads, and collaborating with...  ...with TypeScript/Node, strong skills in performance tuning and distributed systems. This position... 
    Senior
    Performance

    Vizcom

    San Francisco, CA
    2 days ago
  • $300k

    GPU Optimisation Engineer — Real-Time Inference Want to push GPU performance to its limits — not in theory, but in production systems handling real-time speech and multimodal...  ..., and scheduling Writing and tuning custom CUDA / Triton kernels for performance-critical paths... 
    Performance
    Relocation
    Visa sponsorship
    Free visa

    Techire Ai

    San Francisco, CA
    1 day ago
  •  ...and unicorn founders and senior engineers with deep expertise in 3...  ...a Founding Engineer, ML Inference with deep expertise in high-performance ML engineering. This is...  ...using torch.compile, custom CUDA kernels, and specialized...  ...Working knowledge of GPU hardware (NVIDIA) and... 
    Performance
    Relocation
    Visa sponsorship
    Relocation package

    Reactor

    San Francisco, CA
    9 hours ago
  •  ...powers mission‑critical inference for the world's most...  ...help build the platform engineers turn to to ship AI...  ...ROLE We’re seeking a GPU Kernel Engineer to join...  ...directly impacts the performance of state‑of‑the‑art machine...  ...optimize code using CUDA, PTX assembly, and architecture... 
    Performance
    Flexible hours

    Baseten

    San Francisco, CA
    1 day ago
  •  ...English (required) We are searching for a CUDA Kernel Engineer who has hands‑on experience...  ...kernels from scratch. You will work on the GPU performance layer powering large-scale, high-...  ...or HPC workloads. Knowledge of model inference optimization (TensorRT, CUDA Graphs,... 
    Performance
    Remote job
    Local area
    Immediate start
    Relocation package

    Pragmatike

    San Francisco, CA
    2 days ago
  • $160k - $320k

     ...deliver excellence. We seek engineers/researchers with strong...  ...experience to help scale AI inference. You'll leverage your knowledge of high-performance systems to optimize GPU performance at the bleeding...  ...SF or LA offices Tech Stack CUDA/C++, GPGPU, Python, Linux.... 
    Performance
    Full time
    Work at office

    Vast.ai

    San Francisco, CA
    4 days ago
  • $325k

    A leading AI research company in San Francisco seeks an engineer to optimize their powerful AI models for high-volume production environments...  ...role involves collaboration with researchers and focus on performance optimization. Compensation ranges from $325K to $490K. #J-1880... 
    Senior
    Performance

    OpenAI

    San Francisco, CA
    4 days ago
  • $200k - $280k

    A leading AI company in San Francisco is looking for a Staff Machine Learning Engineer to enhance inference systems at production scale. You will design algorithms, optimize performance, and collaborate on RL and post-training pipelines. Ideal candidates have 3+ years of... 
    Performance
    Full time

    AI Chopping Block, Inc.

    San Francisco, CA
    1 day ago
  •  ...superintelligence. One person, one GPU. If you'd like to build...  ...Tuesday. Product Engineering at Lambda is responsible for...  ...power is only one factor. High-performance networking and storage are essential...  ...supporting AI training and inference at scale. Lambda's Infrastructure... 
    Senior
    Performance
    Work at office
    Local area
    Work from home
    Flexible hours

    Lambda

    San Francisco, CA
    1 day ago
  •  ...Senior HPC & GPU Infrastructure Engineer Sciforium is an AI infrastructure company developing...  ...health, reliability, and performance of our GPU compute...  ...maintaining the ML software stack (CUDA/ROCm, PyTorch, JAX, vLLM)...  ...optimizations, or inference systems. Hands-on... 
    Senior
    Performance
    Flexible hours

    Sciforium

    San Francisco, CA
    4 days ago
  • $160k - $250k

     ...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI is building...  ...the boundaries of inference performance and efficiency. Shape the...  ...plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies... 
    Senior
    Performance
    Full time
    Local area

    Together AI

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Inference Performance Engineer - GPU & CUDA. Be the first to apply!