Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Inference Performance Engineer - GPU & CUDA

$220k - $320k

inference.net

inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques and debugging inference frameworks. The role offers a competitive salary of $220,000 - $320,000 plus equity, and values curiosity and fast learning. You will join a team devoted to innovative AI solutions. #J-18808-Ljbffr inference.net

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Senior Inference Performance Engineer - GPU & CUDA in San Francisco, CA vacancy
  •  ...Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work...  ...will have a strong background in CUDA or similar, with proven experience in kernel... 
    Senior
    Performance

    MakerMaker

    San Francisco, CA
    3 days ago
  • $167.2k - $209k

     ...DigitalOcean is seeking a Senior Engineer 2 to play a key technical role in our AI Inference Optimization team....  ...the industry-leading performance for our inference services...  ...inference engine and GPU kernel layers,...  ...their software stacks (CUDA, ROCm, TensorRT, OpenAI... 
    Senior
    Performance
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    17 hours ago
  •  ...FriendliAI is seeking a GPU Kernel Engineer in San Francisco to design and optimize GPU kernels for AI inference. This role requires expertise in CUDA, C++, and performance-critical systems. You will work on cutting-edge GPU technology and contribute to a highly collaborative... 
    Performance

    FriendliAI

    San Francisco, CA
    4 days ago
  • $300k

     ...technology firm in San Francisco seeks a GPU Optimisation Engineer to maximize GPU performance in real-time AI systems. The...  ...possess strong experience with CUDA/Triton, a deep understanding of...  ...execution, and a knack for optimizing inference latency for large generative... 
    Performance
    Visa sponsorship
    Relocation package

    Trades Workforce Solutions

    San Francisco, CA
    4 days ago
  • $128.7k - $261.3k

     ...autonomous driving technology. The role focuses on designing high-performance GPU kernels, optimizing ML performance, and collaborating cross-...  .... The ideal candidate will have a strong background in CUDA programming and C++, with a minimum of 3 years of industry experience... 
    Senior
    Performance

    Israelvcforum

    San Francisco, CA
    4 days ago
  •  ...MakerMaker.AI is looking for a Senior Machine Learning Systems Engineer in San Francisco. In this...  ...build and operate production inference systems, optimizing for performance and reliability. The ideal candidate...  ...and have strong knowledge in GPU-accelerated inference.... 
    Senior
    Performance

    MakerMaker.AI

    San Francisco, CA
    3 days ago
  • $167.2k - $209k

     ...driven applications. We are seeking a Senior Engineer 2 to join our AI Inference Data Plane team. In this role, you...  ...their models with industry-leading performance and reliability. This is a hands‑on...  ...& Interconnects: Understanding of GPU‑level optimisation and experience... 
    Senior
    Performance
    Local area
    Remote work
    Worldwide
    Flexible hours

    DigitalOcean

    San Francisco, CA
    3 days ago
  • $225k

     ...RL, ultra‑long context, and inference‑time compute to achieve this...  ...About The Role As a Software Engineer on the Inference & RL Systems...  ...work on Design and scale high‑performance inference serving systems Optimize...  ...bottlenecks across GPU, networking, and storage layers... 
    Senior
    Performance
    Relocation
    Visa sponsorship

    Magic

    San Francisco, CA
    23 hours ago
  •  ...AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for...  ...computation efficiency. Ideal candidates have 1–5 years of CUDA development experience and a strong understanding of... 
    Performance

    Baseten

    San Francisco, CA
    4 days ago
  •  ...San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate will have hands-on experience with modern... 
    Senior
    Performance

    Reflection AI

    San Francisco, CA
    4 days ago
  •  ...OpenAI is seeking a GPU Inference Engineer based in San Francisco, CA. In this high-impact role, you'll optimize inference performance and scalability for Robotics research, driving engineering efforts to enhance model serving and system efficiency. The ideal candidate... 
    Performance
    Work at office
    Relocation
    Relocation package

    OpenAI

    San Francisco, CA
    16 hours ago
  • $180k - $250k

     ...A leading technology company in San Francisco is seeking a skilled engineer to build custom compute environments, enhancing GPU performance for customer workloads. Candidates should have deep expertise in Linux virtualization and networking fundamentals, along with experience... 
    Senior
    Performance
    Relocation package

    Fal

    San Francisco, CA
    4 days ago
  •  ...achieve this. As demand for inference grows, teams will need to...  ...Help shape Luminal’s engineering culture from the ground up...  .../RDNA assembly, or other GPU ISAs Familiarity with ML...  ...tasks will include writing CUDA kernels, conducting model performance reviews. #J-18808-Ljbffr... 
    Senior
    Performance
    Full time

    Slope

    San Francisco, CA
    3 days ago
  •  ...A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong... 
    Senior
    Performance

    Hyperbolic Labs

    San Francisco, CA
    3 days ago
  • $220k

    Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience... 
    Senior

    Perplexity

    San Francisco, CA
    3 days ago
  •  ...you. ABOUT THE ROLE As a senior Robot Perception Engineer on the Smart Robotics team at...  ...learning Optimize model inference for GPU deployment, leveraging CUDA, TensorRT, and related acceleration...  ...) C/C++ experience for performance-critical components... 
    Senior
    Performance

    Bright Machines

    San Francisco, CA
    26 days ago
  •  ...of Technical Staff to design and optimize inference systems. The role involves managing KV cache allocation and improving execution performance across various components. Ideal candidates should have strong software engineering skills and experience with ML inference systems... 
    Senior
    Performance

    Gimlet Labs

    San Francisco, CA
    2 days ago
  •  ...company in San Francisco is seeking a Senior Software Engineer for Backend (Systems / Infrastructure)...  ...role involves optimizing APIs, managing GPU workloads, and collaborating with...  ...with TypeScript/Node, strong skills in performance tuning and distributed systems. This position... 
    Senior
    Performance

    Vizcom

    San Francisco, CA
    3 days ago
  •  ...build and operate the inference systems that serve...  .... This is an engineering role, not a research...  ...Own the performance characteristics of...  ...WE'RE LOOKING FOR Senior ML systems engineer...  ...change Experience with GPU‑accelerated inference...  ...following languages: C++, CUDA, ROCm or Triton... 
    Performance

    MakerMaker.AI

    San Francisco, CA
    4 days ago
  •  ...Description Machine Learning Engineer, Inference Want to solve realtime...  ..., scheduler design, GPU utilisation, concurrency optimisation...  ...already operates beyond the performance of most publicly available...  ...TensorRT, Triton, vLLM, CUDA Graphs, ONNX Runtime, or custom... 
    Performance
    Remote work
    Flexible hours

    techire ai

    San Francisco, CA
    4 days ago
  • $100k - $120k

     ...foundation models. As training and inference workloads grow, we need kernel‑level...  ...Responsibilities Lead a team of kernel and system engineers focused on performance-critical code Design, implement, and...  ...kernels for CPU (AVX/ARM NEON), GPU (CUDA/ROCm), and hardware accelerators... 
    Performance

    Coda Robotics

    San Francisco, CA
    3 days ago
  • $160k - $230k

     ...LLM Inference Frameworks and Optimization Engineer San Francisco, Singapore, Amsterdam About...  ...the boundaries of performance, scalability, and cost-efficiency...  ...-throughput inference, GPU/accelerator optimizations...  ...performance serving. Apply CUDA graph optimizations,... 
    Performance
    Full time

    Together AI

    San Francisco, CA
    21 days ago
  •  ...powers mission‑critical inference for the world's most...  ...help build the platform engineers turn to to ship AI...  ...ROLE We’re seeking a GPU Kernel Engineer to join...  ...directly impacts the performance of state‑of‑the‑art machine...  ...optimize code using CUDA, PTX assembly, and architecture... 
    Performance
    Flexible hours

    Baseten

    San Francisco, CA
    3 days ago
  •  ...Sciforium Gpu Kernel Engineer Sciforium is an AI infrastructure company...  ...about pushing the limits of performance on modern accelerators. In...  ...for large-scale training and inference. This role is ideal for...  ...kernels using C++, PTX, CUDA, ROCm, Triton, and/or JAX Pallas... 
    Performance
    Flexible hours

    Sciforium

    San Francisco, CA
    1 day ago
  • $500 per month

     ...re a small team of engineers, former US...  ...architecture and performance of our full software...  ...computer vision, ML inference, controls, fire control...  ...across CPU, GPU, memory, and I/O;...  ...against. This is a senior IC role with subteam...  ...Develop and optimize CUDA kernels for high-throughput... 
    Senior
    Performance
    Permanent employment
    Work at office
    Monday to Friday
    Night shift
    Weekend work

    Aurelius Systems

    San Francisco, CA
    3 days ago
  • An innovative company is seeking a talented software engineer to join their dynamic Inference team. This role involves designing and implementing infrastructure...  ...for large-scale multimodal models, focusing on high-performance delivery of audio and image inputs. You'll collaborate... 
    Performance

    OpenAI

    San Francisco, CA
    3 days ago
  • $280k

     ...group of committed researchers, engineers, policy experts, and...  ...breakthrough innovations in GPU performance and systems engineering. As...  ...capabilities and dramatically improve inference efficiency. Working at...  ...GPU Kernel Development: CUDA, Triton, CUTLASS, Flash Attention... 
    Performance
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    3 days ago
  •  ...Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote...  ...routes training and inference jobs across global...  ...debug large-scale GPU infrastructure used...  ...and can reason about performance from network fabric...  ...actually run — NCCL, CUDA, PyTorch distributed... 
    Senior
    Performance
    Full time
    Remote work

    Andromeda

    San Francisco, CA
    3 days ago
  • $300k

    GPU Optimisation Engineer — Real-Time Inference Want to push GPU performance to its limits — not in theory, but in production systems handling real-time speech and multimodal...  ..., and scheduling Writing and tuning custom CUDA / Triton kernels for performance-critical paths... 
    Performance
    Relocation
    Visa sponsorship
    Free visa

    Techire Ai

    San Francisco, CA
    1 day ago
  • $220k - $320k

     ...ML Model Serving Engineer Want to build the layer that actually...  ...’ll join a team focused on inference, where performance is the product. This is about...  ...problems around batching, GPU efficiency, memory...  ...infrastructure Exposure to CUDA, GPU profiling tools, or systems... 
    Performance
    3 days per week

    Trades Workforce Solutions

    San Francisco, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Inference Performance Engineer - GPU & CUDA. Be the first to apply!