Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior AI Inference Engineer - GPU, Rust & CUDA

$220k

Perplexity

Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience in software engineering with a focus on ML inference, familiarity with deep learning frameworks, and a strong understanding of GPU architectures. Compensation ranges from $220K to $485K. #J-18808-Ljbffr

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Senior AI Inference Engineer - GPU, Rust & CUDA in San Francisco, CA vacancy
  • $220k - $320k

    inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques... 
    Senior

    inference.net

    San Francisco, CA
    4 days ago
  • A leading cloud infrastructure company is seeking a Senior Engineer 2 to join their AI Inference Optimization team. The role involves leading the technical...  ...high-performance computing and a strong understanding of GPU architectures. The position offers a competitive salary... 
    Senior
    Remote job

    DigitalOcean

    San Francisco, CA
    4 days ago
  • $220k

     ...We build and run the inference engine behind every Perplexity query and deploy dozens of model...  ...and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer...  ...management to support in API Gateway. GPU kernels migration to CuTe DSL. Port our... 
    Suggested

    Perplexity

    San Francisco, CA
    3 days ago
  •  ...California is hiring a Backend / Infrastructure Engineer to develop the backbone of their video...  ...work on cloud ingestion, distributed GPU inference pipelines, and collaborate with ML...  ...offering a dynamic environment for impactful AI development. #J-18808-Ljbffr PassFort
    Suggested

    PassFort

    San Francisco, CA
    1 day ago
  • $325k

     ...About the Team Our Inference team brings OpenAI's most...  ...access our state-of-the-art AI models, allowing them...  ...the Role We're hiring engineers to scale and optimize OpenAI...  ...across emerging GPU platforms. You'll work...  ...GPU kernels using HIP, CUDA, or Triton, and care deeply... 
    Suggested

    Centaur Labs

    San Francisco, CA
    1 day ago
  • $220k - $320k

    A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation... 
    Senior
    Local area

    Inference

    San Francisco, CA
    3 days ago
  •  ...Asari AI in San Francisco is seeking individuals to optimize high-performance, mission-critical computing systems. You'll work with...  ...performance and design complex systems. The ideal candidate has strong CUDA C experience and fluency in Python and C/C++. We offer... 
    Flexible hours

    Asari AI

    San Francisco, CA
    4 days ago
  • $175k - $225k

     ...led by veteran operators and engineers, alumni of Sonos, Paypal, Tesla...  ...We're looking for an AI Inference Engineer who lives at the boundary...  ...country. If you are obsessed with CUDA kernels, TensorRT...  ...kernels and perform low-level GPU tuning to maximize throughput... 
    Local area
    Remote work

    Sauron

    San Francisco, CA
    18 hours ago
  • $160k - $250k

     ...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI is building the Inference Platform that brings...  ...programming in one or more of: Rust, Go, Python, or...  ...plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies... 
    Senior
    Full time
    Local area

    Together AI

    San Francisco, CA
    1 day ago
  • $160k - $225k

     ...Cacheflow is seeking a Senior Software Engineer for AI Runtime at Databricks, located in San Francisco. You will be instrumental in building and scaling systems for large-scale GPU training, ensuring high throughput and resilience in training across expansive fleets of... 
    Senior

    Cacheflow

    San Francisco, CA
    7 hours ago
  •  ...Hamilton Barnes Associates Limited is looking for a Senior Storage Engineer to support large-scale AI infrastructure in San Francisco. This role involves designing...  ...scalable storage solutions for high-performance GPU platforms. The ideal candidate has extensive experience... 
    Senior
    Remote work

    Hamilton Barnes Associates Limited

    San Francisco, CA
    4 days ago
  • Accellor is seeking an AI Engineer in San Francisco to develop and optimize AI/ML models. The ideal candidate should have strong Python skills...  ...include building training pipelines and debugging models on CUDA-enabled GPUs. This position is an excellent opportunity for... 

    Accellor

    San Francisco, CA
    2 days ago
  • Quadric in San Francisco is looking for an experienced AI Kernel Engineer to develop and optimize AI kernels for their innovative neural processing...  ...and more than 5 years of relevant experience. Knowledge of CUDA, DSP, and C/C++ is essential. Benefits include life insurance... 
    Senior

    Quadric

    San Francisco, CA
    4 days ago
  • $167.2k - $209k

    A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong... 
    Senior
    Remote job

    DigitalOcean

    San Francisco, CA
    8 days ago
  • $216k - $270k

     ...As a Software Engineer on the Machine Learning Infrastructure...  ..." for our large-scale GPU clusters. You will...  ...compute into breakthrough AI. You will: Architect...  ...languages (e.g. Python, Go, Rust, C++) ~ Experience with...  ...and hardware stack (CUDA, NCCL) Experience with... 
    Senior
    Full time

    Scale AI

    San Francisco, CA
    21 days ago
  • A cutting-edge AI technology company based in San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate... 
    Senior

    Reflection AI

    San Francisco, CA
    4 days ago
  • Pragmatike is seeking a CUDA Kernel Engineer for a remote position to develop and optimize NVIDIA CUDA kernels for high-performance AI systems. The ideal candidate will have a deep understanding of GPU architecture, performance optimization strategies, and hands-on experience... 
    Senior
    Remote work
    Relocation package

    Pragmatike

    San Francisco, CA
    2 days ago
  • $300k

     ...startup building an AI and cloud platform,...  ...model training, or inference.  Our client...  ...operates high-performance GPU clusters powering...  ...operate inference engines such as vLLM, SGLang...  ...in Python, Go, Rust, or a comparable language...  ...software stacks (CUDA, Triton, NCCL) and... 
    Senior
    Permanent employment
    Worldwide
    San Francisco, CA
    more than 2 months ago
  •  ...Linuxcareers is seeking an Infrastructure/Cluster Engineer to design and operate large-scale clusters that enable AI inference at scale. The role focuses on managing diverse...  ...systems for cluster health. Experience with GPU infrastructure is a plus. #J-18808-Ljbffr... 

    Linuxcareers

    San Francisco, CA
    3 days ago
  •  ...leading design technology company in San Francisco is seeking a Senior Software Engineer for Backend (Systems / Infrastructure). You will architect...  ...demand grows. This role involves optimizing APIs, managing GPU workloads, and collaborating with cross-functional teams.... 
    Senior

    Vizcom

    San Francisco, CA
    3 days ago
  • $405k

     ...Staff Engineer, Inference Runtime Anthropic's mission...  ...interpretable, and steerable AI systems. We want...  ...on. This is a senior IC role with broad...  ...-sensitive Rust and Python codebase...  ...management – across GPU, TPU, and Trainium...  ...ecosystem (CUDA/GPU, TPU, or Trainium... 
    Work at office
    Visa sponsorship
    Flexible hours

    Colorwave Inc

    San Francisco, CA
    1 day ago
  •  ...An innovative AI company is seeking a Software Engineer to develop infrastructure that supports AI training and inference workflows. This role requires strong object-oriented programming...  ...scalable model serving, optimize multi-GPU infrastructure, and enhance system reliability... 

    SpreeAI

    San Francisco, CA
    3 days ago
  • Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure....  ...impact millions of meetings, ensuring efficient GPU utilization, and debugging production... 

    Fathom

    San Francisco, CA
    1 day ago
  • $190k - $270k

    AI Chopping Block, Inc. is seeking an AI Infrastructure Engineer in San Francisco. This role requires maintaining user-facing services and production systems, specializing in systems while ensuring their reliability and scalability. Candidates should have 5+ years of experience... 
    Senior

    AI Chopping Block, Inc.

    San Francisco, CA
    2 days ago
  • $150k - $250k

     ...Senior/Staff AI Engineer Job Locations US-CA-San Francisco - Remote | US-NC-Raleigh...  ...infrastructure behind real-world model serving and inference. This is the role for engineers who...  ...Improve performance across GPU and CPU pathways Work on KV cache,... 
    Senior
    Full time
    Remote work

    DataDirect Networks Inc

    San Francisco, CA
    1 day ago
  •  ...A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal... 
    Senior

    Jobleads-US

    San Francisco, CA
    3 days ago
  •  ...financial world. The role: SoFi's Senior Staff AI Engineer is a hands-on AI engineering role in...  ...ensure high-throughput, low-latency inference across diverse hardware footprints....  ...and managing the underlying Kubernetes/GPU orchestration for custom model deployments... 
    Senior
    Remote work

    SoFi

    San Francisco, CA
    3 days ago
  •  ...About Us Most AI is frozen in place...  ...intelligence - the inference services that serve...  ...Researchers and ML engineers will hand you workloads...  ...heterogeneous GPU fleets. Batching, scheduling...  ...language (Go, Rust, C++). ~ Working...  ...accelerator stack: CUDA fundamentals, NCCL,... 
    Flexible hours

    Adaption

    San Francisco, CA
    24 days ago
  •  ...Senior Software Engineer We're hiring a Senior Software Engineer onto our Applied AI team to build and extend the backend systems that power...  ...layer that connects them to our GPU-resident compute. A note...  ...Familiarity with causal inference or graph-based systems... 
    Senior
    Work at office

    Alembic Technologies

    San Francisco, CA
    1 day ago
  •  ...Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work...  ...will have a strong background in CUDA or similar, with proven experience in kernel... 
    Senior

    MakerMaker

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Inference Engineer - GPU, Rust & CUDA. Be the first to apply!