Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior AI Inference Engineer - GPU, Rust & CUDA

$220k

Perplexity

Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience in software engineering with a focus on ML inference, familiarity with deep learning frameworks, and a strong understanding of GPU architectures. Compensation ranges from $220K to $485K. #J-18808-Ljbffr Perplexity

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Senior AI Inference Engineer - GPU, Rust & CUDA in San Francisco, CA vacancy
  • $220k - $320k

    inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques... 
    Senior

    inference.net

    San Francisco, CA
    4 days ago
  • A leading cloud infrastructure company is seeking a Senior Engineer 2 to join their AI Inference Optimization team. The role involves leading the technical...  ...high-performance computing and a strong understanding of GPU architectures. The position offers a competitive salary... 
    Senior
    Remote job

    DigitalOcean

    San Francisco, CA
    4 days ago
  •  ...Inference Engine Engineer We build and run the inference engine behind every Perplexity query...  ...and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer...  ...through continuous batching and GPU kernel interleaving. Build dashboards... 
    Suggested

    Perplexity AI

    San Francisco, CA
    1 day ago
  •  ...Sciforium AI Infrastructure Role Sciforium...  ...support from AMD engineers the team is scaling...  ..., and distributed inference features....  ...runtime, service, and GPU layers, working closely...  ...proficiency in C++/Python/Go/Rust ~ Experience...  ...Proficiency in CUDA or ROCm and... 
    Senior
    Work at office
    Flexible hours

    Sciforium

    San Francisco, CA
    5 days ago
  • $220k - $320k

    A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation... 
    Senior
    Local area

    Inference

    San Francisco, CA
    3 days ago
  • $175k - $225k

     ...led by veteran operators and engineers, alumni of Sonos, Paypal, Tesla...  ...We're looking for an AI Inference Engineer who lives at the boundary...  ...country. If you are obsessed with CUDA kernels, TensorRT...  ...kernels and perform low-level GPU tuning to maximize throughput... 
    Local area
    Remote work

    Sauron

    San Francisco, CA
    19 hours ago
  • $160k - $250k

     ...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI is building the Inference Platform that brings...  ...programming in one or more of: Rust, Go, Python, or...  ...plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies... 
    Senior
    Full time
    Local area

    Together AI

    San Francisco, CA
    1 day ago
  • Asari AI in San Francisco is seeking individuals to optimize high-performance, mission-critical computing systems. You'll work with AI...  ...and design complex systems. The ideal candidate has strong CUDA C experience and fluency in Python and C/C++. We offer competitive... 
    Flexible hours

    Asari AI

    San Francisco, CA
    3 days ago
  • $167.2k - $209k

    A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong... 
    Senior
    Remote work

    DigitalOcean

    San Francisco, CA
    11 hours ago
  • Hamilton Barnes Associates Limited is looking for a Senior Storage Engineer to support large-scale AI infrastructure in San Francisco. This role involves designing scalable storage solutions for high-performance GPU platforms. The ideal candidate has extensive experience... 
    Senior
    Remote job

    Hamilton Barnes Associates Limited

    San Francisco, CA
    4 days ago
  • Quadric in San Francisco is looking for an experienced AI Kernel Engineer to develop and optimize AI kernels for their innovative neural processing...  ...and more than 5 years of relevant experience. Knowledge of CUDA, DSP, and C/C++ is essential. Benefits include life insurance... 
    Senior

    Quadric

    San Francisco, CA
    4 days ago
  • A cutting-edge AI technology company based in San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate... 
    Senior

    Reflection AI

    San Francisco, CA
    4 days ago
  • $216k - $270k

     ...As a Software Engineer on the Machine Learning Infrastructure...  ..." for our large-scale GPU clusters. You will...  ...compute into breakthrough AI. You will: Architect...  ...languages (e.g. Python, Go, Rust, C++) ~ Experience with...  ...and hardware stack (CUDA, NCCL) Experience with... 
    Senior
    Full time

    Scale AI

    San Francisco, CA
    16 days ago
  • Pragmatike is seeking a CUDA Kernel Engineer for a remote position to develop and optimize NVIDIA CUDA kernels for high-performance AI systems. The ideal candidate will have a deep understanding of GPU architecture, performance optimization strategies, and hands-on experience... 
    Senior
    Remote work
    Relocation package

    Pragmatike

    San Francisco, CA
    2 days ago
  • $300k

     ...startup building an AI and cloud platform,...  ...model training, or inference.  Our client...  ...operates high-performance GPU clusters powering...  ...operate inference engines such as vLLM, SGLang...  ...in Python, Go, Rust, or a comparable language...  ...software stacks (CUDA, Triton, NCCL) and... 
    Senior
    Permanent employment
    Worldwide
    San Francisco, CA
    more than 2 months ago
  •  ...leading design technology company in San Francisco is seeking a Senior Software Engineer for Backend (Systems / Infrastructure). You will architect...  ...demand grows. This role involves optimizing APIs, managing GPU workloads, and collaborating with cross-functional teams.... 
    Senior

    Vizcom

    San Francisco, CA
    3 days ago
  • $405k

     ...Staff Engineer, Inference Runtime Anthropic's mission...  ...interpretable, and steerable AI systems. We want...  ...on. This is a senior IC role with broad...  ...-sensitive Rust and Python codebase...  ...management – across GPU, TPU, and Trainium...  ...ecosystem (CUDA/GPU, TPU, or Trainium... 
    Work at office
    Visa sponsorship
    Flexible hours

    Colorwave Inc

    San Francisco, CA
    1 day ago
  • $190k - $270k

     ...AI Chopping Block, Inc. is seeking an AI Infrastructure Engineer in San Francisco. This role requires maintaining user-facing services and production systems, specializing in systems while ensuring their reliability and scalability. Candidates should have 5+ years of... 
    Senior

    AI Chopping Block, Inc.

    San Francisco, CA
    3 days ago
  • Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure....  ...impact millions of meetings, ensuring efficient GPU utilization, and debugging production... 

    Fathom

    San Francisco, CA
    1 day ago
  •  ...An innovative AI company is seeking a Software Engineer to develop infrastructure that supports AI training and inference workflows. This role requires strong object-oriented programming...  ...scalable model serving, optimize multi-GPU infrastructure, and enhance system reliability... 

    SpreeAI

    San Francisco, CA
    3 days ago
  •  ...Senior Principal Ai Agent / Ml Software Engineer The Senior Principal AI Agent / ML Software Engineer is a Senior...  ...platforms, autonomous workflows, scalable inference infrastructure, and enterprise AI...  ...for low latency, high throughput, GPU efficiency, reliability, cost,... 
    Senior

    Oracle

    San Francisco, CA
    19 hours ago
  •  ...A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal... 
    Senior

    Jobleads-US

    San Francisco, CA
    3 days ago
  •  ...About Us Most AI is frozen in place...  ...intelligence - the inference services that serve...  ...Researchers and ML engineers will hand you workloads...  ...heterogeneous GPU fleets. Batching, scheduling...  ...language (Go, Rust, C++). ~ Working...  ...accelerator stack: CUDA fundamentals, NCCL,... 
    Flexible hours

    Adaption

    San Francisco, CA
    19 days ago
  • $150k - $250k

     ...Senior/Staff AI Engineer Job Locations US-CA-San Francisco - Remote | US-NC-Raleigh...  ...infrastructure behind real-world model serving and inference. This is the role for engineers who...  ...Improve performance across GPU and CPU pathways Work on KV cache,... 
    Senior
    Full time
    Remote work

    DataDirect Networks Inc

    San Francisco, CA
    1 day ago
  •  ...Senior Software Engineer We're hiring a Senior Software Engineer onto our Applied AI team to build and extend the backend systems that power...  ...layer that connects them to our GPU-resident compute. A note...  ...Familiarity with causal inference or graph-based systems... 
    Senior
    Work at office

    Alembic Technologies

    San Francisco, CA
    1 day ago
  • $207k - $290k

     ...Description About JazzX AI: Vision:...  ...seeking an experienced AI Engineer with deep expertise in...  ...to join our team as a Senior Staff Architect. In this...  ...techniques , including inference-time search, chain-of-thought...  ...(Kubernetes, GPU/TPU clusters, and cloud... 
    Senior
    Worldwide
    Flexible hours

    JazzX AI

    San Francisco, CA
    26 days ago
  •  ...Our client is a well-funded AI startup building production-...  ...customers. They are looking for a Senior AI/ML Engineer to own model training...  ...pipelines, evaluation systems, and inference serving at scale. Full-time,...  ...with distributed training, GPU optimization, or inference... 
    Senior
    Full time

    Clera

    San Francisco, CA
    14 days ago
  •  ...financial world. The role: SoFi's Senior Staff AI Engineer is a hands-on AI engineering role in...  ...ensure high-throughput, low-latency inference across diverse hardware footprints....  ...and managing the underlying Kubernetes/GPU orchestration for custom model deployments... 
    Senior
    Remote work

    SoFi

    San Francisco, CA
    3 days ago
  •  ...Francisco, is seeking a highly skilled kernel engineer to write and optimize GPU kernels that enhance performance for training and inference. This role involves deep, low-level work...  ...will have a strong background in CUDA or similar, with proven experience in kernel... 
    Senior

    MakerMaker

    San Francisco, CA
    3 days ago
  •  ...AI Platform Engineer – Training & Inference Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization...  ...distributed object store, and configure RayData for GPU-direct streaming from GCS/S3. Operate distributed training... 

    Saviynt

    San Francisco, CA
    13 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Inference Engineer - GPU, Rust & CUDA. Be the first to apply!