Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior AI Inference Engineer - GPU, Rust & CUDA

$220k

Perplexity

Perplexity is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience in software engineering with a focus on ML inference, familiarity with deep learning frameworks, and a strong understanding of GPU architectures. Compensation ranges from $220K to $485K. #J-18808-Ljbffr Perplexity

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Senior AI Inference Engineer - GPU, Rust & CUDA in San Francisco, CA vacancy
  • $220k - $320k

    inference.net, a growing company in San Francisco, seeks an experienced engineer to optimize AI inference performance. The ideal candidate will have over 2 years of experience in ML systems and GPU programming. Key responsibilities include implementing optimization techniques... 
    Senior

    inference.net

    San Francisco, CA
    4 days ago
  •  ...Sciforium AI Infrastructure Role Sciforium...  ...support from AMD engineers the team is scaling...  ..., and distributed inference features....  ...runtime, service, and GPU layers, working closely...  ...proficiency in C++/Python/Go/Rust ~ Experience...  ...Proficiency in CUDA or ROCm and... 
    Senior
    Work at office
    Flexible hours

    Sciforium

    San Francisco, CA
    5 days ago
  • $220k

    We build and run the inference engine behind every Perplexity query and deploy dozens of model...  ...and cost budgets. Our stack is Rust, Python, CUDA, and CuTe DSL - and we need another engineer...  ...management to support in API Gateway. GPU kernels migration to CuTe DSL. Port... 
    Suggested

    Perplexity

    San Francisco, CA
    4 days ago
  • $220k - $320k

    A tech startup specializing in AI inference seeks a skilled professional to optimize their inference stack. Candidates should have over 2 years of experience in ML systems, fluency in Python, and hands-on experience with LLM frameworks. The role offers competitive compensation... 
    Senior
    Local area

    Inference

    San Francisco, CA
    3 days ago
  • $175k - $225k

     ...led by veteran operators and engineers, alumni of Sonos, Paypal, Tesla...  ...We're looking for an AI Inference Engineer who lives at the boundary...  ...country. If you are obsessed with CUDA kernels, TensorRT...  ...kernels and perform low-level GPU tuning to maximize throughput... 
    Suggested
    Local area
    Remote work

    Sauron

    San Francisco, CA
    5 days ago
  • $160k - $250k

     ...Senior Backend Engineer, Inference Platform San Francisco About the Role Together AI is building the Inference Platform that brings...  ...programming in one or more of: Rust, Go, Python, or...  ...plus. ~ Familiarity with GPU software stacks (CUDA, Triton, NCCL) and HPC technologies... 
    Senior
    Full time
    Local area

    Together AI

    San Francisco, CA
    1 day ago
  • $167.2k - $209k

    A leading cloud service provider is seeking a Senior Engineer 2 for their AI Inference Data Plane team. This remote role focuses on designing and developing high-scale, resilient data plane services that enhance AI-driven applications. The ideal candidate will have strong... 
    Senior
    Remote work

    DigitalOcean

    San Francisco, CA
    4 days ago
  • Asari AI in San Francisco is seeking individuals to optimize high-performance, mission-critical computing systems. You'll work with AI...  ...and design complex systems. The ideal candidate has strong CUDA C experience and fluency in Python and C/C++. We offer competitive... 
    Flexible hours

    Asari AI

    San Francisco, CA
    3 days ago
  • Quadric in San Francisco is looking for an experienced AI Kernel Engineer to develop and optimize AI kernels for their innovative neural processing...  ...and more than 5 years of relevant experience. Knowledge of CUDA, DSP, and C/C++ is essential. Benefits include life insurance... 
    Senior

    Quadric

    San Francisco, CA
    4 days ago
  • $216k - $270k

     ...As a Software Engineer on the Machine Learning Infrastructure...  ..." for our large-scale GPU clusters. You will...  ...compute into breakthrough AI. You will: Architect...  ...languages (e.g. Python, Go, Rust, C++) ~ Experience...  ...software and hardware stack (CUDA, NCCL) Experience... 
    Senior
    Full time

    Scale AI

    San Francisco, CA
    3 days ago
  • A cutting-edge AI technology company based in San Francisco is seeking a specialist to design and operate large-scale GPU infrastructure. This role requires expertise in deploying GPU systems for high-throughput inference and model performance optimization. The ideal candidate... 
    Senior

    Reflection AI

    San Francisco, CA
    4 days ago
  •  ...frontier of distributed and decentralised AI agents. Our research spans vector-...  ...RAG techniques. Comfortable with CUDA tooling for debugging and optimising GPU workloads. Able to design and train...  ...DeepSpeed or vLLM for efficient inference serving. Familiarity with LangChain... 
    Senior

    Synagi

    San Francisco, CA
    5 days ago
  • Pragmatike is seeking a CUDA Kernel Engineer for a remote position to develop and optimize NVIDIA CUDA kernels for high-performance AI systems. The ideal candidate will have a deep understanding of GPU architecture, performance optimization strategies, and hands-on experience... 
    Senior
    Remote work
    Relocation package

    Pragmatike

    San Francisco, CA
    2 days ago
  •  ...leading design technology company in San Francisco is seeking a Senior Software Engineer for Backend (Systems / Infrastructure). You will architect...  ...demand grows. This role involves optimizing APIs, managing GPU workloads, and collaborating with cross-functional teams.... 
    Senior

    Vizcom

    San Francisco, CA
    2 days ago
  • Fathom is seeking a Model Performance Engineer in San Francisco to optimize the speed, cost, and reliability of its model inference stack while building fine-tuning infrastructure....  ...impact millions of meetings, ensuring efficient GPU utilization, and debugging production... 

    Fathom

    San Francisco, CA
    1 day ago
  •  ...About Us Most AI is frozen in place...  ...intelligence - the inference services that serve...  ...Researchers and ML engineers will hand you workloads...  ...heterogeneous GPU fleets. Batching, scheduling...  ...language (Go, Rust, C++). ~ Working...  ...accelerator stack: CUDA fundamentals, NCCL,... 
    Flexible hours

    Adaption

    San Francisco, CA
    3 days ago
  • $150k - $250k

     ...Senior/Staff AI Engineer Job Locations US-CA-San Francisco - Remote | US-NC-Raleigh...  ...infrastructure behind real-world model serving and inference. This is the role for engineers who...  ...Improve performance across GPU and CPU pathways Work on KV cache,... 
    Senior
    Full time
    Remote work

    DataDirect Networks Inc

    San Francisco, CA
    1 day ago
  •  ...Senior Software Engineer We're hiring a Senior Software Engineer onto our Applied AI team to build and extend the backend systems that power...  ...layer that connects them to our GPU-resident compute. A note...  ...Familiarity with causal inference or graph-based systems... 
    Senior
    Work at office

    Alembic Technologies

    San Francisco, CA
    1 day ago
  • $175k - $220k

     ...building the next generation of AI perception systems for...  .... We are seeking a Senior Applied AI & Machine Learning Engineer to design, optimize, and...  ...and maintain training and inference pipelines that support...  ...stacks (ONNX, TensorRT, CUDA) Experience working on... 
    Senior
    Shift work

    Volt

    San Francisco, CA
    3 days ago
  • An innovative AI company is seeking a Software Engineer to develop infrastructure that supports AI training and inference workflows. This role requires strong object-oriented programming...  ...scalable model serving, optimize multi-GPU infrastructure, and enhance system reliability... 

    SpreeAI

    San Francisco, CA
    4 days ago
  • A leading data and AI company in San Francisco is seeking a Senior Engineer to enhance their Model Serving platform. This role requires expertise in building large-scale distributed systems and collaboration across teams to optimize performance and reliability. Ideal candidates... 
    Senior

    Jobleads-US

    San Francisco, CA
    2 days ago
  •  ...The role: SoFi's Staff AI Engineer is a hands-on AI engineering...  ...organization. This is a critical, senior role responsible for setting...  ...high-throughput, low-latency inference across diverse hardware...  ...managing the underlying Kubernetes/GPU orchestration for custom... 
    Senior
    Remote work

    SoFi

    San Francisco, CA
    3 days ago
  • $150k - $200k

     ...Electricity demand is skyrocketing, driven by AI factories, electric vehicles, and...  ...including data ingestion, feature engineering, model training, inference, deployment, and monitoring....  ...programming skills in Python; Go, Java, or Rust is a plus. ~ Experience with... 
    Senior
    Temporary work
    Work experience placement
    Local area
    Shift work

    Amperesand

    San Francisco, CA
    3 days ago
  • AI Platform Engineer - Training & Inference Saviynt's AI-powered identity platform manages and governs human and non-human access to all of an organization...  ...distributed object store, and configure RayData for GPU-direct streaming from GCS/S3. Operate distributed training... 

    Saviynt Inc.

    San Francisco, CA
    2 days ago
  • $207k - $290k

     ...Description About JazzX AI: Vision:...  ...seeking an experienced AI Engineer with deep expertise in...  ...to join our team as a Senior Staff Architect. In this...  ...techniques , including inference-time search, chain-of-thought...  ...(Kubernetes, GPU/TPU clusters, and cloud... 
    Senior
    Worldwide
    Flexible hours

    JazzX AI

    San Francisco, CA
    6 days ago
  •  ...San Francisco is seeking an experienced engineer for its Inference Platform team. This role involves...  ...inference deployments, driving improvements in AI performance, and utilizing Kubernetes...  ...LLM serving frameworks and deploying GPU workloads. The position offers a competitive... 
    Senior

    Fluidstack

    San Francisco, CA
    5 days ago
  • $300k

     ...startup building an AI and cloud platform,...  ...model training, or inference.  Our client...  ...operates high-performance GPU clusters powering...  ...operate inference engines such as vLLM, SGLang...  ...in Python, Go, Rust, or a comparable language...  ...software stacks (CUDA, Triton, NCCL) and... 
    Senior
    Permanent employment
    Worldwide
    San Francisco, CA
    more than 2 months ago
  • $216k - $270k

     ...As a Software Engineer on the ML Infrastructure team, you will design...  ...languages (e.g., Python, Go, Rust, C++). ~ Experience with LLM...  ...TensorRT-LLM, or text-generation-inference. Compensation packages...  ...is to develop reliable AI systems for the world's most important... 
    Senior
    Full time

    Scale AI

    San Francisco, CA
    1 day ago
  • $187.5k - $395k

     ...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe...  ...leverage our expensive GPU resources while meeting internal...  ...similar) Nice to have CUDA FFmpeg Compensation The... 

    Luma AI

    San Francisco, CA
    1 day ago
  • $142.2k - $204.6k

     ...About This Role As a software engineer for GenAI inference, you will help design, develop, and optimize...  ..., etc. Hands-on experience with CUDA, GPU programming, and key libraries (cuBLAS...  ...Databricks Databricks is the data and AI company. More than 10,000... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Inference Engineer - GPU, Rust & CUDA. Be the first to apply!