Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Technical Staff Lead, AI Inference & GPU Infra

Wafer

A tech company specializing in AI infrastructure is seeking a skilled professional to build scalable infrastructure for AI model training and inference. You will lead architectural decisions and work with core systems that power their GPU optimization platform. Candidates should have expertise in GPU fundamentals, deep learning frameworks like PyTorch and TensorFlow, along with experience in C++ and Python. Join a team at the forefront of AI technology in the heart of San Francisco. #J-18808-Ljbffr Wafer

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Technical Staff Lead, AI Inference & GPU Infra in San Francisco, CA vacancy
  •  ...About the Team Our Inference team brings OpenAI’s most capable research...  ...access our state-of-the-art AI models, allowing them to do...  ...infrastructure across emerging GPU platforms. You’ll work across...  ...collaborate closely with research, infra, and performance teams to... 
    Suggested
    Full time

    OpenAI

    San Francisco, CA
    13 hours ago
  • $150k - $300k

     ...agentic models to the infra that enables anyone to...  ...Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri Dao...  ...cloud LLM serving, LLM inference optimization and RL systems...  ...training stack. Core Technical Responsibilities LLM...  ...operates across our cloud GPU fleets. GPU‑Aware... 
    Suggested
    Work at office
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours
    Shift work

    Prime-Intellect

    San Francisco, CA
    16 hours ago
  • $380k

     ...integrating multimodal functionalities into our AI products, ensuring they are reliable,...  ...About the Role We're looking for a GPU Inference Engineer to contribute to improvements...  ...initiatives by building a stronger technical foundation. In this role you will:... 
    Suggested
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    3 days ago
  • $220k

     ...is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience in... 
    Suggested

    Perplexity

    San Francisco, CA
    16 hours ago
  • A cutting-edge AI infrastructure startup is seeking a Kubernetes DevOps Engineer to join their innovative team in San Francisco. The...  ...clusters across various environments, focusing on high-performance GPU workloads. Ideal candidates will have deep Kubernetes expertise... 
    Suggested

    Jack & Jill/External ATS

    San Francisco, CA
    4 days ago
  •  ...agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind,...  ...the Role Design, build, and operate large-scale GPU infrastructure for high-throughput model inference and mid-training workloads. Develop systems that power... 
    Relocation package

    Reflection

    San Francisco, CA
    4 days ago
  • Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed for next-generation...  ..., Model Serving, Distributed Systems, GPU Infrastructure, AI Infrastructure, Inference Runtime... 

    Acceler8 Talent

    San Francisco, CA
    16 hours ago
  • Overview Build low-latency inference pipelines for on-device deployment, enabling real-time next...  ...optimize distributed inference systems on GPU clusters, pushing throughput with large-...  ...for maximum efficiency, throughput, and responsiveness #J-18808-Ljbffr Genesis AI
    Remote job

    Genesis AI

    San Francisco, CA
    1 day ago
  • $220k

    We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and...  ...scheduling and KV-cache management to support in API Gateway. GPU kernels migration to CuTe DSL. Port our in-house CUDA kernels to... 

    Perplexity

    San Francisco, CA
    1 day ago
  • Overview About Liquid AI Spun out of MIT CSAIL, we build general...  .... The Opportunity Our Edge Inference team compiles Liquid...  ...will work directly with the technical lead on problems that require deep...  ...inference kernels for CPU, NPU, and GPU architectures across diverse... 

    Liquid AI

    San Francisco, CA
    3 days ago
  •  ...of humanity. About the Role As a Technical Lead on the Future of Computing Research team...  ...responsible for implementing the low-level inference stack, including kernel development and...  .... About OpenAI OpenAI is an AI research and deployment company dedicated... 
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    1 day ago
  • $200k - $280k

     ...intersection of efficient inference (algorithms,...  ...engines, or similar), GPU performance,...  ...collaborating with infra, research, and product...  ...experience owning complex technical projects end‑to‑end...  ...leadership (Staff level) Set technical...  ...Together AI is an Equal Opportunity... 
    Full time

    AI Chopping Block, Inc.

    San Francisco, CA
    3 days ago
  •  ...with the web by building AI agents that can...  ...for a member of the AI technical staff to join the founding team...  ...Responsibilities: Scale infra for post-training of...  ...Scale infra for agentic inference (throughput and latency...  ...ML infrastructure (GPU clusters) and supporting... 
    Work at office
    Relocation
    Visa sponsorship

    Yutori

    San Francisco, CA
    13 days ago
  • $150k - $200k

    A tech startup specializing in AI infrastructure seeks an AI Infrastructure Specialist to lead technical deployments for GPU neocloud and AI Factory customers. Ideal candidates have over 5 years of Kubernetes experience, practical GPU skills, and networking knowledge. Offering... 
    Flexible hours

    vCluster

    San Francisco, CA
    1 day ago
  •  ...Staff Technical Lead for Inference & ML Performance San Francisco fal is the generative media ecosystem powering the next generation of AI products. We build the infrastructure, tools, and model access that teams need to move from idea to production, and do it at... 

    Fal

    San Francisco, CA
    2 days ago
  • $320k - $405k

     ..., and steerable AI systems. We want...  ...AI systems. Staff Infrastructure Engineer, Node Infra About the role...  ...that keep every GPU, TPU and...  ...Own the technical strategy and roadmap...  ...internal research/inference/product teams to...  ...Track record of leading complex, multi-quarter... 
    Work at office
    Visa sponsorship
    Flexible hours

    Anthropic

    San Francisco, CA
    9 days ago
  • A leading AI technology firm in San Francisco is seeking an AI Infra Engineer to enhance their infrastructure. The successful candidate will design and maintain Kubernetes clusters and manage Slurm for distributed training. Important skills include extensive experience... 

    Perplexity

    San Francisco, CA
    2 days ago
  • vCluster Labs is seeking an AI Infrastructure Specialist to engage directly with customers in deploying GPU solutions. You will drive technical deployments, optimize infrastructure, validate Kubernetes, and build self-sufficiency with customer teams. Ideal candidates will... 

    vCluster Labs

    San Francisco, CA
    4 days ago
  • $209k - $253k

    A leading AI infrastructure company in San Francisco seeks a Staff Software Engineer to design and develop control systems for GPU node management. The candidate will be critical in building foundational cloud infrastructure and achieving business goals. This role requires... 

    Crusoe Energy Systems LLC

    San Francisco, CA
    2 days ago
  •  ...Introducing Moonlake, AI for creating world simulations....  ...'re looking for a Member of Technical Staff - Data & ML Infrastructure Engineer...  ...'s model training and inference infrastructure. This role...  ...regressions. You'll work across GPU kernels, inference systems, distributed... 

    Moonlake AI

    San Francisco, CA
    1 day ago
  • $200k - $280k

    A leading AI company in San Francisco is looking for a Staff Machine Learning Engineer to enhance inference systems at production scale. You will design algorithms, optimize performance, and collaborate on RL and post-training pipelines. Ideal candidates have 3+ years of... 
    Full time

    AI Chopping Block, Inc.

    San Francisco, CA
    3 days ago
  •  ...based platform and solving complex systems challenges, focusing on GPU infrastructures and multi-cloud environments. The ideal candidate...  ...solutions. Join a team dedicated to building open superintelligence and make an impact in the AI space. #J-18808-Ljbffr B Capital

    B Capital

    San Francisco, CA
    2 days ago
  •  ...combines a foundation model for physics with GPU-native solvers to deliver unprecedented...  .... You will implement parsers, simulation/inference pipelines, and distributed execution...  ...collaborating on internal UIs. Cloud and infra experience (GCP/AWS, Terraform) and operating... 
    Remote work
    Flexible hours

    Vinci4D.ai

    San Francisco, CA
    1 day ago
  • $170k - $220k

    Member of Technical Staff - Infrastructure & LLMs Location: San...  ...next-generation inference infrastructure for LLMs...  ...problems like: Scaling multi-GPU inference workloads...  ...Ownership: Drive core infra design with zero red tape...  ...GPU orchestration, or AI infra Strong technical... 
    Full time
    Temporary work
    Immediate start
    Visa sponsorship
    Work visa

    Amadeus Search

    San Francisco, CA
    2 days ago
  • Gimlet Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting...  ...AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU performance. In this role, you will work close to accelerators... 

    Gimlet Labs

    San Francisco, CA
    4 days ago
  •  ...Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is...  ...class AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build the inference... 

    Gimlet Labs

    San Francisco, CA
    4 days ago
  • $150k - $300k

     ...Chief Scientist, Together AI), Dylan Patel (...  ...tuning runs on managed GPU clusters with a single...  ...that runs the jobs. Core Technical Responsibilities Hosted...  ...Kubernetes-based training and inference orchestration across...  ..., training methods, infra patterns - and the ability... 
    Work at office
    Local area
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours

    Prime Intellect

    San Francisco, CA
    12 days ago
  • Introducing Moonlake, AI for creating world simulations. Scope...  ...tensor+pipeline parallel; NCCL tuning. GPU + kernel performance Nsight...  ...packing, KV-cache tricks. Inference optimization Low-latency...  ...AWQ), distillation, pruning. Infra + reliability SLURM/K8s multi... 

    Embedding VC

    San Francisco, CA
    16 hours ago
  •  ...individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI,...  ...workloads through optimization of communication, memory usage, and GPU utilization. Build and maintain training pipelines that support... 
    Relocation package

    Reflection

    San Francisco, CA
    4 days ago
  • $142.2k - $204.6k

     ...Role As a software engineer for GenAI inference, you will help design, develop, and...  ...operations, etc. Hands-on experience with CUDA, GPU programming, and key libraries (cuBLAS,...  ...Databricks Databricks is the data and AI company. More than 10,000 organizations... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    16 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Technical Staff Lead, AI Inference & GPU Infra. Be the first to apply!