Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Technical Staff Lead, AI Inference & GPU Infra

WAFER INC

A tech company specializing in AI infrastructure is seeking a skilled professional to build scalable infrastructure for AI model training and inference. You will lead architectural decisions and work with core systems that power their GPU optimization platform. Candidates should have expertise in GPU fundamentals, deep learning frameworks like PyTorch and TensorFlow, along with experience in C++ and Python. Join a team at the forefront of AI technology in the heart of San Francisco. #J-18808-Ljbffr WAFER INC

Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Technical Staff Lead, AI Inference & GPU Infra in San Francisco, CA vacancy
  • $150k - $300k

     ...agentic models to the infra that enables anyone to...  ...Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri Dao...  ...cloud LLM serving, LLM inference optimization and RL systems...  ...training stack. Core Technical Responsibilities LLM...  ...operates across our cloud GPU fleets. GPU‑Aware... 
    Suggested
    Work at office
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours
    Shift work

    Prime-Intellect

    San Francisco, CA
    6 days ago
  • $150k - $300k

     ...frontier agentic models to the infra that enables anyone to...  ...Solutions Architect for GPU Infrastructure, you'll be the technical expert who transforms...  ...the world’s most advanced AI models. We recently raised...  ...for LLM training, inference, and HPC workloads Present... 
    Suggested

    Prime Intellect

    San Francisco, CA
    6 days ago
  •  ...Engineer to design and operate large-scale clusters that enable AI inference at scale. The role focuses on managing diverse hardware...  ...and designing observability systems for cluster health. Experience with GPU infrastructure is a plus. #J-18808-Ljbffr Linuxcareers
    Suggested

    Linuxcareers

    San Francisco, CA
    3 days ago
  • $220k

     ...is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience in... 
    Suggested

    Perplexity

    San Francisco, CA
    6 days ago
  • A cutting-edge AI infrastructure startup is seeking a Kubernetes DevOps Engineer to join their innovative team in San Francisco. The...  ...clusters across various environments, focusing on high-performance GPU workloads. Ideal candidates will have deep Kubernetes expertise... 
    Suggested

    Jack & Jill/External ATS

    San Francisco, CA
    5 days ago
  •  ...with the web by building AI agents that can...  ...for a member of the AI technical staff to join the founding team...  ...Responsibilities: Scale infra for post-training of multimodal...  ...infra for agentic inference (throughput and latency...  ...ML infrastructure (GPU clusters) and supporting... 
    Work at office
    Relocation
    Visa sponsorship

    Yutori

    San Francisco, CA
    1 hour ago
  • $350k

     ...Our first goal is to democratize frontier AI R&D across scientific disciplines. We believe...  ...We are looking for an engineer to own the inference systems that power our models in...  ...deployment Optimize inference performance across GPU and accelerator hardware - maximizing... 

    Mirendil

    San Francisco, CA
    2 days ago
  •  ...maintain large distributed ML training and inference clusters Develop efficient, scalable end-...  ...Analyze, profile and debug low-level GPU operations to optimize performance Stay up...  ...platforms (GCP, AWS, or Azure) and their ML/AI service offerings Familiarity with containerization... 

    Kindredventures

    San Francisco, CA
    4 days ago
  •  ...agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind,...  ...the Role Design, build, and operate large-scale GPU infrastructure for high-throughput model inference and mid-training workloads. Develop systems that power... 
    Full time
    Relocation package

    B Capital

    San Francisco, CA
    2 days ago
  •  ...role focuses on workload orchestration, GPU scheduling, and ensuring system reliability, working with highly technical teams in the AI space. The ideal candidate will have a strong...  ...-on experience with both training and inference infrastructure. The position offers a competitive... 

    Hamilton Barnes Associates Limited

    San Francisco, CA
    5 days ago
  • $220k

    We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and...  ...scheduling and KV-cache management to support in API Gateway. GPU kernels migration to CuTe DSL. Port our in-house CUDA kernels to... 

    Perplexity

    San Francisco, CA
    2 days ago
  • Overview About Liquid AI Spun out of MIT CSAIL, we build general...  .... The Opportunity Our Edge Inference team compiles Liquid...  ...will work directly with the technical lead on problems that require deep...  ...inference kernels for CPU, NPU, and GPU architectures across diverse... 

    Liquid AI

    San Francisco, CA
    4 days ago
  • $209k - $253k

    A leading AI infrastructure company in San Francisco seeks a Staff Software Engineer to design and develop control systems for GPU node management. The candidate will be critical in building foundational cloud infrastructure and achieving business goals. This role requires... 

    Crusoe Energy Systems LLC

    San Francisco, CA
    3 days ago
  •  ...Associates Limited is looking for a Senior Storage Engineer to support large-scale AI infrastructure in San Francisco. This role involves designing scalable storage solutions for high-performance GPU platforms. The ideal candidate has extensive experience in storage... 
    Remote job

    Hamilton Barnes Associates Limited

    San Francisco, CA
    2 days ago
  • A leading AI technology firm in San Francisco is seeking an AI Infra Engineer to enhance their infrastructure. The successful candidate will design and maintain Kubernetes clusters and manage Slurm for distributed training. Important skills include extensive experience... 

    Perplexity

    San Francisco, CA
    3 days ago
  • $320k - $405k

     ..., and steerable AI systems. We want...  ...beneficial AI systems. Staff Infrastructure Engineer, Node Infra About the role...  ...that keep every GPU, TPU and...  ...responsibilities Own the technical strategy and...  ...research/inference/product teams to...  ...Track record of leading complex, multi-quarter... 
    Visa sponsorship

    Menlo Ventures

    San Francisco, CA
    3 days ago
  •  ...based platform and solving complex systems challenges, focusing on GPU infrastructures and multi-cloud environments. The ideal candidate...  ...solutions. Join a team dedicated to building open superintelligence and make an impact in the AI space. #J-18808-Ljbffr B Capital

    B Capital

    San Francisco, CA
    3 days ago
  • Gimlet Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting...  ...AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU performance. In this role, you will work close to accelerators... 

    Gimlet Labs

    San Francisco, CA
    5 days ago
  • Member of Technical Staff — ML Systems & Inference Employment Type: Full-time Workplace: On-site About the Company...  ...layer for the next generation of AI infrastructure. As AI workloads scale...  ...low-level optimization. We work with leading AI labs, hyperscalers, and AI-native... 
    Full time

    Acceler8 Talent

    San Francisco, CA
    4 days ago
  •  ...Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is...  ...class AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build the inference... 

    Gimlet Labs

    San Francisco, CA
    5 days ago
  •  ...individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI,...  ...workloads through optimization of communication, memory usage, and GPU utilization. Build and maintain training pipelines that support... 
    Full time
    Relocation package

    B Capital

    San Francisco, CA
    5 days ago
  • $170k - $220k

    Member of Technical Staff - Infrastructure & LLMs Location: San...  ...next-generation inference infrastructure for LLMs...  ...problems like: Scaling multi-GPU inference workloads...  ...Ownership: Drive core infra design with zero red tape...  ...GPU orchestration, or AI infra Strong technical... 
    Full time
    Temporary work
    Immediate start
    Visa sponsorship
    Work visa

    Amadeus Search

    San Francisco, CA
    3 days ago
  • Introducing Moonlake, AI for creating world simulations. Scope...  ...tensor+pipeline parallel; NCCL tuning. GPU + kernel performance Nsight...  ...packing, KV-cache tricks. Inference optimization Low-latency...  ...AWQ), distillation, pruning. Infra + reliability SLURM/K8s multi... 

    Embedding VC

    San Francisco, CA
    6 days ago
  • $200k - $400k

    About The Role We're looking for an inference runtime engineer to push the boundaries of what...  ...will directly impact how the world runs AI inference. Skills And Qualifications Minimum...  ..., etc). Written widely-shared technical blogs or side projects on vLLM or LLM inference... 
    Remote work
    Visa sponsorship
    Shift work

    Inferact

    San Francisco, CA
    5 days ago
  • We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes,...  ...you will be partnering closely with our Inference and Research teams to build, deploy, and...  ...training strategies) Experience managing GPU clusters and optimizing compute resource... 

    Perplexity

    San Francisco, CA
    3 days ago
  •  ...BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion,...  ...cutting‑edge LLM models with industry-leading performance, scalability,...  ...distributed runtimes, networking, and GPU workloads Make thoughtful engineering... 
    Flexible hours

    The Consensus

    San Francisco, CA
    2 days ago
  • About the job FriendliAI is looking for a GPU Kernel Engineer to design, build, and...  ...power our large-scale, GPU-accelerated AI inference platform. You will be delivering world-class...  ...meet market demand. This is a deeply technical, high-impact role where you will write GPU... 
    Flexible hours

    FriendliAI

    San Francisco, CA
    2 days ago
  •  ...very large numbers of the latest generation GPU hardware and infrastructure (currently...  ...and custom solutions. You will also own inference infrastructure. For our robots this is a fleet...  .... The company embraces both large‑scale AI and robotics as core to its DNA. Our team... 

    Generalist

    San Francisco, CA
    3 days ago
  • Software Engineer Intern (AI Infrastructure / Training / Inference) About the Role We are hiring Software Engineers focused on AI Infrastructure to build...  ...infrastructure beyond traditional backend engineering — including GPU orchestration, large‑scale inference systems,... 
    Internship
    Immediate start

    SpreeAI

    San Francisco, CA
    2 days ago
  • $320k

     ...interpretable, and steerable AI systems. We want AI to be safe...  ...Engineering team's mandate is to make inference deployment boring and...  ...builds into production across GPU, TPU, and Trainium fleets, unattended...  ...: Currently, we expect all staff to be in one of our offices at... 
    Visa sponsorship
    Shift work

    United States Digital Space LLC

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Technical Staff Lead, AI Inference & GPU Infra. Be the first to apply!