Technical Staff Lead, AI Inference & GPU Infra

WAFER INC

A tech company specializing in AI infrastructure is seeking a skilled professional to build scalable infrastructure for AI model training and inference. You will lead architectural decisions and work with core systems that power their GPU optimization platform. Candidates should have expertise in GPU fundamentals, deep learning frameworks like PyTorch and TensorFlow, along with experience in C++ and Python. Join a team at the forefront of AI technology in the heart of San Francisco. #J-18808-Ljbffr WAFER INC

Apply

Vacancy posted 5 days ago

Similar jobs that could be interesting for youBased on the Technical Staff Lead, AI Inference & GPU Infra in San Francisco, CA vacancy

Member of Technical Staff - Inference
$150k - $300k
...agentic models to the infra that enables anyone to... ...Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri Dao... ...cloud LLM serving, LLM inference optimization and RL systems... ...training stack. Core Technical Responsibilities LLM... ...operates across our cloud GPU fleets. GPU‑Aware...
Suggested
Work at office
Remote work
Visa sponsorship
Relocation package
Flexible hours
Shift work
Prime-Intellect
San Francisco, CA
6 days ago
Member of Technical Staff - GPU Infrastructure
$150k - $300k
...frontier agentic models to the infra that enables anyone to... ...Solutions Architect for GPU Infrastructure, you'll be the technical expert who transforms... ...the world’s most advanced AI models. We recently raised... ...for LLM training, inference, and HPC workloads Present...
Suggested
Prime Intellect
San Francisco, CA
6 days ago
AI Infra & Cluster Engineer — Scale GPU/CPU Orchestration
...Engineer to design and operate large-scale clusters that enable AI inference at scale. The role focuses on managing diverse hardware... ...and designing observability systems for cluster health. Experience with GPU infrastructure is a plus. #J-18808-Ljbffr Linuxcareers
Suggested
Linuxcareers
San Francisco, CA
3 days ago
Senior AI Inference Engineer - GPU, Rust & CUDA
$220k
...is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience in...
Suggested
Perplexity
San Francisco, CA
6 days ago
Lead Kubernetes & GitOps Engineer for GPU Inference
A cutting-edge AI infrastructure startup is seeking a Kubernetes DevOps Engineer to join their innovative team in San Francisco. The... ...clusters across various environments, focusing on high-performance GPU workloads. Ideal candidates will have deep Kubernetes expertise...
Suggested
Jack & Jill/External ATS
San Francisco, CA
5 days ago
AI Engineer — LLM Infra
...with the web by building AI agents that can... ...for a member of the AI technical staff to join the founding team... ...Responsibilities: Scale infra for post-training of multimodal... ...infra for agentic inference (throughput and latency... ...ML infrastructure (GPU clusters) and supporting...
Work at office
Relocation
Visa sponsorship
Yutori
San Francisco, CA
1 hour ago
Member of Technical Staff, Inference
$350k
...Our first goal is to democratize frontier AI R&D across scientific disciplines. We believe... ...We are looking for an engineer to own the inference systems that power our models in... ...deployment Optimize inference performance across GPU and accelerator hardware - maximizing...
Mirendil
San Francisco, CA
2 days ago
Member of Technical Staff - ML Infra
...maintain large distributed ML training and inference clusters Develop efficient, scalable end-... ...Analyze, profile and debug low-level GPU operations to optimize performance Stay up... ...platforms (GCP, AWS, or Azure) and their ML/AI service offerings Familiarity with containerization...
Kindredventures
San Francisco, CA
4 days ago
Member of Technical Staff - Mid-Training Infra
...agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind,... ...the Role Design, build, and operate large-scale GPU infrastructure for high-throughput model inference and mid-training workloads. Develop systems that power...
Full time
Relocation package
B Capital
San Francisco, CA
2 days ago
Senior ML Infra Engineer: GPU-Optimized Kubernetes Platform
...role focuses on workload orchestration, GPU scheduling, and ensuring system reliability, working with highly technical teams in the AI space. The ideal candidate will have a strong... ...-on experience with both training and inference infrastructure. The position offers a competitive...
Hamilton Barnes Associates Limited
San Francisco, CA
5 days ago
Member of Technical Staff (AI Inference Engineer)
$220k
We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and... ...scheduling and KV-cache management to support in API Gateway. GPU kernels migration to CuTe DSL. Port our in-house CUDA kernels to...
Perplexity
San Francisco, CA
2 days ago
Member of Technical Staff - Edge Inference Engineer
Overview About Liquid AI Spun out of MIT CSAIL, we build general... .... The Opportunity Our Edge Inference team compiles Liquid... ...will work directly with the technical lead on problems that require deep... ...inference kernels for CPU, NPU, and GPU architectures across diverse...
Liquid AI
San Francisco, CA
4 days ago
Staff Software Engineer — AI Infra Architect (GPU Fleet)
$209k - $253k
A leading AI infrastructure company in San Francisco seeks a Staff Software Engineer to design and develop control systems for GPU node management. The candidate will be critical in building foundational cloud infrastructure and achieving business goals. This role requires...
Crusoe Energy Systems LLC
San Francisco, CA
3 days ago
Senior AI Storage Engineer - Remote GPU HPC Infra
...Associates Limited is looking for a Senior Storage Engineer to support large-scale AI infrastructure in San Francisco. This role involves designing scalable storage solutions for high-performance GPU platforms. The ideal candidate has extensive experience in storage...
Remote job
Hamilton Barnes Associates Limited
San Francisco, CA
2 days ago
AI Infra Engineer: Scale ML Training & Inference
A leading AI technology firm in San Francisco is seeking an AI Infra Engineer to enhance their infrastructure. The successful candidate will design and maintain Kubernetes clusters and manage Slurm for distributed training. Important skills include extensive experience...
Perplexity
San Francisco, CA
3 days ago
Senior Staff+ Software Engineer, Node Infra
$320k - $405k
..., and steerable AI systems. We want... ...beneficial AI systems. Staff Infrastructure Engineer, Node Infra About the role... ...that keep every GPU, TPU and... ...responsibilities Own the technical strategy and... ...research/inference/product teams to... ...Track record of leading complex, multi-quarter...
Visa sponsorship
Menlo Ventures
San Francisco, CA
3 days ago
Compute Platform Engineer - GPU & Multi-Cloud Infra
...based platform and solving complex systems challenges, focusing on GPU infrastructures and multi-cloud environments. The ideal candidate... ...solutions. Join a team dedicated to building open superintelligence and make an impact in the AI space. #J-18808-Ljbffr B Capital
B Capital
San Francisco, CA
3 days ago
Member of Technical Staff - Kernels & GPU Performance
Gimlet Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting... ...AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU performance. In this role, you will work close to accelerators...
Gimlet Labs
San Francisco, CA
5 days ago
Member of Technical Staff, Inference
Member of Technical Staff — ML Systems & Inference Employment Type: Full-time Workplace: On-site About the Company... ...layer for the next generation of AI infrastructure. As AI workloads scale... ...low-level optimization. We work with leading AI labs, hyperscalers, and AI-native...
Full time
Acceler8 Talent
San Francisco, CA
4 days ago
Member of Technical Staff - ML Systems & Inference
...Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is... ...class AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build the inference...
Gimlet Labs
San Francisco, CA
5 days ago
Member of Technical Staff - Pre-Training Infra
...individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI,... ...workloads through optimization of communication, memory usage, and GPU utilization. Build and maintain training pipelines that support...
Full time
Relocation package
B Capital
San Francisco, CA
5 days ago
Member of the Technical Staff- LLMs
$170k - $220k
Member of Technical Staff - Infrastructure & LLMs Location: San... ...next-generation inference infrastructure for LLMs... ...problems like: Scaling multi-GPU inference workloads... ...Ownership: Drive core infra design with zero red tape... ...GPU orchestration, or AI infra Strong technical...
Full time
Temporary work
Immediate start
Visa sponsorship
Work visa
Amadeus Search
San Francisco, CA
3 days ago
Member of Technical Staff - Efficient ML
Introducing Moonlake, AI for creating world simulations. Scope... ...tensor+pipeline parallel; NCCL tuning. GPU + kernel performance Nsight... ...packing, KV-cache tricks. Inference optimization Low-latency... ...AWQ), distillation, pruning. Infra + reliability SLURM/K8s multi...
Embedding VC
San Francisco, CA
6 days ago
Member of Technical Staff, Inference
$200k - $400k
About The Role We're looking for an inference runtime engineer to push the boundaries of what... ...will directly impact how the world runs AI inference. Skills And Qualifications Minimum... ..., etc). Written widely-shared technical blogs or side projects on vLLM or LLM inference...
Remote work
Visa sponsorship
Shift work
Inferact
San Francisco, CA
5 days ago
Member of Technical Staff (AI Infrastructure Engineer)
We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes,... ...you will be partnering closely with our Inference and Research teams to build, deploy, and... ...training strategies) Experience managing GPU clusters and optimizing compute resource...
Perplexity
San Francisco, CA
3 days ago
Software Engineer- BIS (Baseten Inference Stack)
...BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion,... ...cutting‑edge LLM models with industry-leading performance, scalability,... ...distributed runtimes, networking, and GPU workloads Make thoughtful engineering...
Flexible hours
The Consensus
San Francisco, CA
2 days ago
Software Engineer - GPU Kernel
About the job FriendliAI is looking for a GPU Kernel Engineer to design, build, and... ...power our large-scale, GPU-accelerated AI inference platform. You will be delivering world-class... ...meet market demand. This is a deeply technical, high-impact role where you will write GPU...
Flexible hours
FriendliAI
San Francisco, CA
2 days ago
Software Engineer: ML Infra
...very large numbers of the latest generation GPU hardware and infrastructure (currently... ...and custom solutions. You will also own inference infrastructure. For our robots this is a fleet... .... The company embraces both large‑scale AI and robotics as core to its DNA. Our team...
Generalist
San Francisco, CA
3 days ago
Software Engineer Intern (AI Infrastructure / Training / Inference)
Software Engineer Intern (AI Infrastructure / Training / Inference) About the Role We are hiring Software Engineers focused on AI Infrastructure to build... ...infrastructure beyond traditional backend engineering — including GPU orchestration, large‑scale inference systems,...
Internship
Immediate start
SpreeAI
San Francisco, CA
2 days ago
Staff + Senior Software Engineer, Inference Deployment
$320k
...interpretable, and steerable AI systems. We want AI to be safe... ...Engineering team's mandate is to make inference deployment boring and... ...builds into production across GPU, TPU, and Trainium fleets, unattended... ...: Currently, we expect all staff to be in one of our offices at...
Visa sponsorship
Shift work
United States Digital Space LLC
San Francisco, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Technical Staff Lead, AI Inference & GPU Infra. Be the first to apply!