Technical Staff Lead, AI Inference & GPU Infra

Wafer

A tech company specializing in AI infrastructure is seeking a skilled professional to build scalable infrastructure for AI model training and inference. You will lead architectural decisions and work with core systems that power their GPU optimization platform. Candidates should have expertise in GPU fundamentals, deep learning frameworks like PyTorch and TensorFlow, along with experience in C++ and Python. Join a team at the forefront of AI technology in the heart of San Francisco. #J-18808-Ljbffr Wafer

Apply

Vacancy posted 3 days ago

Similar jobs that could be interesting for youBased on the Technical Staff Lead, AI Inference & GPU Infra in San Francisco, CA vacancy

Member of Technical Staff - Inference
$150k - $300k
...agentic models to the infra that enables anyone to... ...Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri Dao... ...cloud LLM serving, LLM inference optimization and RL systems... ...training stack. Core Technical Responsibilities LLM... ...operates across our cloud GPU fleets. GPU‑Aware...
Suggested
Work at office
Remote work
Visa sponsorship
Relocation package
Flexible hours
Shift work
Prime-Intellect
San Francisco, CA
4 days ago
Member of Technical Staff - GPU Infrastructure
$150k - $300k
...frontier agentic models to the infra that enables anyone to... ...Solutions Architect for GPU Infrastructure, you'll be the technical expert who transforms... ...the world’s most advanced AI models. We recently raised... ...for LLM training, inference, and HPC workloads Present...
Suggested
Prime Intellect
San Francisco, CA
4 days ago
Senior AI Inference Engineer - GPU, Rust & CUDA
$220k
...is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience in...
Suggested
Perplexity
San Francisco, CA
4 days ago
Member of Technical Staff - ML Infra
...maintain large distributed ML training and inference clusters Develop efficient, scalable end-... ...Analyze, profile and debug low-level GPU operations to optimize performance Stay up... ...platforms (GCP, AWS, or Azure) and their ML/AI service offerings Familiarity with containerization...
Suggested
Kindredventures
San Francisco, CA
2 days ago
Member of Technical Staff (AI Inference Engineer)
$220k
We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and... ...scheduling and KV-cache management to support in API Gateway. GPU kernels migration to CuTe DSL. Port our in-house CUDA kernels to...
Suggested
Perplexity
San Francisco, CA
15 hours ago
Member of Technical Staff, ML Infrastructure & Inference
Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company building a scalable cloud platform designed for next-generation... ..., Model Serving, Distributed Systems, GPU Infrastructure, AI Infrastructure, Inference Runtime...
Acceler8 Talent
San Francisco, CA
15 hours ago
Member of Technical Staff, Inference
$350k
...Our first goal is to democratize frontier AI R&D across scientific disciplines. We believe... ...We are looking for an engineer to own the inference systems that power our models in... ...deployment Optimize inference performance across GPU and accelerator hardware - maximizing...
Mirendil
San Francisco, CA
15 hours ago
Member of Technical Staff - Mid-Training Infra
...agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind,... ...the Role Design, build, and operate large-scale GPU infrastructure for high-throughput model inference and mid-training workloads. Develop systems that power...
Relocation package
Reflection
San Francisco, CA
3 days ago
AI Engineer LLM Infra
...with the web by building AI agents that can... ...for a member of the AI technical staff to join the founding team... ...Responsibilities: Scale infra for post-training of multimodal... ...infra for agentic inference (throughput and latency... ...ML infrastructure (GPU clusters) and supporting...
Work at office
Relocation
Visa sponsorship
Yutori
San Francisco, CA
3 days ago
Compute Platform Engineer - GPU & Multi-Cloud Infra
...based platform and solving complex systems challenges, focusing on GPU infrastructures and multi-cloud environments. The ideal candidate... ...solutions. Join a team dedicated to building open superintelligence and make an impact in the AI space. #J-18808-Ljbffr B Capital
B Capital
San Francisco, CA
1 day ago
Software Engineer - GPU Kernels
...GPU Kernel Engineer Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion, OpenEvidence, Abridge, Clay, Gamma and Writer.... ...intellectually stimulating environment where technical excellence is paramount and your...
Flexible hours
Baseten
San Francisco, CA
3 days ago
Member of Technical Staff - Pre-Training Infra
...individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI,... ...workloads through optimization of communication, memory usage, and GPU utilization. Build and maintain training pipelines that support...
Full time
Relocation package
B Capital
San Francisco, CA
3 days ago
Member of Technical Staff - Kernels & GPU Performance
Gimlet Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting... ...AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU performance. In this role, you will work close to accelerators...
Gimlet Labs
San Francisco, CA
3 days ago
Member of Technical Staff - ML Systems & Inference
...Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is... ...datacenters. Mission Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build inference systems...
Gimlet Labs, Inc.
San Francisco, CA
15 hours ago
Member of Technical Staff, Inference
$200k - $400k
About The Role We're looking for an inference runtime engineer to push the boundaries of what... ...will directly impact how the world runs AI inference. Skills And Qualifications Minimum... ..., etc). Written widely-shared technical blogs or side projects on vLLM or LLM inference...
Remote work
Visa sponsorship
Shift work
Inferact
San Francisco, CA
3 days ago
Member of Technical Staff - Efficient ML
Introducing Moonlake, AI for creating world simulations. Scope... ...tensor+pipeline parallel; NCCL tuning. GPU + kernel performance Nsight... ...packing, KV-cache tricks. Inference optimization Low-latency... ...AWQ), distillation, pruning. Infra + reliability SLURM/K8s multi...
Embedding VC
San Francisco, CA
4 days ago
Member of Technical Staff, Inference
Member of Technical Staff — ML Systems & Inference Employment Type: Full-time Workplace: On-site About the Company... ...layer for the next generation of AI infrastructure. As AI workloads scale... ...low-level optimization. We work with leading AI labs, hyperscalers, and AI-native...
Full time
Acceler8 Talent
San Francisco, CA
2 days ago
Member of Technical Staff - Training Platform
$150k - $300k
...Chief Scientist, Together AI), Dylan Patel (... ...tuning runs on managed GPU clusters with a single... ...that runs the jobs. Core Technical Responsibilities Hosted... ...Kubernetes-based training and inference orchestration across... ..., training methods, infra patterns - and the ability...
Work at office
Local area
Remote work
Visa sponsorship
Relocation package
Flexible hours
Prime-Intellect
San Francisco, CA
1 day ago
Customer Support Engineer (GPU Cluster), India
...Customer Support Engineer (GPU Cluster), India... ...Engineer at a pioneering AI company, you'll be the... ...training, fine tuning, and inference solutions with Together... ...dive deep into complex technical challenges, providing... ...We have contributed to leading open-source research, models...
Remote work
Flexible hours
Night shift
Weekend work
Together AI
San Francisco, CA
6 hours ago
Member of Technical Staff (Robotics Lead)
About Artificial Analysis Artificial Analysis is the leading independent AI benchmarking company. We support labs, engineers and enterprises... ...coverage of AI-driven robotics and hiring a Member of Technical Staff to lead it. We are very excited about the coming wave of...
Artificial Analysis, Inc.
San Francisco, CA
4 days ago
Member of Technical Staff
...Pixeltable Inc. Member of Technical Staff San Francisco, CA·Full... ...revolutionizing the AI development landscape... ...resources (CPU and GPU) efficiently? What data... ...training/fine-tuning, and inference? You will also: Find... ...up Enjoy industry-leading compensation and benefits...
Full time
Part time
Work at office
Work from home
Flexible hours
2 days per week
Pixeltable, Inc.
San Francisco, CA
15 hours ago
Member of Technical Staff
...intelligence per watt, by building AI that optimizes AI itself. Our journey starts with GPU kernels, but will expand into... ...a strong fit if you: Have deep technical intuition and can learn new domains... ...some of the most interesting AI infra problems at a small company with...
Remote work
WAFER INC
San Francisco, CA
15 hours ago
Member of Technical Staff (Engineering Lead, Developer Experience)
...innovates at the frontier of AI infrastructure, search, and orchestration... ...APIs. You’ll combine strong technical expertise with a high bar for... .... The initiatives you lead will bolster the company as the... ...Partner with fellow technical staff, GTM, support, and other cross...
United States Digital Space LLC
San Francisco, CA
4 days ago
Member of Technical Staff, Developer Relations
$200k - $400k
...mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making... ...inference systems project, teach hard technical concepts clearly, and create public artifacts... ..., prefill and decode, quantization, GPU serving, latency versus throughput, and...
Remote work
Visa sponsorship
Inferact
San Francisco, CA
2 days ago
Member of Technical Staff - Infrastructure
...Gimlet is building the next generation of AI infrastructure: large-scale AI datacenters... ...infrastructure behind Gimlet’s heterogeneous inference cloud. Unlike traditional cloud platforms... ..., deploy, and operate large‑scale CPU, GPU, and accelerator clusters powering production...
Gimlet Labs
San Francisco, CA
4 days ago
Member of Technical Staff, Kernels
Member of Technical Staff — Kernels & GPU Performance Employment Type: Full-time Workplace... ...layer for the next era of AI infrastructure. As AI... ...optimization. We work with leading AI labs, hyperscalers, and... ...characteristics across the inference stack Partner with compiler...
Full time
Acceler8 Talent
San Francisco, CA
3 days ago
Member of Technical Staff - Post-Training
...enterprises, and even nation states. Our team of AI researchers and company builders come... ..., reinforcement learning algorithms, and inference-time scaling techniques. Collaborate... ...Able to work fluidly across research and infra boundaries Strong communication capabilities...
Full time
Relocation package
B Capital
San Francisco, CA
15 hours ago
Member of Technical Staff, Kernels
$200k - $350k
...large-scale language model training and inference. You will develop high-performance ML kernels... ...and normalization, optimized for modern GPU architectures. Design compute... ...with the inventors of diffusion models and leading AI researchers. Shape Foundational Technology...
Immediate start
Flexible hours
Inception
San Francisco, CA
3 days ago
Member of Technical Staff, Performance and Scale
$200k - $400k
Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded... ...and disaggregated inference architecture. Familiarity with GPU programming models and memory hierarchies. Knowledge of GPU...
Remote work
Visa sponsorship
Inferact
San Francisco, CA
1 day ago
Member of Technical Staff, Post-Training, RL Infra
$350k
...solving core bottlenecks that unlock step-change acceleration across science and technology. Our first goal is to democratize frontier AI R&D across scientific disciplines. We believe accelerating scientific discovery is one of the most powerful ways to improve the...
Mirendil
San Francisco, CA
15 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Technical Staff Lead, AI Inference & GPU Infra. Be the first to apply!