Technical Staff Lead, AI Inference & GPU Infra
WAFER INC
A tech company specializing in AI infrastructure is seeking a skilled professional to build scalable infrastructure for AI model training and inference. You will lead architectural decisions and work with core systems that power their GPU optimization platform. Candidates should have expertise in GPU fundamentals, deep learning frameworks like PyTorch and TensorFlow, along with experience in C++ and Python. Join a team at the forefront of AI technology in the heart of San Francisco. #J-18808-Ljbffr WAFER INC
$150k - $300k
...agentic models to the infra that enables anyone to... ...Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri Dao... ...cloud LLM serving, LLM inference optimization and RL systems... ...training stack. Core Technical Responsibilities LLM... ...operates across our cloud GPU fleets. GPU‑Aware...SuggestedWork at officeRemote workVisa sponsorshipRelocation packageFlexible hoursShift work$150k - $300k
...frontier agentic models to the infra that enables anyone to... ...Solutions Architect for GPU Infrastructure, you'll be the technical expert who transforms... ...the world’s most advanced AI models. We recently raised... ...for LLM training, inference, and HPC workloads Present...Suggested- ...Engineer to design and operate large-scale clusters that enable AI inference at scale. The role focuses on managing diverse hardware... ...and designing observability systems for cluster health. Experience with GPU infrastructure is a plus. #J-18808-Ljbffr LinuxcareersSuggested
$220k
...is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience in...Suggested- A cutting-edge AI infrastructure startup is seeking a Kubernetes DevOps Engineer to join their innovative team in San Francisco. The... ...clusters across various environments, focusing on high-performance GPU workloads. Ideal candidates will have deep Kubernetes expertise...Suggested
- ...with the web by building AI agents that can... ...for a member of the AI technical staff to join the founding team... ...Responsibilities: Scale infra for post-training of multimodal... ...infra for agentic inference (throughput and latency... ...ML infrastructure (GPU clusters) and supporting...Work at officeRelocationVisa sponsorship
$350k
...Our first goal is to democratize frontier AI R&D across scientific disciplines. We believe... ...We are looking for an engineer to own the inference systems that power our models in... ...deployment Optimize inference performance across GPU and accelerator hardware - maximizing...- ...maintain large distributed ML training and inference clusters Develop efficient, scalable end-... ...Analyze, profile and debug low-level GPU operations to optimize performance Stay up... ...platforms (GCP, AWS, or Azure) and their ML/AI service offerings Familiarity with containerization...
- ...agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind,... ...the Role Design, build, and operate large-scale GPU infrastructure for high-throughput model inference and mid-training workloads. Develop systems that power...Full timeRelocation package
- ...role focuses on workload orchestration, GPU scheduling, and ensuring system reliability, working with highly technical teams in the AI space. The ideal candidate will have a strong... ...-on experience with both training and inference infrastructure. The position offers a competitive...
$220k
We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures at scale with tight latency and... ...scheduling and KV-cache management to support in API Gateway. GPU kernels migration to CuTe DSL. Port our in-house CUDA kernels to...- Overview About Liquid AI Spun out of MIT CSAIL, we build general... .... The Opportunity Our Edge Inference team compiles Liquid... ...will work directly with the technical lead on problems that require deep... ...inference kernels for CPU, NPU, and GPU architectures across diverse...
$209k - $253k
A leading AI infrastructure company in San Francisco seeks a Staff Software Engineer to design and develop control systems for GPU node management. The candidate will be critical in building foundational cloud infrastructure and achieving business goals. This role requires...- ...Associates Limited is looking for a Senior Storage Engineer to support large-scale AI infrastructure in San Francisco. This role involves designing scalable storage solutions for high-performance GPU platforms. The ideal candidate has extensive experience in storage...Remote job
- A leading AI technology firm in San Francisco is seeking an AI Infra Engineer to enhance their infrastructure. The successful candidate will design and maintain Kubernetes clusters and manage Slurm for distributed training. Important skills include extensive experience...
$320k - $405k
..., and steerable AI systems. We want... ...beneficial AI systems. Staff Infrastructure Engineer, Node Infra About the role... ...that keep every GPU, TPU and... ...responsibilities Own the technical strategy and... ...research/inference/product teams to... ...Track record of leading complex, multi-quarter...Visa sponsorship- ...based platform and solving complex systems challenges, focusing on GPU infrastructures and multi-cloud environments. The ideal candidate... ...solutions. Join a team dedicated to building open superintelligence and make an impact in the AI space. #J-18808-Ljbffr B Capital
- Gimlet Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is hitting... ...AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU performance. In this role, you will work close to accelerators...
- Member of Technical Staff — ML Systems & Inference Employment Type: Full-time Workplace: On-site About the Company... ...layer for the next generation of AI infrastructure. As AI workloads scale... ...low-level optimization. We work with leading AI labs, hyperscalers, and AI-native...Full time
- ...Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is... ...class AI datacenters. Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build the inference...
- ...individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI,... ...workloads through optimization of communication, memory usage, and GPU utilization. Build and maintain training pipelines that support...Full timeRelocation package
$170k - $220k
Member of Technical Staff - Infrastructure & LLMs Location: San... ...next-generation inference infrastructure for LLMs... ...problems like: Scaling multi-GPU inference workloads... ...Ownership: Drive core infra design with zero red tape... ...GPU orchestration, or AI infra Strong technical...Full timeTemporary workImmediate startVisa sponsorshipWork visa- Introducing Moonlake, AI for creating world simulations. Scope... ...tensor+pipeline parallel; NCCL tuning. GPU + kernel performance Nsight... ...packing, KV-cache tricks. Inference optimization Low-latency... ...AWQ), distillation, pruning. Infra + reliability SLURM/K8s multi...
$200k - $400k
About The Role We're looking for an inference runtime engineer to push the boundaries of what... ...will directly impact how the world runs AI inference. Skills And Qualifications Minimum... ..., etc). Written widely-shared technical blogs or side projects on vLLM or LLM inference...Remote workVisa sponsorshipShift work- We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes,... ...you will be partnering closely with our Inference and Research teams to build, deploy, and... ...training strategies) Experience managing GPU clusters and optimizing compute resource...
- ...BASETEN Baseten powers mission-critical inference for the world's most dynamic AI companies, like Cursor, Notion,... ...cutting‑edge LLM models with industry-leading performance, scalability,... ...distributed runtimes, networking, and GPU workloads Make thoughtful engineering...Flexible hours
- About the job FriendliAI is looking for a GPU Kernel Engineer to design, build, and... ...power our large-scale, GPU-accelerated AI inference platform. You will be delivering world-class... ...meet market demand. This is a deeply technical, high-impact role where you will write GPU...Flexible hours
- ...very large numbers of the latest generation GPU hardware and infrastructure (currently... ...and custom solutions. You will also own inference infrastructure. For our robots this is a fleet... .... The company embraces both large‑scale AI and robotics as core to its DNA. Our team...
- Software Engineer Intern (AI Infrastructure / Training / Inference) About the Role We are hiring Software Engineers focused on AI Infrastructure to build... ...infrastructure beyond traditional backend engineering — including GPU orchestration, large‑scale inference systems,...InternshipImmediate start
$320k
...interpretable, and steerable AI systems. We want AI to be safe... ...Engineering team's mandate is to make inference deployment boring and... ...builds into production across GPU, TPU, and Trainium fleets, unattended... ...: Currently, we expect all staff to be in one of our offices at...Visa sponsorshipShift work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Technical Staff Lead, AI Inference & GPU Infra. Be the first to apply!
- application support technician San Francisco, CA
- personal computer support technician San Francisco, CA
- help desk assistant San Francisco, CA
- technical associate San Francisco, CA
- life support technician San Francisco, CA
- tech aide San Francisco, CA
- technical support analyst San Francisco, CA
- help desk technical support San Francisco, CA
- trade support analyst San Francisco, CA
- technical support specialist San Francisco, CA

