Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Technical Staff Lead, AI Inference & GPU Infra

Wafer

A tech company specializing in AI infrastructure is seeking a skilled professional to build scalable infrastructure for AI model training and inference. You will lead architectural decisions and work with core systems that power their GPU optimization platform. Candidates should have expertise in GPU fundamentals, deep learning frameworks like PyTorch and TensorFlow, along with experience in C++ and Python. Join a team at the forefront of AI technology in the heart of San Francisco. #J-18808-Ljbffr Wafer

Vacancy posted 5 days ago
Similar jobs that could be interesting for youBased on the Technical Staff Lead, AI Inference & GPU Infra in San Francisco, CA vacancy
  • $150k - $300k

     ...agentic models to the infra that enables anyone to...  ...Andrej Karpathy (Eureka AI, Tesla, OpenAI), Tri Dao...  ...cloud LLM serving, LLM inference optimization and RL systems...  ...training stack. Core Technical Responsibilities LLM...  ...operates across our cloud GPU fleets. GPU‑Aware... 
    Suggested
    Work at office
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours
    Shift work

    Prime-Intellect

    San Francisco, CA
    1 day ago
  • $220k

     ...is looking for an engineer to join their team in San Francisco. You will work on building and operating the inference engine, supporting new models, migrating GPU kernels, and developing a Rust-based serving runtime. The ideal candidate has 3+ years of experience in... 
    Suggested

    Perplexity

    San Francisco, CA
    1 day ago
  • A cutting-edge AI infrastructure startup is seeking a Kubernetes DevOps Engineer to join their innovative team in San Francisco. The...  ...clusters across various environments, focusing on high-performance GPU workloads. Ideal candidates will have deep Kubernetes expertise... 
    Suggested

    Jack & Jill/External ATS

    San Francisco, CA
    5 days ago
  • $200k - $350k

     ...Member Of Technical Staff, Inference & Serving Inception creates the world's fastest, most efficient AI models. Our Mercury model is the world's...  ...performance computing and GPU programming (CUDA). Experience...  ...of diffusion models and leading AI researchers Shape... 
    Suggested
    Immediate start
    Flexible hours

    Inception LLC

    San Francisco, CA
    2 days ago
  • $180k

     ...Member Of Technical Staff - Inference Palo Alto, CA About Xai Xai's mission is to create AI systems that can accurately understand the universe and aid humanity in its...  ...-scaling) to deep low-level optimizations (GPU kernels, quantization, speculative decoding... 
    Suggested
    Temporary work

    Xai

    San Francisco, CA
    4 days ago
  •  ...maintain large distributed ML training and inference clusters Develop efficient, scalable...  ...scales Analyze, profile and debug low-level GPU operations to optimize performance Stay...  ...(GCP, AWS, or Azure) and their ML/AI service offerings Familiarity with containerization... 

    Causal Labs

    San Francisco, CA
    4 days ago
  • Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed for next-generation...  ..., Model Serving, Distributed Systems, GPU Infrastructure, AI Infrastructure, Inference Runtime... 

    Acceler8 Talent

    San Francisco, CA
    1 day ago
  •  ...Inference Engine Engineer We build and run the inference engine behind every Perplexity query and deploy dozens of model architectures...  ...bottlenecks from network ingress through continuous batching and GPU kernel interleaving. Build dashboards, alerts, and automated... 

    Perplexity AI

    San Francisco, CA
    4 days ago
  •  ...with the web by building AI agents that can...  ...for a member of the AI technical staff to join the founding team...  ...Responsibilities: Scale infra for post-training of multimodal...  ...infra for agentic inference (throughput and latency...  ...ML infrastructure (GPU clusters) and supporting... 
    Work at office
    Relocation
    Visa sponsorship

    Yutori

    San Francisco, CA
    9 hours ago
  • Overview About Liquid AI Spun out of MIT CSAIL, we build general...  .... The Opportunity Our Edge Inference team compiles Liquid...  ...will work directly with the technical lead on problems that require deep...  ...inference kernels for CPU, NPU, and GPU architectures across diverse... 

    Liquid AI

    San Francisco, CA
    4 days ago
  •  ...of humanity. About the Role As a Technical Lead on the Future of Computing Research team...  ...responsible for implementing the low-level inference stack, including kernel development and...  .... About OpenAI OpenAI is an AI research and deployment company dedicated... 
    Work at office
    Relocation package

    OpenAI

    San Francisco, CA
    2 days ago
  •  ...Staff Technical Lead for Inference & ML Performance San Francisco fal is the generative media ecosystem powering the next generation of AI products. We build the infrastructure, tools, and model access that teams need to move from idea to production, and do it at... 

    Fal

    San Francisco, CA
    1 day ago
  • About the Team Our Inference team brings OpenAI’s most capable research...  ...access our state-of-the-art AI models, allowing them to do things...  ...across emerging GPU platforms. You’ll work across...  ...collaborate closely with research, infra, and performance teams to ensure... 

    OpenAI

    San Francisco, CA
    11 hours ago
  • A leading AI technology firm in San Francisco is seeking an AI Infra Engineer to enhance their infrastructure. The successful candidate will design and maintain Kubernetes clusters and manage Slurm for distributed training. Important skills include extensive experience... 

    Perplexity

    San Francisco, CA
    3 days ago
  •  ...Associates Limited is looking for a Senior Storage Engineer to support large-scale AI infrastructure in San Francisco. This role involves designing scalable storage solutions for high-performance GPU platforms. The ideal candidate has extensive experience in storage... 
    Remote job

    Hamilton Barnes Associates Limited

    San Francisco, CA
    2 days ago
  • $209k - $253k

    A leading AI infrastructure company in San Francisco seeks a Staff Software Engineer to design and develop control systems for GPU node management. The candidate will be critical in building foundational cloud infrastructure and achieving business goals. This role requires... 

    Crusoe Energy Systems LLC

    San Francisco, CA
    3 days ago
  •  ...large-scale driver navigation AI models and one of the top chess...  ..., SDK design, and large-scale inference infrastructure. You'll...  ...cloud ingestion to distributed GPU inference pipelines that run our...  ...orchestration frameworks or ML infra tools (e.g., DeepSpeed, Triton... 
    Worldwide

    Pear VC

    San Francisco, CA
    4 days ago
  • $200k - $350k

     ...Member Of Technical Staff, Training Infra Bay Area Ai Systems Inception creates the world's fastest, most efficient AI models. Our Mercury model is...  ...Collaborate with the inventors of diffusion models and leading AI researchers Shape Foundational Technology : Your... 
    Immediate start
    Flexible hours

    Inception LLC

    San Francisco, CA
    4 days ago
  • $150k - $300k

     ...Chief Scientist, Together AI), Dylan Patel (...  ...tuning runs on managed GPU clusters with a single...  ...the jobs. Core Technical Responsibilities Hosted...  ...Kubernetes-based training and inference orchestration across...  ..., training methods, infra patterns - and the ability... 
    Work at office
    Local area
    Remote work
    Visa sponsorship
    Relocation package
    Flexible hours

    Prime Intellect

    San Francisco, CA
    4 days ago
  •  ...About Us Gimlet is building the next generation of AI infrastructure: large-scale AI datacenters and the orchestration...  ...About the role Gimlet Labs is seeking a Member of Technical Staff focused on kernels and GPU performance. In this role, you will work close to... 

    Gimlet Labs

    San Francisco, CA
    5 days ago
  •  ...based platform and solving complex systems challenges, focusing on GPU infrastructures and multi-cloud environments. The ideal candidate...  ...solutions. Join a team dedicated to building open superintelligence and make an impact in the AI space. #J-18808-Ljbffr B Capital

    B Capital

    San Francisco, CA
    3 days ago
  •  ...Labs is building the first heterogeneous neocloud for AI workloads. As AI systems scale, the industry is...  ...datacenters. Mission Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build inference systems... 

    Gimlet Labs, Inc.

    San Francisco, CA
    2 days ago
  • $170k - $220k

    Member of Technical Staff - Infrastructure & LLMs Location: San...  ...next-generation inference infrastructure for LLMs...  ...problems like: Scaling multi-GPU inference workloads...  ...Ownership: Drive core infra design with zero red tape...  ...GPU orchestration, or AI infra Strong technical... 
    Full time
    Temporary work
    Immediate start
    Visa sponsorship
    Work visa

    Amadeus Search

    San Francisco, CA
    3 days ago
  • Introducing Moonlake, AI for creating world simulations. Scope...  ...tensor+pipeline parallel; NCCL tuning. GPU + kernel performance Nsight...  ...packing, KV-cache tricks. Inference optimization Low-latency...  ...AWQ), distillation, pruning. Infra + reliability SLURM/K8s multi... 

    Embedding VC

    San Francisco, CA
    1 day ago
  •  ...About the Job We are seeking a highly technical Inference Engine Engineer to optimize the...  ...designing, implementing, and optimizing GPU kernels and supporting infrastructure for...  ...next-generation generative and agentic AI workloads. Your work will directly power... 
    Worldwide
    Flexible hours

    FriendliAI Corp

    San Francisco, CA
    9 hours ago
  • $142.2k - $204.6k

     ...Role As a software engineer for GenAI inference, you will help design, develop, and...  ...operations, etc. Hands-on experience with CUDA, GPU programming, and key libraries (cuBLAS,...  ...Databricks Databricks is the data and AI company. More than 10,000 organizations... 
    Local area
    Worldwide

    Databricks

    San Francisco, CA
    1 day ago
  • $187.5k - $395k

     ...Software Engineer, Inference Luma's mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical...  ...scheduling systems to optimally leverage our expensive GPU resources while meeting internal SLOs Build and... 

    Luma AI

    San Francisco, CA
    4 days ago
  •  ...AI Infra Engineer We are looking for an AI Infra engineer to join our growing team. We...  ...you will be partnering closely with our Inference and Research teams to build, deploy, and...  ...training strategies) Experience managing GPU clusters and optimizing compute resource... 

    Perplexity AI

    San Francisco, CA
    4 days ago
  •  ...Customer Support Engineer (GPU Cluster) San...  ...Engineer at a pioneering AI company, you'll be the...  ...training, fine tuning, and inference solutions with Together...  ...dive deep into complex technical challenges, providing...  ...We have contributed to leading open-source research, models... 
    Full time
    Remote work
    Flexible hours
    Night shift
    Weekend work

    Together AI

    San Francisco, CA
    1 day ago
  • A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level...  ...standards, design monitoring systems and lead incident response. Join a forward-thinking... 

    Hyperbolic Labs

    San Francisco, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Technical Staff Lead, AI Inference & GPU Infra. Be the first to apply!