Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

GPU Network Engineer: RDMA/NVLink at Scale

$350k

Thinking Machines Lab

Thinking Machines Lab is seeking a Network Engineer in San Francisco to manage and improve our GPU network fabric. The role requires in-depth knowledge of large-scale deployments and the ability to debug complex network issues. A collaborative environment is emphasized, where initiative and effective communication with cloud providers are key. The position offers a competitive salary ranging from $350,000 to $475,000 per year, depending on skills and experience. Benefits include unlimited PTO and health coverage, alongside visa sponsorship. #J-18808-Ljbffr Thinking Machines Lab

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the GPU Network Engineer: RDMA/NVLink at Scale in San Francisco, CA vacancy
  •  ...cutting-edge AI infrastructure company in San Francisco seeks an experienced network engineer to optimize high-performance networking protocols for AI models. The ideal candidate will integrate RDMA and InfiniBand into the inference stack, ensuring efficient communication... 
    Suggested

    Baseten

    San Francisco, CA
    4 days ago
  •  ...help build the platform engineers turn to to ship AI products...  ...multi‑modal workloads scale, the network is the computer. We are...  ...foundational engineers to lead our GPU Networking efforts, making RDMA a first‑class building...  ...communication across NVLink and InfiniBand for our... 
    Suggested
    Flexible hours

    Baseten

    San Francisco, CA
    4 days ago
  • $350k

     ...Network Engineer, Supercomputing San Francisco Thinking Machines Lab...  ...network stack that our large-scale training and inference depend...  ...reliability at scale, across large GPU fabrics — both the RDMA/RoCE fabric between nodes and the NVLink/NVSwitch domains within them.... 
    Suggested
    Local area
    Immediate start
    Visa sponsorship
    Work visa
    Relocation package

    Thinking Machines Lab

    San Francisco, CA
    2 days ago
  • $150k - $300k

    Prime Intellect in San Francisco seeks a Solutions Architect for GPU Infrastructure who will transform client requirements into robust systems capable of training advanced AI models. Responsibilities include designing GPU cluster architectures, deploying orchestration systems... 
    Suggested

    Prime Intellect

    San Francisco, CA
    3 days ago
  • $250k - $320k

     ...of AI infrastructure: large-scale AI datacenters and the orchestration...  ...Gimlet Labs is seeking a Network Engineer to design, build, and scale...  ...have Experience with AI/HPC, GPU, or large‑scale distributed infrastructure...  ...tooling. Familiarity with RDMA, RoCE, InfiniBand, or other... 
    Suggested

    Gimlet Labs, Inc.

    San Francisco, CA
    4 days ago
  •  ...superintelligence. One person, one GPU. If you'd like to...  ...Help to build and scale Lambda's high performance cloud network Work on deploying and...  ...call rotation for Network Engineering team You Have...  ..., partitions), GPUDirect RDMA concepts. Experience with... 
    Work at office
    Local area
    Work from home
    Flexible hours

    Lambda

    San Francisco, CA
    16 hours ago
  •  ...based in California is looking for an experienced GPU Infrastructure Manager to join their team. The...  ...clusters globally, alongside mentoring junior engineers. Candidates should have 10+ years of experience with GPU networks and a strong understanding of Ethernet and... 

    The San Francisco Compute Company

    San Francisco, CA
    2 days ago
  •  ...will design, deploy, and operate the network infrastructure underpinning Sesterce's GPU AI factories across Europe —...  ...physical cabling to BGP policies and RDMA fabric tuning. What you will do Design...  ...or 400G/800G Ethernet at scale Deep familiarity with RDMA, RoCE v... 

    Sesterce Group

    San Francisco, CA
    5 hours ago
  •  ...Network Engineer (Data Centers) Baseten powers mission-critical inference...  ...that powers our GPU clusters—from cluster fabric...  ...helping define how we build and scale data center networks. Your...  ...familiarity with InfiniBand, RDMA, or high-performance Ethernet... 
    Flexible hours

    Baseten

    San Francisco, CA
    16 hours ago
  • United States Digital Space LLC seeks an experienced Engineer for the Hardware Health and Observability team, responsible for maintaining...  ..., with a strong command of Python and experience in large-scale systems. We prioritize continuous availability for our research... 

    United States Digital Space LLC

    San Francisco, CA
    2 days ago
  •  ...hardware and software. Speed and scale are our key differentiators....  ..., non-blocking backend networks for clusters of 100k+ accelerators...  ...lifecycle from customer requirements (GPU shape, workload, scale,...  ...lossless Ethernet fabrics for RDMA (RoCEv2): PFC, ECN tuning, traffic... 
    Local area

    Fluidstack

    San Francisco, CA
    2 days ago
  •  ...driven energy solutions firm based in San Francisco is seeking a Staff Network Deployment Engineer. The candidate will lead the deployment of advanced network systems that support high-performance GPU compute clusters. The role requires a minimum of 8 years of network... 

    Crusoe Energy Systems LLC

    San Francisco, CA
    16 hours ago
  • $147k - $211k

    Google Inc. is searching for a Software Engineer III to join their Serverless Networking team in San Francisco. This role involves building scalable customer...  ...have extensive experience in programming and large-scale system design. The ideal applicant will demonstrate leadership... 

    Google Inc.

    San Francisco, CA
    16 hours ago
  • A technology solutions provider is looking for a Network Engineer to enhance and maintain a large-scale network. This role involves managing both wired and wireless infrastructures, conducting assessments, and ensuring network security. Candidates should have a degree... 

    CGS Federal (Contact Government Services)

    San Francisco, CA
    4 days ago
  • A cutting-edge tech company in San Francisco seeks infrastructure engineers to enhance the tooling and systems that power its AI applications. Responsibilities include building GPU orchestration, scaling cloud batchjob systems, and designing efficient scheduling software... 
    Visa sponsorship

    Exa

    San Francisco, CA
    4 days ago
  • Baseten is hiring a Network Engineer (Data Centers) in San Francisco to design and own the high-performance network infrastructure for their GPU clusters. This senior role collaborates closely with hardware and platform teams, directly impacting model performance and inference... 
    Flexible hours

    Baseten

    San Francisco, CA
    9 days ago
  • A cutting-edge AI video platform is seeking a Senior Software Engineer (Infrastructure) to manage its GPU deployments and maintain a reliable AWS backbone. You will collaborate with specialized providers to ensure high availability and architect scalable systems, impacting... 

    Jack & Jill/External ATS

    San Francisco, CA
    4 days ago
  • $225k - $275k

     ...urgency, who believe in the scale of our ambition and thrive on...  ...is seeking a Senior Staff Network Operations Engineer to own production reliability...  ...backbone, data center fabric, and GPU cluster interconnects. You...  ...and Arbor. GPU Cluster and RDMA Networking: Hands-on... 
    Temporary work

    ProducePay

    San Francisco, CA
    1 day ago
  •  ...globe, we offer an innovative GPU marketplace and AI inference...  ...seeking a Senior Infrastructure Engineer to help build and scale Hyperbolic's GPU Cloud...  ...Familiarity with high-performance networking technologies such as InfiniBand and RoCE (RDMA over Converged Ethernet)... 
    Remote work

    Hyperbolic Labs

    San Francisco, CA
    2 days ago
  • $342k

    OpenAI is currently looking for an experienced Optical Network Engineer based in San Francisco, California. This role involves leading laser-related efforts in optical interconnect projects for large-scale compute systems, ensuring performance and manufacturability of... 

    OpenAI

    San Francisco, CA
    16 hours ago
  • $190k - $270k

    AI Chopping Block, Inc. is seeking an experienced AI Infrastructure Engineer to manage user-facing services and production systems. The role encompasses participating in on-call rotations, building infrastructure with tools like Ansible, Terraform, and Kubernetes, and... 
    Full time
    Internship

    AI Chopping Block, Inc.

    San Francisco, CA
    2 days ago
  • $150k - $215k

    Principal Back-End Network Engineer - AI Infrastructure Operations US About Nscale Nscale is the GPU cloud engineered for AI. We provide cost...  ...of our Infiniband and RDMA-based network fabrics. You will...  ...reviewing, and evolving large-scale Infiniband and RoCE fabric architectures... 
    Flexible hours

    Nscale

    San Francisco, CA
    3 days ago
  • About the Team The Core Network Engineering team owns the end-to-end networking...  ...xPU networking used for large‑scale training and inference workloads...  ...across technologies such as RDMA, RoCE, InfiniBand, Ethernet, and high‑performance GPU interconnects Define and operationalize... 

    United States Digital Space LLC

    San Francisco, CA
    4 days ago
  • A tech startup in AI is seeking a Senior Infrastructure Engineer in San Francisco, CA. This role involves building and scaling a GPU Cloud Marketplace, transforming raw GPUs into a programmable pool for AI developers. Successful candidates will have deep knowledge in infrastructure... 

    Hyperbolic Labs

    San Francisco, CA
    4 days ago
  • $250k - $325k

    Electric Capital is looking for an experienced engineer to join their team in San Francisco. You...  ...a key role in architecting and deploying GPU clusters while participating in on-call...  ...role requires a deep understanding of networking, including Ethernet and InfiniBand. The position... 

    Electric Capital

    San Francisco, CA
    3 days ago
  • Google Inc. is looking for a Software Engineer in Serverless Networking to join its San Francisco team. In this role, you will develop next-generation...  ...problem-solving abilities, and experience in large-scale systems. The position offers a competitive salary and opportunities... 

    Google Inc.

    San Francisco, CA
    16 hours ago
  • Linuxcareers is seeking an Infrastructure/Cluster Engineer to design and operate large-scale clusters that enable AI inference at scale. The role focuses...  ...observability systems for cluster health. Experience with GPU infrastructure is a plus. #J-18808-Ljbffr Linuxcareers

    Linuxcareers

    San Francisco, CA
    16 hours ago
  •  ...history. When people finance GPU clusters, the datacenters housing...  ...the market? Otherwise, as AI scales, compute only becomes...  ...shape culture, mentor junior engineers, and learn from our customers....  ...management or architecture with network for at least one GPU cluster in... 
    Long term contract
    Contract work
    Fixed term contract
    Work at office
    Local area
    Visa sponsorship
    Shift work
    3 days per week

    The San Francisco Compute Company

    San Francisco, CA
    2 days ago
  • Sesterce Group is seeking a skilled networking engineer to design, deploy, and operate the network infrastructure for their GPU AI factories. The role includes working with InfiniBand...  ...experience, including familiarity with RDMA and strong Linux networking skills.... 

    Sesterce Group

    San Francisco, CA
    5 hours ago
  • $150k - $250k

     ...About the Role Fluidstack is seeking a Network Engineer to join our Deployment & Integration team...  ...AI datacenter network infrastructure at scale. You'll be in the field turning up modern...  ...to AI/ML networking environments with RDMA (RoCEv2), lossless Ethernet (PFC, ECN),... 
    Local area

    Fluidstack

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to GPU Network Engineer: RDMA/NVLink at Scale. Be the first to apply!