Senior HPC Network Engineer: RDMA, GPU Clusters

$200k - $400k

Institute of Foundation Models

A dedicated research lab is seeking a Network Engineer to design and optimize low-latency, high-bandwidth networking solutions for AI supercomputing clusters. You will work on cutting-edge technologies in collaboration with world-class researchers. The ideal candidate has strong experience with NVIDIA RDMA technologies, networking protocols, and Kubernetes. This role offers a salary range of $200,000 - $400,000 annually, depending on level, and includes comprehensive benefits such as medical plans and a 401K. #J-18808-Ljbffr Institute of Foundation Models

Apply

Vacancy posted 1 day ago

Similar jobs that could be interesting for youBased on the Senior HPC Network Engineer: RDMA, GPU Clusters in Sunnyvale, CA vacancy

Senior AI/HPC GPU Cluster Architect (Equity)
NVIDIA Gruppe in Santa Clara is seeking a technical leader for the GPU AI/HPC Infrastructure team. You will design and implement cutting-edge GPU compute clusters, focusing on deep learning and high-performance computing. The ideal candidate will have at least 5+ years...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior GPU Cluster Architect for AI/HPC Deployments
$184k - $356.5k
NVIDIA Gruppe is seeking an experienced engineer to lead GPU cluster design and support for AI and HPC deployments in Santa Clara, California. The ideal candidate will have over 8 years of experience with large-scale GPU infrastructure and a strong ability to communicate...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior GPU HPC Cluster Engineer — Equity Eligible
NVIDIA Gruppe seeks a skilled HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for high-performance computing workloads. This role involves collaboration with various teams to ensure effective and reliable cluster performance. Key responsibilities...
Senior
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Principal AI/ML Infra Engineer — GPU Clusters & HPC
$272k - $431.25k
...seeks a Principal AI and ML Infra Software Engineer in Santa Clara, California, to enhance the efficiency of AI/ML research on GPU Clusters. The role involves collaboration with... ...candidates should have extensive experience in HPC systems, programming, and a strong...
Suggested
NVIDIA Corporation
Santa Clara, CA
2 days ago
Senior GPU Performance Engineer - HPC & Networking Equity
$152k - $241.5k
NVIDIA Gruppe is seeking a motivated Performance Engineer to influence the roadmap of our communication libraries. The role involves... ...in-depth performance characterization on large multi-GPU and multi-node clusters and studying the interaction of our libraries with...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior GPU Clusters Platform & EngOps Engineer
NVIDIA Gruppe is seeking highly motivated EngOps and Platform Engineers to develop automated tools for managing large GPU clusters. This position requires strong expertise in high-performance computing and deep learning. The ideal applicants have a BS or MS in a relevant...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior Network Engineer - Supercomputing
$200k - $400k
...data scientists, and engineers, tackling the most fundamental... ..., high‑bandwidth networking solutions that power... ...the world’s largest GPU supercomputing clusters. You’ll work on both... ...such as NVIDIA’s RDMA‑capable solutions, InfiniBand... ...GPUDirect RDMA AI & HPC Communication...
Senior
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
1 day ago
Sr. Staff Software Engineer - HPC Network Engineering
$181k - $297k
...'s largest professional network, built to create economic... .... We are seeking an HPC Network Engineer to design, deploy, and operate... ...fabrics for large-scale GPU clusters. The role focuses on... ...networks optimized for RDMA traffic. As a Senior Staff Software Engineer,...
Senior
For contractors
Work at office
Flexible hours
LinkedIn
Mountain View, CA
4 days ago
Senior AI Platform Engineer - GPU Research Clusters
$152k - $287.5k
A leading technology company is seeking a Senior Software Engineer to develop solutions for GPU clusters aimed at enhancing machine learning innovation. The ideal candidate will have over 5 years of experience in software engineering with significant involvement in ML...
Senior
NVIDIA Corporation
Santa Clara, CA
2 days ago
Senior AI and ML HPC Cluster Engineer
As a member of the GPU AI/HPC Infrastructure team, you... ...-breaking GPU compute clusters that run demanding deep... ...encounter including: compute, networking, and storage design... ...Science, Electrical Engineering or related field or... ...InfiniBand with IPoIB and RDMA. Understanding of fast...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior HPC Cluster Engineer
$152k - $241.5k
...computing. An era in which our GPU acts as the brains of... ...highly skilled and experienced HPC Cluster Engineer to design, deploy, and operate... ...the deployment of compute, networking, and storage. Foster strong... ...HPC including InfiniBand, RDMA and RoCE. Understanding of...
Senior
NVIDIA
Santa Clara, CA
1 day ago
Senior HPC Cluster Engineer
...is searching for a highly skilled HPC Cluster Engineer to design, deploy, and operate GPU Compute Clusters for Electronic... ...including deployment of compute, networking, and storage. Foster strong customer... ...to HPC, including InfiniBand, RDMA and RoCE. Understanding of fast,...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior System Software Engineer - GPU Performance
$152k - $241.5k
...Visualization. The GPU, our invention, serves... ...Libraries and Networking team at NVIDIA. We... ...Deep Learning and HPC. We are looking for... ...motivated Performance engineer to influence the roadmap... ...GPU and multi-node clusters. Study the... ...networks in areas like RDMA, topologies,...
Senior
Remote work
NVIDIA
Santa Clara, CA
7 days ago
Senior System Software Engineer - GPU Performance
$152k - $241.5k
We are the GPU Communications Libraries and Networking team at NVIDIA and are looking... ...Performance Engineer to influence the roadmap... .... The DL and HPC applications of today... ...GPU and multi‑node clusters. Study the interaction... ...networks in areas such as RDMA, topologies, and...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior HPC Performance Engineer: Multi-GPU & Networked Systems
$148k - $287.5k
...Labs in Santa Clara, California, is seeking a motivated Performance Engineer to advance communication libraries for deep learning and HPC. You will conduct in-depth performance analysis on multi-GPU clusters, collaborate with dynamic teams, and evaluate proof-of-concepts....
Senior
NVIDIA
Santa Clara, CA
4 days ago
Senior HPC Architect: At-Scale GPU Deployments & Automation
NVIDIA Gruppe in Santa Clara is looking for a Senior HPC Architect to support the deployment of large-scale GPU compute clusters. You will provide engineering solutions for GPU computing products, ensuring technical relationships with teams and assisting in creative solutions...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior HPC Architect: Scalable GPU Compute & AI Platforms
NVIDIA Corporation is seeking a Senior HPC Architect to enhance GPU compute clusters. This role involves designing solutions for operationalizing NVIDIA products and collaborating closely with engineering teams. Ideal candidates should have over 8 years of experience in...
Senior
NVIDIA Corporation
Santa Clara, CA
2 days ago
Senior Cloud Platform Engineer — AWS & GPU HPC
black.ai is looking for a skilled platform engineer in Palo Alto to enhance our AWS... ...platform engineering, DevOps practices, and GPU workloads. As a platform engineer, you will... ...workflows, ensure the reliability of GPU clusters, and own CI/CD pipelines, facilitating researchers...
Senior
black.ai
Palo Alto, CA
2 days ago
Principal AI/ML Infra Engineer for GPU Clusters
...seeking a Principal AI and ML Infra Software Engineer to join our Hardware Infrastructure team... ...infrastructure deficiencies for GPU Clusters, fostering innovations in AI/ML research... ...over 15 years of experience in AI/ML and HPC, with a deep understanding of relevant technologies...
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior Multi-GPU System Architect for AI & HPC
$184k - $356.5k
NVIDIA Corporation in Santa Clara is seeking a Senior GPU System Architect to design multi-GPU scale-up and scale-out systems for AI and HPC. Responsibilities include architecting system topologies, collaborating to optimize transport layers, and contributing to hardware...
Senior
NVIDIA Corporation
Santa Clara, CA
4 days ago
Principal AI and ML Infra Software Engineer, GPU Clusters
$272k - $431.25k
...Principal Ai And Ml Infra Software Engineer, Gpu Clusters We are seeking a Principal AI and ML... ...of demonstrated expertise in AI/ML and HPC tasks and systems. ~ Hands-on experience... ..., Slurm, Kubernetes, LSF), high-speed networking (e.g., Infiniband, RoCE, Amazon EFA),...
NVIDIA
Santa Clara, CA
4 days ago
Senior ML Infra Engineer - GPU Clusters, Reliability & Ops
$152k - $287.5k
NVIDIA Gruppe, based in Santa Clara, is seeking a Senior Software Engineer to accelerate the development of machine learning innovations. In this role, you'll design and implement solutions for GPU clusters, enabling researchers to optimize their work. Strong expertise...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior GPU Performance Engineer - MPI/NCCL & HPC
$152k - $287.5k
NVIDIA Corporation is seeking a motivated Performance Engineer to enhance the roadmap of communication libraries. In this role, you will conduct in-depth performance characterization on multi-GPU clusters and analyze the interaction of libraries with hardware and software...
Senior
NVIDIA Corporation
Santa Clara, CA
2 days ago
Senior GPU Fabric Networking Engineer - Remote + Equity
A leading tech company seeks a Senior Software Engineer for its Fabric Networking team in Santa Clara, CA. You will design and maintain software enabling GPU communication and participate in architecture definition. Ideal candidates have a B.S/M.S/Ph.D. in a related field...
Senior
Remote job
NVIDIA Corporation
Santa Clara, CA
3 days ago
Senior Software Engineer, AI Networking
$152k - $241.5k
...NVIDIA seeks a senior software engineer to join the AI Networking co-design and benchmark R&D team... ...workloads across large GPU and CPU clusters, thereby ensuring the... ...of the following areas: HPC, networking, and AI applications... ...(such as RoCE and RDMA). ~ Strong...
Senior
NVIDIA
Santa Clara, CA
16 hours ago
Senior AI & HPC Engineer for Finance — GPU/CUDA
$152k - $287.5k
NVIDIA Gruppe is seeking a Senior AI Developer Technology Engineer for the Financial Sector to design and optimize parallel algorithms... ...AI workloads. The role involves researching GPU acceleration techniques for AI and HPC workloads, collaborating with experts, and influencing...
Senior
NVIDIA Gruppe
Santa Clara, CA
2 days ago
Senior Software Engineer, DGX Cloud AI Infrastructure
$184k - $287.5k
...are looking for a Senior Software Engineer to lead the bring... ...across NVIDIA GPU platforms at the... ...that keep large clusters productive. This... ...compute, memory, networking, and communication... ...large-scale AI or HPC systems, including... ...familiarity with the RDMA software stack (...
Senior
NVIDIA
Santa Clara, CA
1 day ago
Senior Platform and EngOps Engineer - Cluster Operations
$176k - $276k
...Computing and Visualization. The GPU, our invention, serves as the... ...Join our team of innovative engineers who develop and maintain... ...managing and maintaining large GPU clusters interconnected via NVLink and... ...operating systems, computer networks, and high-performance...
Senior
NVIDIA
Santa Clara, CA
1 day ago
Senior GPU Cloud Infra Engineer - Kubernetes & Automation
NVIDIA Gruppe is seeking experienced Senior Software Engineers to join their production engineering team in Santa Clara, California. The role... ...involves building automation and operational systems for GPU clusters, with a focus on Kubernetes and reliability practices....
Senior
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Network Engineer - AI/HPC
...Network Engineer - AI/HPC Memphis, TN; Palo Alto, CA About XAI XAI's mission is to create AI systems that can accurately understand... ...About The Role: XAI was first in the world to build a 100k GPU cluster on an ethernet network and then did it again in 92 days,...
Xai
Palo Alto, CA
2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior HPC Network Engineer: RDMA, GPU Clusters. Be the first to apply!