RDMA Ops Engineer - Computing Infrastructure Networking
$104.4k - $171kAlibaba Cloud
Overview
We're seeking a skilled RDMA Ops Engineer to optimize and maintain high-performance networking infrastructure for our computing clusters. This role focuses on building and operating ultra-low latency, high-throughput networks using RDMA technologies to power next-generation computing workloads.
Responsibilities
- Deploy, operate and maintain RDMA-based network architectures (RoCE/InfiniBand) for cluster with thousands of nodes
- Optimize network performance for distributed collective communication workloads (NCCL, MPI, etc.)
- Solve complex network issues in distributed collective communication (e.g., NCCL/MPI communication bottlenecks)
- Use automation tools for network provisioning, monitoring, diagnostics, and network performance profiling (latency/throughput analysis)
- Implement CI/CD pipelines for network infrastructure-as-code
- Manage end-to-end network lifecycle: deployment, configuration, monitoring, upgrades
- Collaborate with computing algorithm engineers to troubleshoot network-related bottlenecks in training/inference pipelines
- Bridge Computing framework requirements with underlying network infrastructure capabilities
- Ensure compliance with security and scalability requirements
Qualifications
- Strong scripting skills (Python/Go/Bash) for operational automation
- Expert-level RDMA operational experience (RoCEv2/InfiniBand)
- Understanding of Linux internals (kernel bypass, syscall optimization, etc), and proficient in Linux network stack tuning (irqbalance, NUMA, hugepages)
- Hands-on experience with RDMA/DPDK performance tuning
- Strong knowledge of network protocols (TCP/IP, RoCEv2) and NIC architecture principles
- Ability to abstract complex technical concepts into architectural diagrams
- Proven track record of translating R&D innovations into production solutions
- Strong communication skills for cross-functional collaboration with Computing researchers and SRE teams
- Experience managing production computing networks
- Familiar with Kubernetes networking (CNI, Multus, SR-IOV) and GPU-aware scheduling
- Background in computing system optimization (NVIDIA collective libraries, MPI tuning)
- Deep understanding of computing workload patterns and their network implications
Compensation and Employment
The pay range for this position at commencement of employment is expected to be between $104,400 and $171,000/year. However, base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. If hired, employee will be in an “at-will position” and the Company reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors.
#J-18808-Ljbffr$104.4k - $171k
...A leading cloud service provider is seeking an RDMA Ops Engineer to optimize high-performance networking infrastructure for computing clusters. Responsibilities include deploying RDMA-based network architectures and optimizing performance. The ideal candidate has strong...Network$198k - $326k
...Description LinkedIn is the world's largest professional network, built to create economic opportunity for every member... ...business needs of the team. As a Sr. Staff Software Engineer of the Compute Infrastructure team at LinkedIn, you will play a crucial role in our...NetworkFor contractorsWork at officeFlexible hours$200k - $400k
...A dedicated research lab is seeking a Network Engineer to design and optimize low-latency, high-bandwidth networking solutions for AI supercomputing... .... The ideal candidate has strong experience with NVIDIA RDMA technologies, networking protocols, and Kubernetes. This role...Network$174k - $252k
Senior Software Engineer, Google Cloud Compute Infrastructure Benefits for this role include: Health, dental, vision, life, disability insurance Retirement... ..., distributed computing, large-scale system design, networking and data storage, security, artificial intelligence,...NetworkFull timeTemporary workWorldwide$188k - $275k
...CoreWeave combines superior infrastructure performance with deep technical... ...breakthroughs and turn compute into capability. Founded in... ...What You'll Do: The Field Engineering organization at CoreWeave is... ...-up (IT service, break-fix, network, and firmware), and standing...NetworkPermanent employmentFull timeContract workTemporary workCasual workWork at officeFlexible hours$136.8k - $359.72k
...Senior Software Engineer - Compute Infrastructure (Orchestration & Scheduling) Location: San Jose Team: Infrastructure Employment Type:... ...across heterogeneous resources—including CPU, GPU, memory, network, and power across global data centers. Lead Infrastructure...NetworkTemporary workOverseas$156k - $387.6k
...About the Team The Inference Infrastructure team is the creator and open... ...part of ByteDance's Core Compute Infrastructure organization,... ...workloads, and are looking for engineers passionate about cloud-... ...systems, and/or high-performance networking systems. - Hands-on...NetworkTemporary workLocal area- ...Clara is looking for a Cloud Managed Services Engineer to provide end-to-end management and technical support for networking problems. The role involves diagnosing issues... .... Ideal candidates should possess a BS in Computer Science with 8+ years of experience, expertise...NetworkFlexible hours
$165k - $242k
...Join to apply for the Senior Platform Engineer II, Compute Services role at CoreWeave .... ...enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise... ...distributed systems. Knowledge of network protocols and distributed consensus...NetworkPermanent employmentTemporary workCasual workWork at officeRemote workFlexible hours$165k - $242k
...Senior Platform Engineer II, Compute Services Livingston, NJ / New York, NY / Sunnyvale, CA... ...enterprises, CoreWeave combines superior infrastructure performance with deep technical... ...distributed systems. Knowledge of network protocols and distributed consensus algorithms...NetworkPermanent employmentTemporary workCasual workWork at officeFlexible hours- ...Principal DevOps Engineer (Cloud Ops) Location: Palo Alto The Position... ...world's largest business network community to connect and collaborate... ...and maintaining Docker infrastructure for micro services... ...is desired BS or MS in Computer Science or related field...Network
$88.4k - $143k
...Technical Support Engineer, Cortex Cloud Compute Our Mission Being the cybersecurity partner of choice, protecting our digital way of life... ...evaluation criteria for obtaining results. You’ll enjoy networking with key contacts outside your own area of expertise, with...NetworkShift work$100k
...Engineer, SoC Infrastructure Santa Clara, California, United States Tenstorrent is leading the... ...efficiency. With AI redefining the computing paradigm, solutions must evolve to unify... ...models, compilers, platforms, networking, and semiconductors. Our diverse team...NetworkPermanent employment$184k - $287.5k
## Senior Software Engineer, DGX Cloud AI InfrastructureApplylocations... ..., distributed computing, and large-scale... ...-scale AI clusters, infrastructure, and end-to-end... ...across compute, memory, networking, and communication layers... ...familiarity with the RDMA software stack (NCCL,...NetworkRemote work$175k - $210k
...Infrastructure Engineer Forward Networks is transforming how the world's most complex networks are managed and secured. Founded in 2... ...have ample opportunities to learn. Both Dev and Ops work is in scope. Storage, Compute, Network, Cloud, or Application? If you said '...NetworkWork experience placementWork at office2 days per week- ...Sr. Engineer, Performance Infrastructure Austin, Texas, United States Tenstorrent is leading the industry... ...efficiency. With AI redefining the computing paradigm, solutions must evolve to... ...models, compilers, platforms, networking, and semiconductors. Our diverse team...Network
$272k - $431.25k
NVIDIA has been transforming computer graphics, PC gaming, and... ...Principal Rack Scale Systems Infrastructure Engineer, you will build and guide... ...firmware, OS lifecycle, and networking fabrics. Your task is to compose... ...as Ethernet, InfiniBand, RDMA, and fabric‑level...NetworkShift work$150k - $275k
...A leading AI infrastructure company based in San Jose is seeking a highly skilled Supercomputing Engineer specialized in networking. This role involves developing high-performance networking... ...strong C/C++ skills and experience with RDMA technologies. The position offers a...NetworkRelocation package$207k - $300k
Staff Software Engineer, ML, Compute Platform Sunnyvale, CA, USA Advanced Experience owning outcomes... .... Experience with diagnostics, networking and data analysis. About the job Google... ..., image processing etc. The AI and Infrastructure team is redefining what’s possible....NetworkFull timeWorldwide$200k - $400k
...data scientists, and engineers, tackling the most fundamental... ...for high‑performance computing in deep learning,... ..., high‑bandwidth networking solutions that power... ...technologies such as NVIDIA’s RDMA‑capable solutions,... ...pipelines through Infrastructure‑as‑Code (IaC) best...NetworkVisa sponsorship$149.4k - $205.4k
...Staff HPC Infrastructure Engineer page is loaded## Staff HPC Infrastructure... ...and improve the computational infrastructure. You... ...work· Work with the networking infrastructure team... ...experience· 2+ years of RDMA networking experience... ...software release and ops processes and...NetworkWork at officeRemote workWork from homeFlexible hours- Litmus is seeking an IT Systems Specialist for their Santa Clara HQ. The role involves managing the on-prem VMware infrastructure, office network operations, and IT support for employees. The successful candidate should have strong experience in VMware vSphere, networking...NetworkWork at office
- ...Sr. Director Of Network Engineering At Oracle Cloud Infrastructure (OCI), we build the future of the cloud for Enterprises... ...virtual network service teams compute and GPU product and engineering... ...network, particularly in supporting RDMA interconnections. You will...Network
$225k - $275k
...vertically integrated AI infrastructure company built from the ground... ...time. The demand for AI compute is boundless, and power is... ...is seeking a Senior Staff Network Deployment Engineer to serve as the technical... ...Supply Chain, Data Center Ops, and Site Reliability leadership...NetworkTemporary workRemote work$141.91k - $200.34k
...Join an enthusiastic team of engineers in Intel's Networking Solutions Group (NSG)... ...next generation programmable Infrastructure Processing Units (IPUs)... ...Master's in Electrical or Computer engineering, Computer Science... ...data center workloads, RDMA, collectives, and AI benchmarking...NetworkLocal areaImmediate startShift work$150k - $230k
...and veteran systems engineers who share a vision for... ...foundations of distributed computing. As AI workloads grow... ...complex, traditional infrastructure struggles to meet the... ...systems, high-speed networking, and distributed... ...performance networking (RDMA, InfiniBand) ML...Network$109.2k - $223.4k
...Principal Network Engineer We are the AI Infrastructure - Network Operations team at OCI. We support and operate the RDMA/RoCE network fabrics for OCI's largest AI and HPC customers.... ...of a large-scale global Oracle cloud computing environment (Oracle Cloud Infrastructure...NetworkTemporary workImmediate startFlexible hours$160.36k - $240.54k
...Software Engineer, ML Infrastructure Mountain View, California (HQ) Who We Are Nuro is a self... ...engineers with seamless access to compute and data resources. You will be responsible... ...of distributed systems, networking, and storage bottlenecks in the context...Network- ...on AI solutions is seeking an experienced QA Engineer to test products across various platforms... ...candidate must have a Bachelor's degree in Computer Science and at least 5 years of hands-on testing experience in networking technologies. Strong communication and debugging...Network
$94.16k - $141k
...building blocks of the data infrastructure that connects our world.... ...scale up and scale out networking, disaggregated memory, storage... ...~ Master's degree in Computer Science, Computer Engineering, Electrical Engineering,... ...protocols, including TCP/IP and RDMA Preferred...NetworkInternship
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to RDMA Ops Engineer - Computing Infrastructure Networking. Be the first to apply!
- security infrastructure engineer Sunnyvale, CA
- principal infrastructure engineer Sunnyvale, CA
- remote infrastructure engineer Sunnyvale, CA
- infrastructure developer Sunnyvale, CA
- senior infrastructure engineer Sunnyvale, CA
- infrastructure automation engineer Sunnyvale, CA
- infrastructure engineer Sunnyvale, CA
- data infrastructure engineer Sunnyvale, CA
- infrastructure engineering manager Sunnyvale, CA
- network engineer full time Sunnyvale, CA


