Senior Network Reliability Engineer - DGX Cloud
$136k - $224.25kNVIDIA
NVIDIA is looking for a Senior Network Reliability Engineer to support and maintain our cloud and datacenter network infrastructures. This network serves the needs across the whole software stack for NVIDIA, from Graphics Drivers to Autonomous Vehicles and Artificial Intelligence. In this role, the Senior Network Operations Engineer will remediate critical alerts within defined SLAs, triage production impacting network incidents, and interact with internal customers on network related issues. They will also be responsible for engaging with external vendors to remediate hardware and software issues, and participate in project related work such as network device upgrades and capacity augmentations. An ideal candidate will possess a wide range of skills, including alert monitoring & resolution in large-scale networks and CSP environments, outstanding troubleshooting skills, understanding of L3 underlay networks, and network protocol knowledge in large multi-vendor infrastructures. What you will be doing: Engage in 24/7 global shift rotations to provide remote support for network repairs and changes while collaborating across teams and updating customers on status and ticket information. Drive operational improvements in change management and daily operations by following procedures. Manage and operate large scale IP network technologies and infrastructures. Utilize your skills in Peering and Datacenter interconnect technologies: PNI, Transit, Exchange, Passive DWDM, Wave circuits. Monitor and support the network health of on-premises and cloud infrastructures. Collaborate and develop workflow enhancements while documenting best practices. What we need to see: Deep knowledge and experience of TCP/IP, BGP, OSPF, MPLS, IS-IS, VxLAN, EVPN, QoS, GRE, IPsec, DNS, and MACsec. 5+ years of experience in network operations. Skilled in network troubleshooting techniques and demonstrating creative problem-solving abilities. Strong track record of alert response within defined SLAs and Incident management. Experience with one or more of the following CSP environments: AWS, Azure, GCP, OCI. Familiarity with Arista, Fortinet and Juniper. Hands-on experience with contributing to tooling and automation for provisioning, monitoring, and managing complex network infrastructures. Bachelor’s degree in Computer Science, related technical field, or equivalent experience. Excellent verbal and written communication skills. Ways To Stand Out From The Crowd: Solid understanding of Mellanox/Cumulus OS and Infiniband technology. Skilled in Unix/Linux system administration, with the ability to write and understand Python/Shell scripts to improve efficiency in hyperscale environments. Familiarity with leveraging tools such as Netbox/Nautobot, Prometheus, Grafana, Panoptes to monitor and manage a global network. Passionate about innovating and investing in ground breaking technologies. NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hard-working people in the world working for us. Are you creative and autonomous? Do you love a challenge? If so, we want to hear from you. NVIDIA’s deep learning platforms have made major impact to various fields is broadly used across leading academic institutions, start-ups, and industry, including the world’s largest Internet companies. We need passionate, hard-working and creative people to help us take on more of these outstanding opportunities in deep learning cloud solutions. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 136,000 USD - 224,250 USD for Level 3, and 168,000 USD - 264,500 USD for Level 4. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until July 4, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. NVIDIA pioneered accelerated computing. Today, our AI infrastructure powers global intelligence, transforming every industry. Learn more about NVIDIA.
$224k - $356.5k
...on the world. As part of the DGX Cloud organization, the... ...security, silicon, and cloud engineering teams to turn embedded hardware... ...attestation standards into reliable, self-service cloud capabilities... ...across Data Center, Automotive, Networking, and AI ecosystems. Define...SeniorNetworkRemote work$184k - $287.5k
...that can understand the world. Responsibilities Build org‑wide reliability strategy, guiding how NVIDIA matures its operational... ...enhancing our data platform and related tooling. Implement chaos engineering, failure injection, and resilience testing to elevate our team...Senior$168k - $264.5k
Senior Network Engineer - Cloud Network Infrastructure NVIDIA is seeking an experienced Senior Network Engineer to develop and manage a robust cloud network infrastructure that supports NVIDIA's software development workflows and tools. The role focuses on designing, implementing...SeniorNetwork- ...advanced large language model workloads. We are looking for a Senior Software Engineer to lead the bring-up, triage, benchmarking, analysis, and... ...end-to-end workload performance across compute, memory, networking, and communication layers using tools such as Nsight...SeniorNetwork
$168k - $270.25k
...DSX organization is looking for software engineering talent to build NVIDIA’s NICo technology... ...of computer hardware and networking equipment. As a software engineer, you will... ...end software solutions to manage complex cloud infrastructure deployments. You will write...SeniorNetwork$156k - $190k
...in Sunnyvale, CA, is seeking a Staff Cloud Support Engineer to provide technical leadership in cloud... ...will lead incident responses, design reliability architecture, and mentor team members... ...expertise in Linux, Kubernetes, and networking. We offer a competitive salary range...SeniorNetwork- Crusoe is seeking a Staff Cloud Support Engineer to serve as a technical authority across Customer Experience, SRE, Networking, and Product teams. This role requires deep expertise... ...as strong customer focus to design reliability guardrails and mentor engineers. The successful...SeniorNetwork
$176k - $276k
Production engineering is a field that involves crafting, building, and... ..., data management, systems, networking, coding, database management,... ...deployment, along with open-source cloud-enabling technologies such as... ...storage architectures are reliable, scalable, and efficient....SeniorNetworkFlexible hours$176k - $276k
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain... ...knowledge across different systems, networking, coding, database, capacity management... ...and deployment and open source cloud enabling technologies like Kubernetes...SeniorNetwork$184k - $287.5k
Senior Software Engineer, Cloud-Native Stack - CSP Engagements page is loaded Senior Software Engineer, Cloud... ...-scale, cloud-native stacks across networking (RDMA/RoCE), storage, and control... ...Similar Jobs (5) Senior Software Engineer, DGX Cloud Lepton Marketplace locations 2...SeniorNetworkFull time- NVIDIA Gruppe is seeking a Senior Software Engineer specializing in Resilience Engineering for DGX Cloud. This role emphasizes building and maintaining high reliability standards across teams, implementing chaos engineering practices, and leading incident responses in a...Senior
$356.5k
NVIDIA Gruppe is seeking an experienced AI infrastructure software engineer to join its DGX Cloud AI Efficiency Team in Santa Clara, California. This role focuses on developing the infrastructure for optimizing AI workloads and ensuring high availability and efficiency...Senior$272k - $431.25k
NVIDIA DGX Cloud is scaling GPU infrastructure across... ...for Principal Software Engineers to help shape the technical... ..., automation, and reliability across large‑scale GPU... .... This role is for senior technical leaders who... ...infrastructure, storage, networking, security, and workload...Network$320k
As a Site Reliability and Software Engineering leader in the DGXC Cloud Reliability organization, you will manage the software, automation, and operations of the multi... ...infrastructure, and modern practices in the NVIDIA DGX Cloud Computing environment. Drive technical...- A leading technology firm is in search of a Senior Wireless Network Site Reliability Engineer to manage and enhance their wireless network infrastructure. The ideal candidate has over 8 years of experience in wireless network operations and a strong background in wireless...SeniorNetwork
$120k - $243k
Hewlett Packard Enterprise Development LP is seeking a Senior Competitive Technical Marketing Engineer in Sunnyvale, California. This is an onsite role focused on providing competitive analysis of the HPE Networking portfolio. The ideal candidate will assist in the...SeniorNetwork- Palo Alto Networks, Inc. seeks a Software Engineer to work on next-generation security platforms, delivering cutting-edge cybersecurity solutions. As part of the WildFire Team, you will collaborate with developers and researchers to tackle emerging threats in an innovative...SeniorNetwork
$144k - $209k
Senior Hardware Reliability Engineer, Global Hardware Reliability Engineering corporate_fare Google place Sunnyvale, CA, USA Qualifications Bachelor... ...hardware reliability of new machine learning, server, networking, and storage products. You will also perform early...SeniorNetworkContract work- Palo Alto Networks, Inc. is seeking a Senior Staff Engineer to contribute to their innovative cloud security product, Data Loss Prevention (DLP). This role involves utilizing backend Java cloud engineering skills to develop a cutting-edge, industry-leading service aimed...SeniorNetworkWork at office3 days per week
- NVIDIA Corporation is looking for a Senior Storage Production Engineer to design and support large-scale storage clusters that ensure scalability and... ...background in distributed storage solutions and storage networking protocols is essential. #J-18808-Ljbffr NVIDIA...SeniorNetwork
$126k - $203.5k
Palo Alto Networks, Inc. is seeking a Senior Staff Production Engineer to design and build foundational cloud platform capabilities. This role involves working with infrastructure... ..., software engineering, and production reliability to improve developer productivity and...SeniorNetwork- Google Inc. is seeking a Senior Software Engineer for their TPU supercomputer team in Sunnyvale, CA. The role involves designing and maintaining software across various layers, implementing network routing directly in TPU hardware, and building distributed solutions on...SeniorNetwork
- ...platform services across multiple clouds using Kubernetes, ensuring high security and reliability. The ideal candidate has over 8... ...in DevOps, SRE, or platform engineering with a strong focus on automation... ..., observability, and security. Network policies and cost management...Network
- donato technologies is seeking a Senior SRE / DevOps Engineer in Sunnyvale, CA. The successful candidate will focus on ensuring system reliability and scalability while automating operations across all teams. Candidates should have over 8 years of experience in DevOps,...Senior
- Oracle is seeking experienced Linux Kernel Developers to advance the Linux operating system for large-scale cloud environments. This role involves contributing to the Linux kernel and collaborating on projects across various subsystems. Candidates should have several years...SeniorNetwork
$120.3k - $194.53k
Our Mission At Palo Alto Networks®, we’re united by a shared mission—to protect our digital way of life. We thrive at the... ...a large hybrid infrastructure across multiple public clouds. As a Site Reliability Engineer on the Internet Security Platform team, you will be part...SeniorNetworkFull timeWork at officeVisa sponsorshipWork visa- ...leading cybersecurity company is seeking a Senior Backend Software Engineer to join their team in Sunnyvale, CA. In... ...ecosystem using Go / Golang. Experience with cloud environments, particularly Azure or AWS, as well as networking and API development, is essential. Join a...SeniorNetwork
- ...cybersecurity company in Sunnyvale is seeking a Senior Backend Software Engineer with strong Go/Golang coding skills and cloud experience in Azure or AWS. In this hybrid... ...firewall management security frameworks and work on networking aspects of innovative products. Join a...SeniorNetwork
- ...that turns raw, high‑volume telemetry into reliable, job‑centric insights and automation for... ...GPU fleets. Join our team of innovative engineers who are building this platform and... ...reliability. Ways to Stand Out Strong Linux and networking fundamentals, distributed systems...SeniorNetwork
$133.1k - $306.4k
Senior Manager, Network Reliability Engineering Job Identification 336557 Job Category Product Development Posting Date 06/09/2026, 04:54 PM Role People... ...physical network in a broadly distributed, multi‑tenant cloud environment. This is a highly collaborative, full‑...SeniorNetworkTemporary workFlexible hours
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Network Reliability Engineer - DGX Cloud. Be the first to apply!
- ip network engineer Santa Clara, CA
- network software engineer Santa Clara, CA
- core network engineer Santa Clara, CA
- network reliability engineer Santa Clara, CA
- senior network engineer Santa Clara, CA
- production network engineer Santa Clara, CA
- network engineer Santa Clara, CA
- network engineer - transport Santa Clara, CA
- network engineer contract Santa Clara, CA
- data center network engineer Santa Clara, CA
