Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Network Reliability Engineer - DGX Cloud

$136k - $224.25k

NVIDIA

NVIDIA is looking for a Senior Network Reliability Engineer to support and maintain our cloud and datacenter network infrastructures. This network serves the needs across the whole software stack for NVIDIA, from Graphics Drivers to Autonomous Vehicles and Artificial Intelligence.

In this role, the Senior Network Operations Engineer will remediate critical alerts within defined SLAs, triage production impacting network incidents, and interact with internal customers on network related issues. They will also be responsible for engaging with external vendors to remediate hardware and software issues, and participate in project related work such as network device upgrades and capacity augmentations. An ideal candidate will possess a wide range of skills, including alert monitoring & resolution in large-scale networks and CSP environments, outstanding troubleshooting skills, understanding of L3 underlay networks, and network protocol knowledge in large multi-vendor infrastructures.

What you will be doing:

  • Engage in 24/7 global shift rotations to provide remote support for network repairs and changes while collaborating across teams and updating customers on status and ticket information.

  • Drive operational improvements in change management and daily operations by following procedures.

  • Manage and operate large scale IP network technologies and infrastructures.

  • Utilize your skills in Peering and Datacenter interconnect technologies: PNI, Transit, Exchange, Passive DWDM, Wave circuits.

  • Monitor and support the network health of on-premises and cloud infrastructures.

  • Collaborate and develop workflow enhancements while documenting best practices.

What we need to see:

  • Deep knowledge and experience of TCP/IP, BGP, OSPF, MPLS, IS-IS, VxLAN, EVPN, QoS, GRE, IPsec, DNS, and MACsec.

  • 5+ years of experience in network operations.

  • Skilled in network troubleshooting techniques and demonstrating creative problem-solving abilities.

  • Strong track record of alert response within defined SLAs and Incident management.

  • Experience with one or more of the following CSP environments: AWS, Azure, GCP, OCI.

  • Familiarity with Arista, Fortinet and Juniper.

  • Hands-on experience with contributing to tooling and automation for provisioning, monitoring, and managing complex network infrastructures.

  • Bachelor’s degree in Computer Science, related technical field, or equivalent experience.

  • Excellent verbal and written communication skills.

Ways To Stand Out From The Crowd:

  • Solid understanding of Mellanox/Cumulus OS and Infiniband technology.

  • Skilled in Unix/Linux system administration, with the ability to write and understand Python/Shell scripts to improve efficiency in hyperscale environments.

  • Familiarity with leveraging tools such as Netbox/Nautobot, Prometheus, Grafana, Panoptes to monitor and manage a global network. Passionate about innovating and investing in ground breaking technologies.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hard-working people in the world working for us. Are you creative and autonomous? Do you love a challenge? If so, we want to hear from you. NVIDIA’s deep learning platforms have made major impact to various fields is broadly used across leading academic institutions, start-ups, and industry, including the world’s largest Internet companies. We need passionate, hard-working and creative people to help us take on more of these outstanding opportunities in deep learning cloud solutions.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 136,000 USD - 224,250 USD for Level 3, and 168,000 USD - 264,500 USD for Level 4.

You will also be eligible for equity and benefits ( .

Applications for this job will be accepted at least until May 29, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Senior Network Reliability Engineer - DGX Cloud in United States vacancy
  • $184k - $287.5k

     ...NVIDIA DGX Cloud is building and operating large-scale GPU infrastructure...  .... We are looking for Senior Software Engineers to help build the...  ...systems that make GPU clusters reliable, scalable, and safe to run...  ...with platform, storage, networking, security, and workload teams... 
    Senior
    Network
    Remote work

    NVIDIA

    United States
    1 day ago
  • $184k - $287.5k

     ...Joining NVIDIA's DGX Cloud Lepton Team means contributing...  ...software engineer to join our team. You'...  ...in production. As a senior DGX Cloud AI Infrastructure...  ...meaningful and actionable reliability metrics to track and improve...  ...of NVIDIA GPUs, network technologies (RDMA, IB... 
    Senior
    Network

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $224k - $356.5k

     ...the world. As part of the DGX Cloud organization, the...  ...security, silicon, and cloud engineering teams to turn embedded hardware...  ...attestation standards into reliable, self-service cloud capabilities...  ...across Data Center, Automotive, Networking, and AI ecosystems.... 
    Senior
    Network
    Remote work

    NVIDIA

    United States
    5 days ago
  •  ...us in the moments that matter. Engineering delivers on that promise. The Senior Site Reliability Engineer is responsible for...  ...automated remediation across cloud infrastructure * Evaluate and...  ...Experience with common cloud networking, firewall and load balancing configuration... 
    Senior
    Network
    Work experience placement
    Remote work
    Flexible hours

    Donnelley Financial, LLC

    Rockville, MD
    5 days ago
  • $168k

    NVIDIA is hiring experienced Senior Production Engineers to help scale up its AI...  ...significant experience with site reliability principles and techniques...  ...: You will be part of an DGX Cloud team responsible for...  ...diagnostics to cluster and network telemetry. Working with... 
    Senior
    Network
    Full time

    NVIDIA

    Texas
    4 days ago
  • $168k - $264.5k

     ...NVIDIA is looking for a Senior Network Engineer to develop a cloud network infrastructure. The goal is to craft a reliable, scalable and efficient network to support NVIDIA software development workflows and tools, including CI/CD pipelines, compute resource management... 
    Senior
    Network
    Remote work

    NVIDIA

    United States
    5 days ago
  •  ...About the Role: Sensible Care is now hiring a Senior Cloud & Reliability Engineer who will maintain our current stack, lead our HIPAA compliance...  ...with HIPAA or similar regulatory frameworks. Networking: Strong understanding of VPCs, subnets, VPNs, and load... 
    Senior
    Network
    Immediate start

    SENSIBLE CARE, INC.

    Irvine, CA
    3 days ago
  • $172.8k - $320.9k

     ...SRE function to support the Veeam Data Cloud, our new SaaS platform. This role...  ...ground-up role — you'll help define how reliability engineering works here by mapping systems, writing...  ...* Solid grasp of distributed systems, networking, and cloud-native architecture. * Clear... 
    Senior
    Network
    Base plus commission
    Full time
    Local area
    Remote work
    Worldwide

    Veeam Software

    United States
    1 day ago
  •  ...Job Description Sr TechOps & SRE Lead Engineer (AWS Cloud) Department: Technology / Engineering...  ...infrastructure, DevOps practices, reliability engineering, and operational excellence...  ...Implement VPC architecture, IAM policies, networking, and security best practices. Oversee... 
    Senior
    Network
    Remote work

    Simple Solutions

    Saint Augustine, FL
    21 hours ago
  • $80 per hour

     ...Senior Cloud DevOps Engineer/Site Reliability Engineer Position Title: Senior Cloud DevOps Engineer/Site Reliability Engineer Location: San Jose,...  ..., CI/CD, automated testing) Good understanding of networking Bachelor degree in Computer Science or equivalent... 
    Senior
    Network
    Local area

    ClifyX

    San Jose, CA
    1 day ago
  •  ...recruiting for one of its clients a Senior Site Reliability Engineer (Azure) - this is a fully remote role...  ...(ArgoCD), Helm, and strong RBAC and network policies. Build and maintain secure...  ...solutions Functional Expertise Azure cloud services (networking, compute,... 
    Senior
    Network
    Remote work

    Career Renew

    Miami, FL
    21 hours ago
  •  .... Since inventing decentralized oracle networks, Chainlink has enabled tens of trillions...  ...Reserve. Learn more at chain.link.The Engineering TeamAs adoption of the Chainlink Runtime...  ...be a part of that growth to ensure reliability and security remain at the forefront of... 
    Senior
    Network
    Remote work

    Chainlink Labs

    Bogota, NJ
    4 days ago
  • $90k - $215k

     ...Senior Software Engineer- Observability and Reliability Platform Engineering (REMOTE) Senior Software Engineer- Observability and Reliability Platform Engineering...  ..., and maintenance of the hardware, software, and network systems 3+ years of experience in open-source... 
    Senior
    Network
    Hourly pay
    Full time
    Work experience placement
    Local area
    Remote work
    Flexible hours

    GEICO

    San Jose, CA
    5 days ago
  • $232k - $319k

     ...service with great people and reliable, cost-effective, and...  ...multiple teams focused on Edge networking, K8s platform, CI/CD, Observability...  ...with architects and product engineering Build a world-class observability...  ...of scalable, self-service Cloud infrastructure platforms (e.g... 
    Senior
    Network
    Permanent employment
    Local area
    Worldwide
    Flexible hours

    Okta, Inc.

    San Francisco, CA
    2 days ago
  •  ...Our client is seeking a Senior Systems Reliability Engineer to support and optimize large-scale, distributed infrastructure environments. This...  ...scale, distributed systems environments Foundational networking knowledge including DNS, TCP/IP, routing, and firewalls... 
    Senior
    Network
    Remote work

    The Right Click, Inc.

    New York, NY
    4 hours ago
  • $82.3k - $228.8k

     ...experience. Five9 is a leading provider of cloud contact center software, bringing...  ...are seeking a highly experienced Senior Site Reliability Engineer – Compute Platforms to design,...  ...storage, Kubernetes, hypervisors, networking, and Linux systems Partner with operations... 
    Senior
    Network
    Temporary work
    Work at office
    Remote work
    Worldwide
    3 days per week

    Five9

    San Ramon, CA
    2 days ago
  • $108.5k - $135.6k

     ...Senior Reliability Engineer Position at EVgo EVgo (Nasdaq: EVGO) is one of the nation's largest public fast charging networks for electric vehicles. Our mission is to expedite the mass adoption of electric vehicles (EVs) by creating a convenient, reliable, and affordable... 
    Senior
    Network
    Work experience placement

    EVgo

    Fresno, CA
    5 hours ago
  • $105k - $115k

     ...Senior Reliability Network Engineer Byron Center Office - Byron Center, MI 49315 Overview Salary Range $105,000.00 - $115,000.00 Salary Position Type Full Time Education Level High School Travel Percentage Negligible Description The Senior Network Reliability... 
    Senior
    Network
    Full time
    Work at office

    Surf Internet

    Byron Center, MI
    3 days ago
  •  ...Senior Network Reliability Engineer is a full-time position responsible for supporting and maintaining cloud and datacenter network infrastructures, managing incidents, and collaborating with internal and external stakeholders. Key Responsibilities Provide 24/7 remote... 
    Senior
    Network
    Full time
    Remote work

    Virtual Vocations Inc

    United States
    8 hours ago
  • $150k - $225k

     ...Senior Systems Reliability Engineer Remote - Must reside in California or Oregon Senior Systems Reliability Engineer About IEX IEX (IEX...  ...the whole stack - hardware, software, application, and network. Document current and future configuration processes... 
    Senior
    Network
    Work experience placement
    Local area
    Remote work
    Flexible hours

    IEX Group

    United States
    2 days ago
  • $272k - $431.25k

     ...NVIDIA DGX Cloud is scaling GPU infrastructure across...  ...for Principal Software Engineers to help shape the technical...  ..., automation, and reliability across large-scale GPU...  .... This role is for senior technical leaders who...  ...infrastructure, storage, networking, security, and... 
    Network

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $92.31k - $131.99k

     ...field demands Work with theinternal reliability teams and other stakeholders on the development...  ...scale testing Bachelor's degree in engineering, Computer Science, or related field...  ...including Pride! Women's Leadership Network and a Young Professionals Network. Our... 
    Senior
    Network
    Full time
    Temporary work
    Flexible hours

    Seagate Technology

    Longmont, CO
    4 hours ago
  • $96k - $163k

     ...accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services...  ...their greatest potential. Title and Summary Senior Site Reliability Engineer Overview- The B&MI BizOps team is looking for a Senior... 
    Senior
    Network
    Full time
    Part time
    Worldwide
    Flexible hours
    Shift work

    Mastercard

    O Fallon, MO
    1 day ago
  • $96k - $163k

     ...accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services...  ...their greatest potential. Title and Summary Senior Site Reliability Engineer Job Description: The BizOps team is looking for a Site... 
    Senior
    Network
    Full time
    Part time
    Immediate start
    Worldwide
    Flexible hours
    Shift work
    Weekend work

    Mastercard

    O Fallon, MO
    1 day ago
  • $130.45k - $142.38k

     ...leading home warranty company in Austin, Texas, is seeking a Senior IP Telephony Engineer responsible for engineering and modernizing enterprise...  ...and knowledge of SIP troubleshooting, alongside networking fundamentals. The position offers a competitive salary of... 
    Senior
    Network

    First American

    Austin, TX
    4 days ago
  •  ...Job Description Insight Global is seeking a Network Engineer – Reliability & Observability to support the quality, reliability, and lifecycle performance of large-scale AI network infrastructure. This role serves as a reliability engineering leader, responsible for... 
    Senior
    Network

    Insight Global

    San Francisco, CA
    5 days ago
  • $130.45k - $142.38k

     ...prominent home warranty provider is looking for a Senior IP Telephony Engineer in the Town of Texas, Wisconsin. This role...  ...management of voice incidents, migration to VoIP/cloud platforms, and collaborating with network teams to ensure high availability and secure operations... 
    Senior
    Network

    First American

    Wausau, WI
    2 days ago
  • $96k - $163k

     ...accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services...  ...their greatest potential. Title and Summary Senior Site Reliability Engineer Overview The BizOps team is looking for a Senior Site... 
    Senior
    Network
    Full time
    Part time
    Worldwide
    Flexible hours
    Shift work

    Mastercard

    O Fallon, MO
    1 day ago
  • $147k - $237.5k

     ...Palo Alto Networks seeks an Infrastructure Engineer based in New York, NY to optimize and develop internal tools that enhance developer productivity. The ideal candidate has over 10 years in infrastructure engineering, possesses strong expertise in Go, Kubernetes, and... 
    Senior
    Network
    Remote work

    Palo Alto Networks

    New York, NY
    13 days ago
  •  ...A technology firm seeks a Senior Network Security Engineer to manage and maintain network security systems. The role involves troubleshooting issues, implementing security solutions, and providing expertise across teams. The ideal candidate has 7-10 years of relevant... 
    Senior
    Network
    Remote work

    Kaseya

    New York, NY
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Network Reliability Engineer - DGX Cloud. Be the first to apply!