Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Software Engineer, DGX Cloud Production Engineering

$272k - $431.25k

NVIDIA

NVIDIA DGX Cloud is scaling GPU infrastructure across internal, partner, and cloud environments. We are looking for Principal Software Engineers to help shape the technical direction for production engineering, Kubernetes-based operations, automation, and reliability across large-scale GPU clusters.

This role is for senior technical leaders who can define architecture, lead through influence, build critical systems, and turn ambiguous infrastructure problems into durable software and operating models.

What you’ll be doing:

  • Define and execute the technical strategy for DGX Cloud cluster operations, building the automation, GitOps, and Day 2 reliability needed to operate large-scale GPU clusters across NVIDIA Cloud Partners (NCPs) and on-prem environments.

  • Lead design and implementation of systems for cluster lifecycle, validation, repair, upgrades, observability, and readiness.

  • Establish patterns for Kubernetes-based GPU cluster operations across partner and on-prem environments.

  • Identify and eliminate operational toil through software, APIs, automation, and agent-assisted workflows.

  • Set technical standards for production readiness, SLOs, incident response, handoff gates, and operational acceptance.

  • Mentor engineers and influence platform, infrastructure, storage, networking, security, and workload teams.

What we need to see:

  • 15+ years of experience building and operating large-scale distributed systems or cloud infrastructure.

  • Deep experience with Kubernetes, Linux, infrastructure automation, and production operations.

  • Strong programming experience in Go, Python, or similar.

  • Proven ability to lead complex cross-org technical initiatives.

  • Experience designing reliable systems with clear SLOs, observability, incident response, and automation.

  • BS/MS in Computer Science or equivalent experience.

Ways to stand out from the crowd:

  • Experience with GPU clusters, AI/ML infrastructure, Kubernetes operators, GitOps, BMaaS/VMaaS, managed Kubernetes, or multi-cloud fleet operations.

  • Experience building internal platforms, control planes, lifecycle automation, or production readiness frameworks.

  • Track record of turning operational pain into reusable software, APIs, and engineering standards.

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. We have some of the most forward-thinking and hard-working people on the planet working for us. If you're creative, hard-working and self-motivated, we want to hear from you!

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 431,250 USD.

You will also be eligible for equity and benefits ( .

Applications for this job will be accepted at least until May 22, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Principal Software Engineer, DGX Cloud Production Engineering in Santa Clara, CA vacancy
  • $272k - $431.25k

     ...technology—and amazing people. We are looking for a Principal Software Engineer to join our DGX Cloud team and build the foundational systems that drive...  ...Maintain an incredible focus on the customer experience and product requirements, translating deep technical insight into... 
    Suggested

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

     ...NVIDIA DGX Cloud is building and operating large-scale GPU infrastructure for AI research and production workloads. We are looking for Senior Software Engineers to help build the automation, tooling, and operational systems that make GPU clusters reliable, scalable, and... 
    Suggested
    Remote work

    NVIDIA

    Santa Clara, CA
    1 day ago
  • NVIDIA Corporation is seeking a Senior Software Engineer to join its DGX Cloud Production Engineering team in Santa Clara, CA. This role focuses on building automation and operational systems for large-scale GPU clusters, ensuring reliability and scalability. The ideal... 
    Suggested

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $272k - $431.25k

    NVIDIA Corporation is looking for a Principal Software Engineer for DGX Cloud Production Engineering to define technical strategies and lead efforts in large-scale GPU operations. The successful candidate will have over 15 years of experience in distributed systems, with... 
    Suggested
    Remote job

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $224k - $356.5k

     ...the world. As part of the DGX Cloud organization, the...  ...security, silicon, and cloud engineering teams to turn embedded hardware...  ...security, silicon, platform, and software teams to deliver end-to-end...  ...REST APIs and microservices in production. ~ Experience with cloud-... 
    Suggested
    Remote work

    NVIDIA

    Santa Clara, CA
    6 hours ago
  • $184k - $287.5k

     ...Joining NVIDIA's DGX Cloud AI Efficiency Team means contributing to the infrastructure...  .... We are seeking an AI infrastructure software engineer to join our team. You'll be instrumental...  ...stacks. Enhance infrastructure and products underpinning NVIDIA's AI platforms.... 

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $320k

     ...leading tech company is seeking a seasoned individual to spearhead DGX Cloud strategy, focusing on GPU lifecycle and operational health....  ..., collaborating with stakeholders, and managing full software and system lifecycles. If you're passionate about technology and... 

    NVIDIA Corporation

    Santa Clara, CA
    5 days ago
  • $272k - $431.25k

    Joining NVIDIA's DGX Cloud Team means contributing to the infrastructure that...  ....We are seeking a distributed software engineer to join our team! As a Principal Engineer, you'll be instrumental...  ...to enhance the infrastructure and products that underpin NVIDIA's AI platforms... 

    NVIDIA Corporation

    Santa Clara, CA
    5 days ago
  • $272k - $431.25k

     .... From single node HGX/DGX systems all the way up...  ...growing enterprise and cloud provider businesses. Each...  ...NVIDIA AI and HPC software stack. We’re searching...  ...generation data center products. The ideal candidate will...  ...Mentor architects and engineering teams to grow them into... 
    Shift work

    NVIDIA

    Santa Clara, CA
    7 days ago
  • $136k - $224.25k

     ...Senior Network Reliability Engineer - DGX CloudApplylocations: US, CA,...  ...to support and maintain our cloud and datacenter network infrastructures...  ...the needs across the whole software stack for NVIDIA, from...  ...within defined SLAs, triage production impacting network incidents,... 
    Remote work
    Shift work

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $147k - $237.5k

     .... Job Summary Join our Cloud Network and AI Security team...  ...Engage in all phases of the product development cycle from concept...  ...various hypervisors, system software, and networking. Qualifications...  ...~10 or more years of related engineering experience. ~ Strong... 
    Full time
    Work at office
    Local area

    Palo Alto Networks

    Santa Clara, CA
    1 day ago
  • $210k - $295k

     ...goal of enabling human life on Mars. PRINCIPAL SOFTWARE ENGINEER (PLATFORM TEAM) The Platform Team...  ...and proxies that integrate with any cloud compute provider and multiple frontier...  ...will be critical to accelerating SpaceX production and development by making trustworthy... 
    Permanent employment
    Temporary work

    SpaceX

    Sunnyvale, CA
    1 day ago
  • $147k - $237.5k

     ...Career Help build what is next. Our Cloud Management Platform is a public cloud...  ...network security portfolio. Principal Software Engineers are: Design and develop high-volume...  ...to the specific platform Work with product management on user requirements, designers... 
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    3 days ago
  •  ...Principal Engineer (Sr Manager-equivalent) At Palo Alto Networks®, we're...  ...Palo Alto Networks, Secure Cloud and AI infrastructure is the...  ..., elevate our standards for software quality, and unlock new business...  ...of agentic AI into our products. This role carries executive... 
    Full time
    Work at office
    3 days per week

    Palo Alto Networks

    Santa Clara, CA
    1 day ago
  • $147k - $237.5k

     ..., and we're looking for an experienced Software Engineer to join our team. This team is responsible...  ...to completion, and support them in production Be a champion of test driven development...  ...knowledge of at least one of the major cloud platforms (eg GCP, AWS, or Azure),... 
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    5 days ago
  • $143k - $286k

     ...Summary... What you'll do... As a Principal Engineer in Walmart's Fraud and Risk platform,...  ...passionate Engineers, Data scientists and Product managers who love to challenge each...  ...at Walmart Global Tech. We're a team of software engineers, data scientists, cybersecurity... 
    Full time
    Temporary work
    Part time

    Walmart

    Sunnyvale, CA
    5 days ago
  • $126k - $204.5k

     ...Alto Network's Next-Gen Firewall Cloud Security team is looking for a Sr AI Automation/Test Engineer with experience in Public and...  ...will be part of a world-class software QA engineering team that works...  ...-breaking Cloud security products, As a Sr AI Automation/Test... 
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

     ...Principal Engineer, Security Foundations For Autonomous Agents NVIDIA has been transforming...  ...sources. You'll partner closely with Cloud, AI/ML & Generative AI workforce, internal...  ...intuition for balancing developer productivity with security and compliance, and the... 

    NVIDIA

    Santa Clara, CA
    6 hours ago
  • $143k - $286k

     ...ll do... Role Overview: We are seeking a Principal Software Engineer to lead the design and development of...  ...platforms, enterprise architecture, DevOps, cloud computing, and infrastructure. All of these products and services are supported by scalable and powerful... 
    Full time
    Temporary work
    Part time

    Walmart

    Sunnyvale, CA
    4 days ago
  • $272k - $431.25k

     ...the world. NVIDIA GH200 superchip provides performance and productivity required for strong scaling for HPC and generative AI...  ...design of this massive superchip. We are looking for expert engineers to come and help design rack level solutions for next generation... 

    NVIDIA

    Santa Clara, CA
    6 hours ago
  • $320k

     ...NVIDIA DGX systems are the foundation of the world’s most advanced AI infrastructure...  ..., and a fully optimized AI software stack. We are seeking an engineering leader responsible for end-to-end...  .... You will ensure each DGX product ships as a production-ready system... 

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $147k - $237.5k

     ...SASE Test team and seeking Test Engineers with an Automation‑First...  ...Develop and execute sophisticated software tests and frameworks to...  ...working closely with Development, Product Management, SRE and Technical...  ...leadership in the areas of cloud‑based orchestration, cloud‑delivered... 
    Permanent employment
    Contract work
    Flexible hours

    Palo Alto Networks, Inc.

    Santa Clara, CA
    5 days ago
  • $147k - $237.5k

    Palo Alto Networks, Inc. is seeking a Principal Software Engineer to develop a scalable cloud management platform overseeing next-generation security solutions. Ideal candidates will have over 8 years of experience in enterprise applications and technical leadership, with... 

    Palo Alto Networks, Inc.

    Santa Clara, CA
    4 days ago
  • $147k - $237.5k

    Palo Alto Networks, Inc. is seeking a Principal Software Engineer in Santa Clara, California, to design and implement Threat Intelligence Services. The role involves working on the cloud-native malware detection platform, WildFire. Candidates should have extensive knowledge... 

    Palo Alto Networks, Inc.

    Santa Clara, CA
    5 days ago
  • Palo Alto Networks, Inc. is seeking a Senior Staff Engineer to contribute to their innovative cloud security product, Data Loss Prevention (DLP). This role involves...  ...3 days a week. Candidates should have extensive software engineering experience, particularly with Core... 
    Work at office
    3 days per week

    Palo Alto Networks, Inc.

    Santa Clara, CA
    4 days ago
  • $320k

    Director, Site Reliability and Software Engineering - DGX Cloud page is loaded## Director, Site Reliability and Software Engineering - DGX Cloudlocations...  ...distributed NVIDIA GPU cloud clusters and contribute to product strategy. You will be the leader for all aspects of... 

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $147k - $237.5k

    Palo Alto Networks, Inc. seeks a Principal Software Engineer to join the Cortex Xpanse team in Santa Clara, California. This role focuses on building scalable backend services and APIs while working on the Attack Surface Management platform. Candidates should have 7+ years... 

    Palo Alto Networks, Inc.

    Santa Clara, CA
    4 days ago
  • $248k - $391k

     ...excel and make a profound global impact. We're hiring a Principal Software Engineer to own the engineering efforts across NVIDIA enterprise...  ...technologies such as Nemotron and AI Blueprints in enterprise production environments. Mentor and lead engineers, codify shared... 

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $168k - $264.5k

    NVIDIA is looking for a Senior Network Engineer to develop a cloud network infrastructure. The goal is...  ...efficient network to support NVIDIA software development workflows and tools,...  ...resource management flow and developer productivity tools. The network is serving the... 

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $384k

    NVIDIA is seeking a Senior Director, System Software Engineering, to lead strategy and execution for capacity management in DGX Cloud, building the capacity foundation for...  ...partner closely with architecture, security, product, and developer platform leaders to deliver... 
    Full time

    NVIDIA

    Santa Clara, CA
    22 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Software Engineer, DGX Cloud Production Engineering. Be the first to apply!