Principal Software Engineer, DGX Cloud Production Engineering
$272k - $431.25kNVIDIA
NVIDIA DGX Cloud is scaling GPU infrastructure across internal, partner, and cloud environments. We are looking for Principal Software Engineers to help shape the technical direction for production engineering, Kubernetes-based operations, automation, and reliability across large-scale GPU clusters.
This role is for senior technical leaders who can define architecture, lead through influence, build critical systems, and turn ambiguous infrastructure problems into durable software and operating models.
What you’ll be doing:
Define and execute the technical strategy for DGX Cloud cluster operations, building the automation, GitOps, and Day 2 reliability needed to operate large-scale GPU clusters across NVIDIA Cloud Partners (NCPs) and on-prem environments.
Lead design and implementation of systems for cluster lifecycle, validation, repair, upgrades, observability, and readiness.
Establish patterns for Kubernetes-based GPU cluster operations across partner and on-prem environments.
Identify and eliminate operational toil through software, APIs, automation, and agent-assisted workflows.
Set technical standards for production readiness, SLOs, incident response, handoff gates, and operational acceptance.
Mentor engineers and influence platform, infrastructure, storage, networking, security, and workload teams.
What we need to see:
15+ years of experience building and operating large-scale distributed systems or cloud infrastructure.
Deep experience with Kubernetes, Linux, infrastructure automation, and production operations.
Strong programming experience in Go, Python, or similar.
Proven ability to lead complex cross-org technical initiatives.
Experience designing reliable systems with clear SLOs, observability, incident response, and automation.
BS/MS in Computer Science or equivalent experience.
Ways to stand out from the crowd:
Experience with GPU clusters, AI/ML infrastructure, Kubernetes operators, GitOps, BMaaS/VMaaS, managed Kubernetes, or multi-cloud fleet operations.
Experience building internal platforms, control planes, lifecycle automation, or production readiness frameworks.
Track record of turning operational pain into reusable software, APIs, and engineering standards.
NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. We have some of the most forward-thinking and hard-working people on the planet working for us. If you're creative, hard-working and self-motivated, we want to hear from you!
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 431,250 USD.
You will also be eligible for equity and benefits ( .
Applications for this job will be accepted at least until May 22, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
$272k - $431.25k
...technology—and amazing people. We are looking for a Principal Software Engineer to join our DGX Cloud team and build the foundational systems that drive... ...Maintain an incredible focus on the customer experience and product requirements, translating deep technical insight into...Suggested$184k - $287.5k
...NVIDIA DGX Cloud is building and operating large-scale GPU infrastructure for AI research and production workloads. We are looking for Senior Software Engineers to help build the automation, tooling, and operational systems that make GPU clusters reliable, scalable, and...SuggestedRemote work- NVIDIA Corporation is seeking a Senior Software Engineer to join its DGX Cloud Production Engineering team in Santa Clara, CA. This role focuses on building automation and operational systems for large-scale GPU clusters, ensuring reliability and scalability. The ideal...Suggested
$272k - $431.25k
NVIDIA Corporation is looking for a Principal Software Engineer for DGX Cloud Production Engineering to define technical strategies and lead efforts in large-scale GPU operations. The successful candidate will have over 15 years of experience in distributed systems, with...SuggestedRemote job$224k - $356.5k
...the world. As part of the DGX Cloud organization, the... ...security, silicon, and cloud engineering teams to turn embedded hardware... ...security, silicon, platform, and software teams to deliver end-to-end... ...REST APIs and microservices in production. ~ Experience with cloud-...SuggestedRemote work$184k - $287.5k
...Joining NVIDIA's DGX Cloud AI Efficiency Team means contributing to the infrastructure... .... We are seeking an AI infrastructure software engineer to join our team. You'll be instrumental... ...stacks. Enhance infrastructure and products underpinning NVIDIA's AI platforms....$320k
...leading tech company is seeking a seasoned individual to spearhead DGX Cloud strategy, focusing on GPU lifecycle and operational health.... ..., collaborating with stakeholders, and managing full software and system lifecycles. If you're passionate about technology and...$272k - $431.25k
Joining NVIDIA's DGX Cloud Team means contributing to the infrastructure that... ....We are seeking a distributed software engineer to join our team! As a Principal Engineer, you'll be instrumental... ...to enhance the infrastructure and products that underpin NVIDIA's AI platforms...$272k - $431.25k
.... From single node HGX/DGX systems all the way up... ...growing enterprise and cloud provider businesses. Each... ...NVIDIA AI and HPC software stack. We’re searching... ...generation data center products. The ideal candidate will... ...Mentor architects and engineering teams to grow them into...Shift work$136k - $224.25k
...Senior Network Reliability Engineer - DGX CloudApplylocations: US, CA,... ...to support and maintain our cloud and datacenter network infrastructures... ...the needs across the whole software stack for NVIDIA, from... ...within defined SLAs, triage production impacting network incidents,...Remote workShift work$147k - $237.5k
.... Job Summary Join our Cloud Network and AI Security team... ...Engage in all phases of the product development cycle from concept... ...various hypervisors, system software, and networking. Qualifications... ...~10 or more years of related engineering experience. ~ Strong...Full timeWork at officeLocal area$210k - $295k
...goal of enabling human life on Mars. PRINCIPAL SOFTWARE ENGINEER (PLATFORM TEAM) The Platform Team... ...and proxies that integrate with any cloud compute provider and multiple frontier... ...will be critical to accelerating SpaceX production and development by making trustworthy...Permanent employmentTemporary work$147k - $237.5k
...Career Help build what is next. Our Cloud Management Platform is a public cloud... ...network security portfolio. Principal Software Engineers are: Design and develop high-volume... ...to the specific platform Work with product management on user requirements, designers...Full timeWork at office- ...Principal Engineer (Sr Manager-equivalent) At Palo Alto Networks®, we're... ...Palo Alto Networks, Secure Cloud and AI infrastructure is the... ..., elevate our standards for software quality, and unlock new business... ...of agentic AI into our products. This role carries executive...Full timeWork at office3 days per week
$147k - $237.5k
..., and we're looking for an experienced Software Engineer to join our team. This team is responsible... ...to completion, and support them in production Be a champion of test driven development... ...knowledge of at least one of the major cloud platforms (eg GCP, AWS, or Azure),...Full timeWork at office$143k - $286k
...Summary... What you'll do... As a Principal Engineer in Walmart's Fraud and Risk platform,... ...passionate Engineers, Data scientists and Product managers who love to challenge each... ...at Walmart Global Tech. We're a team of software engineers, data scientists, cybersecurity...Full timeTemporary workPart time$126k - $204.5k
...Alto Network's Next-Gen Firewall Cloud Security team is looking for a Sr AI Automation/Test Engineer with experience in Public and... ...will be part of a world-class software QA engineering team that works... ...-breaking Cloud security products, As a Sr AI Automation/Test...Full timeWork at office$272k - $431.25k
...Principal Engineer, Security Foundations For Autonomous Agents NVIDIA has been transforming... ...sources. You'll partner closely with Cloud, AI/ML & Generative AI workforce, internal... ...intuition for balancing developer productivity with security and compliance, and the...$143k - $286k
...ll do... Role Overview: We are seeking a Principal Software Engineer to lead the design and development of... ...platforms, enterprise architecture, DevOps, cloud computing, and infrastructure. All of these products and services are supported by scalable and powerful...Full timeTemporary workPart time$272k - $431.25k
...the world. NVIDIA GH200 superchip provides performance and productivity required for strong scaling for HPC and generative AI... ...design of this massive superchip. We are looking for expert engineers to come and help design rack level solutions for next generation...$320k
...NVIDIA DGX systems are the foundation of the world’s most advanced AI infrastructure... ..., and a fully optimized AI software stack. We are seeking an engineering leader responsible for end-to-end... .... You will ensure each DGX product ships as a production-ready system...$147k - $237.5k
...SASE Test team and seeking Test Engineers with an Automation‑First... ...Develop and execute sophisticated software tests and frameworks to... ...working closely with Development, Product Management, SRE and Technical... ...leadership in the areas of cloud‑based orchestration, cloud‑delivered...Permanent employmentContract workFlexible hours$147k - $237.5k
Palo Alto Networks, Inc. is seeking a Principal Software Engineer to develop a scalable cloud management platform overseeing next-generation security solutions. Ideal candidates will have over 8 years of experience in enterprise applications and technical leadership, with...$147k - $237.5k
Palo Alto Networks, Inc. is seeking a Principal Software Engineer in Santa Clara, California, to design and implement Threat Intelligence Services. The role involves working on the cloud-native malware detection platform, WildFire. Candidates should have extensive knowledge...- Palo Alto Networks, Inc. is seeking a Senior Staff Engineer to contribute to their innovative cloud security product, Data Loss Prevention (DLP). This role involves... ...3 days a week. Candidates should have extensive software engineering experience, particularly with Core...Work at office3 days per week
$320k
Director, Site Reliability and Software Engineering - DGX Cloud page is loaded## Director, Site Reliability and Software Engineering - DGX Cloudlocations... ...distributed NVIDIA GPU cloud clusters and contribute to product strategy. You will be the leader for all aspects of...$147k - $237.5k
Palo Alto Networks, Inc. seeks a Principal Software Engineer to join the Cortex Xpanse team in Santa Clara, California. This role focuses on building scalable backend services and APIs while working on the Attack Surface Management platform. Candidates should have 7+ years...$248k - $391k
...excel and make a profound global impact. We're hiring a Principal Software Engineer to own the engineering efforts across NVIDIA enterprise... ...technologies such as Nemotron and AI Blueprints in enterprise production environments. Mentor and lead engineers, codify shared...$168k - $264.5k
NVIDIA is looking for a Senior Network Engineer to develop a cloud network infrastructure. The goal is... ...efficient network to support NVIDIA software development workflows and tools,... ...resource management flow and developer productivity tools. The network is serving the...$384k
NVIDIA is seeking a Senior Director, System Software Engineering, to lead strategy and execution for capacity management in DGX Cloud, building the capacity foundation for... ...partner closely with architecture, security, product, and developer platform leaders to deliver...Full time
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal Software Engineer, DGX Cloud Production Engineering. Be the first to apply!
- senior principal software engineer Santa Clara, CA
- principal software engineer Santa Clara, CA
- aws cloud infrastructure engineer Santa Clara, CA
- remote cloud architect Santa Clara, CA
- senior cloud engineer Santa Clara, CA
- cloud architect Santa Clara, CA
- cloud engineering manager Santa Clara, CA
- cloud engineer remote Santa Clara, CA
- principal cloud engineer Santa Clara, CA
- senior principal cloud computing engineer Santa Clara, CA

