Senior Software Engineer, DGX Cloud Production Engineering
$184k - $287.5kNVIDIA
NVIDIA DGX Cloud is building and operating large-scale GPU infrastructure for AI research and production workloads. We are looking for Senior Software Engineers to help build the automation, tooling, and operational systems that make GPU clusters reliable, scalable, and safe to run. This role is part of a production engineering team focused on Kubernetes-based infrastructure, GPU cluster operations, reliability, automation, GitOps, and Day 2 operability across DGX Cloud environments.
What you'll be doing:- Build and operate automation for large-scale GPU clusters across NVIDIA Cloud Partners (NCP) and on-prem environments.
- Develop tools and services for provisioning, validation, upgrades, monitoring, repair, and cluster lifecycle operations.
- Improve Day 0 / Day 1 / Day 2 workflows for cluster bringup, handoff, and production operations.
- Reduce manual production touches through APIs, GitOps, automation, and agent-assisted workflows.
- Participate in on-call, incident response, debugging, and durable follow-up work.
- Partner with platform, storage, networking, security, and workload teams to make infrastructure production-ready.
- 8+ years of experience building or operating production infrastructure.
- Strong programming skills in Python, Go, or similar.
- Experience with Linux, Kubernetes, containers, cloud infrastructure, or infrastructure automation.
- Ability to troubleshoot distributed systems in production.
- Clear communication and ability to work across teams.
- BS/MS in Computer Science or equivalent experience.
- Experience with GPU infrastructure, Kubernetes operators, GitOps, Terraform, ArgoCD, or fleet automation.
- Experience with SLOs, on-call, incident response, observability, and reliability practices.
- Exposure to BMaaS, VMaaS, managed Kubernetes, or multi-cloud infrastructure.
NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
$184k - $287.5k
...NVIDIA DGX Cloud is building and operating large-scale GPU infrastructure for AI research and production workloads. We are looking for Senior Software Engineers to help build the automation, tooling, and operational systems that make GPU clusters reliable, scalable, and...SeniorSoftwareRemote work- NVIDIA Corporation is seeking a Senior Software Engineer to join its DGX Cloud Production Engineering team in Santa Clara, CA. This role focuses on building automation and operational systems for large-scale GPU clusters, ensuring reliability and scalability. The ideal...SeniorSoftware
$248k - $396.75k
NVIDIA Corporation is looking for a Principal Software Engineer specializing in Distributed Systems for DGX Cloud. This role involves managing and developing scalable GPU clusters for AI workloads, requiring significant experience in Kubernetes and software engineering...SeniorSoftware$272k - $431.25k
...NVIDIA DGX Cloud is scaling GPU infrastructure across internal, partner... ...We are looking for Principal Software Engineers to help shape the technical direction for production engineering, Kubernetes-based... ...clusters. This role is for senior technical leaders who can define...Software$224k - $356.5k
...the world. As part of the DGX Cloud organization, the... ...security, silicon, and cloud engineering teams to turn embedded hardware... ...security, silicon, platform, and software teams to deliver end-to-end... ...REST APIs and microservices in production. ~ Experience with cloud-...SeniorSoftwareRemote work$136k - $224.25k
## Senior Network Reliability Engineer - DGX CloudApplylocations: US, CA, Santa Clara: US, Remotetime... ...support and maintain our cloud and datacenter network... ...needs across the whole software stack for NVIDIA, from... ...defined SLAs, triage production impacting network incidents...SeniorSoftwareRemote workShift work$224k - $356.5k
## Engineering Manager, DGX Cloud Production EngineeringApplylocations: US, CA, Remote: US, TX, Remote: US, WA, Remotetime type: Full timeposted on: Posted... ...looking for an Engineering Manager to lead a team of software and production engineers focused on Kubernetes-based...SoftwareRemote work$320k
...leading tech company is seeking a seasoned individual to spearhead DGX Cloud strategy, focusing on GPU lifecycle and operational health.... ..., collaborating with stakeholders, and managing full software and system lifecycles. If you're passionate about technology and...SeniorSoftware$168k - $264.5k
NVIDIA Corporation is seeking a Senior Network Engineer to develop a cloud network infrastructure that supports software development workflows. This role involves designing, implementing, and troubleshooting network stacks, with a focus on automation. Key qualifications...SeniorSoftware$272k - $431.25k
NVIDIA Corporation is looking for a Principal Software Engineer for DGX Cloud Production Engineering to define technical strategies and lead efforts in large-scale GPU operations. The successful candidate will have over 15 years of experience in distributed systems, with...SoftwareRemote job$384k
NVIDIA is seeking a Senior Director, System Software Engineering, to lead strategy and execution for capacity management in DGX Cloud, building the capacity foundation for NVIDIA's internal... ...closely with architecture, security, product, and developer platform leaders to...SeniorSoftwareFull time$168k - $264.5k
NVIDIA is looking for a Senior Network Engineer to develop a cloud network infrastructure. The goal is to craft... ...efficient network to support NVIDIA software development workflows and tools,... ...resource management flow and developer productivity tools. The network is serving the...SeniorSoftware$184k - $287.5k
...Joining NVIDIA's DGX Cloud AI Efficiency Team means contributing... ...seeking an AI infrastructure software engineer to join our team. You'll be... ...availability of AI systems. As a senior DGX Cloud AI Infrastructure... ...Enhance infrastructure and products underpinning NVIDIA's AI...SeniorSoftware$182k - $260k
...Zscaler is seeking an experienced Principal Software Development Engineer to join their Platform Engineering team. This remote position in the USA involves establishing coding standards and best practices, mentoring junior engineers, and collaborating with cross-functional...SeniorSoftwareRemote work- ...A leading market research firm is seeking a Senior Development Operations Engineer to join their remote team. This role involves managing daily operations... ...'s degree in Computer Science, have over 5 years of software engineering experience, and be skilled in tools like...SeniorSoftwareRemote work
- ...for application microservices deployed in both on-prem and on Cloud. Setup test tools to validate environment, application and solutions... ...science or equivalent with 1+ years hands on professional software development experience with a variety of different testing...SeniorSoftware
$355k
...Bridgewater Associates Referrals is seeking a Senior Security Engineer with a passion for security, software engineering, and automation. You will build secure platforms and ensure safety in software delivery. Responsibilities include securing CI/CD pipelines and collaborating...SeniorSoftwareRemote work$272k - $431.25k
...We are looking for a Principal Software Engineer to join our DGX Cloud team and build the foundational systems... ...coaching, mentoring, and encouraging senior engineers, elevating the technical... ...on the customer experience and product requirements, translating deep technical...Software- ...Framework Ventures is seeking a Senior Software Development Engineer (Automation) to drive testing phases and develop automated test strategies. This... ...automation, with proficiency in Python, JavaScript, and cloud platforms. The engineer will optimize test suites for deployments...SeniorSoftware
- A tech solutions provider based in San Diego is seeking a Senior Software Engineer specializing in Oracle Agile PLM. The successful candidate will... .... This position requires expertise in automation tools and cloud computing, particularly OCI. #J-18808-Ljbffr SoftClouds LLCSeniorSoftware
- Softclouds is seeking a Senior Software Engineer in San Diego, United States, to specialize in Oracle Agile PLM environments. The role involves... ...position requires US citizenship and offers an opportunity to work on complex cloud migration projects. #J-18808-Ljbffr SoftcloudsSeniorSoftware
- A technology solutions provider is seeking a Sr. Software Test Automation Engineer in San Diego, CA. This role requires 7+ years of experience in software test automation, focusing on automation frameworks and API testing using tools like Selenium and Jmeter. The successful...SeniorSoftwareFull time
- A leading tech solutions company is seeking a Sr. Platform Engineer to manage and innovate on virtualization platforms, primarily Nutanix... ...and Terraform. This role supports DevOps practices across the software development lifecycle, ensuring efficient and scalable platform...SeniorSoftware
$230k - $270k
North Cloud is seeking a Senior Software Engineer, AI to drive machine learning and AI initiatives. This hybrid role is centered in New York City and involves collaborating on AI-powered features for cloud finance management. The ideal candidate has over 5 years of experience...SeniorSoftware$125k - $185k
...A leading technology company is seeking a Senior Software Quality Engineer to enhance test automation and ensure the quality of their products. Candidates should have strong SQL knowledge and experience with programming languages like Python, Go, and C++. This full-time...SeniorSoftwareFull timeRemote workFlexible hours- Zillow Group Inc. is seeking a Senior Software Development Engineer to lead the CI/CD infrastructure for cloud applications. This remote position requires over 5 years of software engineering experience and strong knowledge of infrastructure as code with Terraform. The...SeniorSoftwareRemote job
$91.35 - $103.37 per hour
A government solutions company is looking for a skilled DevOps Engineer in Annapolis. In this role, you will streamline the software development lifecycle and use AWS and automation tools. The ideal candidate has over 10 years of experience in DevOps and holds a Bachelor...SeniorSoftwareHourly pay- ...global automotive firm in Plano, TX is seeking an experienced Senior Software Engineer to join the Toyota Financial Services Enterprise Tools team... ...will have over three years of experience in developing cloud-native applications and expertise in automation and microservices...SeniorSoftware
- DataRobot, Inc. is seeking a DevOps Engineer II based in New York to architect and support efficient software systems. The candidate will work closely with engineers across disciplines, focusing on automating processes using Python, Kubernetes, and Terraform. Responsibilities...SeniorSoftwareFlexible hours
- LRP Media Group is in search of a Senior Software QA Engineer to drive quality assurance for ModMed's cloud platform. You will design and implement test plans, enhance automated testing frameworks, and collaborate with cross-functional teams to ensure software reliability...SeniorSoftware
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Software Engineer, DGX Cloud Production Engineering. Be the first to apply!
- graduate software developer United States
- rust software engineer United States
- senior software design engineer United States
- software engineer student United States
- software engineer amazon United States
- software developer positions United States
- software engineer full time United States
- software qa engineer United States
- new graduate software engineer United States
- junior software developer United States

