Senior Systems Software Engineer, Kubernetes Scale - DGX Cloud
$184k - $287.5kNVIDIA
The DGX Cloud organization at NVIDIA brings together cutting-edge hardware and software innovation to deliver industry-leading accelerated computing for the world's most adventurous AI workloads. We're a team of innovative engineers dedicated to solving some of the world's biggest challenges, constantly driving advancements, and impacting millions of lives worldwide! We are looking for an outstanding Senior Systems Software Engineer with deep experience in distributed systems, open-source technologies such as Kubernetes and containers, and a strong background in systems performance and scalability. The ideal candidate brings broad, end-to-end experience across the stack - from GPU operator and device plugins to distributed inference serving and cloud platforms - along with the technical depth to investigate and address exciting, real-world problems at scale. In this pivotal role, you will take on the challenge of scaling AI infrastructure while optimizing total cost of ownership, driving down cost per token to unlock the next generation of AI innovation and AI factories! What you'll be doing: Drive end-to-end performance and scale characterization for the NVIDIA DGX Cloud software stack, from Kubernetes control and data planes through NVIDIA components such as GPU Operator, Network Operator, DCGM, NIM, and distributed inference serving, following issues from orchestration down to the metal. Collaborate with AI researchers, developers and customers to develop innovative, automated tests that simulate real user workloads using custom-built and leading open-source tools and frameworks. Deep dive into performance and scale issues in complex distributed systems, including interactions between Kubernetes and the NVIDIA software stack, to identify and resolve root causes. Design and develop monitoring, reporting and analysis tools for performance and scale testing across software, GPU and CPU resources. Triage, debug and root cause issues related to operating Kubernetes clusters at ultra-large scale, ensuring reliability and efficiency. Build and maintain a high-velocity framework that enables continuous, always-on performance and scale testing via a modern CI/CD pipeline. Document research, methodologies and results clearly and concisely, and present findings at internal and external venues, including community conferences such as KubeCon and GTC. Engage efficiently with upstream communities — including Kubernetes, CNCF and NVIDIA open-source projects — to validate performance and scalability of AI workloads early and help shape design and development decisions. What we need to see: 8+ years of experience Computer Architecture, Networking, Storage systems, Accelerators and Bachelors/Masters in Engineering (preferably, Electrical Engineering, Computer Engineering, or Computer Science) or equivalent experience Expertise in Kubernetes and familiarity with related CNCF projects Background in working with large scale parallel and distributed accelerator-based systems Expertise optimizing performance and AI workloads on large scale systems Experience with performance modeling and benchmarking at scale Proficiency in Golang/Python Background with the NVIDIA software ecosystem in both training and inference domains Expertise with at least one of public CSP infrastructure (GCP, AWS, Azure, OCI for example) Ways to stand out from the crowd: Strong operational experience with any one of the Kubernetes distributions Prior experience scaling Kubernetes clusters to ultra-large node and object counts Demonstrated history of working in the open-source community Excellent communication and interpersonal abilities PhD in relevant areas #LI-Hybrid Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until June 14, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. NVIDIA pioneered accelerated computing. Today, our AI infrastructure powers global intelligence, transforming every industry. Learn more about NVIDIA.
$184k - $287.5k
At NVIDIA, the DGX Cloud division merges fresh hardware and software innovations to offer... ...team of skilled engineers is committed to addressing... ...looking for a Senior Systems Software Engineer... ...experience in Kubernetes node engineering,... ...at frontier AI scale. In this vital...CloudSeniorFull timeWorldwide- ...Technologies is seeking a Senior Software Engineer to join the Substrate... ...will design and build Kubernetes product offerings, ensuring scale, stability, and security... ...have expertise in systems programming and infrastructure... ...from on-prem to cloud. A range of benefits and...CloudSenior
- ...industry player is seeking a skilled support engineer to enhance system stability and performance. In this... ...services, troubleshoot large-scale distributed systems, and support applications running on Kubernetes and cloud platforms. Your expertise will be vital...CloudSenior
$140k - $170k
A digital transformation firm is seeking a skilled DevOps Engineer in Seattle to design, automate, and scale cloud-native infrastructure. This role involves managing Kubernetes environments, building CI/CD pipelines, and maintaining operations on Google Cloud Platform...CloudSenior$200k - $322k
Senior Technical Program Manager, DGX Cloud - Trust Services page is loaded## Senior... ...Cloud powers large-scale AI infrastructure... ...security, compliance, engineering execution, and... ...firmware, platform, and software teams.* Establish... ...SaaS, distributed systems, or infrastructure...CloudSenior- ...data and AI company is seeking a Senior Software Engineer to join their Networking... ...networking solutions for large-scale compute clusters across all major cloud providers. The ideal candidate... ...background in large-scale distributed systems and network connectivity. This...CloudSenior
$200k - $322k
NVIDIA Corporation in Seattle is seeking a Senior Technical Program Manager for DGX Cloud to lead Trust Services programs. This role entails collaborating across engineering and security teams to implement and maintain trust service standards in complex infrastructures...CloudSenior- ...Senior Platform/DevOps Engineer (Kubernetes-Linux) Bellevue Office, Sunset Corporate... ...with speed, scale and sovereignty. Named... ...centers and Atlas cloud integration. This is... ..., and alerting systems (e.g., Prometheus,... ...collaboration with software engineering, DevOps...CloudSeniorWork at officeLocal areaFlexible hours
$106.61k - $284.28k
...Hispanic Alliance for Career Enhancement is seeking a seasoned Cloud Platform Engineer to lead the maintenance and evolution of our high-... ...availability cloud infrastructure. This role requires expertise in Kubernetes and Azure, along with experience in Terraform and Ansible...CloudSenior- ...financial institution is seeking a Site Reliability Engineer III in Seattle, WA. This role involves designing and managing cloud infrastructure, deploying containerized... ...Engineering or DevOps, with strong AWS and Kubernetes expertise. The position offers competitive compensation...CloudSenior
- ...of their business systems through natural language... ...on the Forbes Cloud 100 and AI 50... ...Moveworks’ Reasoning Engine and natural language... ...by the global scale of ServiceNow and... ...not a blocker. As a Senior Identity & Access... ...across AWS, Azure, Kubernetes, and beyond; reduce...CloudSeniorContract workWork at officeRemote workFlexible hours
- Medium is seeking a Senior Site Reliability Engineer to lead the evolution of its infrastructure. The... ...involves designing resilient systems, automating deployment processes,... ...relevant experience, particularly in scaling systems and cloud infrastructure. Benefits include...CloudSenior
- ...V. in Seattle, United States, is seeking a proficient Software Engineer to enhance their cloud network solutions. This role involves leading initiatives... ...experience and expertise in cloud technologies, Kubernetes, and Go. Join an inclusive team dedicated to operational...CloudSenior
$320k
NVIDIA Corporation is seeking a Site Reliability Engineer to design and maintain extensive production systems, focusing on Kubernetes and performance optimization. Ideal... ...will possess over 16 years of experience in cloud systems, strong automation skills, and a BS...CloudSenior$139.5k - $258.1k
A leading technology company in Seattle is seeking a Senior Software Engineer to work on Kubernetes clusters. You will partner with teams across the organization... ...has over 5 years of experience in distributed systems and is proficient in Golang. This role offers a...CloudSenior$148.7k - $199.4k
...Disney Company (Germany) GmbH in Seattle is seeking a backend engineer to enhance and maintain its Digital Advertising Platform. In this... ...role, you will collaborate with diverse teams to build large-scale ad serving components while leveraging advanced technologies like...CloudSenior$157k - $213.8k
...customers across all our products to build connectivity solutions to cloud resources and prevent data exfiltration. We are seeking experienced Senior Software Engineers with large-scale distributed system experience to join the Networking Infrastructure team. You will...CloudSeniorLocal areaWorldwide- ...weaving together advances in cloud infrastructure, automation and analytics, and software delivery, we help... ...and perspectives at AHEAD. Senior Client Solutions Engineer The Senior Client Solutions... ...Services, etc.) is a necessity to scale your effectiveness within accounts...CloudSeniorWork at office
- ...Washington. Our teams focus on building Cloud and Custom Software Development solutions for our clients... ...with ZooKeeper ~ Experience with Kubernetes (highly preferred) ~ NoSQL... ...integration (e.g. Jenkins, TeamCity)systems ~ Very comfortable with modern web...CloudSeniorFull timeRemote workWork from home
- ...Bellevue, WA is seeking a Staff Software Engineer and Tech Lead to design systems for their compute infrastructure.... ...develop compute abstractions and scale fleet management systems while leading... ...Familiarity with AWS, Azure, and Kubernetes is highly valued. #J-18808-...Cloud
$220k - $288.75k
...Senior Solution Engineer At Snowflake, we are powering the era of the agentic enterprise. To usher in this new era,... ...machine learning and generative AI solutions in the cloud. ~ Broad range of experience within large-scale Database and/or Data Warehouse technology, ETL,...CloudSeniorFlexible hours- ...Cloud R&D Engineer The client is looking for highly motivated... ...in distributed systems, cloud computing, containers... ...to actively create software/systems for solving... ...algorithms in large-scale cloud infrastructure.... ...platform. ~ Knowledge of Kubernetes is a plus (nice to...CloudSeniorWorldwide
$171.6k - $302.2k
...seeking an experienced Site Reliability Engineer to enhance compute infrastructure at scale. You will design and implement innovative solutions, manage cloud infrastructure, and focus on... ...and familiarity with OpenStack and Kubernetes. The position offers a competitive...CloudSenior$171.6k - $258.1k
...Washington, United States Software and Services The... ...delivering OS and system services on Apple... ...Description The Cloud OS System Software... ...software engineer to build and integrate... ...adapt, tailor, and scale software on a novel... ...(Docker, Kubernetes) and AI workload orchestration...CloudRelocation$140k - $150k
...Shipium's platform provides cloud infrastructure and leading AI... ...that optimize costs and scale automation. We’re building... ...connects previously fragmented systems and automates complex supply... ...the role We are seeking a Senior Solutions Engineer to serve as a strategic technical...CloudSeniorWork at officeLocal areaRemote workWork from home$167k - $209k
...DigitalOcean, LLC is seeking an engineer to join the Object Storage Team in Seattle. This hybrid... ...with peers to support distributed systems. Ideal candidates will have experience in programming, Linux systems, and cloud technologies. The salary range is between $...CloudSeniorRemote work- APPIT Software Solutions is hiring a Senior Site Reliability Engineer (SRE) in Seattle, USA . Lead site... ...engineering efforts for large-scale distributed systems, driving 99.99%... ...at scale Advanced Kubernetes operations... ...organizations Experience with cloud infrastructure (AWS...CloudSeniorFlexible hours
$150k - $180k
...leading technology firm in Seattle is seeking an experienced engineer to design and implement cloud infrastructure, improve operational efficiency, and... ...candidates will have over 8 years of experience in software and cloud engineering, with a focus on reliable and scalable...CloudSenior- ...Seattle is seeking a Site Reliability Engineer to support and scale cloud services for millions of users.... ...supporting critical infrastructural systems and frameworks, with a strong emphasis... ..., proficiency in tools like Kubernetes and OpenStack, and a background in...CloudSenior
- Ll Oefentherapie is seeking a Sr. Principal Technical Program Manager to lead cross-functional programs in Oracle Cloud Infrastructure (OCI) Operations. The successful candidate will ensure operational excellence and partner with various teams to deliver impactful projects...CloudSenior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Systems Software Engineer, Kubernetes Scale - DGX Cloud. Be the first to apply!
- systems software developer Seattle, WA
- IT system engineer Seattle, WA
- system programmer Seattle, WA
- senior cloud solutions architect Seattle, WA
- senior cloud security engineer Seattle, WA
- cloud network engineer Seattle, WA
- big data cloud engineer Seattle, WA
- cloud architect Seattle, WA
- cloud engineering manager Seattle, WA
- lead cloud architect Seattle, WA


