Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Systems Software Engineer, Kubernetes Scale - DGX Cloud

$184k - $287.5k
Full-time

NVIDIA

The DGX Cloud organization at NVIDIA brings together cutting-edge hardware and software innovation to deliver industry-leading accelerated computing for the world's most adventurous AI workloads. We're a team of innovative engineers dedicated to solving some of the world's biggest challenges, constantly driving advancements, and impacting millions of lives worldwide! We are looking for an outstanding Senior Systems Software Engineer with deep experience in distributed systems, open-source technologies such as Kubernetes and containers, and a strong background in systems performance and scalability. The ideal candidate brings broad, end-to-end experience across the stack - from GPU operator and device plugins to distributed inference serving and cloud platforms - along with the technical depth to investigate and address exciting, real-world problems at scale. In this pivotal role, you will take on the challenge of scaling AI infrastructure while optimizing total cost of ownership, driving down cost per token to unlock the next generation of AI innovation and AI factories! What you'll be doing: Drive end-to-end performance and scale characterization for the NVIDIA DGX Cloud software stack, from Kubernetes control and data planes through NVIDIA components such as GPU Operator, Network Operator, DCGM, NIM, and distributed inference serving, following issues from orchestration down to the metal. Collaborate with AI researchers, developers and customers to develop innovative, automated tests that simulate real user workloads using custom-built and leading open-source tools and frameworks. Deep dive into performance and scale issues in complex distributed systems, including interactions between Kubernetes and the NVIDIA software stack, to identify and resolve root causes. Design and develop monitoring, reporting and analysis tools for performance and scale testing across software, GPU and CPU resources. Triage, debug and root cause issues related to operating Kubernetes clusters at ultra-large scale, ensuring reliability and efficiency. Build and maintain a high-velocity framework that enables continuous, always-on performance and scale testing via a modern CI/CD pipeline. Document research, methodologies and results clearly and concisely, and present findings at internal and external venues, including community conferences such as KubeCon and GTC. Engage efficiently with upstream communities — including Kubernetes, CNCF and NVIDIA open-source projects — to validate performance and scalability of AI workloads early and help shape design and development decisions. What we need to see: 8+ years of experience Computer Architecture, Networking, Storage systems, Accelerators and Bachelors/Masters in Engineering (preferably, Electrical Engineering, Computer Engineering, or Computer Science) or equivalent experience Expertise in Kubernetes and familiarity with related CNCF projects Background in working with large scale parallel and distributed accelerator-based systems Expertise optimizing performance and AI workloads on large scale systems Experience with performance modeling and benchmarking at scale Proficiency in Golang/Python Background with the NVIDIA software ecosystem in both training and inference domains Expertise with at least one of public CSP infrastructure (GCP, AWS, Azure, OCI for example) Ways to stand out from the crowd: Strong operational experience with any one of the Kubernetes distributions Prior experience scaling Kubernetes clusters to ultra-large node and object counts Demonstrated history of working in the open-source community Excellent communication and interpersonal abilities PhD in relevant areas #LI-Hybrid Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until June 14, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. NVIDIA pioneered accelerated computing. Today, our AI infrastructure powers global intelligence, transforming every industry. Learn more about NVIDIA.

Vacancy posted 8 hours ago
Similar jobs that could be interesting for youBased on the Senior Systems Software Engineer, Kubernetes Scale - DGX Cloud in Seattle, WA vacancy
  • $184k - $287.5k

    At NVIDIA, the DGX Cloud division merges fresh hardware and software innovations to offer...  ...team of skilled engineers is committed to addressing...  ...looking for a Senior Systems Software Engineer...  ...experience in Kubernetes node engineering,...  ...at frontier AI scale. In this vital... 
    Cloud
    Senior
    Full time
    Worldwide

    NVIDIA

    Seattle, WA
    8 hours ago
  •  ...Technologies is seeking a Senior Software Engineer to join the Substrate...  ...will design and build Kubernetes product offerings, ensuring scale, stability, and security...  ...have expertise in systems programming and infrastructure...  ...from on-prem to cloud. A range of benefits and... 
    Cloud
    Senior

    Palantir Technologies

    Seattle, WA
    3 days ago
  •  ...industry player is seeking a skilled support engineer to enhance system stability and performance. In this...  ...services, troubleshoot large-scale distributed systems, and support applications running on Kubernetes and cloud platforms. Your expertise will be vital... 
    Cloud
    Senior

    TechDigital Group

    Bellevue, WA
    16 hours ago
  • $140k - $170k

    A digital transformation firm is seeking a skilled DevOps Engineer in Seattle to design, automate, and scale cloud-native infrastructure. This role involves managing Kubernetes environments, building CI/CD pipelines, and maintaining operations on Google Cloud Platform... 
    Cloud
    Senior

    Medium

    Seattle, WA
    4 days ago
  • $200k - $322k

    Senior Technical Program Manager, DGX Cloud - Trust Services page is loaded## Senior...  ...Cloud powers large-scale AI infrastructure...  ...security, compliance, engineering execution, and...  ...firmware, platform, and software teams.* Establish...  ...SaaS, distributed systems, or infrastructure... 
    Cloud
    Senior

    NVIDIA Corporation

    Seattle, WA
    2 days ago
  •  ...data and AI company is seeking a Senior Software Engineer to join their Networking...  ...networking solutions for large-scale compute clusters across all major cloud providers. The ideal candidate...  ...background in large-scale distributed systems and network connectivity. This... 
    Cloud
    Senior

    Databricks Inc.

    Bellevue, WA
    4 days ago
  • $200k - $322k

    NVIDIA Corporation in Seattle is seeking a Senior Technical Program Manager for DGX Cloud to lead Trust Services programs. This role entails collaborating across engineering and security teams to implement and maintain trust service standards in complex infrastructures... 
    Cloud
    Senior

    NVIDIA Corporation

    Seattle, WA
    2 days ago
  •  ...Senior Platform/DevOps Engineer (Kubernetes-Linux) Bellevue Office, Sunset Corporate...  ...with speed, scale and sovereignty. Named...  ...centers and Atlas cloud integration. This is...  ..., and alerting systems (e.g., Prometheus,...  ...collaboration with software engineering, DevOps... 
    Cloud
    Senior
    Work at office
    Local area
    Flexible hours

    Armada

    Bellevue, WA
    1 day ago
  • $106.61k - $284.28k

     ...Hispanic Alliance for Career Enhancement is seeking a seasoned Cloud Platform Engineer to lead the maintenance and evolution of our high-...  ...availability cloud infrastructure. This role requires expertise in Kubernetes and Azure, along with experience in Terraform and Ansible... 
    Cloud
    Senior

    Hispanic Alliance for Career Enhancement

    Seattle, WA
    2 days ago
  •  ...financial institution is seeking a Site Reliability Engineer III in Seattle, WA. This role involves designing and managing cloud infrastructure, deploying containerized...  ...Engineering or DevOps, with strong AWS and Kubernetes expertise. The position offers competitive compensation... 
    Cloud
    Senior

    JPMorgan Chase & Co.

    Seattle, WA
    2 days ago
  •  ...of their business systems through natural language...  ...on the Forbes Cloud 100 and AI 50...  ...Moveworks’ Reasoning Engine and natural language...  ...by the global scale of ServiceNow and...  ...not a blocker. As a Senior Identity & Access...  ...across AWS, Azure, Kubernetes, and beyond; reduce... 
    Cloud
    Senior
    Contract work
    Work at office
    Remote work
    Flexible hours

    Centaur Labs

    Kirkland, WA
    2 days ago
  • Medium is seeking a Senior Site Reliability Engineer to lead the evolution of its infrastructure. The...  ...involves designing resilient systems, automating deployment processes,...  ...relevant experience, particularly in scaling systems and cloud infrastructure. Benefits include... 
    Cloud
    Senior

    Medium

    Seattle, WA
    3 days ago
  •  ...V. in Seattle, United States, is seeking a proficient Software Engineer to enhance their cloud network solutions. This role involves leading initiatives...  ...experience and expertise in cloud technologies, Kubernetes, and Go. Join an inclusive team dedicated to operational... 
    Cloud
    Senior

    Elasticsearch B.V.

    Seattle, WA
    1 day ago
  • $320k

    NVIDIA Corporation is seeking a Site Reliability Engineer to design and maintain extensive production systems, focusing on Kubernetes and performance optimization. Ideal...  ...will possess over 16 years of experience in cloud systems, strong automation skills, and a BS... 
    Cloud
    Senior

    NVIDIA Corporation

    Seattle, WA
    1 day ago
  • $139.5k - $258.1k

    A leading technology company in Seattle is seeking a Senior Software Engineer to work on Kubernetes clusters. You will partner with teams across the organization...  ...has over 5 years of experience in distributed systems and is proficient in Golang. This role offers a... 
    Cloud
    Senior

    Apple Inc.

    Seattle, WA
    3 days ago
  • $148.7k - $199.4k

     ...Disney Company (Germany) GmbH in Seattle is seeking a backend engineer to enhance and maintain its Digital Advertising Platform. In this...  ...role, you will collaborate with diverse teams to build large-scale ad serving components while leveraging advanced technologies like... 
    Cloud
    Senior

    The Walt Disney Company (Germany) GmbH

    Seattle, WA
    1 day ago
  • $157k - $213.8k

     ...customers across all our products to build connectivity solutions to cloud resources and prevent data exfiltration. We are seeking experienced Senior Software Engineers with large-scale distributed system experience to join the Networking Infrastructure team. You will... 
    Cloud
    Senior
    Local area
    Worldwide

    Menlo Ventures

    Bellevue, WA
    2 days ago
  •  ...weaving together advances in cloud infrastructure, automation and analytics, and software delivery, we help...  ...and perspectives at AHEAD.  Senior Client Solutions Engineer The Senior Client Solutions...  ...Services, etc.) is a necessity to scale your effectiveness within accounts... 
    Cloud
    Senior
    Work at office

    AHEAD

    Seattle, WA
    7 days ago
  •  ...Washington. Our teams focus on building Cloud and Custom Software Development solutions for our clients...  ...with ZooKeeper ~ Experience with Kubernetes (highly preferred) ~ NoSQL...  ...integration (e.g. Jenkins, TeamCity)systems ~ Very comfortable with modern web... 
    Cloud
    Senior
    Full time
    Remote work
    Work from home

    Dev9

    Seattle, WA
    3 days ago
  •  ...Bellevue, WA is seeking a Staff Software Engineer and Tech Lead to design systems for their compute infrastructure....  ...develop compute abstractions and scale fleet management systems while leading...  ...Familiarity with AWS, Azure, and Kubernetes is highly valued. #J-18808-... 
    Cloud

    Databricks Inc.

    Bellevue, WA
    1 day ago
  • $220k - $288.75k

     ...Senior Solution Engineer At Snowflake, we are powering the era of the agentic enterprise. To usher in this new era,...  ...machine learning and generative AI solutions in the cloud. ~ Broad range of experience within large-scale Database and/or Data Warehouse technology, ETL,... 
    Cloud
    Senior
    Flexible hours

    Streamlit

    Bellevue, WA
    3 days ago
  •  ...Cloud R&D Engineer The client is looking for highly motivated...  ...in distributed systems, cloud computing, containers...  ...to actively create software/systems for solving...  ...algorithms in large-scale cloud infrastructure....  ...platform. ~ Knowledge of Kubernetes is a plus (nice to... 
    Cloud
    Senior
    Worldwide

    Netpace

    Bellevue, WA
    6 days ago
  • $171.6k - $302.2k

     ...seeking an experienced Site Reliability Engineer to enhance compute infrastructure at scale. You will design and implement innovative solutions, manage cloud infrastructure, and focus on...  ...and familiarity with OpenStack and Kubernetes. The position offers a competitive... 
    Cloud
    Senior

    Apple

    Seattle, WA
    3 days ago
  • $171.6k - $258.1k

     ...Washington, United States Software and Services The...  ...delivering OS and system services on Apple...  ...Description The Cloud OS System Software...  ...software engineer to build and integrate...  ...adapt, tailor, and scale software on a novel...  ...(Docker, Kubernetes) and AI workload orchestration... 
    Cloud
    Relocation

    Apple

    Seattle, WA
    16 hours ago
  • $140k - $150k

     ...Shipium's platform provides cloud infrastructure and leading AI...  ...that optimize costs and scale automation. We’re building...  ...connects previously fragmented systems and automates complex supply...  ...the role We are seeking a Senior Solutions Engineer to serve as a strategic technical... 
    Cloud
    Senior
    Work at office
    Local area
    Remote work
    Work from home

    Shipium Corp

    Seattle, WA
    2 days ago
  • $167k - $209k

     ...DigitalOcean, LLC is seeking an engineer to join the Object Storage Team in Seattle. This hybrid...  ...with peers to support distributed systems. Ideal candidates will have experience in programming, Linux systems, and cloud technologies. The salary range is between $... 
    Cloud
    Senior
    Remote work

    DigitalOcean

    Seattle, WA
    3 days ago
  • APPIT Software Solutions is hiring a Senior Site Reliability Engineer (SRE) in Seattle, USA . Lead site...  ...engineering efforts for large-scale distributed systems, driving 99.99%...  ...at scale Advanced Kubernetes operations...  ...organizations Experience with cloud infrastructure (AWS... 
    Cloud
    Senior
    Flexible hours

    Appit LLC

    Seattle, WA
    2 days ago
  • $150k - $180k

     ...leading technology firm in Seattle is seeking an experienced engineer to design and implement cloud infrastructure, improve operational efficiency, and...  ...candidates will have over 8 years of experience in software and cloud engineering, with a focus on reliable and scalable... 
    Cloud
    Senior

    Axon Enterprise

    Seattle, WA
    1 day ago
  •  ...Seattle is seeking a Site Reliability Engineer to support and scale cloud services for millions of users....  ...supporting critical infrastructural systems and frameworks, with a strong emphasis...  ..., proficiency in tools like Kubernetes and OpenStack, and a background in... 
    Cloud
    Senior

    Apple Inc.

    Seattle, WA
    16 hours ago
  • Ll Oefentherapie is seeking a Sr. Principal Technical Program Manager to lead cross-functional programs in Oracle Cloud Infrastructure (OCI) Operations. The successful candidate will ensure operational excellence and partner with various teams to deliver impactful projects... 
    Cloud
    Senior

    Ll Oefentherapie

    Seattle, WA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Systems Software Engineer, Kubernetes Scale - DGX Cloud. Be the first to apply!