Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior AI Infrastructure Software Engineer - DGX Cloud

$184k - $287.5k

NVIDIA

Joining NVIDIA's DGX Cloud Lepton Team means contributing to the leading cloud product that powers innovative AI research and developers. We focus on building the AI/ML platform for improving productivity, optimizing efficiency and resiliency of AI workloads, as well as developing scalable AI infrastructure services globally. We are seeking an AI infrastructure software engineer to join our team. You'll be instrumental in designing, building, and maintaining AI platforms that enable large-scale AI training, inferencing, fine-tuning, and Agentic AI in production.

As a senior DGX Cloud AI Infrastructure software engineer at NVIDIA, you will have the opportunity to work on innovative technologies that power the future of AI and be part of a dynamic and supportive team that values learning and growth. The role provides the autonomy to work on meaningful projects with the support and mentorship needed to succeed, and contributes to a culture of blameless postmortems, iterative improvement, and risk-taking. If you are seeking an exciting and rewarding career that makes a difference, we invite you to apply now!

What you’ll be doing:

  • Develop platform and tools for large-scale AI, LLM, and GenAI infrastructure.

  • Develop and optimize tools to improve AI/ML workload efficiency and resiliency.

  • Root cause and analyze and triage failures from the application level to the hardware level

  • Enhance infrastructure and products underpinning NVIDIA's AI platforms.

  • Co-design and implement APIs for integration with NVIDIA's resiliency stacks on the platform.

  • Define meaningful and actionable reliability metrics to track and improve system and service reliability.

  • Skilled in problem-solving, root cause analysis, and optimization.

What we need to see:

  • Minimum of 8+ years of experience in developing software infrastructure for large scale AI systems.

  • Bachelor's degree or higher in Computer Science or a related technical field (or equivalent experience).

  • Strong debugging skills and experience in analyzing and triaging AI applications from the application level to the hardware level.

  • Proven track record in building and scaling large-scale distributed systems.

  • Experience with AI training and inferencing and data infrastructure services.

  • Familiar in Kubernetes and operating large-scale observability platforms for monitoring and logging (e.g., ELK, Prometheus, Loki).

  • Proficiency in programming languages such as Python, C/C++, script languages

  • Excellent communication and collaboration skills, and a culture of diversity, intellectual curiosity, problem solving, and openness are essential.

Ways to stand out from the crowd:

  • Experience in working with the large scale AI cluster and cloud-native infrastructure

  • Strong understanding of NVIDIA GPUs, network technologies (RDMA, IB, NCCL)

  • Good understanding on DL frameworks internal PyTorch, TensorFlow, JAX, Dynamo, and Ray

  • Experience and root cause analysis of failures and datacenter scale

  • Strong background in software design and development.

NVIDIA leads the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions, from artificial intelligence to autonomous cars. NVIDIA is looking for exceptional people like you to help us accelerate the next wave of artificial intelligence.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits ( .

Applications for this job will be accepted at least until May 16, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Senior AI Infrastructure Software Engineer - DGX Cloud in Santa Clara, CA vacancy
  • $184k - $287.5k

     ...Joining NVIDIA's DGX Cloud AI Efficiency Team means contributing to the infrastructure that powers our innovative AI research....  ...seeking an AI infrastructure software engineer to join our team. You'll be instrumental...  ...of AI systems. As a senior DGX Cloud AI Infrastructure... 
    Senior
    Software

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $184k - $287.5k

     ...NVIDIA DGX Cloud is building and operating large-scale GPU infrastructure for AI research and production workloads. We are looking for Senior Software Engineers to help build the automation, tooling, and operational systems that make GPU clusters reliable, scalable, and... 
    Senior
    Software
    Remote work

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $384k

     ...NVIDIA is seeking a Senior Director, System Software Engineering, to lead strategy and execution...  ...capacity management in DGX Cloud, building the capacity...  ...foundation for NVIDIA's internal AI research clusters. This...  ...architecture, cloud infrastructure, or large-scale systems... 
    Senior
    Software

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $168k - $264.5k

     ...NVIDIA is looking for a Senior Network Engineer to develop a cloud network infrastructure. The goal is to craft a reliable, scalable...  ...network to support NVIDIA software development workflows and tools,...  ...existing vacancy. NVIDIA uses AI tools in its recruiting... 
    Senior
    Software
    Remote work

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $136k - $224.25k

     ...NVIDIA is looking for a Senior Network Reliability Engineer to support and maintain our cloud and datacenter network infrastructures. This network serves the needs across the whole software stack for NVIDIA, from Graphics...  ...vacancy. NVIDIA uses AI tools in its recruiting... 
    Senior
    Software
    Remote work
    Shift work

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $224k - $356.5k

     ...into the unlimited potential of AI to define the next era of...  ...the world. As part of the DGX Cloud organization, the Attestation...  ...security, silicon, and cloud engineering teams to turn embedded hardware...  ..., silicon, platform, and software teams to deliver end-to-end trust... 
    Senior
    Software
    Remote work

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $272k - $431.25k

     ...looking for a Principal Software Engineer to join our DGX Cloud team and build the foundational...  ...’s high-performance GPU infrastructure. You will play a...  ...that fuels the future of AI and cloud computing. What...  ...mentoring, and encouraging senior engineers, elevating the... 
    Software

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $200k - $322k

     ...As a Senior Technical Program Manager passionate about Cloud Security, you will drive the DGX Cloud infrastructure security program that improves...  ...roadmaps and the software development...  ...Compliance, SRE, and Engineering to continually advance...  ...the future of AI infrastructure... 
    Senior
    Software

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $200k - $322k

     ...TPM) to join our NVIDIA DGX Cloud team. This is a...  ...extensive experience in cloud infrastructure bring-up and relationship...  ...with companies and engineering teams internally to help build AI capacity and infrastructure...  ..., Infrastructure, Software teams and their leadership... 
    Senior
    Software

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $200k - $322k

     ...NVIDIA’s DGX Cloud is redefining how organizations deploy and scale AI infrastructure. We’re looking for a Senior Technical Program Manager to drive storage...  ...role interfacing with engineering, product, operations,...  ...management of large-scale software or infrastructure projects... 
    Senior
    Software

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $200k - $322k

     ...NVIDIA is seeking a Senior Technical Program Manager...  ...Services programs for DGX Cloud. DGX Cloud powers large-scale AI infrastructure across NVIDIA, cloud service...  ...security, compliance, engineering execution, and partner...  ..., platform, and software teams. Establish program... 
    Senior
    Software

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $168k - $258.75k

     ...NVIDIA's DGX Cloud (DGXC) powers AI for strategic research and product workloads...  ...programs spanning DGXC infrastructure, Resilience Tools, and...  ...with cloud infrastructure, software, operations, and environments...  ...will work closely with engineering, SRE, operations, and... 
    Senior
    Software

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $200k - $322k

     ...NVIDIA's DGX Cloud (DGXC) powers AI for strategic research and product...  ...The company seeks a Senior Technical Program...  ...next-generation AI software platforms. In this role...  ...services, cloud infrastructure, and system integration...  ...high-impact engineering programs within a dynamic... 
    Senior
    Software

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $200k - $322k

     ...DGX Cloud Team is looking for a Senior Technical Program Manager (TPM) to guide complex, cross...  ...NVIDIA’s next-generation AI infrastructure. This position involves leading software-related initiatives across...  ...for managing high-impact engineering programs within a dynamic,... 
    Senior
    Software
    Shift work

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $184k - $287.5k

     ...We are looking for a Senior Software Engineer to become part of our storage management plane team....  ...and supervise our distributed storage infrastructure. Our team is continually dedicated to...  ...recently, GPU deep learning ignited modern AI - the next era of computing - with... 
    Senior
    Software
    Remote work

    NVIDIA

    Santa Clara, CA
    23 hours ago
  • $272k - $431.25k

     ...NVIDIA DGX Cloud is scaling GPU infrastructure across internal, partner, and cloud environments...  ...looking for Principal Software Engineers to help shape the...  ...clusters. This role is for senior technical leaders who can...  ...Experience with GPU clusters, AI/ML infrastructure,... 
    Software

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $152k - $241.5k

     ...We are now looking for a Senior Infrastructure Software Engineer for Deep Learning Libraries!...  ...for autonomous vehicles to DGX servers for datacenters and...  ...distributed systems and cluster/cloud computing, especially with...  ...velocity across our many AI/DL/Compute Software... 
    Senior
    Software

    NVIDIA

    Santa Clara, CA
    1 day ago
  •  ...About the Team: The AI Validation Platform team owns the cloud-agnostic, reliable,...  ...to serve as the infrastructure platform for teams developing...  ...We are seeking a Senior ML Infrastructure engineer to help build and...  ...platform backend software components. Collaborate... 
    Senior
    Software
    Local area
    Work from home

    General Motors

    Sunnyvale, CA
    3 days ago
  • $155k - $230k

     ...where data spreads across various clouds and devices, traditional security measures...  ..., encryption, and confidential AI solutions. As data breaches...  ...safer digital future.  As the Senior/Staff Software Engineer - Infrastructure and Devops, you will be a part of... 
    Senior
    Software
    Temporary work
    H1b
    Worldwide
    Shift work

    Fortanix

    Santa Clara, CA
    3 days ago
  • $168k - $258.75k

     ...world-class technology. The DGX Cloud organization plays a...  ...this mission, crafting the software operating layer for NVIDIA's AI factory. This...  ...researchers, developers, and AI infrastructure providers worldwide. Our...  ...findings into product and engineering plans. Own lab... 
    Senior
    Software
    Full time
    Worldwide

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $224k - $356.5k

     ...unlimited potential of AI to define the next era...  ...passionate member to join our Engineering Team in GeForce NOW as a Senior Systems Software Engineer. In this role,...  ...guiding the future of Cloud Gaming. GeForce NOW is...  ...and global infrastructure, distributed systems, load... 
    Senior
    Software
    Remote work

    NVIDIA

    Santa Clara, CA
    23 hours ago
  • $224k - $356.5k

     ...GeForce NOW is NVIDIA's Cloud Gaming service,...  ...NVIDIA proprietary software, GeForce NOW transforms...  ...are looking for a Senior System Software Engineer for Cloud who sees the...  ...observability, and infrastructure automation. What...  ...vacancy. NVIDIA uses AI tools in its... 
    Senior
    Software
    Local area

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $160k - $210k

     ...Taara Senior Backend Software Engineer, Cloud & Infrastructure Born at X, Google's Moonshot Factory, Taara is on a mission to connect billions of people lacking abundant and affordable internet today by pioneering the way we use light to deliver faster, cheaper, more... 
    Senior
    Software
    Full time
    Work at office
    Night shift
    3 days per week

    Taara Connect, Inc

    Sunnyvale, CA
    1 day ago
  • $184k - $287.5k

     ...advanced multi-rack, multi-tenant AI/ML datacenters with NVIDIA GB200,...  ...upcoming GB300 GPUs. NVIDIA seeks a Senior Software Engineer for our CSP (Cloud Service Provider) Engagements team...  ...(Prometheus, OpenTelemetry), and infrastructure-as-code. ~ Excellent communication... 
    Senior
    Software

    NVIDIA

    Santa Clara, CA
    23 hours ago
  •  ...Powered by the Illumio AI Security Graph, our...  ...threats across hybrid multi-cloud environments -...  ...cyber resilience for the infrastructure, systems, and organizations...  ...'s Vision: Our Engineering team is driven by a...  ...subsystems and own the entire software development lifecycle,... 
    Senior
    Software
    Immediate start

    Illumio

    Sunnyvale, CA
    2 days ago
  • $185k - $195k

     ...Cloud Infrastructure Engineer Opportunity At Skylo The world still has coverage blind spots. You could...  ...interpersonal skills Every role at Skylo uses AI tools daily to amplify output. We...  ...team and talent across tech domains: software, hardware, chipsets, telecom,... 
    Senior
    Software
    Work at office
    Local area
    Remote work
    Worldwide
    3 days per week

    Skylo LLC

    Mountain View, CA
    2 days ago
  • $184k - $287.5k

     ...Become a Senior System Software Engineer on NVIDIA's AI Inference Operations Team, focusing on DevOps and Infrastructure Automation. Join a company revolutionizing computer graphics, PC gaming...  ...intersection of systems programming, cloud-native infrastructure, and developer... 
    Senior
    Software

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

     ...computer workloads for Physical AI. These include robotics...  ...deep expertise in backend infrastructure, inference and cloud-native applications to...  ...with our product management, engineering, and business teams to...  ...technology, workflow orchestration softwares (Airflow, Argo, etc),... 
    Senior
    Software

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $180k - $240k

     ...encompassing solution that integrates advanced software and hardware powering the fleet,...  ...the role We are seeking a Senior Cloud Infrastructure Engineer to architect and manage the large-...  ...possible. You will be the backbone of our AI platform, ensuring that multi-GPU... 
    Senior
    Software
    Odd job
    Work at office

    Gatik AI

    Mountain View, CA
    3 days ago
  •  ...Inclusion. We weave AI into the fabric of everything...  ...Networks, Secure Cloud and AI infrastructure is the foundation of...  ...-class Principal Engineer (Sr Manager-equivalent...  ...elevate our standards for software quality, and unlock...  ...platforms, mentoring senior engineers and... 
    Software
    Full time
    Work at office
    3 days per week

    Palo Alto Networks

    Santa Clara, CA
    23 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior AI Infrastructure Software Engineer - DGX Cloud. Be the first to apply!