Senior DGX Cloud AI Infrastructure Software Engineer
$184k - $287.5kDormont Manufacturing Co
Joining NVIDIA’s DGX Cloud AI Efficiency Team means contributing to the infrastructure that powers our innovative AI research. This team focuses on developing tools for optimizing efficiency and resiliency of AI workloads – pre‑training, post‑training, inference. Our objective is to deliver a stable, scalable environment for AI researchers, providing them with the necessary resources and scale to foster innovation. We are seeking an AI infrastructure software engineer to join our team. You’ll be instrumental in designing, building, and maintaining AI infrastructure that enable large-scale AI training and inferencing. The responsibilities include implementing software and systems engineering practices to ensure high efficiency and availability of AI systems. As a senior DGX Cloud AI Infrastructure software engineer at NVIDIA, you will have the opportunity to work on innovative technologies that power the future of AI and data science and be part of a dynamic, diverse, and supportive team that values learning and growth. The role provides the autonomy to work on meaningful projects with the support and mentorship needed to succeed, and contributes to a culture of blameless postmortems, iterative improvement, and risk‑taking. If you are seeking an exciting and rewarding career that makes a difference, we invite you to apply now! What you’ll be doing: Develop infrastructure software and tools for large-scale pre‑training, post‑training, and inference. Develop and optimize tools and libraries to improve infrastructure efficiency and resiliency. Co‑design and implement APIs for integration with NVIDIA’s resiliency stacks. Enhance infrastructure and products underpinning NVIDIA’s AI platforms. Define meaningful and actionable reliability metrics to track and improve system and service reliability. Skilled in problem‑solving, root cause analysis, and optimization. Root cause and analyze and triage failures from the application level to the hardware level. What we need to see: Minimum of 8+ years of experience in developing software infrastructure for large scale AI systems. Bachelor’s degree or higher in Computer Science or a related technical field (or equivalent experience). Strong debugging skills and experience in analyzing and triaging AI applications from the application level to the hardware level. Experience with observability platforms for monitoring and logging (e.g., ELK, Prometheus, Loki). Proven track record in building and scaling large‑scale distributed systems. Experience with AI training and inferencing infrastructure services. Proficiency in programming languages such as Python, C/C++, script languages. Experience in quality software engineering practices, including test development, defensive programming, version control, and CI. Excellent communication and collaboration skills, and a culture of diversity, intellectual curiosity, problem solving, and openness are essential. Ways to stand out from the crowd: Background in working with the large scale clusters Experience in defining and building observability and telemetry software stack Experience with RDMA software stack (NCCL, IB verbs, ucx, libfabrics) Experience and root cause analysis of failures and datacenter scale Good understanding on DL frameworks internal PyTorch, TensorFlow, JAX, and Ray Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD – 287,500 USD for Level 4, and 224,000 USD – 356,500 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until February 16, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr Dormont Manufacturing Co
$184k - $287.5k
...advanced multi-rack, multi-tenant AI/ML datacenters with NVIDIA GB200,... ...upcoming GB300 GPUs. NVIDIA seeks a Senior Software Engineer for our CSP (Cloud Service Provider) Engagements team... ...(Prometheus, OpenTelemetry), and infrastructure-as-code. Excellent communication...CloudSenior$141.3k - $226k
Senior Software Engineer - Cloud Native Storage Design and implement scalable distributed storage control... ...delivery of product features. Utilize AI tools for specification-driven... ...implementing reconciliation loops. Infrastructure (Required): Experience with infrastructure...CloudSeniorLocal area- ...company for Bitcoin mining and AI cloud. Bitdeer is committed to... ...responsible for: Cloud Platform Software Development Design, develop,... ...code following software engineering best practices (CI/CD, code... ...management training. Cloud‑Native Infrastructure & DevOps Design and...CloudSeniorLocal area
- Dormont Manufacturing Co, located in California, is looking for a Senior Software Engineer to work within the CSP Engagements team. This role involves defining customer workflows, debugging Kubernetes issues, and collaborating with various teams to enhance datacenter products...CloudSenior
$224k - $356.5k
NVIDIA DGX Cloud is building the operating model for reliable, scalable GPU infrastructure across internal, partner, and on-prem... ...We are looking for an Engineering Manager to lead a team of software and production engineers... ...vacancy. NVIDIA uses AI tools in its...Cloud- TryApplyNow is seeking a Senior Software Developer to work on the Dayforce... ...platform, primarily focused on AI-native development using .NET... ..., and mentoring other engineers. Located in California, this... ...fundamentals and proficiency in cloud services. #J-18808-Ljbffr TryApplyNowCloudSeniorFull time
- Nova Intelligence is hiring a Senior Security Engineer to enhance the security of their AI platform for SAP. The role includes owning platform security architecture... ...have hands-on experience in application and cloud security, particularly with AWS, and the ability to...CloudSenior
- ServiceTitan, Inc. is seeking an experienced engineer to build and implement core services for the... ...should possess at least 5 years of software engineering experience, preferably with strong skills in Python and cloud technologies. The position offers flexibility...CloudSenior
- A healthcare technology startup is seeking a Senior DevOps Engineer to create and optimize infrastructure for their AI Care platform. The role involves design and maintenance... ...strong skills in Linux, DevOps practices, and cloud platforms. The position offers competitive...CloudSeniorRemote jobFlexible hours
- ClickUp is seeking a Senior AI Engineer to develop and manage the core AI platform and apply LLMs for... ...extensive experience in designing scalable AI infrastructures, a strong background in backend engineering, and expertise in cloud-native technologies, including Kubernetes...CloudSenior
$200k - $250k
A healthcare data collaboration platform is looking for a Senior Engineering Manager to lead its Clinical Data Platform organization. You will... ...and execution strategy for a multi-tenant, multi-cloud platform that manages sensitive healthcare data. The ideal candidate...CloudSeniorRemote work- A state-run investment organization is seeking a Senior Software Engineer in California to develop data analytics tools and systems. The role demands expert software and data engineering experience, focusing on languages like Python and SQL, with a strong background in...CloudSeniorRemote work
$122k - $240.5k
PowerToFly is seeking an Engineering Manager II to lead the delivery of cybersecurity and AI-enabled solutions. The role demands a blend of technical leadership and client engagement to deploy high-quality solutions aligned with client needs. With over 5 years in full stack...CloudSenior- ...technology company in California is seeking a Senior Software Engineer to lead the design and development of an innovative AI platform. Responsibilities include mentoring junior... ...-functional teams while developing scalable cloud services. Ideal candidates should have at...CloudSenior
$200k - $250k
Senior Engineering Manager, Platform Remote - United States | Remote Full‑time... ...leader who can operate across software engineering, data/ML engineering, and cloud infrastructure. You will define and drive the... ...analytics, product features and AI/ML use cases. Establish...CloudSeniorFull timeRemote work$110.7k - $218.3k
PowerToFly seeks a Senior Consultant in AI & Data to integrate ServiceNow with Google AI platforms, modernizing personnel records for soldiers and... ..., 5+ years in AI development, and a strong background in cloud services. This role offers a wage range of $110,700 to $218,...CloudSenior$208k - $333.5k
Systems Engineering is an engineering discipline focused on building, automating... ..., and velocity. It combines software and systems engineering practices across infrastructure automation, containerized... ...internal and external facing GPU cloud services are deployed reliably,...CloudSenior$175k - $190k
Senior Machine Learning Engineer Hybrid, New York • Hybrid, Boston Data... ...end-to-end logistics infrastructure designed entirely... ...data scientists and software engineers to create... ...infrastructure for our AI/ML capabilities Create... ...). Utilizing cloud-based (AWS Preferred...CloudSeniorFull timeTemporary work- ...are seeking a skilled and experienced Senior AI Engineer - AI Platform to join our ClickUp Engineering... ...ClickUp’s core platform. Develop infrastructure for model serving, monitoring, logging... ..., and cost‑efficiency, leveraging cloud‑native technologies and distributed systems...CloudSeniorVisa sponsorshipWork visa
$148.7k - $297.3k
...and scalability of cloud‑native, enterprise‑grade... .... We are seeking a Senior Manager, Platform Engineering. You are a platform... .... Experience with software delivery and release... ...evolution of Lingo’s cloud infrastructure on Azure, ensuring... ...tooling, including AI driven anomaly...CloudSeniorWorldwide- Note: By applying to the Senior / Lead / Principal Software Engineer - Foundations Team posting... ...at the frontier of AI and modern software development... ...building foundational infrastructure that enables great customer... ...operations — highly scalable cloud infrastructure, robust CI...CloudSeniorWork experience placement
$152k - $241.5k
NVIDIA data center systems, such as DGX and HGX, have become core to... ...rapidly growing enterprise and cloud provider businesses. These platforms... ..., and a fully optimized NVIDIA AI and HPC software stack. We are hiring Sr. Software Engineer who will help build simulators...CloudSeniorWork experience placement- Senior Data Platform Engineer page is loaded## Senior Data Platform Engineerremote type: Hybridlocations:... ...data sources into the Salesforce Data Cloud platform. In this role, the individual... ...timeliness, and reliability* Support AI-driven and advanced analytics capabilities...CloudSenior
$20k
...behind role‑specific AI experiences across Atlas... ...You’ll Build Core engineering primitives behind the... ...approval flows, evaluation infrastructure, and production... ...+ years of production software engineering experience... ...building services on public cloud infrastructure such as...CloudSeniorFor contractorsWork at officeLocal areaRemote workFlexible hours$66.52 - $88.14 per hour
Stanford Health Care seeks a Cloud Engineer in California to manage the Enterprise Information Management platform. The role requires expertise in Azure and Databricks, 4+ years experience, and a strong understanding of data operations. You will lead automation projects...CloudSeniorHourly pay- Akraya, Inc. is seeking a seasoned Data Engineer to enhance their cloud-based data ecosystem for the finance organization. The role involves integrating fragmented financial data into a unified platform, facilitating advanced analytics and automation initiatives. The ideal...CloudSeniorRemote jobContract work
$184k - $287.5k
Senior Storage Infrastructure Software Engineer Join to apply for the Senior Storage Infrastructure Software Engineer... ...infrastructure on large‑scale, distributed cloud computing systems with thousands of... ...ASIC workflow. Exposure to Cursor AI. Prior experience in...SeniorWorldwide- A leading live entertainment company is seeking a Cloud Engineer to manage and administer SQL Server databases and AWS environments. The... ...technology, particularly AWS, and experience in automating infrastructure deployment. This role offers competitive benefits including...CloudSenior
- Introduction At IBM Software, we transform client challenges into solutions. Building the world’s leading AI-powered, cloud-native products that shape the future of business and society... ...Management, and cross-functional engineering teams to align on priorities and technical...CloudSenior
- ...managed offerings. You will drive Engineering projects that deliver on... ...functionality. At IBM Software, we transform client challenges... ...Building the world’s leading AI-powered, cloud-native products that shape the... ...We’re looking for Senior Engineers with a deep backend...CloudSenior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior DGX Cloud AI Infrastructure Software Engineer. Be the first to apply!
- cloud developer California, MO
- senior principal cloud computing engineer California, MO
- aws cloud infrastructure engineer California, MO
- principal cloud computing engineer California, MO
- informatica cloud developer California, MO
- software engineer - cloud services California, MO
- cloud security engineer California, MO
- cloud architect California, MO
- big data cloud engineer California, MO
- aws cloud architect California, MO
