Principal Software Engineer, Rack-Scale System Software — CSP Engagements
$272k - $431.25kNVIDIA
We're looking for a Principal Software Engineer to join our CSP Engagements team as the technical focal point for rack-scale system SW/FW, working with CSP engineering teams to ensure they can deploy, monitor, and operate these systems reliably at fleet scale. In this role, you will collaborate with NVIDIA's cross-functional rack-scale system SW/FW engineering teams with dedicated CSP-facing technical leadership. Your focus is on the system-level software that manages, monitors, and recovers the rack as a whole — fabric management, GPU/NVSwitch error handling and recovery, health telemetry APIs, firmware update orchestration, and SW-driven serviceability. You will drive work streams with CSP engineering teams to build shared understanding of the architecture, incorporate their operational feedback, and ensure integration readiness. What you'll be doing: Drive rack-scale SW/FW architecture alignment across CSP engagements — including fabric management software, link health monitoring, GPU/NVSwitch error handling, SW/FW serviceability features (e.g., hot-plug support, component isolation, firmware-driven recovery), and multi-component firmware orchestration Drive technical work streams with CSP engineering teams on rack-scale system software — ensuring they deeply understand fabric management, NVSwitch behavior, error handling and recovery policies, health telemetry APIs, and SW/FW-controlled recovery operation Capture and synthesize CSP engineering feedback on rack-scale system software — health monitoring APIs, SW-driven serviceability workflows, firmware update orchestration, and error recovery behavior — champion that feedback into NVIDIA's architecture decisions Collaborate with multi-functional teams to ensure customer operational requirements are reflected in system software and firmware development Identify cross-CSP patterns in rack-scale SW/FW issues, error handling behavior, and system configuration practices — drive documentation, tooling, and test strategy improvements as a result Collaborate with execution teams on left-shift strategy — ensuring customer-side SW/FW integration work is identified early and completed ahead of hardware availability Make critical technical decisions on rack-scale system SW/FW tradeoffs and mitigate execution risks through early engagement with CSP engineering teams What we need to see: 15+ years of experience in system software, platform firmware, or large-scale distributed systems engineering. BS or MS in Computer Science, Electrical Engineering, or related field (or equivalent experience) Deep understanding of rack-scale system software challenges: multi-component coordination, error propagation, health monitoring, and serviceability / reliability Experience with fabric management software, cluster management, or system-level orchestration frameworks. Familiarity with firmware architectures and update lifecycle management (multi-component update sequencing, rollback, recovery) Understanding of error handling and recovery design patterns in distributed systems — fault isolation, retry policies, graceful degradation Experience with health monitoring and telemetry systems: health scoring, event correlation, API design for fleet-level observability Understanding of GPU or accelerator system software (drivers, device management, power management) is a strong plus Customer obsession — genuine passion for understanding how CSPs operate sophisticated systems at fleet scale and simplifying their experience Proven success providing technical leadership across organizational boundaries and influencing system software design without direct authority. Strong communication — ability to translate complex system software architecture into actionable mentorship for customer engineering teams Ways to stand out from the crowd: Experience with NVIDIA NVSwitch, NVOS, or GPU fabric management software Background in system software for large-scale clusters at a hyperscaler (cluster management, fleet orchestration, health platforms) Experience crafting error handling and recovery frameworks for multi-component systems (hundreds or thousands of coordinating devices) Familiarity with GPU or accelerator fleet operations — driver lifecycle, firmware rollout strategies, health-based scheduling Understanding of how system software decisions impact serviceability, availability, and operational cost at fleet scale NVIDIA’s invention of the GPU in 1999 fueled the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern deep learning — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We're looking to grow our company and establish teams with the most thoughtful people in the world. Are you ready to change the next generation of computing? Join us at the forefront of technological advancement. NVIDIA data center systems, such as DGX and HGX, have become core to NVIDIA's rapidly growing enterprise and cloud provider businesses. These platforms bring together the full power of NVIDIA GPUs, NVIDIA NVLink, NVIDIA InfiniBand networking, NVIDIA Grace CPUs, and a fully optimized NVIDIA AI and HPC software stack. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 431,250 USD. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until June 30, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. NVIDIA pioneered accelerated computing. Today, our AI infrastructure powers global intelligence, transforming every industry. Learn more about NVIDIA.
$272k - $431.25k
We're looking for a Principal Software Engineer to join our CSP Engagements team as the technical focal... ...for GPU firmware and GPU system software, working directly... ...GPU firmware at fleet scale. You will drive work streams... ...hundreds of GPUs per rack Serve as the technical...SuggestedFull time$272k - $431.25k
We're looking for a Principal Engineer to join our CSP Engagements team as the technical focal... ...identify patterns and drive systemic improvements in... ...validated for the latest NVIDIA rack-scale systems, GPU... ...identify configuration, software, or workload differences...SuggestedFull time$184k - $287.5k
Senior Software Engineer, NCCL and CUDA - CSP Engagements page is loaded## Senior Software Engineer, NCCL and CUDA - CSP... ...libraries layer for deployment at scale. The role combines deep technical... ...and CUDA libraries, frameworks, and system software interaction to solve the...SuggestedRemote work$249.9k - $374.9k
...generation AI inference rack‑scale solutions anchored by... ...chassis and power systems to high‑speed interconnects... ..., and ODM engagement. This leader will define... ...power, and mechanical engineers across multiple sites.... ...Partner with ASIC, software, product management, and...SuggestedWork experience placementWork from home$130k - $160k
Director of Business Systems & Automation Join to apply for the Director... ...Systems That Help Businesses Scale? Are you a hands‑on systems... ...implement new technologies and software solutions to support evolving... ...Oracle, Microsoft) and selectively engage technology consulting partners...SuggestedFull timeWork at officeRemote workFlexible hours- ..., gaming and embedded systems. Grounded in a culture... ...advance your career. Rack Scale Serviceability & Telemetry... ...firmware, management software, security, validation,... ...ROLE AMD is seeking a Principal Member of Technical... ...manufacturing, and customer engineering to translate...Remote work
- Aurigo Software Technologies Inc. is seeking a Senior AI Developer specializing in Generative... ...and build production-grade GenAI systems. This role focuses on engineering robust platforms for natural-language interactions in large-scale enterprise document ecosystems. The...
- Software Engineer, IS&T Enterprise Systems Austin Metro Area, Texas, United States Software and Services Imagine... ...customer experience. Each customer engagement is an opportunity to delight, engage... ...enterprise-level technology at a global scale. 2 yrs experience in securing...Work experience placement
$218.8k - $335.3k
...automotive company in Austin is seeking an experienced Staff AI/ML Engineer to join the AV ML Infra team. The role involves designing... ...and requires over 8 years of experience in large-scale distributed systems. Candidates should be proficient in cloud technologies such...- ...bringing Apollo to market at scale, tackling the complex challenges... ...seeking an experienced IT Systems Administrator to own the day‑... ...and compute environments that engineering teams rely on to design,... ...lifecycle management end‑to‑end: racking, provisioning, imaging, patching...
$170k - $210k
...design implications Lead large-scale, complex technical... ...Search, SDLC, DevOps/Platform Engineering, Performance & Scalability/Reliability... ...and/or Linux Operating Systems 5+ years’ experience leading... ...our 1,800+ employees, pioneer engagement and growth solutions that fuel...Work experience placement- ...Insight Global is looking for a System Administrator to work on site... ...hardware, including initial rack/stack, cabling coordination,... ...Implement and manage software installations and upgrades across... ...environments Collaborate with security, engineering, and operations teams to...Flexible hours
$170k - $200k
...to apply for the .NET System Architect role at InCommodities... ...benefiting from the scale and stability of our... ...evolution of our software ecosystem, aligning with... ...alongside IT and cloud engineers to ensure a seamless, scalable... ...fundamental to how we engage with each other....Full timeTemporary workWork at officeRemote workRelocationFlexible hours$35 per hour
**Job Posting Title:**Temporary Systems Maintenance Specialist - Center for Community College Student Engagement - (UTEMPS)**----****Hiring Department:**UTemp Pool**----****Position Open To:**All Applicants**----****Weekly Scheduled Hours:**10**----****FLSA Status:**Non...Hourly payTemporary workPart timeCasual workWork at officeImmediate start10 hours per weekShift work- ...frictionless points of engagement with our users.... ...History & Progress: Own the systems that surface where learners... ...engage their teams at scale. Account Settings:... ...teams comprised of Engineers and Product Designers.... ...Product Manager for SaaS or software products. Experience...Summer workWork at officeHome office
- ...the email communication. Principal Software Engineer - Next-Generation API & AI... ...intersection of distributed systems, data platforms, and agentic... ...distributed systems, APIs, or large‑scale data platforms. Deep... ...goals. In addition to our engaging workspace in South Austin,...Remote jobFull timeContract workTemporary workLocal areaWorldwideVisa sponsorshipFlexible hours
- ...professionals. Our core search engine sits at the heart of... ..., and self‑driven Principal Software Engineer (PSE) to join... ...Intelligence system with the cutting‑edge... ...experience with large‑scale Search, Recommendation... ...them to life. Actively engage in tracking and reducing...Work at officeLocal areaImmediate startFlexible hours
- ...developers save time by accelerating software builds and tests. Our cloud-... ...we build tools that empower engineering teams—from startups to... ...Engineer with a focus on build systems, compilers, and languages... ...standards for software delivery at scale and ensure operational...Remote workWorldwide
- ...Lansweeper is growing its engineering capability with a newly created Principal Software Engineer role focused... ...intelligent platforms, systems, and frameworks that... ...infrastructure software at scale Strong understanding... ...week in Austin, Texas Engaging company culture with team...Full timeLocal area2 days per week
- ...We're looking for a technically solid and people‑first Systems Administrator to join our IT team. You'll be the backbone of day‑to‑... ...with a healthy mix of hands‑on technical work and direct user engagement. No ticket gets closed without the person on the other end actually...Remote work
- Software Engineer (Java DevOps Administrator), IS&T Enterprise Systems Austin Metro Area, Texas, United States Software and Services The people here at Apple don’t just... ...root cause analysis of critical issues in large‑scale distributed systems. Enable zero‑downtime...
- ...cloud computing. As a Staff Systems Engineer, you will be the bridge between... ...workflows, and agent software in Go that connects diverse... ...with automotive hardware at scale. If you are excited about turning... ...and automotive protocols. Engage with embedded engineering teams...Work experience placementRelocation packageFlexible hours
$169k
Job Title: Director of Digital Engagement Hiring Department: Dell Medical School Position Open... ...a truly integrated academic health system - the Director of Digital Engagement plays... ...digital strategy, teams and platforms at scale. Proficiency with Adobe Experience Manager...Work at officeLocal areaImmediate start- ...Technologies is seeking a Full Stack Software Engineer to join the Program Systems team within the Programs... ...demand forecasting, and customer engagement into a unified digital thread. Our... ...internal development frameworks that scale the team’s capacity. Support citizen...Permanent employmentTemporary workWork at officeShift work
- Voice Platform Software Engineer, Customer Systems Austin, Texas, United States Software and Services Join a... ...building the next generation of large-scale voice and real-time communication... ...are foundational to future customer engagement experiences. Description As a Voice...
- Salesforce in Austin, Texas is seeking talented software and platform engineers to join our AI team. In this role, you'll be responsible for building and deploying cutting-edge AI services aimed at improving customer interactions across our CRM platform. The ideal candidate...
- ..., gaming and embedded systems. Grounded in a culture... ...Platform Application Engineering team as a System Application... ...of hardware and software. You enjoy collaborating... ...energized by customer engagement and technical... ...ensure reliability at scale. Understand Partner requirements...
$120k
Senior Secure Research Systems Engineer Purpose The Senior Secure Research Systems Engineer will lead secure research computing initiatives... ...and collect required artifacts for CUI assessments. Engage in ongoing risk assessment across the college research environment...Work at office- ...Infrastructure (OCI) is building the next generation of AI native engineering systems powering cloud operations, infrastructure automation, and developer productivity at scale. We are looking for a Principal Software Development Engineer (IC4) who operates as an AI native...
- Sr Site Reliability Engineer, Customer Systems Austin, Texas, United States Software and Services Imagine what you could do here. Apple is a place where extraordinary... ...in designing and building resilient, large-scale, low latency, cloud and on-prem Infrastructure including...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal Software Engineer, Rack-Scale System Software — CSP Engagements. Be the first to apply!
- principal software engineer Austin, TX
- system validation engineer Austin, TX
- systems engineer Austin, TX
- advanced systems engineer Austin, TX
- unix linux systems engineer Austin, TX
- space systems engineer Austin, TX
- system verification engineer Austin, TX
- senior linux systems engineer Austin, TX
- mission system engineer Austin, TX
- active directory systems engineer Austin, TX

