Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Software Engineer, Rack-Scale System Software — CSP Engagements

$272k - $431.25k
Full-time

NVIDIA

We're looking for a Principal Software Engineer to join our CSP Engagements team as the technical focal point for rack-scale system SW/FW, working with CSP engineering teams to ensure they can deploy, monitor, and operate these systems reliably at fleet scale. In this role, you will collaborate with NVIDIA's cross-functional rack-scale system SW/FW engineering teams with dedicated CSP-facing technical leadership. Your focus is on the system-level software that manages, monitors, and recovers the rack as a whole — fabric management, GPU/NVSwitch error handling and recovery, health telemetry APIs, firmware update orchestration, and SW-driven serviceability. You will drive work streams with CSP engineering teams to build shared understanding of the architecture, incorporate their operational feedback, and ensure integration readiness. What you'll be doing: Drive rack-scale SW/FW architecture alignment across CSP engagements — including fabric management software, link health monitoring, GPU/NVSwitch error handling, SW/FW serviceability features (e.g., hot-plug support, component isolation, firmware-driven recovery), and multi-component firmware orchestration Drive technical work streams with CSP engineering teams on rack-scale system software — ensuring they deeply understand fabric management, NVSwitch behavior, error handling and recovery policies, health telemetry APIs, and SW/FW-controlled recovery operation Capture and synthesize CSP engineering feedback on rack-scale system software — health monitoring APIs, SW-driven serviceability workflows, firmware update orchestration, and error recovery behavior — champion that feedback into NVIDIA's architecture decisions Collaborate with multi-functional teams to ensure customer operational requirements are reflected in system software and firmware development Identify cross-CSP patterns in rack-scale SW/FW issues, error handling behavior, and system configuration practices — drive documentation, tooling, and test strategy improvements as a result Collaborate with execution teams on left-shift strategy — ensuring customer-side SW/FW integration work is identified early and completed ahead of hardware availability Make critical technical decisions on rack-scale system SW/FW tradeoffs and mitigate execution risks through early engagement with CSP engineering teams What we need to see: 15+ years of experience in system software, platform firmware, or large-scale distributed systems engineering. BS or MS in Computer Science, Electrical Engineering, or related field (or equivalent experience) Deep understanding of rack-scale system software challenges: multi-component coordination, error propagation, health monitoring, and serviceability / reliability Experience with fabric management software, cluster management, or system-level orchestration frameworks. Familiarity with firmware architectures and update lifecycle management (multi-component update sequencing, rollback, recovery) Understanding of error handling and recovery design patterns in distributed systems — fault isolation, retry policies, graceful degradation Experience with health monitoring and telemetry systems: health scoring, event correlation, API design for fleet-level observability Understanding of GPU or accelerator system software (drivers, device management, power management) is a strong plus Customer obsession — genuine passion for understanding how CSPs operate sophisticated systems at fleet scale and simplifying their experience Proven success providing technical leadership across organizational boundaries and influencing system software design without direct authority. Strong communication — ability to translate complex system software architecture into actionable mentorship for customer engineering teams Ways to stand out from the crowd: Experience with NVIDIA NVSwitch, NVOS, or GPU fabric management software Background in system software for large-scale clusters at a hyperscaler (cluster management, fleet orchestration, health platforms) Experience crafting error handling and recovery frameworks for multi-component systems (hundreds or thousands of coordinating devices) Familiarity with GPU or accelerator fleet operations — driver lifecycle, firmware rollout strategies, health-based scheduling Understanding of how system software decisions impact serviceability, availability, and operational cost at fleet scale NVIDIA’s invention of the GPU in 1999 fueled the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern deep learning — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We're looking to grow our company and establish teams with the most thoughtful people in the world. Are you ready to change the next generation of computing? Join us at the forefront of technological advancement. NVIDIA data center systems, such as DGX and HGX, have become core to NVIDIA's rapidly growing enterprise and cloud provider businesses. These platforms bring together the full power of NVIDIA GPUs, NVIDIA NVLink, NVIDIA InfiniBand networking, NVIDIA Grace CPUs, and a fully optimized NVIDIA AI and HPC software stack. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 431,250 USD. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until June 30, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. NVIDIA pioneered accelerated computing. Today, our AI infrastructure powers global intelligence, transforming every industry. Learn more about NVIDIA.

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Principal Software Engineer, Rack-Scale System Software — CSP Engagements in Austin, TX vacancy
  • $272k - $431.25k

    We're looking for a Principal Software Engineer to join our CSP Engagements team as the technical focal...  ...for GPU firmware and GPU system software, working directly...  ...GPU firmware at fleet scale. You will drive work streams...  ...hundreds of GPUs per rack Serve as the technical... 
    Suggested
    Full time

    NVIDIA

    Austin, TX
    2 days ago
  • $272k - $431.25k

    We're looking for a Principal Engineer to join our CSP Engagements team as the technical focal...  ...identify patterns and drive systemic improvements in...  ...validated for the latest NVIDIA rack-scale systems, GPU...  ...identify configuration, software, or workload differences... 
    Suggested
    Full time

    NVIDIA

    Austin, TX
    2 days ago
  • $184k - $287.5k

    Senior Software Engineer, NCCL and CUDA - CSP Engagements page is loaded## Senior Software Engineer, NCCL and CUDA - CSP...  ...libraries layer for deployment at scale. The role combines deep technical...  ...and CUDA libraries, frameworks, and system software interaction to solve the... 
    Suggested
    Remote work

    NVIDIA Corporation

    Austin, TX
    3 days ago
  • $249.9k - $374.9k

     ...generation AI inference rack‑scale solutions anchored by...  ...chassis and power systems to high‑speed interconnects...  ..., and ODM engagement. This leader will define...  ...power, and mechanical engineers across multiple sites....  ...Partner with ASIC, software, product management, and... 
    Suggested
    Work experience placement
    Work from home

    Jobleads-US

    Austin, TX
    3 days ago
  • $130k - $160k

    Director of Business Systems & Automation Join to apply for the Director...  ...Systems That Help Businesses Scale? Are you a hands‑on systems...  ...implement new technologies and software solutions to support evolving...  ...Oracle, Microsoft) and selectively engage technology consulting partners... 
    Suggested
    Full time
    Work at office
    Remote work
    Flexible hours

    A La C.A.R.T.E. Solutions - ALC

    Austin, TX
    1 day ago
  •  ..., gaming and embedded systems. Grounded in a culture...  ...advance your career. Rack Scale Serviceability & Telemetry...  ...firmware, management software, security, validation,...  ...ROLE AMD is seeking a Principal Member of Technical...  ...manufacturing, and customer engineering to translate... 
    Remote work

    Advanced Micro Devices

    Austin, TX
    4 days ago
  • Aurigo Software Technologies Inc. is seeking a Senior AI Developer specializing in Generative...  ...and build production-grade GenAI systems. This role focuses on engineering robust platforms for natural-language interactions in large-scale enterprise document ecosystems. The... 

    Aurigo Software Technologies Inc.

    Austin, TX
    1 day ago
  • Software Engineer, IS&T Enterprise Systems Austin Metro Area, Texas, United States Software and Services Imagine...  ...customer experience. Each customer engagement is an opportunity to delight, engage...  ...enterprise-level technology at a global scale. 2 yrs experience in securing... 
    Work experience placement

    Apple Inc.

    Austin, TX
    4 days ago
  • $218.8k - $335.3k

     ...automotive company in Austin is seeking an experienced Staff AI/ML Engineer to join the AV ML Infra team. The role involves designing...  ...and requires over 8 years of experience in large-scale distributed systems. Candidates should be proficient in cloud technologies such... 

    General Motors

    Austin, TX
    3 days ago
  •  ...bringing Apollo to market at scale, tackling the complex challenges...  ...seeking an experienced IT Systems Administrator to own the day‑...  ...and compute environments that engineering teams rely on to design,...  ...lifecycle management end‑to‑end: racking, provisioning, imaging, patching... 

    Jobr

    Austin, TX
    1 day ago
  • $170k - $210k

     ...design implications Lead large-scale, complex technical...  ...Search, SDLC, DevOps/Platform Engineering, Performance & Scalability/Reliability...  ...and/or Linux Operating Systems 5+ years’ experience leading...  ...our 1,800+ employees, pioneer engagement and growth solutions that fuel... 
    Work experience placement

    TTEC Digital

    Austin, TX
    5 days ago
  •  ...Insight Global is looking for a System Administrator to work on site...  ...hardware, including initial rack/stack, cabling coordination,...  ...Implement and manage software installations and upgrades across...  ...environments Collaborate with security, engineering, and operations teams to... 
    Flexible hours

    Insight Global

    Austin, TX
    2 days ago
  • $170k - $200k

     ...to apply for the .NET System Architect role at InCommodities...  ...benefiting from the scale and stability of our...  ...evolution of our software ecosystem, aligning with...  ...alongside IT and cloud engineers to ensure a seamless, scalable...  ...fundamental to how we engage with each other.... 
    Full time
    Temporary work
    Work at office
    Remote work
    Relocation
    Flexible hours

    InCommodities

    Austin, TX
    1 day ago
  • $35 per hour

    **Job Posting Title:**Temporary Systems Maintenance Specialist - Center for Community College Student Engagement - (UTEMPS)**----****Hiring Department:**UTemp Pool**----****Position Open To:**All Applicants**----****Weekly Scheduled Hours:**10**----****FLSA Status:**Non... 
    Hourly pay
    Temporary work
    Part time
    Casual work
    Work at office
    Immediate start
    10 hours per week
    Shift work

    University of Texas

    Austin, TX
    4 days ago
  •  ...frictionless points of engagement with our users....  ...History & Progress: Own the systems that surface where learners...  ...engage their teams at scale. Account Settings:...  ...teams comprised of Engineers and Product Designers....  ...Product Manager for SaaS or software products. Experience... 
    Summer work
    Work at office
    Home office

    Pluralsight

    Austin, TX
    3 days ago
  •  ...the email communication. Principal Software Engineer - Next-Generation API & AI...  ...intersection of distributed systems, data platforms, and agentic...  ...distributed systems, APIs, or large‑scale data platforms. Deep...  ...goals. In addition to our engaging workspace in South Austin,... 
    Remote job
    Full time
    Contract work
    Temporary work
    Local area
    Worldwide
    Visa sponsorship
    Flexible hours

    SpyCloud, Inc.

    Austin, TX
    5 days ago
  •  ...professionals. Our core search engine sits at the heart of...  ..., and self‑driven Principal Software Engineer (PSE) to join...  ...Intelligence system with the cutting‑edge...  ...experience with large‑scale Search, Recommendation...  ...them to life. Actively engage in tracking and reducing... 
    Work at office
    Local area
    Immediate start
    Flexible hours

    Fairygodboss

    Austin, TX
    5 days ago
  •  ...developers save time by accelerating software builds and tests. Our cloud-...  ...we build tools that empower engineering teams—from startups to...  ...Engineer with a focus on build systems, compilers, and languages...  ...standards for software delivery at scale and ensure operational... 
    Remote work
    Worldwide

    EngFlow

    Austin, TX
    2 days ago
  •  ...Lansweeper is growing its engineering capability with a newly created Principal Software Engineer role focused...  ...intelligent platforms, systems, and frameworks that...  ...infrastructure software at scale Strong understanding...  ...week in Austin, Texas Engaging company culture with team... 
    Full time
    Local area
    2 days per week

    Lansweeper NV

    Austin, TX
    1 day ago
  •  ...We're looking for a technically solid and people‑first Systems Administrator to join our IT team. You'll be the backbone of day‑to‑...  ...with a healthy mix of hands‑on technical work and direct user engagement. No ticket gets closed without the person on the other end actually... 
    Remote work

    Integreon

    Austin, TX
    21 hours ago
  • Software Engineer (Java DevOps Administrator), IS&T Enterprise Systems Austin Metro Area, Texas, United States Software and Services The people here at Apple don’t just...  ...root cause analysis of critical issues in large‑scale distributed systems. Enable zero‑downtime... 

    Apple Inc.

    Austin, TX
    4 days ago
  •  ...cloud computing. As a Staff Systems Engineer, you will be the bridge between...  ...workflows, and agent software in Go that connects diverse...  ...with automotive hardware at scale. If you are excited about turning...  ...and automotive protocols. Engage with embedded engineering teams... 
    Work experience placement
    Relocation package
    Flexible hours

    General Motors

    Austin, TX
    4 days ago
  • $169k

    Job Title: Director of Digital Engagement Hiring Department: Dell Medical School Position Open...  ...a truly integrated academic health system - the Director of Digital Engagement plays...  ...digital strategy, teams and platforms at scale. Proficiency with Adobe Experience Manager... 
    Work at office
    Local area
    Immediate start

    Phase2 Technology

    Austin, TX
    2 days ago
  •  ...Technologies is seeking a Full Stack Software Engineer to join the Program Systems team within the Programs...  ...demand forecasting, and customer engagement into a unified digital thread. Our...  ...internal development frameworks that scale the team’s capacity. Support citizen... 
    Permanent employment
    Temporary work
    Work at office
    Shift work

    Saronic Technologies Inc.

    Austin, TX
    5 days ago
  • Voice Platform Software Engineer, Customer Systems Austin, Texas, United States Software and Services Join a...  ...building the next generation of large-scale voice and real-time communication...  ...are foundational to future customer engagement experiences. Description As a Voice... 

    Apple

    Austin, TX
    1 day ago
  • Salesforce in Austin, Texas is seeking talented software and platform engineers to join our AI team. In this role, you'll be responsible for building and deploying cutting-edge AI services aimed at improving customer interactions across our CRM platform. The ideal candidate... 

    Salesforce

    Austin, TX
    1 day ago
  •  ..., gaming and embedded systems. Grounded in a culture...  ...Platform Application Engineering team as a System Application...  ...of hardware and software. You enjoy collaborating...  ...energized by customer engagement and technical...  ...ensure reliability at scale. Understand Partner requirements... 

    Advanced Micro Devices

    Austin, TX
    3 days ago
  • $120k

    Senior Secure Research Systems Engineer Purpose The Senior Secure Research Systems Engineer will lead secure research computing initiatives...  ...and collect required artifacts for CUI assessments. Engage in ongoing risk assessment across the college research environment... 
    Work at office

    The University of Texas at Austin

    Austin, TX
    4 days ago
  •  ...Infrastructure (OCI) is building the next generation of AI native engineering systems powering cloud operations, infrastructure automation, and developer productivity at scale. We are looking for a Principal Software Development Engineer (IC4) who operates as an AI native... 

    Ll Oefentherapie

    Austin, TX
    2 days ago
  • Sr Site Reliability Engineer, Customer Systems Austin, Texas, United States Software and Services Imagine what you could do here. Apple is a place where extraordinary...  ...in designing and building resilient, large-scale, low latency, cloud and on-prem Infrastructure including... 

    Apple Inc.

    Austin, TX
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Software Engineer, Rack-Scale System Software — CSP Engagements. Be the first to apply!