Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Software Engineer, E2E Performance and Goodput — CSP Engagements

$272k - $431.25k
Full-time

NVIDIA

We're looking for a Principal Engineer to join our CSP Engagements team as the technical focal point for end-to-end performance, working directly with engineering teams of key CSP/hyperscale customers to ensure they achieve various performance targets on NVIDIA platforms. In this role, you will augment NVIDIA's performance and benchmark teams with a dedicated CSP-facing focus. You will drive work streams with CSP engineering teams to build shared understanding of platform performance characteristics, gather and incorporate their workload-specific feedback into NVIDIA's optimization priorities, and validate that performance targets are met in customer-representative configurations. Your cross-CSP visibility enables you to identify patterns and drive systemic improvements in documentation, configuration guidance, and tooling. What you'll be doing: Drive performance characterization work streams with engineering teams of key CSP/hyperscale customers — ensuring they understand platform performance expectations, profiling methodology, and tuning options for their specific workloads Gather and synthesize CSP performance feedback — identify gaps between expected and actual throughput, and champion optimization priorities back into NVIDIA's CUDA, NCCL, driver, and firmware teams Ensure key open-source performance and stress tools (e.g., STREAM, GPU Burn, GPU BLAST) are updated and validated for the latest NVIDIA rack-scale systems, GPU architectures, and CPU platforms — so customers and internal teams have reliable baseline measurements from day one Work closely with CSPs to ensure their own performance and validation tooling reflects the latest GPU capabilities, memory hierarchy changes, and platform-specific tuning parameters Conduct cross-CSP performance comparison and pattern analysis — identify configuration, software, or workload differences that explain performance gaps between deployments Collaborate with CSPs to ensure performance-related integration work (profiling infrastructure, benchmark harnesses, config validation) is ready ahead of deployment milestones Define test strategies and tooling requirements for performance validation — both for NVIDIA internal certification and customer acceptance What we need to see: 15+ years of experience in systems performance engineering, ideally in GPU/HPC/ML infrastructure. BS or MS in Computer Science, Computer Engineering, or related field (or equivalent experience) Proficiency in GPU workload profiling: nsight systems, nsight compute, DCGM metrics, or equivalent instrumentation Understanding of distributed training performance dynamics: computation/communication overlap, pipeline bubbles, memory bandwidth utilization, collective efficiency Statistical methods for performance analysis: regression detection, confidence intervals, A/B comparison at scale Understanding of how the full software stack impacts performance: driver overhead, collective algorithm selection, memory allocation, scheduling, firmware power management Strong data analysis and visualization skills (Python, pandas, dashboards). Customer obsession — genuine passion for understanding why customers aren't achieving expected performance and driving solutions Ability to communicate performance findings to both deep technical audiences and executive leadership Demonstrated success influencing multiple engineering teams to prioritize performance improvements Ways to stand out from the crowd: Experience profiling and optimizing distributed training at 1000+ GPU scale (Megatron-LM, DeepSpeed, FSDP) Background in ML infrastructure performance at a CSP/hyperscaler Familiarity with NVIDIA platforms (DGX, HGX, NVLink topology) and profiling tools Experience building automated performance regression detection systems for production environments Understanding of inference workload performance dynamics (vLLM, TensorRT-LLM, SGLang, continuous batching) NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative, hardworking and self-motivated, we want to hear from you! Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 431,250 USD. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until June 30, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. NVIDIA pioneered accelerated computing. Today, our AI infrastructure powers global intelligence, transforming every industry. Learn more about NVIDIA.

Vacancy posted 1 hour ago
Similar jobs that could be interesting for youBased on the Principal Software Engineer, E2E Performance and Goodput — CSP Engagements in Santa Clara, CA vacancy
  • $272k - $431.25k

    We're looking for a Principal Software Engineer to join our CSP Engagements team as the technical focal point for GPU firmware and GPU system software, working...  ...tenancy isolation, secure boot, attestation), and performance — and champion those priorities into NVIDIA's GPU... 
    Performance
    Full time

    NVIDIA

    Santa Clara, CA
    1 hour ago
  • $272k - $431.25k

    We're looking for a Principal Software Engineer to join our CSP Engagements team as the technical focal point for fleet-scale reliability, working directly...  ...groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The GPU, our... 
    Performance
    Full time

    NVIDIA

    Santa Clara, CA
    1 hour ago
  • $272k - $431.25k

    What you’ll be doing: Drive system software architecture alignment and technical deep dives, acting as the primary software engineering contact for NPI projects with key customers...  ...experience in designing scalable, high‑performance server systems at the SW/HW interface.... 
    Performance
    Shift work

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

    NVIDIA is seeking a Senior Software Engineer, NCCL and CUDA specialization to join our Cloud Service Provider (CSP) Engagements team, focusing on ML software stack functionality and performance for datacenter products such as GB300 and Vera Rubin. This role involves working... 
    Performance

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

    Senior Software Engineer, Cloud-Native Stack - CSP Engagements page is loaded Senior Software Engineer, Cloud-Native Stack - CSP Engagements Apply locations US...  ...of the CSP engagements team. What you’ll be doing: Perform deep-dive debugging of multi-rack, multi-tenant... 
    Performance
    Full time

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

    NVIDIA is seeking a Senior Systems Software test (lead) Engineer to join our Cloud Service Provider (CSP) Engagements team, focusing on ML software stack validation for...  ...service providers with next-generation high-performance training and inference platforms. You will... 
    Performance
    Full time
    Local area

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $184k - $287.5k

    NVIDIA is seeking a Senior Firmware Engineer to join our CSP Engagements team, focusing on system software for Datacenter products such as GB200. This role combines...  ...NVIDIA GPU firmware issues, power management, performance, and thermal control problems for data center... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • Overview NVIDIA is seeking a Senior Software Engineer to join our CSP Engagements team, focusing on system software for datacenter products such as GB200....  ...and applications focusing on AI/ML and HPC workloads. Perform advanced system debugging, root cause analysis, and performance... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $200k

     ...management platforms for enterprise customer engagement. Trusted by the world’s most...  ..., and scale are non‑negotiable. The Principal Software Engineer role exists to help us continue...  ...Participate hands‑on in coding, debugging, performance optimization, and production issue... 
    Performance
    Shift work

    eGain

    Sunnyvale, CA
    4 days ago
  •  ...beyond. THE ROLE: As a Principal AI Infrastructure Solution Engineer, you will partner with AMD’s AI software teams and customers to enable...  ...operating resilient, high‑performance AI workloads at scale....  ...infrastructure community engagement (plus) LOCATION: Santa Clara... 
    Performance

    Advanced Micro Devices , Inc.

    Santa Clara, CA
    3 days ago
  • $168k - $258.75k

     ...Technical Program Manager to join the CSP Engagements team, focused on deep technical...  ...systems and embedded software leaders—including software engineering managers, technical leads, or senior...  ...technical topics, including bring‑up, performance, reliability, observability, and... 
    Performance
    Full time

    NVIDIA AI

    Santa Clara, CA
    2 days ago
  • $272k - $431.25k

     ...new AI-powered application. We seek a Principal Software Engineer - AI Inference to advance open-source...  ...inference runtime architecture, GPU performance engineering, and distributed systems....  ...and land PRs or equivalent experience, engage in development discussions, help... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

     ...graphics, and accelerated computing. As a Principal Software Engineer, you will lead the transformation of...  ...to manage complex customer engagements and help develop our product and architecture...  ...design‑in, coding, bring‑up, performance tuning, failure analysis, and production... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

     ...world. At NVIDIA, as a Principal Rack Scale Systems Infrastructure Engineer, you will build and guide the development of software systems. These systems...  ...internal deployments and CSP environments. Bridge...  ...silicon, or other high‑performance computing systems. Expertise... 
    Performance
    Shift work

    Jobleads-US

    Santa Clara, CA
    23 hours ago
  • $272k - $425.5k

    Principal Software Engineer – Large-Scale LLM Memory and Storage Systems page is loaded## Principal Software Engineer – Large-Scale LLM Memory...  ...across multi-node distributed environments. Built in Rust for performance and Python for extensibility, Dynamo orchestrates GPU... 
    Performance
    Local area
    Remote work

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $184k - $356.5k

    NVIDIA Corporation is looking for a Senior Software Engineer specializing in NCCL and CUDA to join our Cloud Service Provider Engagements team in Santa Clara, California. You...  ...with customers to address functional and performance challenges in our ML software stack for... 
    Performance

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $175k - $245k

     ...support the delivery of our new platform. Maintain the existing software components, OS related. Requirements: B.S./M.S. with 8...  ...-on experience with the Linux kernel, debugging, development, performance tuning, etc. Detailed knowledge of Linux kernel, scheduling... 
    Performance
    Full time

    Fortinet

    Sunnyvale, CA
    1 day ago
  • d-Matrix inc. is seeking a Principal Software Engineer specializing in kernels at our headquarters in Santa Clara, CA. In this role, you will...  ...kernels for next-generation AI hardware, ensuring optimized performance. The ideal candidate has a strong background in computer... 
    Performance
    3 days per week

    d-Matrix inc.

    Santa Clara, CA
    3 days ago
  • $167k - $270.5k

     ...experience motion. The Sr. Principal/Principal person will have demonstrated...  ..., observability and performance across all AI/ML/Agentic...  ...validation. Partner with data engineering to design high‑quality datasets...  ...15+ years in data science, software engineering, data... 
    Performance
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    5 days ago
  • $240k - $250k

     ...resolution times and improving engineering efficiency. Integrate...  ...points and scalable solutions. Engage directly on customer calls...  ...proposing new troubleshooting and performance analysis tools that raise...  ...YOU BRING 10+ years of software engineering experience, with... 
    Performance

    Ring

    Milpitas, CA
    3 days ago
  • $272k - $431.25k

     ...architecture and hands‑on delivery across system software, drivers, and CUDA to make profiling...  ..., driver/platform layers, and performance counter/trace providers. Establish profiling...  .... Set technical direction for an engineering team; mentor engineers, drive technical... 
    Performance

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $147k - $237.5k

     ...Experience Management) group is seeking an accomplished Principal Software Engineer with expertise in developing client-side software for MacOS...  ...address complex challenges in the ADEM space, driving performance and reliability across diverse environments. Your Impact... 
    Performance
    Visa sponsorship
    Work visa

    Palo Alto Networks

    Santa Clara, CA
    5 days ago
  • $217k - $326k

     ...your mark, come join us. THE ROLE You will be a key Senior Software Engineer driving the digital transformation of Everpure focused on innovating...  ...mission is to design and implement highly reliable, high-performance algorithms and technologies that redefine the customer... 
    Performance
    Work at office
    Immediate start
    Flexible hours

    Pure Storage

    Santa Clara, CA
    3 days ago
  • $147k - $237.5k

     ...precision that drives great outcomes. Job Summary As a Principal Software Engineer, you will play a key role in the design and implementation...  ...trade‑offs to optimize time‑to‑release while maintaining performance and scalability requirements. Play a key role design/re‑... 
    Performance
    Full time
    Work at office

    Palo Alto Networks

    Santa Clara, CA
    1 day ago
  • As a Principal Engineer, you will act as a hands‑on technical leader and architect for the Marketplace...  ...ensure scalability, reliability, and performance. Solve complex challenges in...  ...initiatives. Qualifications 10+ years of software engineering experience in large‑scale... 
    Performance
    Temporary work

    Walmart

    Sunnyvale, CA
    3 days ago
  • $165.8k - $308k

    ## Principal Bioinformatics Software EngineerApplylocations: Santa Claratime type: Full timeposted on: Posted...  ...*** As a Bioinformatics Software Engineer, you will design and develop...  ...available based on individual and Company performance. This position also qualifies for... 
    Performance
    Local area
    Relocation package

    F. Hoffmann-La Roche AG

    Santa Clara, CA
    1 day ago
  • $143k - $286k

     ...access layer for SQL and NoSQL datastores. Lead a team of engineers to deliver cross‑team initiatives. Root‑cause...  ...multiple teams, applications, networks, hardware, and software that relate to scaling and performance. Collaborate with the open source community and make... 
    Performance
    Temporary work
    Work experience placement
    Work at office

    Walmart

    Sunnyvale, CA
    1 day ago
  • $272k - $431.25k

     ...smart personal assistants and engineering‑productivity tools to data‑...  ...the company. Now we need a principal‑level, hands‑on engineering...  ...doing: Improve reliability, performance, observability, release confidence...  ...behave like mature software, not prototypes. Build reusable... 
    Performance
    Live in

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $197.3k - $313.7k

     ...person and virtually. As the Principal Engineer focused on architecture...  ...requires a deep understanding of software development, architecture...  ...excellence in scalability, performance, observability, and user...  ...will at times require active engagement on deployments, monitoring,... 
    Performance

    Centaur Labs

    Palo Alto, CA
    1 day ago
  • $200k - $225k

     ...networks and mobile users. As a Senior Engineer, your role will involve building and designing...  ..., develop, and support highly scalable software features and infrastructure on our next-...  ..., and maintainable code that scales and performs well for thousands of customers.... 
    Performance
    Remote work

    Palo Alto Networks

    Santa Clara, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Software Engineer, E2E Performance and Goodput — CSP Engagements. Be the first to apply!