Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Debug & Failure Analysis Engineer - Datacenter GPUs

NVIDIA

NVIDIA is seeking a Senior System Debug Engineer to join its datacenter product engineering team in Santa Clara, California. The role involves driving failure analysis and debugging efforts during the New Product Introduction phase while collaborating with industry experts. The ideal candidate will have over 12 years of experience and a degree in Electrical Engineering. This position offers a competitive salary, equity, and a comprehensive benefits package. The successful candidate will perform rigorous failure analysis and engage with internal teams to ensure product quality and timely delivery. This job presents a unique opportunity to work with innovative technology that shapes the future of datacenters. #J-18808-Ljbffr NVIDIA

Vacancy posted 14 hours ago
Similar jobs that could be interesting for youBased on the Senior Debug & Failure Analysis Engineer - Datacenter GPUs in Santa Clara, CA vacancy
  • $200k - $322k

    Join NVIDIA's datacenter product engineering team in our Operations organization and be at the forefront of technological advancement! As a Senior System Debug Engineer, you will drive failure analysis and debug efforts during our New Product Introduction (NPI) phase. You... 
    Senior
    Work experience placement
    Overseas

    NVIDIA

    Santa Clara, CA
    14 hours ago
  • An established industry player is seeking a Sr. Failure Analysis Engineer to join their team. In this role, you will be responsible for driving...  ...dynamic work environment where your expertise in hardware debugging, Linux system administration, and effective communication... 
    Senior

    The Fountain Group

    Santa Clara, CA
    4 days ago
  • $152k - $241.5k

     ...and Python), analytical, and debugging Good understanding of Deep Learning...  ...Experience with NVIDIA GPUs, CUDA Programming, and Networking...  ...stand out Proven SW engineering experience experience in deploying...  ...in large AI job performance analysis for training/inference workload... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $130k - $200k

     ...connections among thousands of GPUs and memory units. The...  ...and test systems engineer to design, build, and...  ...automation and analysis frameworks, and closing...  ...limitations and failure modes Debug complex issues across...  ...Familiarity with telecom, datacenter optics, or silicon... 
    Senior
    Contract work

    nEye.ai

    Santa Clara, CA
    27 days ago
  • $200k

     ...AI. As AI moves beyond the datacenter into robots, autonomous mobile...  ...exceptional architects and engineers to rethink how AI, sensing,...  ...bug investigation, root-cause analysis, and debug across hardware and software...  .... Relevant Backgrounds CPUs GPUs AI accelerators Networking... 
    Senior
    Flexible hours

    Velaura

    Santa Clara, CA
    2 days ago
  •  ...your career. THE ROLE We are seeking a highly skilled Senior Hardware Validation & Failure Analysis Engineer to lead hardware validation, system bring-up,...  ...candidate combines deep technical expertise in hardware debugging, root cause analysis, and system‑level validation... 
    Senior
    Contract work

    Advanced Micro Devices

    Santa Clara, CA
    14 hours ago
  • $46.3 per hour

     ...national staffing firm and are currently seeking an Sr. Failure Analysis Engineer for a prominent client of ours. This position is located...  ...the test environment. Demonstrated ability at hardware debugging to component level and fault isolation is mandatory.... 
    Senior
    Monday to Friday
    Shift work
    Day shift

    The Fountain Group

    Santa Clara, CA
    2 days ago
  • $200k - $351k

     ...to receive an alert: Senior Principal, Design Engineering, Power Design...  ...specification, design, and debug of complex power delivery systems for datacenter products. Skills &...  ...relevant simulations, analysis, test vehicles, and...  ...CPU, GPU components Failure mode analysis Good understanding... 
    Senior
    Local area

    Celestica

    San Jose, CA
    2 days ago
  • $132k - $207k

    Senior System Power Validation and Applications Engineer page is loaded Senior System Power...  ...Engineer in the Datacenter System Engineering...  ...development, and debug power system issues...  ...feasibility analysis, present options...  ...reliability, and failure analysis Experience... 
    Senior
    Full time

    NVIDIA Corporation

    Santa Clara, CA
    14 hours ago
  • $200k - $400k

     ...for a deeply technical engineer to co-design and...  ...engineering, distributed debugging, and communication-runtime...  ...across thousands of GPUs • Architect fault-tolerant...  ...real-world cluster failures Core Technical...  ...concepts • Congestion analysis and routing optimization... 
    Senior
    Visa sponsorship

    Institute of Foundation Models

    Sunnyvale, CA
    3 days ago
  • $116k - $184k

    NVIDIA Gruppe in Santa Clara is seeking a Product Quality Engineer to lead failure analysis for system products. The ideal candidate will have a strong analytical skillset and experience in problem-solving, focusing on quality improvement initiatives. Your responsibilities... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    14 hours ago
  • $116k - $184k

     ...hear from you! We are looking for a Product Quality Engineer to join our team leading all aspects of failure analysis for NVIDIA’s system product segment throughout...  ...on experience in PCB level hardware verification/debug and signal integrity measurement. Experience with... 
    Senior
    Work experience placement

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  •  ...Overview Product Quality Engineer for the Systems Product Quality team,...  ...serving as the system‑level power debug domain expert for customer returns and field failures on NVIDIA data center systems, compute...  ...Lead system‑level power failure analysis for customer returns and field... 
    Senior

    NVIDIA

    Santa Clara, CA
    1 day ago
  •  ...connections among thousands of GPUs and memory units. The...  ...Overview We are looking for a senior individual contributor with...  ...Applied Physics, Electrical Engineering, or related field 7 - 15+ years...  ...Hands-on experience with failure analysis and materials characterization... 
    Senior

    nEye Systems, Inc.

    Santa Clara, CA
    3 days ago
  • $168k - $258.75k

    The Senior Datacenter Systems Modeling Engineer leads cross‑functional execution and modeling for data‑center power‑delivery programs. This role owns...  ...and optimize the design; drive fast and effective failure analysis with solid root cause and correlation actions. Own... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $132k - $207k

     ...looking to hire a System Test Engineer who will work in the test...  ...manufacturing test solutions for Datacenter products. Through HW and SW...  ...in the early design stages. Debug complex hardware and software...  ...scripting for automation and data analysis. Strong problem‑solving... 
    Senior
    Local area
    Overseas

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $168k - $258.75k

    Senior Hardware Validation Engineer - Datacenter Systems Engineering Senior Hardware Validation Engineer will lead validation activities...  ...into NVIDIA’s datacenter products. Debug, triage issues, conduct root‑cause analysis, verify fixes, define new tests, and improve... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  •  ...over 40 years experience in engineering and materials testing services...  ...everything we do. Every analysis, every test, and every question...  ...Laboratories is seeking a Senior Failure Analysis Engineer to lead hands...  ...who enjoys deep technical debugging, operates lab tools independently... 
    Senior
    Permanent employment
    Full time
    Casual work
    Local area
    Monday to Friday

    Eurofins USA Material Sciences

    Sunnyvale, CA
    26 days ago
  •  ...looking for curious, collaborative, and motivated hardware engineers to take our GPU memory subsystem from first silicon...  ...on leadership of HBM/GDDR enablement, validation, and failure analysis for NVIDIA’s datacenter and graphics products. Drive review of memory partner’... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • Senior Electrical Engineer (Data Center)- [Hybrid] Houston, Texas Unfortunately...  ...delivers end-to-end AI datacenter infrastructure built around...  ...and design reviews covering failure modes such as loose...  ...equivalent power systems analysis software Proficiency in ECAD... 
    Senior

    Submer

    Sunnyvale, CA
    3 days ago
  • $184k - $287.5k

    ## Senior Fortran Compiler EngineerApplylocations:...  ...language parallelism for GPUs and Multicore CPUs*...  ...with teamwork throughout debugging, prototyping, and...  ...experience with semantic analysis* Knowledge of programming...  ...growth, our exclusive engineering teams are rapidly... 
    Senior
    Remote work

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

     ...looking for a dedicated engineer for the Senior Systems Software...  ...and improve multiple datacenter products concurrently...  ...datacenter builds for NVIDIA GPUs, CPUs, and networking...  ...and tools. Analyze, debug, and resolve critical...  ...profiler to systems analysis. Linux systems... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $210k - $265k

     ...execution mode and has a world-class engineering team with decades of...  ...conversion systems, including circuit analysis, simulation, control loop...  ...testing, validation, and debugging power supply designs to ensure...  ...for AI accelerators, GPUs, or high‑performance ASICs. Knowledge... 
    Senior

    Eridu AI

    Saratoga, CA
    1 day ago
  • $75 - $85 per hour

     ...development and hardware engineering company, offering end-...  ...com/careers Title: Senior Electrical Engineer...  ...electrical measurements and analysis across power rails,...  .... ~Execute lab debugging and troubleshooting using...  .... ~ Conduct failure analysis of electrical... 
    Senior
    Flexible hours

    Fresh Consulting

    Sunnyvale, CA
    1 day ago
  • $152k - $241.5k

     ...based programming model for our GPUs. CUDA Tile shipped with CUDA1...  ..., and other general software engineering work. What we need to see:...  ...compiler optimization, performance analysis and IR design. Ability to...  ...design skills, including debugging, performance analysis, and test... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $152k - $241.5k

     ...looking for versatile software engineers for our XLA team. NVIDIA is...  ...the OpenXLA compiler on NVIDIA GPUs at scale. You’ll collaborate...  .... Performance tuning and analysis. Code‑generation for NVIDIA GPU...  ...software design skills, including debugging, performance analysis, and... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    14 hours ago
  •  ...and experienced HPC Cluster Engineer to design, deploy and operate...  ...workloads, including performance analysis and optimizations. Conduct...  ...accelerate researchers’ velocity, debugging and software performance at...  ...: Background with NVIDIA GPUs, CUDA programming, NCCL and MLPerf... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    14 hours ago
  •  ...Senior Systems Engineer US - Milpitas About Us Graphcore is one of the...  .... Diagnose system-level failures involving thermal behavior,...  ...to perform root cause analysis and propose corrective actions...  ...architectures and board-level debugging. Experience analyzing... 
    Senior
    Flexible hours

    Graphcore

    Milpitas, CA
    14 hours ago
  • $200k

     ...As AI moves beyond the datacenter into robots,...  ...exceptional architects and engineers to rethink how AI, sensing...  ...We are looking for a Senior RTL Engineer to help build...  ...optimization. Strong debugging and problem-solving...  ...and memory subsystems GPUs AI accelerators... 
    Senior
    Flexible hours
    Night shift

    Velaura

    Santa Clara, CA
    2 days ago
  •  ...develop software infrastructure and improve datacenter architectures designed for deep...  ...have a Bachelor’s degree in Electrical Engineering or Computer Science and 8 years of experience...  ...with CUDA, PyTorch, and performance analysis is essential. #J-18808-Ljbffr NVIDIA
    Senior

    NVIDIA

    Santa Clara, CA
    14 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Debug & Failure Analysis Engineer - Datacenter GPUs. Be the first to apply!