Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior System Reliability Engineer for AI & HPC

NVIDIA

NVIDIA Corporation is seeking a Senior System Reliability Engineer to join its Reliability Engineering team, located in Santa Clara, California. In this role, you will be responsible for leading reliability testing and collaborating cross-functionally with engineering teams to enhance product reliability. The ideal candidate will possess a Bachelor’s or Master’s degree in Electrical or Mechanical Engineering, with over 8 years of experience in hardware reliability. NVIDIA is known for offering competitive salaries and a comprehensive benefits package. #J-18808-Ljbffr NVIDIA Corporation

Vacancy posted 14 hours ago
Similar jobs that could be interesting for youBased on the Senior System Reliability Engineer for AI & HPC in Santa Clara, CA vacancy
  • $152k - $241.5k

     ...’re looking for a Senior SRE to join our Compute...  ...important systems running while working...  ...harness the power of AI to deliver...  ...integrate cleanly with HPC schedulers, storage...  ...management, fleet reliability/auto‑healing, E2E...  ...Ruby. Mentored other engineers and influenced... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $184k - $356.5k

    NVIDIA Corporation in Santa Clara is seeking a Senior GPU System Architect to design multi-GPU scale-up and scale-out systems for AI and HPC. Responsibilities include architecting system topologies, collaborating to optimize transport layers, and contributing to hardware... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • $168k - $264.5k

    ## Senior System Reliability EngineerApplylocations: US, CA, Santa Claratime type: Full timeposted on:...  ...Today, we are increasingly known as “the AI computing company.” We're looking to...  ...are looking for a System Reliability Engineer to join NVIDIA's existing Reliability... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    14 hours ago
  • $152k - $241.5k

     ...NVIDIA Gruppe in Santa Clara is seeking a Senior Software Engineer to enhance their HPC infrastructure. The role involves applying distributed systems patterns, automation, and building scalable services in a hybrid multi-cloud environment. Candidates should have strong... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $136k - $218.5k

    NVIDIA in Santa Clara is seeking a Silicon Speed Features Engineer to co-design system-level speed features across Gaming, Datacenter, Automotive,...  ...The role involves collaborating cross-functionally and using AI to enhance automation tools for performance validation.... 
    Suggested

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $152k - $241.5k

     ...looking for outstanding software engineers to help us expand our...  ...supporting NVIDIA products across HPC, cloud and enterprise on both...  ...will span many aspects of GPU system integration, including telemetry...  ...future of accelerated compute and AI.What you'll be doing:Develop... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $32 per hour

    Sterling Engineering is seeking a Sr. Electronics Technician in Milpitas, CA, to support production of advanced computer systems. This hands-on role focuses on system-level testing, troubleshooting...  ...skills, and familiarity with AI hardware. The position offers a full... 
    Senior
    Full time

    Sterling Engineering

    Milpitas, CA
    1 day ago
  •  ...Santa Clara is hiring for a role in their Hardware Infrastructure EDA Compute team to optimize workload scheduling systems and improve overall service reliability. The successful candidate will manage and scale job scheduling systems while driving measurable improvements in... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • NVIDIA Corporation in Santa Clara is seeking an experienced hardware engineer to collaborate cross-functionally on system-level features. Responsibilities include defining specifications, performing validation, and leading complex debug efforts to ensure timely product... 

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • $136k - $218.5k

     ...markets. As a Silicon Speed Features Engineer, you will co‑design system‑level speed features, build the...  ...time using modern tooling—including AI—without losing rigor. What You’ll Be...  ...hardware, firmware/software, process/reliability, and operations teams to co‑design system... 

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $184k - $356.5k

    NVIDIA Gruppe is seeking an experienced engineer to lead GPU cluster design and support for AI and HPC deployments in Santa Clara, California. The ideal candidate...  ...degree and demonstrate expertise in distributed systems. NVIDIA offers a competitive salary, equity, and... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $129k - $161.27k

     ...institution in Santa Clara seeks a skilled IT professional to enhance HPC capabilities through training, develop infrastructure solutions,...  .... Ideal candidates will have experience with Linux and Windows systems, and SAN storage environments. The role offers a salary range of... 
    Senior

    Santa Clara University

    Santa Clara, CA
    2 days ago
  • $156k - $190k

    Crusoe Energy Systems in Sunnyvale, CA, is seeking a Staff Cloud Support Engineer to provide technical leadership in cloud infrastructure. You will lead incident responses, design reliability architecture, and mentor team members. The ideal candidate will have over 8 years... 
    Senior

    Crusoe Energy Systems

    Sunnyvale, CA
    1 day ago
  • NVIDIA Corporation is seeking a Systems Quality and Reliability Engineer to join their LPU team. This role is crucial for ensuring the reliability of NVIDIA's AI/ML products through in-depth root-cause analysis and failure investigations. The ideal candidate will have a... 

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  •  ...into the unlimited potential of AI to define the next era of...  ...raw, high‑volume telemetry into reliable, job‑centric insights and automation...  .... Join our team of innovative engineers who are building this platform...  ...the Software Engineering and Systems Engineering team to translate... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $180k - $200k

    Uber is hiring a Senior Staff Engineer to architect and scale an autonomous support agent, enhancing customer experience using GenAI tools....  ...have over 10 years of experience in building production ML/AI systems and will lead voice agent initiatives. This role offers a salary... 
    Senior

    Uber

    Sunnyvale, CA
    3 days ago
  • $200k - $322k

     ...leveraging the immense potential of AI to usher in the next era of...  ...global impact. NVIDIA is seeking a Senior Manager of Site Reliability Engineering to lead and reshape how IT operations...  ...service management to build AI‑powered systems that enhance reliability, speed, and... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $184k - $287.5k

     ...unlimited potential of AI to define the next...  ...and improve software systems for rack, networking,...  ...and management. As a Senior Software Engineer - Datacenter Systems,...  ...run today's fastest HPC and AI workloads. This...  ...and Site Reliability Engineering (SRE) practices... 
    Senior
    Full time

    NVIDIA

    Santa Clara, CA
    4 days ago
  • $136k - $212.75k

    NVIDIA Gruppe in Santa Clara is hiring a Senior Power System Engineer to lead the development of efficient, high-current power systems for advanced AI accelerators. The role involves architecting power delivery systems for data center platforms and collaborating with cross... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • A leading AI technology company located in Sunnyvale, California, is looking for an experienced engineer to join its SOTA Training Platform team. The ideal candidate will have over...  ...ML models to life on Cerebras CSX systems, performance tuning, and contributing to... 
    Senior

    Cerebras

    Sunnyvale, CA
    4 days ago
  • NVIDIA Corporation is looking for a Senior System Software Engineer to join the NvSci team and help maintain its leadership in AI. This role involves building next-generation software, enhancing system architecture, and collaborating for performance optimization. Candidates... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $200k - $322k

    NVIDIA AI in Santa Clara is seeking a Senior System Debug Engineer to drive failure analysis during the New Product Introduction phase. You will collaborate with industry experts to ensure quality in GPU Server products while working in a diverse and supportive environment... 
    Senior

    NVIDIA AI

    Santa Clara, CA
    2 days ago
  •  ...crucial for scaling Deep Learning and HPC. We're seeking a Senior Software Architect to help co‑design next...  ...increasing scale of next generation systems. This is an outstanding opportunity to...  ...technologies to accelerate AI and HPC workloads. Explore innovative... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $181.1k - $318.4k

    A leading technology firm located in Santa Clara, California is seeking an experienced Machine Learning Data Engineer to design and implement AI robotic systems. This role requires strong software engineering skills and at least 3 years of relevant experience. Candidates... 
    Senior

    Apple Inc.

    Santa Clara, CA
    3 days ago
  • NVIDIA in Santa Clara is seeking an experienced engineer to design and optimize AI systems for the CUDA ecosystem. Ideal candidates will have strong C/C++ and Python skills, with a solid background in AI systems development. The position offers competitive salaries, equitably... 
    Senior

    NVIDIA

    Santa Clara, CA
    1 day ago
  • We are now looking for a Senior System Software Engineer to work in our Tegra system software group. The best candidates will have excellent C/C++...  ...power sophisticated server products used in groundbreaking AI, HPC, and accelerated computing workloads. We have some of the... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  •  ...company based in Santa Clara, California, is seeking a Senior Software Engineer to focus on the cloud-native stack for their AI/ML datacenters. This role entails deep technical work including debugging complex systems and gathering customer requirements. Ideal candidates... 
    Senior

    NVIDIA

    Santa Clara, CA
    1 day ago
  • NVIDIA Corporation is seeking a Senior Reliability Engineer to join a dynamic team focused on developing innovative products. The role involves building reliability test plans and methodologies for advanced technologies like GPUs and automotive products. Ideal candidates... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • $184k - $287.5k

    NVIDIA Gruppe is seeking talented AI systems engineers to advance innovative technologies in AI inference systems software. This role involves developing cutting-edge libraries, code generators, and kernel technologies for NVIDIA's architecture, emphasizing high-impact... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $224k - $356.5k

    NVIDIA Corporation is seeking a Senior Software Engineer in Santa Clara to define runtime...  ...Your role involves integrating AI with vehicle dynamics and safety systems, tackling complex problems in real...  ...AI and systems teams to build reliable solutions ensuring safety and performance... 
    Senior

    NVIDIA

    Santa Clara, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior System Reliability Engineer for AI & HPC. Be the first to apply!