Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Site Reliability Engineer - Hardware Infrastructure

NVIDIA Gruppe

At NVIDIA, Site Reliability Engineering provides a rare chance to define, develop, and support large-scale production systems with high efficiency and availability. This demanding position merges software and systems engineering efforts to guarantee flawless service operation with consistent reliability and uptime. As an SRE here, you will be part of a welcoming team that values collaboration and creativity, empowering developers to make significant updates while sustaining efficient system function. What you'll be doing: Develop and support guidelines for incident management, planned maintenance, and blameless postmortems. Assist teams in responding to high severity incidents, driving root cause analysis, crafting high-quality postmortems, and developing post-incident corrective actions. Define reliability and supportability metrics, Service Level Objectives, and error budgets. Develop and drive the adoption of actionable, customer‑centric monitoring and alerting. Apply automation and Generative AI/Agentic solutions to minimize manual and tedious activities and boost customer support. Guide teams on establishing sustainable on‑call and operational standards. What we need to see: Degree in Computer Science or a related technical field involving coding, or equivalent experience. 8+ years of experience in SRE, DevOps, or Production Engineering. Strong understanding of SRE principles, including incident management, error budgets, SLOs, and SLAs. Experience crafting and deploying systems that are fault‑tolerant, performant, and supportable. Background with infrastructure automation. Experience running critical services in production. Experience in one or more of the following: Python, Go, Perl, or Ruby. Hands‑on experience with observability platforms (e.g., Prometheus, Grafana). Strong communication skills with the ability to convey technical concepts effectively to diverse audiences. Flexibility and adaptability working in a fast‑paced environment with evolving requirements. Ways to stand out from the crowd: Expertise in establishing incident management and postmortem processes. Experience driving adoption of common tools and processes across diverse groups. Experience working with LLM/Generative AI/Agentic solutions to shorten mitigation time, lessen toil, and ensure Service Level Objectives are met. Hands‑on expertise operating and scaling distributed systems with tight SLAs, ensuring high availability and performance. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level4, and 224,000 USD - 356,500 USD for Level5. You will also be eligible for equity and benefits. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA Gruppe

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Site Reliability Engineer - Hardware Infrastructure in Santa Clara, CA vacancy
  • A prominent tech company in Sunnyvale is seeking a Senior Signal Integrity Engineer to work on cutting-edge data center hardware. The role involves engaging with multiple teams to ensure signal integrity across various systems. Ideal candidates should have a Bachelor's... 
    Suggested

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • $132k - $190k

    Google Inc. in Sunnyvale is seeking a hardware engineer to link hardware engineering and laboratory operations, focusing on ensuring precision and successful hardware deployments. This role requires expertise in electronics calibration, test automation, and collaboration... 
    Suggested

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $159k - $231k

    Senior Hardware Systems Design Engineer, Platforms Infrastructure Google Sunnyvale, CA, USA Apply Requirements Bachelor’s degree in Electrical Engineering, Computer Engineering, Physics, a related field, or equivalent practical experience. 4 years of experience working... 
    Suggested
    Worldwide

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • $184k - $287.5k

     ...push the boundaries of innovation and engineering? At NVIDIA, we lead the world in accelerated...  ...high‑performance systems. As a Senior Hardware Systems Engineer, you will help build...  ...with hyperscale data center infrastructure, including cooling methods, facility power... 
    Suggested

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • NVIDIA Gruppe in Santa Clara is seeking an experienced validation engineer to define testing strategies and collaborate with cross-...  .... The ideal candidate will have strong debugging skills across hardware and software and be proficient in Python for test automation. The... 
    Suggested

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • $132k - $190k

    Hardware Performance Test Engineer, Engineering Labs, Platforms Location: Sunnyvale, CA, USA. Requirements: Bachelor’s degree in Electrical Engineering...  ...and failure reproduction. Develop software and infrastructure required for managing a fleet of test systems at scale,... 

    Google Inc.

    Sunnyvale, CA
    3 days ago
  • $159k - $231k

    Senior Hardware Power Test Engineer, Platforms Infrastructure Google, Sunnyvale, CA, USA Overview As a Senior Hardware Power Test Engineer, you will be supporting a team of hardware board and power designers responsible for prototype bring‑up, validation, qualification... 
    Full time

    Google Inc.

    Sunnyvale, CA
    20 hours ago
  • $120k - $172k

    A leading technology company in California seeks a Product Quality Engineer for hardware within Google Cloud. This role involves owning the product quality process, utilizing advanced statistical methods, and collaborating with cross-functional teams to ensure exceptional... 

    Google Inc.

    Sunnyvale, CA
    1 day ago
  • Google Inc. is seeking a Soc Design Engineer in Sunnyvale, California, to shape the future of AI and ML hardware acceleration. This role involves driving TPU technology and working on SoC-level RTL design for advanced AI applications. The ideal candidate will have a Bachelor... 

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • $108k - $153k

    Hardware Validation Engineer, Cloud Platforms Google, Sunnyvale, CA, USA This is...  ...hardware must be performed on site. Qualifications Bachelor...  ...‑scale). The AI and Infrastructure team is redefining what’s...  ...unparalleled scale, efficiency, reliability and velocity. Our... 
    Full time
    Work at office
    Worldwide

    Google Inc.

    Sunnyvale, CA
    2 days ago
  •  ...algorithms and RTL Design. Understanding of both software and hardware is required. Key Responsibilities Architect, design, develop and...  ...or MS degree (preferred) or equivalent experience in Computer Engineering or Electrical Engineering. At least 3+ years of work... 
    Work experience placement

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • $116k - $189.75k

    ## Software Engineer, Hardware Tools and Methodology - New College Grad 2026Applylocations: US, CA, Santa Claratime type: Full timeposted on: Posted Yesterdayjob requisition id: JR2018659Our work at NVIDIA is dedicated towards a computing model focused on visual and AI... 

    NVIDIA

    Santa Clara, CA
    20 hours ago
  • Hardware Qualification Engineer, ML Products, Google Cloud corporate_fare Google place Sunnyvale, CA,...  ...largest and most powerful computing infrastructures in the world. The Hardware Qualification...  ...ensures that this equipment is reliable. In the Research and Development... 

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $120k - $172k

    Product Quality Engineer, Hardware, Google Cloud corporate_fare Google place Sunnyvale, CA, USA Apply Bachelor's degree in Mechanical Engineering...  ...discipline, or equivalent practical experience. Certified Reliability/Quality Engineer (CRE/CQE) certification or equivalent... 
    Full time
    Worldwide

    Google Inc.

    Sunnyvale, CA
    1 day ago
  • $144k - $209k

    Google Inc. is seeking a Senior Hardware Reliability Engineer in Sunnyvale, CA, to manage hardware reliability for innovative machine learning and server products. You will lead reliability tests and define optimal design choices to mitigate risks early in the development... 

    Google Inc.

    Sunnyvale, CA
    2 days ago
  •  ...leading technology firm in Sunnyvale is seeking a hands-on engineer to oversee hardware quality management for their products. The ideal...  ...cross-functional teams to drive improvements in product reliability and customer satisfaction. This role is critical in ensuring... 

    Synopsys, Inc.

    Sunnyvale, CA
    1 day ago
  • $144k - $209k

    Senior Hardware Reliability Engineer, Global Hardware Reliability Engineering corporate_fare Google place Sunnyvale, CA, USA Qualifications Bachelor...  ...mission profiles for chassis, rack from integration sites to field (data centers) that help predict field reliability... 
    Contract work

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • $170k - $215k

    Atomic Machines in Santa Clara, California, is looking for a Staff Systems Engineer to develop cutting-edge manufacturing hardware. This role entails collaboration with cross-functional teams to create software that links platform architecture to novel hardware. The ideal... 

    Energy Jobline ZR

    Santa Clara, CA
    3 days ago
  • Palo Alto Networks, Inc. is looking for a skilled Senior to Principal EDVT Engineer to enhance our hardware engineering team in Santa Clara, CA. The role focuses on validating and verifying cutting-edge electronic systems while ensuring they meet the highest standards... 

    Palo Alto Networks, Inc.

    Santa Clara, CA
    3 days ago
  • $147.4k - $272.1k

    Site Reliability Engineer (Edge Services), Infrastructure Services Sunnyvale, California, United States Software and Services We are seeking a proactive Site Reliability Engineer to champion the evolution of our production ecosystems. In this role, you will help drive... 
    Relocation
    Shift work

    Apple Inc.

    Sunnyvale, CA
    1 day ago
  • Front-End Deployment Engineer (Hardware / EDA) Recruited by Abotts Inc. | abottstech.com Abotts is proud to be recruiting on behalf of a high-growth, venture-backed technology startup that is redefining the intersection of Artificial Intelligence and Semiconductor/Hardware... 

    Abotts Inc

    Santa Clara, CA
    20 hours ago
  •  ...US technology consulting firm in Santa Clara is looking for Hardware Engineers to work on board validation and hardware testing. Ideal candidates...  ...tests, and maintaining lab organization. This is an on-site role and offers a unique chance to work in a fast-paced lab with... 

    NewsNowGh

    Santa Clara, CA
    1 day ago
  • $136k - $218.5k

    NVIDIA is seeking capable customer-facing hardware engineers to work directly with Cloud Scale Providers (CSP’s) deploying next generation...  ...as AI Factories, are vital to scale compute and networking infrastructure needed for agentic AI processing. The CSP HW Systems... 

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $176k - $276k

    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high...  ...), or equivalent experience8+ years of experience with Infrastructure automation, distributed systems design, experience with... 

    NVIDIA Corporation

    Santa Clara, CA
    4 days ago
  • Google Inc. is hiring a Systems Development Engineer in Sunnyvale, CA. This position involves working with hardware test engineers to validate and qualify server, networking, and optical platforms. The ideal candidate will manage lab resources and coordinate between engineering... 

    Google Inc.

    Sunnyvale, CA
    1 day ago
  • $108k - $153k

    Google Inc. in Sunnyvale, CA is seeking a Hardware Validation Engineer to contribute to medium-scale design projects. This role involves designing, testing, and maintaining hardware systems, including system-level testbeds. The ideal candidate will hold a Bachelor's degree... 

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • $250k

     ...single source of truth—explainable, reliable, and maintainable—that serves as the...  ...knowledge management as mission-critical infrastructure for the AI-powered enterprise. We’re...  ...Position Overview As Director of Site Reliability Engineering, you will ensure that eGain’s AI... 
    Work at office

    eGain Corporation

    Sunnyvale, CA
    1 day ago
  • $147k - $211k

    Google Inc. is seeking a Software Engineer in Mountain View to develop low-level software for their Tensor SoC and Pixel devices. The role involves performance analysis, hardware-software interface design, and collaboration with multiple teams. Applicants should have a... 

    Google Inc.

    Mountain View, CA
    3 days ago
  •  ...leadership in managing the product lifecycle and customer engagement. The ideal candidate will possess over 8 years of experience in hardware product management, excellent communication skills, and a strong technical background in networking. Join a team that's making... 

    Hobbsnews

    Sunnyvale, CA
    1 day ago
  • $120.3k - $194.53k

     ...kind of precision that drives great outcomes. Job Summary Palo Alto Networks runs a large hybrid infrastructure across multiple public clouds. As a Site Reliability Engineer on the Internet Security Platform team, you will be part of a team supporting Advanced DNS... 
    Full time
    Work at office
    Visa sponsorship
    Work visa

    Palo Alto Networks, Inc.

    Santa Clara, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Site Reliability Engineer - Hardware Infrastructure. Be the first to apply!