Site Reliability Engineer - Hardware Infrastructure
NVIDIA Gruppe
At NVIDIA, Site Reliability Engineering provides a rare chance to define, develop, and support large-scale production systems with high efficiency and availability. This demanding position merges software and systems engineering efforts to guarantee flawless service operation with consistent reliability and uptime. As an SRE here, you will be part of a welcoming team that values collaboration and creativity, empowering developers to make significant updates while sustaining efficient system function. What you'll be doing: Develop and support guidelines for incident management, planned maintenance, and blameless postmortems. Assist teams in responding to high severity incidents, driving root cause analysis, crafting high-quality postmortems, and developing post-incident corrective actions. Define reliability and supportability metrics, Service Level Objectives, and error budgets. Develop and drive the adoption of actionable, customer‑centric monitoring and alerting. Apply automation and Generative AI/Agentic solutions to minimize manual and tedious activities and boost customer support. Guide teams on establishing sustainable on‑call and operational standards. What we need to see: Degree in Computer Science or a related technical field involving coding, or equivalent experience. 8+ years of experience in SRE, DevOps, or Production Engineering. Strong understanding of SRE principles, including incident management, error budgets, SLOs, and SLAs. Experience crafting and deploying systems that are fault‑tolerant, performant, and supportable. Background with infrastructure automation. Experience running critical services in production. Experience in one or more of the following: Python, Go, Perl, or Ruby. Hands‑on experience with observability platforms (e.g., Prometheus, Grafana). Strong communication skills with the ability to convey technical concepts effectively to diverse audiences. Flexibility and adaptability working in a fast‑paced environment with evolving requirements. Ways to stand out from the crowd: Expertise in establishing incident management and postmortem processes. Experience driving adoption of common tools and processes across diverse groups. Experience working with LLM/Generative AI/Agentic solutions to shorten mitigation time, lessen toil, and ensure Service Level Objectives are met. Hands‑on expertise operating and scaling distributed systems with tight SLAs, ensuring high availability and performance. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level4, and 224,000 USD - 356,500 USD for Level5. You will also be eligible for equity and benefits. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA Gruppe
- A prominent tech company in Sunnyvale is seeking a Senior Signal Integrity Engineer to work on cutting-edge data center hardware. The role involves engaging with multiple teams to ensure signal integrity across various systems. Ideal candidates should have a Bachelor's...Suggested
$132k - $190k
Google Inc. in Sunnyvale is seeking a hardware engineer to link hardware engineering and laboratory operations, focusing on ensuring precision and successful hardware deployments. This role requires expertise in electronics calibration, test automation, and collaboration...Suggested$159k - $231k
Senior Hardware Systems Design Engineer, Platforms Infrastructure Google Sunnyvale, CA, USA Apply Requirements Bachelor’s degree in Electrical Engineering, Computer Engineering, Physics, a related field, or equivalent practical experience. 4 years of experience working...SuggestedWorldwide$184k - $287.5k
...push the boundaries of innovation and engineering? At NVIDIA, we lead the world in accelerated... ...high‑performance systems. As a Senior Hardware Systems Engineer, you will help build... ...with hyperscale data center infrastructure, including cooling methods, facility power...Suggested- NVIDIA Gruppe in Santa Clara is seeking an experienced validation engineer to define testing strategies and collaborate with cross-... .... The ideal candidate will have strong debugging skills across hardware and software and be proficient in Python for test automation. The...Suggested
$132k - $190k
Hardware Performance Test Engineer, Engineering Labs, Platforms Location: Sunnyvale, CA, USA. Requirements: Bachelor’s degree in Electrical Engineering... ...and failure reproduction. Develop software and infrastructure required for managing a fleet of test systems at scale,...$159k - $231k
Senior Hardware Power Test Engineer, Platforms Infrastructure Google, Sunnyvale, CA, USA Overview As a Senior Hardware Power Test Engineer, you will be supporting a team of hardware board and power designers responsible for prototype bring‑up, validation, qualification...Full time$120k - $172k
A leading technology company in California seeks a Product Quality Engineer for hardware within Google Cloud. This role involves owning the product quality process, utilizing advanced statistical methods, and collaborating with cross-functional teams to ensure exceptional...- Google Inc. is seeking a Soc Design Engineer in Sunnyvale, California, to shape the future of AI and ML hardware acceleration. This role involves driving TPU technology and working on SoC-level RTL design for advanced AI applications. The ideal candidate will have a Bachelor...
$108k - $153k
Hardware Validation Engineer, Cloud Platforms Google, Sunnyvale, CA, USA This is... ...hardware must be performed on site. Qualifications Bachelor... ...‑scale). The AI and Infrastructure team is redefining what’s... ...unparalleled scale, efficiency, reliability and velocity. Our...Full timeWork at officeWorldwide- ...algorithms and RTL Design. Understanding of both software and hardware is required. Key Responsibilities Architect, design, develop and... ...or MS degree (preferred) or equivalent experience in Computer Engineering or Electrical Engineering. At least 3+ years of work...Work experience placement
$116k - $189.75k
## Software Engineer, Hardware Tools and Methodology - New College Grad 2026Applylocations: US, CA, Santa Claratime type: Full timeposted on: Posted Yesterdayjob requisition id: JR2018659Our work at NVIDIA is dedicated towards a computing model focused on visual and AI...- Hardware Qualification Engineer, ML Products, Google Cloud corporate_fare Google place Sunnyvale, CA,... ...largest and most powerful computing infrastructures in the world. The Hardware Qualification... ...ensures that this equipment is reliable. In the Research and Development...
$120k - $172k
Product Quality Engineer, Hardware, Google Cloud corporate_fare Google place Sunnyvale, CA, USA Apply Bachelor's degree in Mechanical Engineering... ...discipline, or equivalent practical experience. Certified Reliability/Quality Engineer (CRE/CQE) certification or equivalent...Full timeWorldwide$144k - $209k
Google Inc. is seeking a Senior Hardware Reliability Engineer in Sunnyvale, CA, to manage hardware reliability for innovative machine learning and server products. You will lead reliability tests and define optimal design choices to mitigate risks early in the development...- ...leading technology firm in Sunnyvale is seeking a hands-on engineer to oversee hardware quality management for their products. The ideal... ...cross-functional teams to drive improvements in product reliability and customer satisfaction. This role is critical in ensuring...
$144k - $209k
Senior Hardware Reliability Engineer, Global Hardware Reliability Engineering corporate_fare Google place Sunnyvale, CA, USA Qualifications Bachelor... ...mission profiles for chassis, rack from integration sites to field (data centers) that help predict field reliability...Contract work$170k - $215k
Atomic Machines in Santa Clara, California, is looking for a Staff Systems Engineer to develop cutting-edge manufacturing hardware. This role entails collaboration with cross-functional teams to create software that links platform architecture to novel hardware. The ideal...- Palo Alto Networks, Inc. is looking for a skilled Senior to Principal EDVT Engineer to enhance our hardware engineering team in Santa Clara, CA. The role focuses on validating and verifying cutting-edge electronic systems while ensuring they meet the highest standards...
$147.4k - $272.1k
Site Reliability Engineer (Edge Services), Infrastructure Services Sunnyvale, California, United States Software and Services We are seeking a proactive Site Reliability Engineer to champion the evolution of our production ecosystems. In this role, you will help drive...RelocationShift work- Front-End Deployment Engineer (Hardware / EDA) Recruited by Abotts Inc. | abottstech.com Abotts is proud to be recruiting on behalf of a high-growth, venture-backed technology startup that is redefining the intersection of Artificial Intelligence and Semiconductor/Hardware...
- ...US technology consulting firm in Santa Clara is looking for Hardware Engineers to work on board validation and hardware testing. Ideal candidates... ...tests, and maintaining lab organization. This is an on-site role and offers a unique chance to work in a fast-paced lab with...
$136k - $218.5k
NVIDIA is seeking capable customer-facing hardware engineers to work directly with Cloud Scale Providers (CSP’s) deploying next generation... ...as AI Factories, are vital to scale compute and networking infrastructure needed for agentic AI processing. The CSP HW Systems...$176k - $276k
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high... ...), or equivalent experience8+ years of experience with Infrastructure automation, distributed systems design, experience with...- Google Inc. is hiring a Systems Development Engineer in Sunnyvale, CA. This position involves working with hardware test engineers to validate and qualify server, networking, and optical platforms. The ideal candidate will manage lab resources and coordinate between engineering...
$108k - $153k
Google Inc. in Sunnyvale, CA is seeking a Hardware Validation Engineer to contribute to medium-scale design projects. This role involves designing, testing, and maintaining hardware systems, including system-level testbeds. The ideal candidate will hold a Bachelor's degree...$250k
...single source of truth—explainable, reliable, and maintainable—that serves as the... ...knowledge management as mission-critical infrastructure for the AI-powered enterprise. We’re... ...Position Overview As Director of Site Reliability Engineering, you will ensure that eGain’s AI...Work at office$147k - $211k
Google Inc. is seeking a Software Engineer in Mountain View to develop low-level software for their Tensor SoC and Pixel devices. The role involves performance analysis, hardware-software interface design, and collaboration with multiple teams. Applicants should have a...- ...leadership in managing the product lifecycle and customer engagement. The ideal candidate will possess over 8 years of experience in hardware product management, excellent communication skills, and a strong technical background in networking. Join a team that's making...
$120.3k - $194.53k
...kind of precision that drives great outcomes. Job Summary Palo Alto Networks runs a large hybrid infrastructure across multiple public clouds. As a Site Reliability Engineer on the Internet Security Platform team, you will be part of a team supporting Advanced DNS...Full timeWork at officeVisa sponsorshipWork visa
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Site Reliability Engineer - Hardware Infrastructure. Be the first to apply!
- site reliability engineer sre Santa Clara, CA
- site reliability engineer Santa Clara, CA
- infrastructure automation engineer Santa Clara, CA
- senior infrastructure engineer Santa Clara, CA
- security infrastructure engineer Santa Clara, CA
- principal infrastructure engineer Santa Clara, CA
- infrastructure engineer Santa Clara, CA
- infrastructure engineering manager Santa Clara, CA
- remote infrastructure engineer Santa Clara, CA
- data infrastructure engineer Santa Clara, CA

