Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Manager, Site Reliability Engineering

$200k - $322k

NVIDIA

Senior Manager, Site Reliability Engineering page is loaded## Senior Manager, Site Reliability Engineeringlocations: US, CA, Santa Claratime type: Full timeposted on: Posted Yesterdayjob requisition id: JR2016119For over 25 years, NVIDIA has been at the forefront of transforming computer graphics, PC gaming, and accelerated computing, driven by a legacy of continuous innovation and exceptional talent. We are now leveraging the immense potential of AI to usher in the next era of computing, where our GPUs power the "brains" of computers, robots, and autonomous vehicles that can comprehend the world. This pioneering work demands vision, innovation, and the world's best talent. Join our diverse and supportive environment, where NVIDIANs are inspired to excel and make a profound global impact.NVIDIA is seeking a Senior Manager of Site Reliability Engineering to lead and reshape how IT operations function at scale. This role goes beyond traditional service management to build AI-powered systems that enhance reliability, speed, and employee experience. We offer an outstanding opportunity to lead and refine Incident, Problem, and Change Management into an intelligent, automated operating model using observability, AI insights, and orchestration. This leader will apply strong operational execution with an SRE attitude, facilitating the move from reactive processes to predictive and autonomous operations.**What you’ll be doing*** Manage the full lifecycle of Incident, Problem, and CM as a 24×7 operational function, ensuring high reliability and minimal business disruption.* Transform incident response by bringing to bear AI detection, correlation, and guided remediation, reducing time to detect, respond, and resolve.* Build and scale intelligent incident workflows that integrate monitoring, telemetry, and service context to enable faster and more consistent response.* Evolve Problem Management into a data-driven field, using AI and analytics to identify patterns, eliminate recurring issues, and drive systemic fixes.* Modernize CM by introducing risk-aware, data-driven decisioning, improving change success rates, and reducing blast radius.* Drive the adoption of observability as a foundation, ensuring service-level visibility, signal quality, and actionable insights across the IT ecosystem.* Lead the development of automation and orchestration platforms that reduce manual effort across the outage lifecycle, including detection, triage, communication, and RCA or equivalent experience.* Partner closely with engineering, infrastructure, and business teams to align operations with service reliability goals and SLOs.**What we need to see:*** BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering or related fields (or equivalent experience).* 5+ years of experience leading and managing global IT operations or service management teams, with growing scope and complexity.* 12+ overall years of experience in Site Reliability Engineering, IT Service Management, with a focus on Incident Management, Problem Management, and Configuration Management* Proven proficiency in Incident, Problem, and CM with a consistent record of delivering measurable gains in reliability and efficiency.* Demonstrated experience applying AI, automation, or advanced analytics to improve operational outcomes.* Solid understanding of observability, monitoring ecosystems, and modern reliability practices (SRE principles, SLOs, error budgets).* Demonstrated ability to move organizations from process-heavy to technology-focused operating models.* Strong leadership capability with experience building and scaling engineering-focused teams (SRE, SWE, or equivalent).* Ability to deliver executive-level communication and insights, translating operational signals into clear, actionable narratives for leadership.* Ability to build and lead a high-performing team of SREs and engineers, encouraging a culture of ownership, innovation, and continuous improvement.**Ways to stand out from the crowd:*** ITIL knowledge and/or certification* Experience building or scaling AI-powered operational platforms.* Ability to challenge traditional ITSM models and introduce innovative, scalable approaches.* A mentality passionate about automation first, prevention over reaction, and systems over process.Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 200,000 USD - 322,000 USD.You will also be eligible for equity and .Applications for this job will be accepted at least until April 17, 2026.This posting is for an existing vacancy.NVIDIA uses AI tools in its recruiting processes.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA Corporation

Vacancy posted 4 days ago
Similar jobs that could be interesting for youBased on the Senior Manager, Site Reliability Engineering in Santa Clara, CA vacancy
  •  ...that turns raw, high‑volume telemetry into reliable, job‑centric insights and automation for...  ...GPU fleets. Join our team of innovative engineers who are building this platform and...  ...performance, data integrity, and safe change management. You’ll own SLOs/SLIs, incident response... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • $174k - $252k

    Senior Software Engineer, Site Reliability Engineering X Applicants in San Francisco: Qualified applications with arrest or conviction records will be considered for employment in accordance with the San Francisco Fair Chance Ordinance for Employers and the California... 
    Senior
    Full time

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • $152k - $241.5k

     ...Overview We’re looking for a Senior SRE to join our Compute Farm...  ...‑as‑Code) and config management to standardize and automate...  ...lifecycle management, fleet reliability/auto‑healing, E2E observability...  ...Perl, or Ruby. Mentored other engineers and influenced technical direction... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $200k - $322k

     ...supportive environment, where NVIDIANs are inspired to excel and make a profound global impact. NVIDIA is seeking a Senior Manager of Site Reliability Engineering to lead and reshape how IT operations function at scale. This role goes beyond traditional service management... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $126k - $204.5k

     ...monitoring tools and practices, having managed high cardinality metrics, implemented tracing...  ...you will collaborate closely with our engineering teams to develop innovative solutions...  ...of the product and ensure the reliability and availability of our services. Qualifications... 
    Senior

    Palo Alto Networks, Inc.

    Santa Clara, CA
    5 days ago
  • $145k - $165k

    A technology solutions firm in Sunnyvale, CA is looking for a highly experienced Site Reliability Engineer (SRE). This role involves maintaining uptime and performance across systems. Exceptional Linux expertise and automation skills in Bash and Python are crucial. Key... 
    Senior

    Bolt Graphics, Inc.

    Sunnyvale, CA
    4 days ago
  • $176k - $276k

    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high...  ...systems, networking, coding, database, capacity management, continuous delivery and deployment and open source cloud... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • $198.3k - $342.8k

    Site Reliability Engineering Manager, eBusiness Services Sunnyvale, California, United States Software and Services Imagine what we could do together. At Apple, new ideas have a way of becoming excellent products, services, and customer experiences very quickly. Bring... 
    Relocation

    Apple Inc.

    Sunnyvale, CA
    4 days ago
  • $129.3k - $193.9k

    Northrop Grumman Corp. (JP) is seeking a Deputy Operations Program Manager in Sunnyvale, CA. This role involves leading project teams, managing manufacturing operations, and ensuring program delivery meets schedule and budget. Ideal candidates bring extensive experience... 
    Senior

    Northrop Grumman Corp. (JP)

    Sunnyvale, CA
    4 days ago
  •  ...Sr. Manager API Platform Make Next Happen Now. For more than 30 years, the Bank has helped innovative companies and their investors...  ...Management platform. You will work cross-functionally with Architects, Engineers, Business Analysts, and Service Managers across multiple teams... 
    Senior

    Professional Recruiters

    Santa Clara, CA
    7 days ago
  • Intel is seeking a Collateral - Design and DFM Lead Engineer in Santa Clara, CA. You will lead efforts to enhance design for manufacturability rules while collaborating across multiple product teams. The role requires a deep understanding of DFM methodologies and a proven... 
    Senior

    Intel

    Santa Clara, CA
    4 days ago
  • Robotics Process Automation, LLC is looking for an experienced iOS Engineer based in Sunnyvale, California. The ideal candidate will have over 8 years of experience in iOS development and a passion for delivering high-quality mobile applications. Responsibilities include... 
    Senior

    Robotics Prcocess Automation, LLC

    Sunnyvale, CA
    5 days ago
  • $148k - $287.5k

    NVIDIA is looking for a Senior Technical Marketing Manager to join the GeForce team in Santa Clara, California. In this pivotal role, you will bridge deep technical knowledge with effective communication, shaping how consumers experience NVIDIA's innovative gaming platforms... 
    Senior

    NVIDIA

    Santa Clara, CA
    5 days ago
  • $245k - $295k

     ...Senior Engineering Manager Join Crusoe as a Senior Engineering Manager and lead a talented team focused on revolutionizing our cloud infrastructure. In this pivotal role, you'll lead the Command Center Insights & Actions team — building the systems that translate raw... 
    Senior
    Full time
    Temporary work

    Crusoe

    Sunnyvale, CA
    1 day ago
  • A leading tech company is seeking an SAP Test Manager to oversee comprehensive testing activities for an SAP upgrade project. The ideal...  ..., and ensuring compliance with industry standards. This senior role offers competitive compensation in a dynamic and collaborative... 
    Senior

    TechDigital Group

    Santa Clara, CA
    4 days ago
  • $210k - $270k

    Zocdoc is seeking a Senior Site Reliability Engineer to develop and maintain distributed production systems. The ideal candidate will have over 5 years of experience in site reliability or production engineering, particularly in cloud environments like AWS. Responsibilities... 
    Senior

    GoTo Meeting

    Palo Alto, CA
    2 days ago
  • NVIDIA is seeking a Senior Manager for its Silicon Co-Design Group in Santa Clara, California. This role involves leading efforts in post...  ...and problem-solving, with a strong foundation in Electrical Engineering. NVIDIA offers competitive benefits and a commitment to a... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $160k - $220k

    A leading vehicle intelligence company in California is seeking a Solutions Engineering Manager to lead technical pre-sales for global customers. This role demands expertise in automotive software development and customer engagement, with a focus on closing deals. Ideal... 
    Senior
    Flexible hours

    Applied Intuition

    Sunnyvale, CA
    5 days ago
  • NVIDIA Gruppe in Santa Clara is looking for a Senior Manager to oversee Product Co-Design and Verification. This role focuses on speed, reliability, and power compliance in semiconductor design. The ideal candidate will have extensive experience in semiconductor design,... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $116k - $195.5k

    NVIDIA Gruppe is actively seeking an experienced professional for a leadership role in Capital Asset Management based in Santa Clara, CA. The ideal candidate will be responsible for tracking and recovery of fixed assets globally. This position demands a detail-oriented... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $130k - $160k

    DeWinter Group is seeking a Senior Financial Reporting & Technical Accounting Lead in Sunnyvale, CA. This role involves architecting...  ...expansion, leading GAAP-compliant financial statement preparation, and managing audit relationships. The ideal candidate has 3-6 years of... 
    Senior

    DeWinter Group

    Sunnyvale, CA
    3 days ago
  • $232k - $368k

    Nvidia Corporation in Santa Clara is seeking a System Integration Lead to manage and resolve critical silicon issues before production. The role involves leading a team focused on delivering high-quality silicon, developing strategies to keep programs on schedule, and... 
    Senior

    Nvidia Corporation

    Santa Clara, CA
    1 day ago
  •  ...join our small team focused on growth and productivity. The role involves scaling our platform and infrastructure while enhancing reliability and the overall developer experience. Ideal candidates will have strong expertise in distributed systems, cloud-native... 
    Senior
    Remote job

    BuildBuddy

    Palo Alto, CA
    2 days ago
  • $207k - $300k

    Google Inc. is looking for a Staff Software Engineer specializing in Site Reliability Engineering in Sunnyvale, CA. This role combines software and systems engineering to build and manage distributed systems, ensuring high reliability and uptime. The ideal candidate should... 
    Senior

    Google Inc.

    Sunnyvale, CA
    4 days ago
  •  ...team of scientists to deliver innovative media solutions. The ideal candidate has over 10 years of applied research experience, management experience, and a strong background in machine learning. This role offers the opportunity to work on challenging projects and redefine... 
    Senior

    Prime Video & Amazon MGM Studios

    Sunnyvale, CA
    1 day ago
  • A leading tech company is looking for a Senior Manager to build and oversee a software development team focused on Linux-based Wi-Fi networking products. This role involves mentoring developers in Sunnyvale and managing software development efforts. The ideal candidate... 
    Senior

    Axius Inc

    Sunnyvale, CA
    2 days ago
  • A global technology company in California is seeking a Senior Manager for Driver Development & High-Speed Transceiver Test Automation. The role requires leading a high-performing engineering team and contributing technically to embedded driver development and automation... 
    Senior

    II-VI UK, Ltd.

    Santa Clara, CA
    2 days ago
  • $190.61k - $361.48k

    Intel Corporation is looking for a highly experienced technical leader to join their AI SoC organization in Santa Clara. This role involves owning the architecture and end-to-end design of complex SoC subsystems, requiring deep expertise in SoC and microarchitecture. The...
    Senior

    Intel Corporation

    Santa Clara, CA
    3 days ago
  • A global healthcare leader is seeking a Senior Product Manager to drive marketing strategies for innovative coronary therapies, including Intravascular Lithotripsy (IVL). This fully remote role focuses on increasing product penetration and launching new campaigns while... 
    Senior
    Remote job

    Johnson & Johnson

    Santa Clara, CA
    2 days ago
  • GlobalFoundries is seeking a Senior Principal IP Design Engineer in Santa Clara to lead and own RTL development for efficient, low-power CPU cores. You will drive multiple micro-architecture strategies, ensuring performance and area optimizations while assisting in functional... 
    Senior

    GlobalFoundries

    Santa Clara, CA
    5 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Manager, Site Reliability Engineering. Be the first to apply!