Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Manager, Site Reliability Engineering

$200k - $322k

NVIDIA

Senior Manager, Site Reliability Engineering page is loaded## Senior Manager, Site Reliability Engineeringlocations: US, CA, Santa Claratime type: Full timeposted on: Posted Yesterdayjob requisition id: JR2016119For over 25 years, NVIDIA has been at the forefront of transforming computer graphics, PC gaming, and accelerated computing, driven by a legacy of continuous innovation and exceptional talent. We are now leveraging the immense potential of AI to usher in the next era of computing, where our GPUs power the "brains" of computers, robots, and autonomous vehicles that can comprehend the world. This pioneering work demands vision, innovation, and the world's best talent. Join our diverse and supportive environment, where NVIDIANs are inspired to excel and make a profound global impact.NVIDIA is seeking a Senior Manager of Site Reliability Engineering to lead and reshape how IT operations function at scale. This role goes beyond traditional service management to build AI-powered systems that enhance reliability, speed, and employee experience. We offer an outstanding opportunity to lead and refine Incident, Problem, and Change Management into an intelligent, automated operating model using observability, AI insights, and orchestration. This leader will apply strong operational execution with an SRE attitude, facilitating the move from reactive processes to predictive and autonomous operations.**What you’ll be doing*** Manage the full lifecycle of Incident, Problem, and CM as a 24×7 operational function, ensuring high reliability and minimal business disruption.* Transform incident response by bringing to bear AI detection, correlation, and guided remediation, reducing time to detect, respond, and resolve.* Build and scale intelligent incident workflows that integrate monitoring, telemetry, and service context to enable faster and more consistent response.* Evolve Problem Management into a data-driven field, using AI and analytics to identify patterns, eliminate recurring issues, and drive systemic fixes.* Modernize CM by introducing risk-aware, data-driven decisioning, improving change success rates, and reducing blast radius.* Drive the adoption of observability as a foundation, ensuring service-level visibility, signal quality, and actionable insights across the IT ecosystem.* Lead the development of automation and orchestration platforms that reduce manual effort across the outage lifecycle, including detection, triage, communication, and RCA or equivalent experience.* Partner closely with engineering, infrastructure, and business teams to align operations with service reliability goals and SLOs.**What we need to see:*** BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering or related fields (or equivalent experience).* 5+ years of experience leading and managing global IT operations or service management teams, with growing scope and complexity.* 12+ overall years of experience in Site Reliability Engineering, IT Service Management, with a focus on Incident Management, Problem Management, and Configuration Management* Proven proficiency in Incident, Problem, and CM with a consistent record of delivering measurable gains in reliability and efficiency.* Demonstrated experience applying AI, automation, or advanced analytics to improve operational outcomes.* Solid understanding of observability, monitoring ecosystems, and modern reliability practices (SRE principles, SLOs, error budgets).* Demonstrated ability to move organizations from process-heavy to technology-focused operating models.* Strong leadership capability with experience building and scaling engineering-focused teams (SRE, SWE, or equivalent).* Ability to deliver executive-level communication and insights, translating operational signals into clear, actionable narratives for leadership.* Ability to build and lead a high-performing team of SREs and engineers, encouraging a culture of ownership, innovation, and continuous improvement.**Ways to stand out from the crowd:*** ITIL knowledge and/or certification* Experience building or scaling AI-powered operational platforms.* Ability to challenge traditional ITSM models and introduce innovative, scalable approaches.* A mentality passionate about automation first, prevention over reaction, and systems over process.Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 200,000 USD - 322,000 USD.You will also be eligible for equity and .Applications for this job will be accepted at least until April 17, 2026.This posting is for an existing vacancy.NVIDIA uses AI tools in its recruiting processes.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr NVIDIA Corporation

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior Manager, Site Reliability Engineering in Santa Clara, CA vacancy
  • $168k - $270.25k

    Senior Site Reliability Engineer page is loaded## Senior Site Reliability Engineerlocations: US, CA, Santa Claratime type: Full timeposted on: Posted...  ...Engineering, Production Engineering, or Incident Management roles* Bachelor’s or Master’s degree in Computer Science... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $148k - $235.75k

     ...on the world.Join our team of innovative engineers who are building an AI Data Center AIOps...  ...turns raw, high-volume telemetry into reliable, job-centric insights and automation for...  ...performance, data integrity, and safe change management. You’ll own SLOs/SLIs, incident response... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • $174k - $252k

    Senior Software Engineer, Site Reliability Engineering X Applicants in San Francisco: Qualified applications with arrest or conviction records will be considered for employment in accordance with the San Francisco Fair Chance Ordinance for Employers and the California... 
    Senior
    Full time

    Google Inc.

    Sunnyvale, CA
    5 days ago
  • $145k - $165k

    A technology solutions firm in Sunnyvale, CA is looking for a highly experienced Site Reliability Engineer (SRE). This role involves maintaining uptime and performance across systems. Exceptional Linux expertise and automation skills in Bash and Python are crucial. Key... 
    Senior

    Bolt Graphics, Inc.

    Sunnyvale, CA
    2 days ago
  • $176k - $276k

    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high...  ...systems, networking, coding, database, capacity management, continuous delivery and deployment and open source cloud... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $181.69k - $213.75k

     ...funds and SPVs, representing nearly $185B in assets under management, with tools designed to enhance the strategic impact of...  ...solve today unlock the opportunities of tomorrow. As a Senior Site Reliability Engineer, you’ll work to: Build and scale our internal platform... 
    Senior
    Full time
    Work at office

    Carta

    Santa Clara, CA
    1 day ago
  •  ...Sr. Manager API Platform Make Next Happen Now. For more than 30 years, the Bank has helped innovative companies and their investors...  ...Management platform. You will work cross-functionally with Architects, Engineers, Business Analysts, and Service Managers across multiple teams... 
    Senior

    Professional Recruiters

    Santa Clara, CA
    8 hours ago
  • $129.3k - $193.9k

    Northrop Grumman Corp. (JP) is seeking a Deputy Operations Program Manager in Sunnyvale, CA. This role involves leading project teams, managing manufacturing operations, and ensuring program delivery meets schedule and budget. Ideal candidates bring extensive experience... 
    Senior

    Northrop Grumman Corp. (JP)

    Sunnyvale, CA
    2 days ago
  • $272k - $431.25k

    A leading technology company in Santa Clara is looking for a Senior Manager in Systems Software Engineering to drive the development of cloud services. The ideal candidate will have over 10 years of software development experience, including 5 years in leadership roles... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  •  ...A global healthcare leader is seeking a Senior Product Manager to drive marketing strategies for innovative coronary therapies, including Intravascular Lithotripsy (IVL). This fully remote role focuses on increasing product penetration and launching new campaigns while... 
    Senior
    Remote work

    Johnson & Johnson

    Santa Clara, CA
    8 hours ago
  • Robotics Process Automation, LLC is looking for an experienced iOS Engineer based in Sunnyvale, California. The ideal candidate will have over 8 years of experience in iOS development and a passion for delivering high-quality mobile applications. Responsibilities include... 
    Senior

    Robotics Prcocess Automation, LLC

    Sunnyvale, CA
    3 days ago
  • $171.54k - $276.8k

    Palo Alto Networks is accepting resumes for the following positions in SANTA CLARA, CA: Principal Product Manager (REF9485204) Investigate and understand customer business goals, architecture, scale, and service level objectives for Public Cloud security use cases, as... 
    Senior
    Remote work

    Stryker Corporation

    Santa Clara, CA
    3 days ago
  • $224k - $356.5k

     ...how you can make a lasting impact on the world. As a Senior Developer Relations Manager for Data Platforms, you’ll work with our most strategic...  ...offerings and alignment at the business, product, and engineering levels. Build relationships with executive and technical... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $272k - $431.25k

     ...developers in the semiconductor ecosystem. The Industrial Engineering organization is a strong, growing, and visible group both inside...  ..., SK Hynix, TSMC, Qualcomm, Intel, etc. with 5+ years people management experience. ~ MS/PhD in Electrical Engineering, Computer... 
    Senior

    NVIDIA

    Santa Clara, CA
    8 hours ago
  •  ...A leading construction recruitment firm is looking for a Senior Project Manager for doors, frames, and hardware projects. The role entails leading multiple commercial projects, ensuring client satisfaction, and maintaining profitability. Candidates should have over 7... 
    Senior
    Full time
    Remote work

    Solid Rock Recruiting LLC

    Santa Clara, CA
    8 hours ago
  •  ...A leading global supply chain services provider is seeking a Key Account Executive to manage and grow accounts for top companies. This fully remote position requires strong consultative sales skills, a deep understanding of client needs, and the ability to build relationships... 
    Senior
    Remote work

    Arrow Electronics

    Santa Clara, CA
    8 hours ago
  • $196k - $310.5k

     ...accelerated computing technology across various industries Develop key messages that resonate with external audiences Pitch and manage stories with reporters to secure impactful press coverage Nurture strong long-term relations with business, technology and trade... 
    Senior

    NVIDIA

    Santa Clara, CA
    1 day ago
  • A leading tech company is seeking an SAP Test Manager to oversee comprehensive testing activities for an SAP upgrade project. The ideal...  ..., and ensuring compliance with industry standards. This senior role offers competitive compensation in a dynamic and collaborative... 
    Senior

    TechDigital Group

    Santa Clara, CA
    2 days ago
  • $210k - $270k

    Zocdoc is seeking a Senior Site Reliability Engineer to develop and maintain distributed production systems. The ideal candidate will have over 5 years of experience in site reliability or production engineering, particularly in cloud environments like AWS. Responsibilities... 
    Senior

    GoTo Meeting

    Palo Alto, CA
    5 days ago
  • $232k - $368k

    NVIDIA AI is looking for a Senior Manager to lead the Silicon Co-Design Group in Santa Clara, California. This role includes planning post-silicon feature integration, leading technical teams, and building operational rigor. The ideal candidate will have over 12 years... 
    Senior

    NVIDIA AI

    Santa Clara, CA
    5 days ago
  • $232k - $368k

    NVIDIA Corporation is seeking a Senior Manager for the System Integration in the Silicon Co-Design Group based in Santa Clara, CA. The...  ...excellent leadership skills, and a strong background in electrical engineering. Competitive salary between $232,000 - $368,000 annually,... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    3 days ago
  • $188k - $275k

     ...What You'll Do: The Observability Engineering organization at CoreWeave is responsible...  ...telemetry pipelines, and observability reliability, enabling teams to detect issues quickly...  ...the role: CoreWeave is seeking a Senior Manager, Observability Engineering to lead a team... 
    Senior
    Permanent employment
    Temporary work
    Casual work
    Work at office
    Remote work
    Flexible hours

    CoreWeave

    Sunnyvale, CA
    29 days ago
  • $184k - $287.5k

     ...We are looking for a Senior Developer Relations Manager to drive strategic technical teamwork with leading Agentic AI companies building the next...  ...building agents, model fine-tuning, tool calling, and context engineering, combined with a strategic understanding of the rapidly... 
    Senior
    Work experience placement

    NVIDIA

    Santa Clara, CA
    8 hours ago
  • $207k - $300k

    Site Reliability Engineering Manager, Google Distributed Cloud Google Sunnyvale, CA, USA Bachelor’s degree in Computer Science, a related field, or equivalent practical experience. 8 years of experience building or managing distributed systems or cloud infrastructure... 
    Full time

    Google Inc.

    Sunnyvale, CA
    5 days ago
  • $160k - $220k

    A leading vehicle intelligence company in California is seeking a Solutions Engineering Manager to lead technical pre-sales for global customers. This role demands expertise in automotive software development and customer engagement, with a focus on closing deals. Ideal... 
    Senior
    Flexible hours

    Applied Intuition

    Sunnyvale, CA
    3 days ago
  • $130k - $160k

    DeWinter Group is seeking a Senior Financial Reporting & Technical Accounting Lead in Sunnyvale, CA. This role involves architecting...  ...expansion, leading GAAP-compliant financial statement preparation, and managing audit relationships. The ideal candidate has 3-6 years of... 
    Senior

    DeWinter Group

    Sunnyvale, CA
    1 day ago
  • $232k - $368k

    Nvidia Corporation in Santa Clara is seeking a System Integration Lead to manage and resolve critical silicon issues before production. The role involves leading a team focused on delivering high-quality silicon, developing strategies to keep programs on schedule, and... 
    Senior

    Nvidia Corporation

    Santa Clara, CA
    4 days ago
  • $207k - $300k

    Google Inc. is looking for a Staff Software Engineer specializing in Site Reliability Engineering in Sunnyvale, CA. This role combines software and systems engineering to build and manage distributed systems, ensuring high reliability and uptime. The ideal candidate should... 
    Senior

    Google Inc.

    Sunnyvale, CA
    2 days ago
  • $200k - $322k

    A leading technology company is looking for a Senior Manager of Site Reliability Engineering in California. The role involves managing the full lifecycle of IT operations, transforming incident response through AI, and leading a high-performing team. The ideal candidate... 
    Senior
    Full time

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • Lockheed Martin in Sunnyvale, California is looking for a Senior Manager to lead technologies that support national defense, particularly in optics and electro-optics. This role entails overseeing R&D programs, managing budgets, and collaborating with various teams within... 
    Senior

    Lockheed Martin

    Sunnyvale, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Manager, Site Reliability Engineering. Be the first to apply!