Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior SRE: Scalable Systems & Observability Engineer

NVIDIA Corporation

NVIDIA Corporation is looking for a Senior Systems Software Engineer (SRE) in Santa Clara, California. This role focuses on designing, building, and maintaining large-scale production systems using various engineering practices. The ideal candidate should have extensive experience in infrastructure automation and distributed systems. Key responsibilities include ensuring GPU cloud services run with maximum reliability, participating in service lifecycles, and leveraging automation for efficiency. Join NVIDIA to work in a diverse environment promoting collaboration and continuous learning. #J-18808-Ljbffr NVIDIA Corporation

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Senior SRE: Scalable Systems & Observability Engineer in Santa Clara, CA vacancy
  • $126k - $204.5k

    Palo Alto Networks, Inc. is seeking a skilled DevOps/SRE engineer to join their Cortex team in Santa Clara, California. This role involves...  ...large-scale GCP environments and requires expertise in observability tools such as Thanos, Prometheus, and Grafana. The ideal candidate... 
    Senior

    Palo Alto Networks, Inc.

    Santa Clara, CA
    3 days ago
  • $176k - $333.5k

    NVIDIA Corporation in Santa Clara is seeking a Site Reliability Engineer (SRE) to design and maintain large-scale production systems focusing on reliability and observability. Candidates should have a BS in Computer Science or related field and 8+ years' experience in... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • A leading technology company is seeking a Senior System Software Engineer for Cloud in Santa Clara, CA. This role involves designing and deploying scalable cloud-based solutions for a cloud gaming service. The ideal candidate will have extensive experience with programming... 
    Senior

    NVIDIA

    Santa Clara, CA
    1 day ago
  • $224k - $431.25k

    NVIDIA Gruppe is seeking a Senior System Software Engineer for Cloud in Santa Clara, California. The role involves designing and building scalable cloud solutions for GeForce NOW. Candidates should have extensive experience with Java, Golang, and Kubernetes, along with... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago
  • donato technologies is seeking a Senior SRE / DevOps Engineer in Sunnyvale, CA. The successful candidate will focus on ensuring system reliability and scalability while automating operations across all teams. Candidates should have over 8 years of experience in DevOps,... 
    Senior

    donato technologies

    Sunnyvale, CA
    4 days ago
  • Title: Sr. SRE / DevOps Engineer Location: Sunnyvale, CA Job Summary - For this role,...  ...vital role in ensuring that the systems are reliable, scalable, and high performing. Responsibilities...  ...knowledge of monitoring and observability tools: Apache Splunk. Knowledge... 
    Senior
    Local area

    donato technologies

    Sunnyvale, CA
    4 days ago
  • $176k - $276k

    Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination...  ...aspects of large scale Observability & Telemetry collection platform with... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • $224k - $356.5k

    At NVIDIA, our Financial Systems Engineering team is at the heart of ensuring that our massive...  ...Design: Design, deploy, and maintain scalable software services that ensure transactional...  ...including Kubernetes, Docker, CI/CD, observability, and reliability engineering. Your... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • $152k - $241.5k

    NVIDIA Gruppe in Santa Clara is seeking a Senior Software Engineer to enhance their HPC infrastructure. The role involves applying distributed systems patterns, automation, and building scalable services in a hybrid multi-cloud environment. Candidates should have strong... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    2 days ago
  • A leading technology company is looking for a Java SRE Engineer to support large-scale cloud migrations and production systems on AWS and Kubernetes. You will lead migrations, design robust AWS EKS platforms, and implement deployment strategies. The ideal candidate has... 
    Senior

    EITACIES Inc.

    Santa Clara, CA
    14 hours ago
  • Nuro, based in Mountain View, is seeking senior engineers to build and scale its large-scale computing infrastructure. The role involves designing scalable frameworks and collaborating closely with teams to develop tools and APIs for business-critical applications. The... 
    Senior

    I did my part and supported the Regular Toilet

    Mountain View, CA
    3 days ago
  •  ...our team of innovative engineers who are building this...  ...Software Engineering and Systems Engineering team to...  ...to keep environments scalable, consistent, and reproducible...  ...systems as SRE/DevOps/Platform Ops. Proven...  ...of reliability for an observability/AIOps platform: SLOs/SLIs... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    14 hours ago
  • $200k - $322k

    Senior Manager, Site Reliability Engineering page is loaded## Senior Manager, Site...  ...to build AI-powered systems that enhance reliability...  ...model using observability, AI insights, and orchestration...  ...execution with an SRE attitude,...  ...introduce innovative, scalable approaches.* A... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $208k - $333.5k

    Systems Engineering is an engineering discipline focused on building, automating, and operating...  ...containerized platforms, storage, telemetry, and observability. Systems engineers are highly...  .... A core part of this work is an SRE mindset: eliminating manual toil through... 
    Senior
    Flexible hours

    Nvidia Corporation

    Santa Clara, CA
    4 days ago
  • $184k - $356.5k

    NVIDIA Corporation is seeking a Senior Systems Software Engineer based in Santa Clara, California. The ideal candidate will have deep experience...  ...distributed systems and a strong background in performance and scalability. This role involves driving performance characterization... 
    Senior

    NVIDIA Corporation

    Santa Clara, CA
    1 day ago
  • Netflix, Inc. is seeking a Senior Systems Development Engineer to build and maintain high-performance storage systems supporting creative teams globally. The role requires at least 5 years of experience with enterprise storage platforms like Dell PowerScale and NetApp,... 
    Senior

    Netflix, Inc.

    Los Gatos, CA
    1 day ago
  • $181.1k - $318.4k

    Senior Site Reliability Engineer, Storage SRE / Apple Services Engineering Cupertino, California, United States...  ...development of platform-wide tooling, observability, and operational practices that...  ...Identify and eliminate systemic sources of toil and instability across... 
    Senior
    Relocation

    Apple Inc.

    Cupertino, CA
    2 days ago
  • $128.7k - $261.3k

    Senior System Performance Engineer on GM’s AV System Performance Team - responsible for designing, building, and optimizing reliable, high-performance...  ..., and methodologies that support efficient and scalable AV software development. Evaluate and prototype new tools... 
    Senior
    Local area
    Remote work
    Flexible hours

    General Motors

    Sunnyvale, CA
    3 days ago
  • Cohesity, a leader in AI-powered data security, seeks a Senior Engineering Manager in Santa Clara, CA. You will lead teams to design and build large-scale systems while mentoring developers and driving product vision. The ideal candidate has over 12 years of software engineering... 
    Senior
    Full time
    Remote work

    Madrona Venture Labs

    Santa Clara, CA
    1 day ago
  • Java SRE Engineer Onsite San Francisco Bay Area Infrastructure Engineer...  ...migrations and production systems on AWS and Kubernetes...  ...ensure system reliability and scalability Drive architectural...  ...and Kafka Familiarity with observability tools (Prometheus, Grafana,... 

    EITACIES Inc.

    Santa Clara, CA
    14 hours ago
  •  ...We are looking for a Senior Software Engineer to help build NeMo Platform, NVIDIA...  ..., and operating AI systems at scale. This role will focus...  ...infrastructure for observing behavior, measuring progress...  ...understanding of reliability, scalability, security, and performance... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • Moveworks is seeking a Machine Learning Engineer in Mountain View, California, to design and optimize scalable ML infrastructure for large language models. This pivotal role requires collaboration with cross-functional teams to enhance AI product scalability. The ideal... 
    Senior

    Moveworks

    Mountain View, CA
    4 days ago
  • $207k - $300k

    Google Inc. is seeking a Software Engineer in Sunnyvale, CA, to develop cutting-edge technologies for serving Large Language Models. This critical role focuses on performance, scalability, and resource efficiency. The ideal candidate will have extensive experience in software... 
    Senior
    Full time

    Google Inc.

    Sunnyvale, CA
    4 days ago
  • $147.4k - $272.1k

     ...powerful, usable features and systems. We work with the latest AI...  ...capabilities Create robust, scalable architectures for systems that...  ...help make complex AI systems observable, understandable and debuggable...  ...with API design, both for other engineers to use, but also for AI... 
    Senior
    Relocation

    Apple Inc.

    Cupertino, CA
    1 day ago
  •  ...company is seeking passionate developers to join their dynamic Engineering teams. In this role, you will design and implement...  ...world-class APIs and contributing to large-scale distributed systems. If you are eager to make a difference and thrive in a collaborative... 
    Senior

    TechDigital Group

    Sunnyvale, CA
    1 day ago
  •  ...obsessed**, and results-oriented Senior Product Manager to drive the...  ...services** that empower our engineering teams. This role, as part of...  ...make a measurable impact on system reliability and developer...  ..., Platform Engineering, SRE, Observability, or a related technical field... 
    Local area
    Flexible hours

    GEICO

    Palo Alto, CA
    1 day ago
  • $200k - $322k

    We are seeking a self‑motivated senior engineer for the Aerial Omniverse Digital Twin team. This...  ...numbers of emulated devices, across systems of potentially thousands of interconnected...  ...physical fidelity, latency, and system scalability. What we need to see: PhD in high‑... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    1 day ago
  • Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel...  ...are seeking a versatile and experienced engineer to join our SOTA Training Platform team....  ...levels of performance, efficiency, and scalability for AI applications. Responsibilities... 
    Senior
    Internship

    Cerebras

    Sunnyvale, CA
    1 day ago
  • $171k - $231.5k

     ...looking for a creative and enthusiastic Senior Design System Engineer to join our Design Technology group....  ...be expected to architect highly scalable, performant component libraries while...  ...adoption (including testing and observability) to rigorously optimize component rendering... 
    Senior

    Intuit Inc.

    Mountain View, CA
    2 days ago
  •  ...programming skills, and a passion for mentoring team members. This role emphasizes collaboration across teams, and entails designing scalable components while utilizing advanced cloud technologies and containerization tools. Competitive compensation and benefits are... 
    Senior

    NVIDIA Gruppe

    Santa Clara, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior SRE: Scalable Systems & Observability Engineer. Be the first to apply!