Senior SRE: Scalable Systems & Observability Engineer
NVIDIA Corporation
NVIDIA Corporation is looking for a Senior Systems Software Engineer (SRE) in Santa Clara, California. This role focuses on designing, building, and maintaining large-scale production systems using various engineering practices. The ideal candidate should have extensive experience in infrastructure automation and distributed systems. Key responsibilities include ensuring GPU cloud services run with maximum reliability, participating in service lifecycles, and leveraging automation for efficiency. Join NVIDIA to work in a diverse environment promoting collaboration and continuous learning. #J-18808-Ljbffr NVIDIA Corporation
$126k - $204.5k
Palo Alto Networks, Inc. is seeking a skilled DevOps/SRE engineer to join their Cortex team in Santa Clara, California. This role involves... ...large-scale GCP environments and requires expertise in observability tools such as Thanos, Prometheus, and Grafana. The ideal candidate...Senior$176k - $333.5k
NVIDIA Corporation in Santa Clara is seeking a Site Reliability Engineer (SRE) to design and maintain large-scale production systems focusing on reliability and observability. Candidates should have a BS in Computer Science or related field and 8+ years' experience in...Senior- A leading technology company is seeking a Senior System Software Engineer for Cloud in Santa Clara, CA. This role involves designing and deploying scalable cloud-based solutions for a cloud gaming service. The ideal candidate will have extensive experience with programming...Senior
$224k - $431.25k
NVIDIA Gruppe is seeking a Senior System Software Engineer for Cloud in Santa Clara, California. The role involves designing and building scalable cloud solutions for GeForce NOW. Candidates should have extensive experience with Java, Golang, and Kubernetes, along with...Senior- donato technologies is seeking a Senior SRE / DevOps Engineer in Sunnyvale, CA. The successful candidate will focus on ensuring system reliability and scalability while automating operations across all teams. Candidates should have over 8 years of experience in DevOps,...Senior
- Title: Sr. SRE / DevOps Engineer Location: Sunnyvale, CA Job Summary - For this role,... ...vital role in ensuring that the systems are reliable, scalable, and high performing. Responsibilities... ...knowledge of monitoring and observability tools: Apache Splunk. Knowledge...SeniorLocal area
$176k - $276k
Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination... ...aspects of large scale Observability & Telemetry collection platform with...Senior$224k - $356.5k
At NVIDIA, our Financial Systems Engineering team is at the heart of ensuring that our massive... ...Design: Design, deploy, and maintain scalable software services that ensure transactional... ...including Kubernetes, Docker, CI/CD, observability, and reliability engineering. Your...Senior$152k - $241.5k
NVIDIA Gruppe in Santa Clara is seeking a Senior Software Engineer to enhance their HPC infrastructure. The role involves applying distributed systems patterns, automation, and building scalable services in a hybrid multi-cloud environment. Candidates should have strong...Senior- A leading technology company is looking for a Java SRE Engineer to support large-scale cloud migrations and production systems on AWS and Kubernetes. You will lead migrations, design robust AWS EKS platforms, and implement deployment strategies. The ideal candidate has...Senior
- Nuro, based in Mountain View, is seeking senior engineers to build and scale its large-scale computing infrastructure. The role involves designing scalable frameworks and collaborating closely with teams to develop tools and APIs for business-critical applications. The...Senior
- ...our team of innovative engineers who are building this... ...Software Engineering and Systems Engineering team to... ...to keep environments scalable, consistent, and reproducible... ...systems as SRE/DevOps/Platform Ops. Proven... ...of reliability for an observability/AIOps platform: SLOs/SLIs...Senior
$200k - $322k
Senior Manager, Site Reliability Engineering page is loaded## Senior Manager, Site... ...to build AI-powered systems that enhance reliability... ...model using observability, AI insights, and orchestration... ...execution with an SRE attitude,... ...introduce innovative, scalable approaches.* A...Senior$208k - $333.5k
Systems Engineering is an engineering discipline focused on building, automating, and operating... ...containerized platforms, storage, telemetry, and observability. Systems engineers are highly... .... A core part of this work is an SRE mindset: eliminating manual toil through...SeniorFlexible hours$184k - $356.5k
NVIDIA Corporation is seeking a Senior Systems Software Engineer based in Santa Clara, California. The ideal candidate will have deep experience... ...distributed systems and a strong background in performance and scalability. This role involves driving performance characterization...Senior- Netflix, Inc. is seeking a Senior Systems Development Engineer to build and maintain high-performance storage systems supporting creative teams globally. The role requires at least 5 years of experience with enterprise storage platforms like Dell PowerScale and NetApp,...Senior
$181.1k - $318.4k
Senior Site Reliability Engineer, Storage SRE / Apple Services Engineering Cupertino, California, United States... ...development of platform-wide tooling, observability, and operational practices that... ...Identify and eliminate systemic sources of toil and instability across...SeniorRelocation$128.7k - $261.3k
Senior System Performance Engineer on GM’s AV System Performance Team - responsible for designing, building, and optimizing reliable, high-performance... ..., and methodologies that support efficient and scalable AV software development. Evaluate and prototype new tools...SeniorLocal areaRemote workFlexible hours- Cohesity, a leader in AI-powered data security, seeks a Senior Engineering Manager in Santa Clara, CA. You will lead teams to design and build large-scale systems while mentoring developers and driving product vision. The ideal candidate has over 12 years of software engineering...SeniorFull timeRemote work
- Java SRE Engineer Onsite San Francisco Bay Area Infrastructure Engineer... ...migrations and production systems on AWS and Kubernetes... ...ensure system reliability and scalability Drive architectural... ...and Kafka Familiarity with observability tools (Prometheus, Grafana,...
- ...We are looking for a Senior Software Engineer to help build NeMo Platform, NVIDIA... ..., and operating AI systems at scale. This role will focus... ...infrastructure for observing behavior, measuring progress... ...understanding of reliability, scalability, security, and performance...Senior
- Moveworks is seeking a Machine Learning Engineer in Mountain View, California, to design and optimize scalable ML infrastructure for large language models. This pivotal role requires collaboration with cross-functional teams to enhance AI product scalability. The ideal...Senior
$207k - $300k
Google Inc. is seeking a Software Engineer in Sunnyvale, CA, to develop cutting-edge technologies for serving Large Language Models. This critical role focuses on performance, scalability, and resource efficiency. The ideal candidate will have extensive experience in software...SeniorFull time$147.4k - $272.1k
...powerful, usable features and systems. We work with the latest AI... ...capabilities Create robust, scalable architectures for systems that... ...help make complex AI systems observable, understandable and debuggable... ...with API design, both for other engineers to use, but also for AI...SeniorRelocation- ...company is seeking passionate developers to join their dynamic Engineering teams. In this role, you will design and implement... ...world-class APIs and contributing to large-scale distributed systems. If you are eager to make a difference and thrive in a collaborative...Senior
- ...obsessed**, and results-oriented Senior Product Manager to drive the... ...services** that empower our engineering teams. This role, as part of... ...make a measurable impact on system reliability and developer... ..., Platform Engineering, SRE, Observability, or a related technical field...Local areaFlexible hours
$200k - $322k
We are seeking a self‑motivated senior engineer for the Aerial Omniverse Digital Twin team. This... ...numbers of emulated devices, across systems of potentially thousands of interconnected... ...physical fidelity, latency, and system scalability. What we need to see: PhD in high‑...Senior- Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel... ...are seeking a versatile and experienced engineer to join our SOTA Training Platform team.... ...levels of performance, efficiency, and scalability for AI applications. Responsibilities...SeniorInternship
$171k - $231.5k
...looking for a creative and enthusiastic Senior Design System Engineer to join our Design Technology group.... ...be expected to architect highly scalable, performant component libraries while... ...adoption (including testing and observability) to rigorously optimize component rendering...Senior- ...programming skills, and a passion for mentoring team members. This role emphasizes collaboration across teams, and entails designing scalable components while utilizing advanced cloud technologies and containerization tools. Competitive compensation and benefits are...Senior
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior SRE: Scalable Systems & Observability Engineer. Be the first to apply!
- site reliability engineer Santa Clara, CA
- site reliability engineer sre Santa Clara, CA
- visual systems engineer Santa Clara, CA
- system engineer contract Santa Clara, CA
- application system engineer Santa Clara, CA
- system test engineer Santa Clara, CA
- senior windows systems engineer Santa Clara, CA
- system performance engineer Santa Clara, CA
- senior staff systems engineer Santa Clara, CA
- director systems engineering Santa Clara, CA

