Senior DevOps Engineer, AIOPs
$148k - $235.75kNVIDIA
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.
Join our team of innovative engineers who are building an AI Data Center AIOps platform that turns raw, high-volume telemetry into reliable, job-centric insights and automation for GPU fleets. We’re hiring a DevOps Engineer to operate the platform itself (not the compute cluster): uptime, performance, data integrity, and safe change management. You’ll own SLOs/SLIs, incident response, and postmortems for the telemetry ingestion, processing, storage, and APIs/dashboards that operators depend on. You’ll partner Software Engineering and Systems Engineering team to translate platform signals into actionable, trustworthy alerts and automation.
What you'll be doing:
Continuously monitor platform health via dashboards/logs/metrics, automate recurring checks, and keep reliability + resource efficiency on track.
Own Kubernetes deployments end-to-end (runbooks, canary checks, post-deploy validation), and lead rollbacks/remediations when needed.
Lead first-level incident triage: collect diagnostics, identify likely root causes, and hand off clear, actionable findings to engineering.
Build and maintain runbooks/SOPs/checklists, pushing continuous improvement through automation.
Manage deployment infrastructure and packaging (Helm + Terraform/IaC) to keep environments scalable, consistent, and reproducible.
Contribute in adjacent functional areas to grow and help your team members!
What we need to see:
BS/MS in CS/CE (or equivalent experience) and 5+ years operating production distributed systems as SRE/DevOps/Platform Ops.
Proven ownership of reliability for an observability/AIOps platform: SLOs/SLIs, on-call, addressing incidents, and follow-up evaluations that drive measurable improvements.
Deep Kubernetes + containers experience (deploying, debugging, scaling) for telemetry-heavy microservices—ingestion, processing, storage, APIs, and UI.
Automation-first approach: solid scripting (Python/Bash), CI/CD, and infrastructure-as-code (Terraform + Helm) to deliver safe rollouts (canaries/rollbacks), reproducible environments, and minimal toil.
Clear communicator who writes excellent runbooks/docs and can translate ambiguous requirements into concrete operational practices and dependable customer-facing reliability.
Ways to stand out from the crowd:
Strong Linux + networking fundamentals, distributed systems instincts, and hands-on ops for Kubernetes/services/streaming stacks are ideal; bonus for experience with observability platforms at scale.
Experience building safe automation that operators trust: canary releases, automated rollback criteria, “monitoring for the monitoring” (lag/drop/error budgets), and replay/backfill pipelines with correctness checks.
Strong in distributed/streaming systems operations (Kafka/Pulsar, Flink/Spark, ClickHouse/Elastic/TSDBs, object storage)—and can reason about backpressure, hotspots, and failure domains end-to-end.
Proven programming experience building automation tools or services — ideally in Python, or similar languages — to simplify operations and scale recurring processes.
Proven experience running large‑scale production deployments and multiple Kubernetes environments or clusters across teams or customers, coordinating changes and rollouts with minimal disruption with hands‑on experience with observability tools — you know your way around dashboards, metrics, logs, and traces using platforms like Prometheus, Grafana, or similar.
With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you.
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 176,000 USD - 276,000 USD for Level 4.You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until May 16, 2026.This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.- ...Senior DevOps Engineer Location: Sunnyvale, CA Onsite position Fulltime position JD: Must Have Skills: AWS, EKS, IAM, S3, Kubernetes, Kustomize, Flux, Crossplane, CRDs, Python, Github, Kafka, Linux, Trino Strong...SeniorFull time
- ...practices for each phase. Requirements Bachelor’s degree or above, with 5+ years of experience in system operations and DevOps engineering; experience supporting mobile advertising platforms is preferred. Proficient in Golang and Vue.js development; experience...Senior
- Stellar IT Solutions LLC is seeking a Senior DevOps Engineer in Santa Clara, CA to design, build, and scale infrastructure for a site-builder/network automation platform. This role involves transitioning the platform to a production-ready DevOps model, focusing on CI/CD...SeniorLong term contract
- ...Senior Cloud/DevOps Engineer Location: Sunnyvale, CA Experience: 10 Duration: 6 Months Please mention the current location, DL location, and visa status. Only U.S. Citizens and Green Card Holders. Must have skills: NGINX, Zero Trust Networking, AWS, Kubernetes...Senior
$139k - $257.55k
...Job Title SRE / AI Platform DevOps Engineer Role Description We are seeking a hands-on Senior DevOps Engineer specializing in AI Ops to own infrastructure provisioning... ...time monitoring and operational insights. 5. AIOps Platform Integration 7. Production...SeniorTemporary workLocal areaWorldwide- ...Looper, Kubernetes, or Concord. Collaborate with developers, QA, DevOps, and product teams to ensure high-quality and timely releases.... ...on release progress, risks, and dependencies. Mentor junior engineers and promote best practices in release engineering and...Senior
$155k - $230k
A leading cybersecurity company in Santa Clara is seeking a Senior/Staff Software Engineer to provide technology leadership in their DevOps Team. You'll design and manage resilient infrastructures and implement CI/CD pipelines while mentoring junior engineers. The ideal...Senior$184k - $287.5k
...We are seeking a highly skilled and experienced Senior DevOps Engineer to join NVIDIA’s Robotics DevOps team! The ideal candidate will bring deep expertise in CI/CD infrastructure along with hands‑on experience supporting robotics software, including ROS 2–based systems...SeniorNight shift$137.61k - $265.87k
...used to evaluate architectural tradeoffs and drive product decisions across future generations. We are seeking a Senior Infrastructure and DevOps Engineer to design and operate the infrastructure behind large-scale modeling, simulation, and data analysis workflows. You...SeniorInternshipLocal areaImmediate startShift work- ...Experience : 8-15 Job Type : Contractual Overview Job Location: Santa Clara, CA Job Type: Long Term Contract Summary Seeking a Senior DevOps Engineer to design, build, and scale infrastructure for a site‑builder/network automation platform. This role focuses on CI/CD...SeniorLong term contract
- ...Service. You will collaborate with architects and subject matter experts to design robust technical architectures while setting up DevOps processes and CI/CD pipelines. If you are passionate about leveraging your expertise in AEM and cloud solutions, this opportunity is...Senior
$198k - $260k
...200 repositories across three product lines, producing firmware that ships to automotive OEMs. We are looking for a Sr. Staff DevOps Engineer to own the delivery platform: CI/CD pipelines, release automation, artifact management, build tooling, and the instrumentation...SeniorWork at officeWorldwideFlexible hoursShift work$180k - $270k
Pure Storage, Inc. in California is seeking a Senior Release Engineer to improve software release processes. This role requires over 5 years of experience with Python or Golang and a strong analytical skill set. The successful candidate will work in a fast-paced environment...SeniorFlexible hours- ...Senior Release Engineer, Hyperscale Line Of Business Santa Clara, California We're in an unbelievably exciting area of tech and are fundamentally reshaping the data storage industry. Here, you lead with innovative thinking, grow along with us, and join the smartest...SeniorWork at officeFlexible hours
- ...Job description Company is helping our client find a Senior DevOps Engineer to provide follow-the-sun coverage for the ADAS line of business, ensuring platform stability, SLA compliance, and rapid incident response during West Coast business hours. In this role...Senior
$100k - $300k
...Job Title: Senior DevOps Engineer Position Type: FTE Location: Palo Alto, CA Salary Range / Rate (Currency): $100,000 - $300,000 Job ID#: 158174 Job Summary (Responsibilities and Requirements): Responsibilities You will continue to develop...SeniorWork experience placement- ...of our fast‑growing, highly ambitious team you won’t just drive the future of AI—you’ll help define it. Role Overview Senior DevOps Engineer – architect and maintain the core infrastructure that powers our cutting‑edge AI solutions. Responsibilities Design,...SeniorFull time
- ...DevOps Engineer As a DevOps engineer, the candidate will be responsible for the development of tools that allows application engineers to focus just on coding/developing the application while everything else happens automatically in the background, including continuous...Senior
$175k - $219k
...About this Role and The Mission We are in the critical 0 to 1 phase of building a new product. We are looking for a Senior DevOps Engineer who is a builder, not a maintainer. You will architect the foundational infrastructure for both our Backend services and...Senior- ...Senior DevOps Engineer TENEX is an AI-native, automation-first, built-for-scale Managed Detection and Response (MDR) provider. We are a force multiplier for defenders, helping organizations enhance their cybersecurity posture through advanced threat detection, rapid...Senior
$150k - $200k
...team includes leaders with strong expertise in neuroscience and engineering from Stanford University. LVIS has been selected to be a... ...transforming the neurology health care industry. LVIS is looking for a senior DevOps engineer who is responsible for commercial cloud-based big-...SeniorFull timeWork at office$183.99k - $269.54k
...Career Status: Professional Employment Type: Regular Full Time Career Level: T4PF-1 Original Posting Date: 04/06/26 Job Title: Senior DevOps Engineer Work Model: Fully Remote Purpose and Objective Ariba, Inc. seeks a Senior DevOps Engineer at our Palo Alto location...SeniorFull timeLocal areaRemote work- ...Job Summary We are seeking a highly capable Senior DevOps Engineer / Platform Engineer to build, operationalize, and scale the infrastructure and deployment foundation for a strategic site-builder / network automation platform . This role will focus on creating...SeniorLocal area
$145k - $165k
...San Jose, CA - 4 days in the office per week As a Sr. Dev Ops Engineer at Elekta, you should demonstrate superb technical competency,... .... Responsibilities Build and maintain System/DevOps capabilities to support agile product development which includes...SeniorWork at officeFlexible hours$186.02k - $258.1k
...Interactive Entertainment, a wholly-owned subsidiary of Sony Group Corporation. Sony Interactive Entertainment LLC seeks a Senior DevOps Engineer in San Diego, CA to ensure the development, design, testing, and deployment of AEM assets and sites, as well as Adobe Marketing...SeniorImmediate startRemote workWork from home- An innovative AI solutions company is seeking a Senior DevOps Engineer to architect and maintain the core infrastructure supporting cutting-edge AI applications. The role involves designing scalable environments, collaborating with teams for seamless deployments, and championing...SeniorFull timeRemote workFlexible hours
- ...Enterprise Technologies Inc. is a recognized provider of professional IT Consulting services in the US. We are actively seeking Sr Devops Engineer FullTime Rolefor one of our direct client. Sr DevOps Engineer Location: Remote (Anywhere in US) Full Time/ Direct Hire...SeniorFull timeRemote work
- ...infrastructure controls Implement monitoring and logging (Datadog, Prometheus, etc.) Support high-availability architecture 6+ years DevOps / Cloud engineering AWS infrastructure (VPC, IAM, ECS/EKS, Lambda, etc.) Infrastructure-as-code (Terraform preferred) Experience supporting...SeniorFull timeContract workRemote workFlexible hours
- ...Job Title 12+ years in platform engineering, SRE, or DevOps. Experience with HPC clusters (Slurm, PBS, Grid Engine). Cloud infrastructure expertise (GCP/AWS preferred). Proficiency with Terraform, Ansible, Prometheus, Grafana, ELK. Strong Linux administration...Senior
$70 - $75 per hour
Job Overview Pay Range: $70hr - $75hr Responsibilities: Multi-Cloud IaC: Architect and maintain modular, reusable infrastructure components across AWS and GCP using Terraform. Utilize Terragrunt to keep configurations DRY (Don't Repeat Themself) and manage...SeniorRemote work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior DevOps Engineer, AIOPs. Be the first to apply!
- devops aws developer (remote) Santa Clara, CA
- devops engineer sre Santa Clara, CA
- senior devops cloud engineer Santa Clara, CA
- senior devops engineer Santa Clara, CA
- senior devops engineer remote Santa Clara, CA
- devops engineer full time Santa Clara, CA
- big data devops engineer Santa Clara, CA
- devops engineer Santa Clara, CA
- devops cloud engineer Santa Clara, CA
- senior development executive Santa Clara, CA

