Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior DevOps Engineer, AIOPs

$148k - $235.75k

NVIDIA

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.

Join our team of innovative engineers who are building an AI Data Center AIOps platform that turns raw, high-volume telemetry into reliable, job-centric insights and automation for GPU fleets. We’re hiring a DevOps Engineer to operate the platform itself (not the compute cluster): uptime, performance, data integrity, and safe change management. You’ll own SLOs/SLIs, incident response, and postmortems for the telemetry ingestion, processing, storage, and APIs/dashboards that operators depend on. You’ll partner Software Engineering and Systems Engineering team to translate platform signals into actionable, trustworthy alerts and automation.

What you'll be doing:

  • Continuously monitor platform health via dashboards/logs/metrics, automate recurring checks, and keep reliability + resource efficiency on track.

  • Own Kubernetes deployments end-to-end (runbooks, canary checks, post-deploy validation), and lead rollbacks/remediations when needed.

  • Lead first-level incident triage: collect diagnostics, identify likely root causes, and hand off clear, actionable findings to engineering.

  • Build and maintain runbooks/SOPs/checklists, pushing continuous improvement through automation.

  • Manage deployment infrastructure and packaging (Helm + Terraform/IaC) to keep environments scalable, consistent, and reproducible.

  • Contribute in adjacent functional areas to grow and help your team members!

What we need to see:

  • BS/MS in CS/CE (or equivalent experience) and 5+ years operating production distributed systems as SRE/DevOps/Platform Ops.

  • Proven ownership of reliability for an observability/AIOps platform: SLOs/SLIs, on-call, addressing incidents, and follow-up evaluations that drive measurable improvements.

  • Deep Kubernetes + containers experience (deploying, debugging, scaling) for telemetry-heavy microservices—ingestion, processing, storage, APIs, and UI.

  • Automation-first approach: solid scripting (Python/Bash), CI/CD, and infrastructure-as-code (Terraform + Helm) to deliver safe rollouts (canaries/rollbacks), reproducible environments, and minimal toil.

  • Clear communicator who writes excellent runbooks/docs and can translate ambiguous requirements into concrete operational practices and dependable customer-facing reliability.

Ways to stand out from the crowd:

  • Strong Linux + networking fundamentals, distributed systems instincts, and hands-on ops for Kubernetes/services/streaming stacks are ideal; bonus for experience with observability platforms at scale.

  • Experience building safe automation that operators trust: canary releases, automated rollback criteria, “monitoring for the monitoring” (lag/drop/error budgets), and replay/backfill pipelines with correctness checks.

  • Strong in distributed/streaming systems operations (Kafka/Pulsar, Flink/Spark, ClickHouse/Elastic/TSDBs, object storage)—and can reason about backpressure, hotspots, and failure domains end-to-end.

  • Proven programming experience building automation tools or services — ideally in Python, or similar languages — to simplify operations and scale recurring processes.

  • Proven experience running large‑scale production deployments and multiple Kubernetes environments or clusters across teams or customers, coordinating changes and rollouts with minimal disruption with hands‑on experience with observability tools — you know your way around dashboards, metrics, logs, and traces using platforms like Prometheus, Grafana, or similar.

With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us and, due to unprecedented growth, our exclusive engineering teams are rapidly growing. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 176,000 USD - 276,000 USD for Level 4.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until May 16, 2026.

This posting is for an existing vacancy. 

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the Senior DevOps Engineer, AIOPs in Santa Clara, CA vacancy
  •  ...Senior DevOps Engineer Location: Sunnyvale, CA Onsite position Fulltime position JD: Must Have Skills: AWS, EKS, IAM, S3, Kubernetes, Kustomize, Flux, Crossplane, CRDs, Python, Github, Kafka, Linux, Trino Strong... 
    Senior
    Full time

    SARIAN Co

    Sunnyvale, CA
    3 days ago
  •  ...practices for each phase. Requirements Bachelor’s degree or above, with 5+ years of experience in system operations and DevOps engineering; experience supporting mobile advertising platforms is preferred. Proficient in Golang and Vue.js development; experience... 
    Senior

    Mintegral

    Sunnyvale, CA
    3 days ago
  • Stellar IT Solutions LLC is seeking a Senior DevOps Engineer in Santa Clara, CA to design, build, and scale infrastructure for a site-builder/network automation platform. This role involves transitioning the platform to a production-ready DevOps model, focusing on CI/CD... 
    Senior
    Long term contract

    Stellar IT Solutions LLC

    Santa Clara, CA
    1 day ago
  •  ...Senior Cloud/DevOps Engineer Location: Sunnyvale, CA Experience: 10 Duration: 6 Months Please mention the current location, DL location, and visa status. Only U.S. Citizens and Green Card Holders. Must have skills: NGINX, Zero Trust Networking, AWS, Kubernetes... 
    Senior

    Argyle Infotech

    Sunnyvale, CA
    4 days ago
  • $139k - $257.55k

     ...Job Title SRE / AI Platform DevOps Engineer Role Description We are seeking a hands-on Senior DevOps Engineer specializing in AI Ops to own infrastructure provisioning...  ...time monitoring and operational insights. 5. AIOps Platform Integration 7. Production... 
    Senior
    Temporary work
    Local area
    Worldwide

    Adobe

    San Jose, CA
    2 days ago
  •  ...Looper, Kubernetes, or Concord. Collaborate with developers, QA, DevOps, and product teams to ensure high-quality and timely releases....  ...on release progress, risks, and dependencies. Mentor junior engineers and promote best practices in release engineering and... 
    Senior

    Tranzeal

    Sunnyvale, CA
    4 days ago
  • $155k - $230k

    A leading cybersecurity company in Santa Clara is seeking a Senior/Staff Software Engineer to provide technology leadership in their DevOps Team. You'll design and manage resilient infrastructures and implement CI/CD pipelines while mentoring junior engineers. The ideal... 
    Senior

    Fortanix Inc.

    Santa Clara, CA
    2 days ago
  • $184k - $287.5k

     ...We are seeking a highly skilled and experienced Senior DevOps Engineer to join NVIDIA’s Robotics DevOps team! The ideal candidate will bring deep expertise in CI/CD infrastructure along with hands‑on experience supporting robotics software, including ROS 2–based systems... 
    Senior
    Night shift

    NVIDIA

    Santa Clara, CA
    2 days ago
  • $137.61k - $265.87k

     ...used to evaluate architectural tradeoffs and drive product decisions across future generations. We are seeking a Senior Infrastructure and DevOps Engineer to design and operate the infrastructure behind large-scale modeling, simulation, and data analysis workflows. You... 
    Senior
    Internship
    Local area
    Immediate start
    Shift work

    Intel

    Santa Clara, CA
    9 hours ago
  •  ...Experience : 8-15 Job Type : Contractual Overview Job Location: Santa Clara, CA Job Type: Long Term Contract Summary Seeking a Senior DevOps Engineer to design, build, and scale infrastructure for a site‑builder/network automation platform. This role focuses on CI/CD... 
    Senior
    Long term contract

    Stellar IT Solutions LLC

    Santa Clara, CA
    2 days ago
  •  ...Service. You will collaborate with architects and subject matter experts to design robust technical architectures while setting up DevOps processes and CI/CD pipelines. If you are passionate about leveraging your expertise in AEM and cloud solutions, this opportunity is... 
    Senior

    TechDigital Group

    Santa Clara, CA
    2 days ago
  • $198k - $260k

     ...200 repositories across three product lines, producing firmware that ships to automotive OEMs. We are looking for a Sr. Staff DevOps Engineer to own the delivery platform: CI/CD pipelines, release automation, artifact management, build tooling, and the instrumentation... 
    Senior
    Work at office
    Worldwide
    Flexible hours
    Shift work

    Sonatus

    Sunnyvale, CA
    4 days ago
  • $180k - $270k

    Pure Storage, Inc. in California is seeking a Senior Release Engineer to improve software release processes. This role requires over 5 years of experience with Python or Golang and a strong analytical skill set. The successful candidate will work in a fast-paced environment... 
    Senior
    Flexible hours

    Pure Storage, Inc.

    Santa Clara, CA
    4 days ago
  •  ...Senior Release Engineer, Hyperscale Line Of Business Santa Clara, California We're in an unbelievably exciting area of tech and are fundamentally reshaping the data storage industry. Here, you lead with innovative thinking, grow along with us, and join the smartest... 
    Senior
    Work at office
    Flexible hours

    Pure Storage

    Santa Clara, CA
    1 day ago
  •  ...Job description Company is helping our client find a Senior DevOps Engineer to provide follow-the-sun coverage for the ADAS line of business, ensuring platform stability, SLA compliance, and rapid incident response during West Coast business hours. In this role... 
    Senior

    Comrise

    Palo Alto, CA
    4 days ago
  • $100k - $300k

     ...Job Title: Senior DevOps Engineer Position Type: FTE Location: Palo Alto, CA Salary Range / Rate (Currency): $100,000 - $300,000 Job ID#: 158174 Job Summary (Responsibilities and Requirements): Responsibilities You will continue to develop... 
    Senior
    Work experience placement

    Intellipro Group

    Palo Alto, CA
    2 days ago
  •  ...of our fast‑growing, highly ambitious team you won’t just drive the future of AI—you’ll help define it. Role Overview Senior DevOps Engineer – architect and maintain the core infrastructure that powers our cutting‑edge AI solutions. Responsibilities Design,... 
    Senior
    Full time

    New Code Inc

    Palo Alto, CA
    2 days ago
  •  ...DevOps Engineer As a DevOps engineer, the candidate will be responsible for the development of tools that allows application engineers to focus just on coding/developing the application while everything else happens automatically in the background, including continuous... 
    Senior

    Professional Recruiters

    Santa Clara, CA
    2 days ago
  • $175k - $219k

     ...About this Role and The Mission We are in the critical 0 to 1 phase of building a new product. We are looking for a Senior DevOps Engineer who is a builder, not a maintainer. You will architect the foundational infrastructure for both our Backend services and... 
    Senior

    Drivemode

    Mountain View, CA
    4 days ago
  •  ...Senior DevOps Engineer TENEX is an AI-native, automation-first, built-for-scale Managed Detection and Response (MDR) provider. We are a force multiplier for defenders, helping organizations enhance their cybersecurity posture through advanced threat detection, rapid... 
    Senior

    TenEx

    San Jose, CA
    2 days ago
  • $150k - $200k

     ...team includes leaders with strong expertise in neuroscience and engineering from Stanford University. LVIS has been selected to be a...  ...transforming the neurology health care industry. LVIS is looking for a senior DevOps engineer who is responsible for commercial cloud-based big-... 
    Senior
    Full time
    Work at office

    LVIS

    Palo Alto, CA
    4 days ago
  • $183.99k - $269.54k

     ...Career Status: Professional Employment Type: Regular Full Time Career Level: T4PF-1 Original Posting Date: 04/06/26 Job Title: Senior DevOps Engineer Work Model: Fully Remote Purpose and Objective Ariba, Inc. seeks a Senior DevOps Engineer at our Palo Alto location... 
    Senior
    Full time
    Local area
    Remote work

    SAP SE

    Palo Alto, CA
    5 days ago
  •  ...Job Summary We are seeking a highly capable Senior DevOps Engineer / Platform Engineer to build, operationalize, and scale the infrastructure and deployment foundation for a strategic site-builder / network automation platform . This role will focus on creating... 
    Senior
    Local area

    Omni Inclusive

    Santa Clara, CA
    4 days ago
  • $145k - $165k

     ...San Jose, CA - 4 days in the office per week As a Sr. Dev Ops Engineer at Elekta, you should demonstrate superb technical competency,...  .... Responsibilities Build and maintain System/DevOps capabilities to support agile product development which includes... 
    Senior
    Work at office
    Flexible hours

    Elekta

    San Jose, CA
    4 days ago
  • $186.02k - $258.1k

     ...Interactive Entertainment, a wholly-owned subsidiary of Sony Group Corporation. Sony Interactive Entertainment LLC seeks a Senior DevOps Engineer in San Diego, CA to ensure the development, design, testing, and deployment of AEM assets and sites, as well as Adobe Marketing... 
    Senior
    Immediate start
    Remote work
    Work from home

    GrabJobs

    San Jose, CA
    3 days ago
  • An innovative AI solutions company is seeking a Senior DevOps Engineer to architect and maintain the core infrastructure supporting cutting-edge AI applications. The role involves designing scalable environments, collaborating with teams for seamless deployments, and championing... 
    Senior
    Full time
    Remote work
    Flexible hours

    New Code Inc

    Palo Alto, CA
    4 days ago
  •  ...Enterprise Technologies Inc. is a recognized provider of professional IT Consulting services in the US. We are actively seeking Sr Devops Engineer FullTime Rolefor one of our direct client. Sr DevOps Engineer Location: Remote (Anywhere in US) Full Time/ Direct Hire... 
    Senior
    Full time
    Remote work

    Rootshell Enterprise Technologies

    Santa Clara, CA
    4 days ago
  •  ...infrastructure controls Implement monitoring and logging (Datadog, Prometheus, etc.) Support high-availability architecture 6+ years DevOps / Cloud engineering AWS infrastructure (VPC, IAM, ECS/EKS, Lambda, etc.) Infrastructure-as-code (Terraform preferred) Experience supporting... 
    Senior
    Full time
    Contract work
    Remote work
    Flexible hours

    GrabJobs

    San Jose, CA
    4 days ago
  •  ...Job Title 12+ years in platform engineering, SRE, or DevOps. Experience with HPC clusters (Slurm, PBS, Grid Engine). Cloud infrastructure expertise (GCP/AWS preferred). Proficiency with Terraform, Ansible, Prometheus, Grafana, ELK. Strong Linux administration... 
    Senior

    Saxon Global

    Mountain View, CA
    3 days ago
  • $70 - $75 per hour

    Job Overview Pay Range: $70hr - $75hr Responsibilities: Multi-Cloud IaC: Architect and maintain modular, reusable infrastructure components across AWS and GCP using Terraform. Utilize Terragrunt to keep configurations DRY (Don't Repeat Themself) and manage...
    Senior
    Remote work

    Cynet Systems

    San Jose, CA
    4 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior DevOps Engineer, AIOPs. Be the first to apply!