Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Senior Site Reliability Engineer / DevOps Engineer

Full-time

Prophet Town

Senior Site Reliability Engineer (SRE) / DevOps Engineer

Location: Onsite - Mountain View, CA

Experience Required: 5+ years

Infrastructure Footprint: Global production infrastructure across AWS, South America, and Europe

Role Type: Hands-on engineering role

Role Overview

Seeking a Senior Site Reliability Engineer / DevOps Engineer to design, scale, and operate highly available global infrastructure supporting production systems across multiple international regions.

This role is for an engineer with 5+ years of experience building and running production-grade cloud infrastructure. The right person understands where distributed systems fail and has learned the hard lessons that come from operating Kubernetes and cloud platforms at scale.

The ideal candidate has deep hands-on experience with Kubernetes, ArgoCD, Terraform, CI/CD pipelines, AWS infrastructure, and multi-region platform reliability. They should understand the limitations, sharp edges, and operational failure modes of these tools.

This is an onsite role working closely with platform engineering and leadership to build resilient global infrastructure.

What You’ll Do

Global Infrastructure Architecture

  • Design and operate globally distributed production infrastructure across AWS regions and physical data center environments in South America and Europe
  • Build highly available multi-region systems with strong disaster recovery and failover strategies
  • Solve cross-region networking, latency, DNS routing, replication, and reliability challenges

Kubernetes Platform Engineering

  • Build, scale, secure, and troubleshoot production Kubernetes clusters
  • Handle cluster lifecycle management, upgrades, node failures, networking issues, storage problems, and control-plane troubleshooting
  • Tune workloads for resiliency, scheduling efficiency, autoscaling behavior, and resource optimization
  • Debug real-world Kubernetes issues, including:
    • etcd instability
    • networking overlays and CNI failures
    • ingress/controller edge cases
    • persistent volume failures
    • node pressure and eviction behavior
    • cluster upgrade regressions

GitOps / ArgoCD Operations

  • Design and maintain GitOps workflows using ArgoCD
  • Manage promotion pipelines across environments and regions
  • Resolve drift detection issues, sync conflicts, reconciliation failures, and deployment ordering challenges
  • Build safe rollback and progressive deployment strategies

Candidates should know why ArgoCD breaks, not just how to click “Sync.”

Infrastructure as Code

  • Build and maintain reusable Terraform modules for multi-region infrastructure
  • Manage state strategy, workspace isolation, secrets handling, and provider complexity
  • Solve real-world Terraform pain points, including:
    • state corruption and locking conflicts
    • module version drift
    • provider upgrade regressions
    • dependency graph surprises
    • cross-account provisioning complexity

CI/CD Engineering

  • Build and optimize production CI/CD pipelines
  • Improve deployment speed, safety, and repeatability
  • Troubleshoot flaky pipelines, artifact inconsistencies, race conditions, environment drift, and rollback failures

Reliability & Observability

  • Establish SLIs/SLOs and production health standards
  • Build alerting, monitoring, tracing, and incident response workflows
  • Lead root cause analysis and postmortem improvements
  • Reduce operational toil through automation

Why This Role

You’ll own foundational infrastructure decisions for globally distributed systems and help build resilient platform capabilities at international scale.

This is a hands-on engineering role for someone who wants meaningful ownership and complex technical problems.

Requirements

Required Experience

  • 5+ years in Site Reliability Engineering, DevOps, or Platform Engineering

  • Deep production experience with:

    • Kubernetes

    • ArgoCD

    • Terraform

    • AWS

    • CI/CD systems

    • Linux systems administration

    • Infrastructure automation

Preferred Experience

  • Experience operating infrastructure across multiple continents
  • Experience with hybrid cloud or physical data center integration
  • Strong networking knowledge, including BGP, VPNs, routing, DNS, and load balancing
  • Experience with security hardening and compliance in production systems
  • Software engineering background with Go, Python, or Bash

What “Senior” Means Here

You have enough production experience to have strong opinions because you have seen failures firsthand.

You know:

  • why Terraform plans sometimes lie

  • why ArgoCD syncs can fail for non-obvious reasons

  • why Kubernetes upgrades can ruin your week

  • why “works in staging” means very little

  • why multi-region failover diagrams often fail in production

  • why observability usually breaks exactly when needed most

You’ve solved these problems repeatedly and improved systems because of those lessons.

Vacancy posted 20 days ago
Similar jobs that could be interesting for youBased on the Senior Site Reliability Engineer / DevOps Engineer in Mountain View, CA vacancy
  •  ...genuine value for both sides. Founded by ex-Meta product and engineering leaders, we've raised over $30M in total capital from...  ...conversation, and commerce converge. The Role We're looking for a Senior Site Reliability Engineer to own the reliability, scalability, and... 
    Senior
    Full time
    Shift work

    Nectar

    Palo Alto, CA
    2 days ago
  •  ...send me a copy of your updated resumes Title: Sr. SRE / DevOps Engineer Location: Sunnyvale, CA (Only Local candidate) Client...  .../ DevOps Engineer at Sunnyvale, California location. As Site Reliability Engineer, the individual will work closely with multi-functional... 
    Senior
    Full time
    Local area
    Immediate start

    Donato Technologies Inc

    Sunnyvale, CA
    a month ago
  •  ...Senior DevOps Engineer Location: Sunnyvale, CA Onsite position Fulltime position JD: Must Have Skills: AWS, EKS, IAM, S3, Kubernetes, Kustomize, Flux, Crossplane, CRDs, Python, Github, Kafka, Linux, Trino Strong... 
    Senior
    Full time

    SARIAN Co

    Sunnyvale, CA
    7 hours ago
  • An innovative AI solutions company is seeking a Senior DevOps Engineer to architect and maintain the core infrastructure supporting cutting...  ...seamless deployments, and championing best practices in system reliability. Ideal candidates should have over 7 years of experience... 
    Senior
    Full time
    Remote work
    Flexible hours

    New Code Inc

    Palo Alto, CA
    6 days ago
  •  ...Senior Release Engineer This is a rare opportunity to join one of America's most beloved eCommerce companies as a Senior Release Engineer. This...  ...pipelines. Specific experience deploying large scale web sites/products Experience deploying cloud based apps Strong... 
    Senior

    Black Swan Search

    Mountain View, CA
    1 day ago
  • $100k - $300k

     ...Job Title: Senior DevOps Engineer Position Type: FTE Location: Palo Alto, CA Salary Range / Rate (Currency): $100,000 - $300,0...  ...infrastructure (GCP, Docker, Terraform, etc.). Develop reliability and observability strategies to ensure system performance... 
    Senior
    Work experience placement

    Intellipro Group

    Palo Alto, CA
    4 days ago
  •  ...Job description Company is helping our client find a Senior DevOps Engineer to provide follow-the-sun coverage for the ADAS line of business...  ..., scaling), and platform teams to maintain uptime, reliability, and operational excellence across multiple production environments... 
    Senior

    Comrise

    Palo Alto, CA
    2 days ago
  •  ...the future of AI—you’ll help define it. Role Overview Senior DevOps Engineer – architect and maintain the core infrastructure that powers...  ...deployments. Champion best practices: own system reliability, automation, security, and infrastructure architecture.... 
    Senior
    Full time

    New Code Inc

    Palo Alto, CA
    4 days ago
  • $175k - $219k

     ...to 1 phase of building a new product. We are looking for a Senior DevOps Engineer who is a builder, not a maintainer. You will architect the...  ...Android), ensuring our developers can ship code instantly and reliably. This is not a role where you wait for a ticket or perfect... 
    Senior

    Drivemode

    Mountain View, CA
    1 day ago
  •  ...Looper, Kubernetes, or Concord. Collaborate with developers, QA, DevOps, and product teams to ensure high-quality and timely releases....  ...on release progress, risks, and dependencies. Mentor junior engineers and promote best practices in release engineering and... 
    Senior

    Tranzeal

    Sunnyvale, CA
    1 day ago
  • $150k - $200k

     ...strong expertise in neuroscience and engineering from Stanford University. LVIS has been...  ...care industry. LVIS is looking for a senior DevOps engineer who is responsible for...  ...Teamcity, Jenkins and ArgoCD or FluxCD) Site Reliability engineering practices with Monitoring... 
    Senior
    Full time
    Work at office

    LVIS

    Palo Alto, CA
    1 day ago
  •  ...Job Title 12+ years in platform engineering, SRE, or DevOps. Experience with HPC clusters (Slurm, PBS, Grid Engine). Cloud infrastructure expertise (GCP/AWS preferred). Proficiency with Terraform, Ansible, Prometheus, Grafana, ELK. Strong Linux administration... 
    Senior

    Saxon Global

    Mountain View, CA
    7 hours ago
  • $232k - $263k

     ...future of SaaS security! Sr. Staff Site Reliability Engineer As a Sr. Staff SRE at Obsidian ,...  ...will operate as a strategic partner to DevOps and Platform Engineering leadership, shaping...  ...roles ~3+ years operating at a senior or technical leadership level (Staff or... 
    Senior
    Full time
    Work from home
    Flexible hours

    Obsidian Security

    Palo Alto, CA
    21 days ago
  • $183.99k - $269.54k

     ...Career Status: Professional Employment Type: Regular Full Time Career Level: T4PF-1 Original Posting Date: 04/06/26 Job Title: Senior DevOps Engineer Work Model: Fully Remote Purpose and Objective Ariba, Inc. seeks a Senior DevOps Engineer at our Palo Alto location... 
    Senior
    Full time
    Local area
    Remote work

    SAP SE

    Palo Alto, CA
    2 days ago
  •  ...patients worldwide. We're a team of engineers, clinicians, and innovators united by one...  ...of Position We are seeking a Senior DevOps Engineer to join the software team within...  ...tooling that supports scalable, secure, and reliable data platforms and APIs. Essential... 
    Senior
    Local area
    Worldwide
    Flexible hours

    Intuitive

    Sunnyvale, CA
    4 days ago
  • $198k - $260k

     ...Senior Staff DevOps Engineer - CI/CD & Release Engineering At Sonatus, we're driving the transformation...  ...— your job is to make releases reliable, repeatable, and auditable. Artifact...  ...lunches, snacks, and beverages during on-site working days Wellness benefit... 
    Senior
    Work at office
    Worldwide
    Flexible hours
    Shift work

    Sonatus

    Sunnyvale, CA
    2 days ago
  • $200k - $322k

     ...supportive environment, where NVIDIANs are inspired to excel and make a profound global impact. NVIDIA is seeking a Senior Manager of Site Reliability Engineering to lead and reshape how IT operations function at scale. This role goes beyond traditional service management... 
    Senior

    NVIDIA

    Santa Clara, CA
    3 days ago
  • $105.05k - $161.8k

     ...Senior DevOps Engineer Innovation is in HP's DNA. From our origins in a Palo Alto garage in 1939, to our current position as one of the...  ...standards for best practices Ensure operational efficiency, reliability, and up-time of the systems/subsystems/software... 
    Senior
    Temporary work
    Local area
    Flexible hours

    HP Development Company, L.P.

    Palo Alto, CA
    3 days ago
  • $67k - $136.8k

     ...around this simple story. The opportunity As an FSO DevOps Engineer Senior Analyst, you’ll be based in our Service Delivery Center,...  ...practices, including CI/CD automation, infrastructure reliability, observability, and secure deployment patterns. You will contribute... 
    Senior
    Summer holiday
    Flexible hours

    EY

    Palo Alto, CA
    2 days ago
  •  ...vector database company for enterprise-grade AI. Founded by the engineers behind Milvus, the world’s most popular open-source vector...  ...you will do: Work at the intersection of development and site reliability. Creating SRE tools and systems, as well as supporting... 
    Senior
    Full time

    Zilliz

    Redwood City, CA
    a month ago
  • $158.9k - $238.3k

     ...Rubrik and are the first customers of the Engineering teams at Rubrik. Rubrik Corp IT is...  ...About the role: We are hiring a Senior DevOps Infrastructure Engineer to join the Infrastructure...  ...and DevOps, owning the automation, reliability, and scalability of private cloud... 
    Senior
    Permanent employment
    Local area

    Rubrik

    Palo Alto, CA
    3 days ago
  •  ...Job Description Job Description Site Reliability Engineer Onsite- Bay Area, CA Skills Relevant Skills and Experience What You’ll...  ...no customer interaction. Must-Have: ~4+ years in SRE, DevOps, or Infrastructure Engineering ~ Solid experience with GCP... 

    Amiri Recruiting

    Mountain View, CA
    1 day ago
  • $168.93k - $192.5k

     ...identity. To learn more, visit Role Overview We are seeking a Site Reliability Engineer to join our Core Platform Engineering organization. The SRE...  ...~3-5 years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering. ~2+ years of hands-on... 
    Full time
    Temporary work
    Work at office
    Remote work
    Flexible hours

    ID.me

    Mountain View, CA
    1 day ago
  • $168.93k - $192.5k

     ...identity. To learn more, visit Role Overview We are seeking a Site Reliability Engineer to join our Core Platform Engineering organization. The SRE...  ...~3-5 years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering. ~2+ years of hands-on... 
    Full time
    Temporary work
    Work at office
    Remote work
    Flexible hours

    ID.me

    Mountain View, CA
    21 days ago
  •  ...Senior Java Developer Location: Sunnyvale, CA (3 days onsite) Duration: Long term contract Additional Comments: Java hands...  ...years Experience worked in Java, webflex, Websocket, Redis, Git, Devops Ecosystem. Clear communication and ability to articulate the... 
    Senior
    Long term contract

    InterSources

    Sunnyvale, CA
    1 day ago
  • $100k - $170k

     ...applications on Amazon EKS with focus on reliability and performance Build and maintain...  ...procedures Collaborate directly with engineering teams to optimize application...  ...role: ~5+ years of experience in SRE, DevOps, or Platform Engineering roles ~ Proven... 
    Full time
    Work at office
    Immediate start
    Visa sponsorship

    eSpace

    Saratoga, CA
    1 day ago
  • $180k - $260k

     ...operations.  About the role We are seeking an experienced Senior/Staff Site Reliability Engineer to support the operation, monitoring, and scaling of our...  ...in a related role such as Site Reliability Engineer, DevOps Engineer, or Infrastructure Engineer. ~ Strong... 
    Senior
    Odd job
    Work at office
    Remote work

    Gatik

    Mountain View, CA
    more than 2 months ago
  • $106.1k - $199.3k

     ...stability of the block storage, the design and implementation of disaster recovery solutions, promote the improvement of service reliability, scalability and performance optimization, and guarantee system SLA. 3. Responsible for resource management and planning of block... 
    Senior
    Full time
    Relocation package

    Tencent

    Palo Alto, CA
    2 days ago
  •  ...design by customizing MES tool per business needs Education Requirements, Ideal Experience: Associate’s degree in Industrial Engineering or IT related field Minimum of 0-3 years’ relevant experience Experience in C#, Delphi desired Knowledge of the... 
    Full time
    Work at office

    Foxconn Industrial Internet - FII

    Sunnyvale, CA
    a month ago
  •  ...Job Description Job Description Forhyre is looking for engineers who can bring unique perspectives and innovative ideas to all areas...  ...evangelize cloud best practices while building a culture of reliability and observability Engage in and improve the end to end lifecycle... 
    Full time

    Forhyre

    Sunnyvale, CA
    a month ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Senior Site Reliability Engineer / DevOps Engineer. Be the first to apply!