DevOps Manager: AWS, Kubernetes, SRE Lead
AppZen
Requirements 8+ years of experience in DevOps, SRE, infrastructure, or platform engineering, with at least 2 years leading or managing engineers (formal or tech-lead capacity)
- Deep, hands-on AWS experience across compute, networking, IAM, data, and observability services; comfortable designing for multi-account, multi-region SaaS
- Strong production experience with Kubernetes (preferably EKS), including upgrades, autoscaling, and securing multi-tenant clusters
- Demonstrated hands on operations experience with PostgreSQL at scale — query and index tuning, replication, HA/failover, backups, and version upgrades — and with Elasticsearch / OpenSearch (cluster sizing, shard strategy, ingest tuning, and incident response)
- Working knowledge of additional datastores commonly used in SaaS: Redis, Kafka or other message brokers, and object storage; comfortable evaluating tradeoffs between managed services (RDS, Aurora, ElastiCache, MSK, OpenSearch Service) and self-managed options
- Proficient with Terraform and modern IaC patterns; clear opinions on module design, state management, and PR-driven workflows
- Solid scripting and automation skills in at least one of Python, Go, or Bash
- Track record of designing and operating CI/CD pipelines at scale (GitHub Actions, Jenkins, ArgoCD, or similar)
- Experience running production workloads under SOC 2 or comparable compliance frameworks; comfortable partnering with Security on audits and remediation
- Excellent communication and stakeholder skills; able to translate infrastructure tradeoffs into language product, finance, and customer teams understand
- (Desirable) Experience supporting AI/ML or data heavy SaaS workloads (GPU fleets, vector stores, large async pipelines)
- (Desirable) Familiarity with service mesh (Istio, Linkerd) and progressive delivery (Argo Rollouts, feature flags)
- (Desirable) Background scaling FinOps practices and managing cloud spend at $5M+ annual run-rate
- (Desirable) Experience operating multitenant SaaS with strict data isolation requirements for enterprise finance customers
- (Desirable) Exposure to multi-cloud or hybrid-cloud environments (Azure, GCP)
- You'll set technical direction, coach engineers, partner closely with Product Engineering and Security, and stay close enough to the work to tune a slow Postgres query, debug an Elasticsearch cluster under load, write Terraform, or review a Helm chart yourself
- This is a builder-manager role. We expect roughly 60% leadership and delivery management, and 40% hands-on technical contribution
- Manage, coach, and grow a team of 3-6 DevOps and platform engineers; own hiring, performance, growth plans, and 1:1s
- Set quarterly priorities aligned to engineering and business goals; communicate progress and risk clearly to leadership
- Build a healthy on-call culture: balanced rotations, blameless postmortems, and continuous reduction of toil
- Own the architecture, cost, and reliability of AppZen's AWS footprint across multiple regions and accounts
- Drive infrastructure-as-code standards using Terraform; champion modular, reviewable, version-controlled infrastructure
- Partner with Security and Compliance on SOC 2, ISO 27001, GDPR, and customer audit requirements; harden IAM, network, and secrets management
- Manage cloud spend: visibility, forecasting, and ongoing optimization (Savings Plans, rightsizing, multi-tenant efficiency)
- Hands on ownership of PostgreSQL in production: schema reviews, index and query tuning, vacuum/bloat management, replication, failover, point-in-time recovery, and major-version upgrades (RDS / Aurora)
- Run and scale Elasticsearch / OpenSearch clusters: shard and index design, JVM and heap tuning, snapshot strategy, hot-warm tiers, and incident response under heavy ingest or query load
- Operate supporting datastores such as Redis (caching, queues), Kafka or SQS/SNS (streaming and async), and S3-backed data lakes; define patterns for high availability, durability, and disaster recovery
- Partner with engineering on capacity planning, performance benchmarking, data tier cost optimization, backup/restore drills, and customer data isolation for multi-tenant workloads
- Operate and improve our EKS-based Kubernetes platform: cluster lifecycle, autoscaling, multi tenancy, and workload isolation
- Define golden paths for service teams using Helm, Kustomize, and GitOps tooling such as ArgoCD or Flux
- Set patterns for service mesh, ingress, and zero-downtime deployments
- Lead the design of internal developer platform capabilities so product teams can ship safely and quickly without infra friction
- Maintain and improve build, test, and deploy pipelines (e.g., GitHub Actions, Jenkins, ArgoCD); enforce supply-chain security and artifact provenance
- Drive measurable improvements in DORA metrics: lead time, deploy frequency, change failure rate, and MTTR
- Own the observability stack (e.g., Datadog, Prometheus, Grafana, OpenTelemetry); ensure consistent metrics, logs, and traces across services
- Define and operationalize SLOs and error budgets in partnership with service owners
- Lead incident command for high-severity events and convert learnings into durable systemic fixes
Vacancy posted 21 hours ago
Similar jobs that could be interesting for youBased on the DevOps Manager: AWS, Kubernetes, SRE Lead in San Francisco, CA vacancy
$127k - $249k
...an experienced Senior or Staff Engineer for their SRE, InfraSec team based in San Francisco. This role involves... ...guiding the security of cloud infrastructure, leading teams, and implementing security solutions across AWS, Azure, and GCP. Candidates should have over six years...Amazon Web ServiceFlexible hours- Lead Site Reliability Engineer — Scalable Financial Technology... ...redefining how B2B organizations manage accounts receivable,... ...Infrastructure Strong experience with AWS, Kubernetes (EKS), containerization (... ...relational databases. CI/CD & DevOps Experience working with...Amazon Web ServicePermanent employmentFull timeContract workTemporary workFlexible hours
$165k - $225k
...Senior Site Reliability Engineer to build and manage the infrastructure supporting engineering... ...ensure system reliability, scale the AWS/GCP infrastructure, and collaborate with... ...in cloud operations and experience with Kubernetes. Competitive salary range of $165,000 - $...Amazon Web Service- A technology firm is seeking a Lead Site Reliability Engineer to design and implement automated infrastructure and manage Kubernetes workloads. The role involves refining CI/CD pipelines and leading incident response efforts, requiring expertise in Terraform, Prometheus...Suggested
- E2B is a fast-growing Series A startup based in San Francisco, seeking an Infrastructure Engineer to manage Terraform and Kubernetes for AI agent sandboxes. Your role involves migrating to Kubernetes, building reusable components, and enhancing infrastructure observability...Suggested
- ...San Francisco, is looking for a Staff Site Reliability Engineer to lead the reliability, scalability, and observability strategies across... ...focus on distributed systems and cloud environments, preferably AWS. You will guide best practices, mentor engineers, and ensure...Amazon Web ServiceFlexible hours
$170k - $215k
...Reliability Engineer in San Francisco to manage site reliability processes, elevate their deployment confidence, and drive AWS infrastructure solutions. This role demands... ...software development skills and experience in SRE or DevOps roles. The ideal candidate will enjoy high...Amazon Web Service- A leading AI technology company based in San Francisco is looking for a seasoned Software Engineer with expertise in cloud architecture... ...possess 6+ years of experience with strong skills in Python or Go, AWS, and Terraform. This role offers a chance to make a significant...Amazon Web Service
- ...looking for an experienced Site Reliability Engineer (SRE) to join our team. The role involves designing and maintaining... ...in SRE or related roles, along with expertise in Kubernetes and cloud infrastructure (GCP or AWS). We offer comprehensive benefits and a fully remote...Amazon Web ServiceRemote job
$150k - $220k
TrueML is looking for a Senior Manager, DevOps to lead infrastructure and platform engineering efforts in... ...candidate will have 10+ years in DevOps/SRE, a Bachelor's degree in Computer Science, and expertise in AWS and Kubernetes. The position offers a salary range of $1...Amazon Web Service$175k - $210k
...Senior Manager, DevOps & SRE – Platform Reliability & Global Operations Location... ...platforms. This role leads a blended DevOps and SRE... ...DevOps practices for CI/CD, Kubernetes, IaC, automation, and cost optimization... ..., Helm, Ansible (Azure & AWS) IAM across Azure and AWS...Amazon Web ServiceWork at office3 days per week- ...seeking a Machine Learning Engineer to lead a portfolio of technology projects. This... ...involves collaborating with product managers to deliver cloud-based solutions and mentoring... ..., with strong skills in Python, AWS, and Kubernetes. A Bachelor's degree is required, and a...Amazon Web Service
- ...Role : DevOps Technical Release Manager Advanced Preferred Location : Onsite (SF Bay Area) • The DevOps Technical... ...• WebServices • Amazon Web Services (AWS) • AEM • CI/CD • Content Management Systems • DevOps/SRE • HTML/CSS/JavaScript • pythonAmazon Web ServiceFlexible hours
- ...Lead the technical design, architecture, and development... ...with product managers, UX designers, and stakeholders... ...cloud services (e.g., AWS, Azure, Google Cloud... ...containerization (e.g., Docker, Kubernetes ). Familiarity... ...Understanding of DevOps, infrastructure as code...Amazon Web Service
- ...Reliability Engineering, DevOps, or a similar role... ...Deep expertise in SRE principles and... ...automation, incident management, and post-mortems ,... ...cloud platforms (AWS, GCP, Azure), including... ...platforms (Kubernetes) , Proficiency in designing... ...& Response: Lead efforts in incident...Amazon Web Service
- ...Overview: Position: Sr AWS/Python Developer Lead Location: San Francisco, CA (Onsite) Experience: 8-10 Years (Preferred... ...architecture, and containerization ( Docker, Kubernetes ). Knowledge of DevOps practices including CI/CD pipelines and Infrastructure...Amazon Web ServiceContract work
- Position Frontend Lead (Amplience CMS OR any other Ecommerce CMS) Location San... ...Platform (GCP) is our primary cloud. We use AWS and Azure for specific applications.... ...& Orchestration: Docker, Kubernetes (GKE), Helm CI/CD & DevOps: GitLab CI, Jenkins, Spinnaker, Terraform...Amazon Web Service
- ...cryptocurrency project, patiently leading the asset-backed currency... ...experience, specialization as an SRE, and a love of scaling. They... ...(preferably via automation) Manage blockchain nodes for maximum uptime... ...with Dokploy / Swarm / Kubernetes / etc Help architect solutions...Work at officeFlexible hours
- ...operate multicloud infrastructure across AWS, GCP, and Azure. The ideal candidate... ...strong hands-on experience with Kubernetes and Terraform, and will manage production systems for Fortune 500... ...options, and the opportunity to work with leading technology in a dynamic environment,...Amazon Web Service
$150k
About the Role We're looking for a Lead, Applied AI to architect and... ...organizational enablement. You will manage the core technical... ...orchestration Production cloud experience (AWS, GCP, or Azure) and containerization (Docker, Kubernetes) Integration experience with...Amazon Web ServiceFull time- A leading cloud security startup in the US is seeking a Platform Engineer to design and maintain AWS infrastructure for multi-tenant SaaS platforms. This role requires strong proficiency in AWS, Kubernetes, and infrastructure-as-code tools such as Terraform. You will work...Amazon Web ServiceRemote work
$73.15k - $174k
...Lead TypeScript Backend Architect Choosing Capgemini means... ...standards. • Work closely with DevOps engineers to build and deploy cloud infrastructure using AWS managed services. • Implement and... ...especially Swagger (OpenAPI),Docker,Kubernetes,Helm, Istio, Argo •...Amazon Web ServicePermanent employmentFull timeContract workLocal area$132.5k - $338.3k
...You Are:The Integration Lead Architect playing a pivotal... ...expectations are managed and aligned with product... ...integration patterns across AWS or Azure, including containerized and Kubernetes-based deployment models.... ...familiarity with Agile and DevOps practices is highly...Amazon Web ServiceWork experience placementLive inWork at officeLocal area$230k - $270k
A tech-driven company is seeking an experienced engineer to manage and architect large-scale distributed database systems. The role requires a deep understanding of Kubernetes and AWS, with responsibilities including driving automation and enhancing data integrity. Candidates...Amazon Web Service$128.5k - $161k
A leading code security company is looking for a Software Engineer specializing in Infrastructure... ...engineers to design, implement, and manage critical infrastructure. Candidates... ...experience, familiarity with Kubernetes and AWS, and excellent communication skills. This...Amazon Web Service- ...Engineer - Public Cloud (Senior/Lead/Principal) Our Public Cloud... ...with solving real-world data management challenges, a strong... ...public cloud platforms such as AWS, GCP, Azure, or Alibaba Design... ...containerization frameworks such as Kubernetes, Docker, Mesos Resolve...Amazon Web Service
- ...secure systems and enhancing tooling. Candidates should have 2-3+ years of experience in high-availability systems, proficient in AWS and Kubernetes, and have a solid programming background. The position offers unlimited PTO and a comprehensive benefits plan, fostering an...Amazon Web Service
$125k - $165k
A leading innovator in laboratory software is seeking a Site Reliability Engineer in San... ...and performance of AI systems, managing production infrastructure, and operating... ...distributed systems and technologies like AWS and Kubernetes. The position offers a competitive salary...Amazon Web Service- A leading cloud security firm in San Francisco is seeking a Cloud Security Architect to own the cloud and infrastructure security. You will design AWS tenant isolation, manage Kubernetes security, and implement cloud security posture management. Ideal candidates have 5...Amazon Web ServiceRemote work
- ...Infrastructure Engineer to ensure stability, security, and scalability of their infrastructure. In this foundational role, you'll manage the Kubernetes cluster, drive infrastructure automation, and work closely with engineering teams. Candidates should have 6+ years of...Amazon Web ServiceFlexible hoursWeekend work
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to DevOps Manager: AWS, Kubernetes, SRE Lead. Be the first to apply!
Related searches
- lead devops engineer San Francisco, CA
- devops director San Francisco, CA
- devops team lead San Francisco, CA
- devops visa sponsorship available San Francisco, CA
- junior devops remote San Francisco, CA
- linux devops San Francisco, CA
- aws devops San Francisco, CA
- entry level devops San Francisco, CA
- senior devops San Francisco, CA
- devops San Francisco, CA


