Sr DevOps Engineer
Omni Inclusive
Job Summary
We are seeking a highly capable Senior DevOps Engineer / Platform Engineer to build, operationalize, and scale the infrastructure and deployment foundation for a strategic site-builder / network automation platform . This role will focus on creating reliable CI/CD pipelines, production-grade Kubernetes deployment patterns, managed database services, observability, environment reproducibility, secrets management, and Infrastructure as Code across development, testing, staging, and production environments.
This engineer will play a critical role in moving the platform from an early-stage, partially manual operating model into a repeatable, supportable, and production-ready DevOps model. The environment includes Kubernetes-hosted services, AWS managed services, workflow orchestration with Temporal, integration with Nautobot, Argo-based promotion flows, and the supporting tooling required for debugging, snapshotting, local development, and production support.
This is a hands-on engineering role for someone who can design the right platform patterns, implement them directly, and establish a durable operating model between development and DevOps teams.
Key Responsibilities
Platform Deployment & CI/CD
• Design, implement, and maintain CI/CD pipelines for testing, staging, and production environments.
• Build and maintain deployment workflows that support safe and seamless promotion across environments.
• Improve and maintain Argo-based deployment workflows to enable controlled release progression from test to staging to production.
• Establish baseline deployment mechanisms for the site-builder application and related services.
• Standardize Kubernetes application packaging and deployment patterns, with a strong preference toward Helm-based lifecycle management for complex services and third-party components.
• Migrate existing deployments to Helm charts where appropriate.
Kubernetes & Runtime Platform Engineering
• Support the deployment and ongoing operation of services running in Kubernetes.
• Improve runtime reliability, resiliency, and troubleshooting for distributed services operating inside shared Kubernetes clusters.
• Investigate and harden service-to-service connectivity patterns, especially for workflow components such as workers connecting to the Temporal engine.
• Partner with development teams to define production-grade runtime requirements, resource sizing, restart policies, and platform support boundaries.
Infrastructure as Code & Cloud Services
• Design and implement fully declarative Infrastructure as Code for managed cloud services, especially in AWS.
• Provision and maintain managed data services such as RDS/PostgreSQL and MongoDB-compatible document databases across all environments.
• Eliminate manual infrastructure setup where possible and replace it with reproducible, version-controlled deployment patterns.
• Prepare the platform for future scale across multiple environments and regions through repeatable IaC and GitOps-aligned practices.
Data Services, Snapshots & Developer Enablement
• Setup and maintain RDS, MongoDB, Redis/cache services , and related dependencies for all environments.
• Build tooling and operational processes for:
• production and staging database snapshots,
• restoring snapshots into development environments,
• enabling local debugging and development from realistic data states.
• Support creation of local and development environments, including Minikube-based environment-as-code approaches that mirror production behavior as closely as practical.
• Improve platform reproducibility so engineers can quickly stand up close-to-production development environments.
Workflow Orchestration & Temporal Support
• Lead the setup, deployment, and operational support of Temporal for workflow orchestration.
• Support production operations for Temporal, including troubleshooting performance issues, restarts, scaling concerns, and resource shortages.
• Establish maintainable deployment patterns for Temporal using supported packaging and lifecycle management approaches.
• Partner with engineering teams to ensure workflow platform reliability and upgradeability over time.
Observability, Reliability & Incident Readiness
• Design and maintain observability across testing, staging, and production using tools such as Prometheus and Grafana .
• Define and implement monitoring for:
• service and cluster utilization,
• CPU, memory, storage,
• IOPS / throughput metrics,
• database connections and session counts,
• cache hit / miss / coverage metrics,
• RDS and MongoDB utilization,
• service health and alerting.
• Build and maintain logging, tracing, and correlation capabilities, separated appropriately by environment.
• Create tools to support deep debugging and operational inspection, including raw database reads, cleanup of unused volumes, and emergency cache invalidation.
Security, Access & Secrets Management
• Maintain secrets management processes across environments.
• Build tooling for short-lived internal token generation and long-lived secret rotation.
• Support secure access from deployed services to active production devices and southbound systems.
• Help establish credential management patterns for southbound integrations and device-facing access.
• Partner with related teams to define safe operational limits and controls for service integrations.
External Integrations & Platform Support
• Support integration patterns with Nautobot and help define safe client-side behaviors such as rate limiting, retry/backoff, and service protection mechanisms.
• Partner with application teams to understand and mitigate integration issues such as rate limiting or request rejection.
• Support staging and testing by enabling virtual device environments where needed.
• Contribute to end-to-end acceptance testing and production readiness activities.
Operating Model & Cross-Functional Execution
• Help define an effective operating model between Development and DevOps, whether via RACI , embedded Agile delivery, or a hybrid support model.
• Support deployment readiness, incident management, environment ownership boundaries, and lifecycle responsibilities.
• Work closely with software engineering, infrastructure, application owners, and partner teams to drive production readiness and sustainable operations.
Required Qualifications
• Bachelor's degree in Computer Science, Engineering, Information Systems, or equivalent practical experience.
• 7+ years of experience in DevOps, Platform Engineering, SRE, or Infrastructure Engineering roles.
• Strong hands-on experience with Kubernetes in production environments.
• Strong experience building and maintaining CI/CD pipelines for multi-environment software delivery.
• Strong experience with ArgoCD , GitOps workflows, or equivalent deployment tooling.
• Strong experience with Helm and Kubernetes package/deployment lifecycle management.
• Experience with AWS managed services , especially RDS/PostgreSQL , document databases, and related infrastructure.
• Strong experience with Infrastructure as Code , such as Terraform and/or similar declarative tooling.
• Experience with Prometheus, Grafana , and modern observability practices.
• Experience with Redis/cache services , secrets management, and operational debugging.
• Strong Linux, networking, and distributed systems troubleshooting skills.
• Strong scripting and automation skills in one or more languages such as Python, Bash, or Go.
• Proven ability to work cross-functionally and operate effectively in environments where ownership boundaries are still evolving.
Preferred Qualifications
• Experience with Temporal deployment and production operations.
• Experience supporting developer platforms with local environment reproducibility using Minikube, kind, or similar tools.
• Experience with MongoDB / DocumentDB operations and restore workflows.
• Experience integrating with Nautobot , NetBox, or similar infrastructure source-of-truth platforms.
• Experience operating in shared-cluster environments with multi-team tenancy and constrained access models.
• Experience designing platform patterns for internal products that must scale across regions or multiple deployment footprints.
• Familiarity with network automation or infrastructure orchestration platforms is a plus.
What Success Looks Like
• CI/CD pipelines are reliable, repeatable, and support safe promotion across all environments.
• Kubernetes deployments are standardized, maintainable, and production ready.
• Managed infrastructure is defined as code rather than through manual setup.
• Temporal, databases, cache layers, and observability tooling are stable and supportable.
• Development teams can reproduce realistic environments locally for faster debugging and delivery.
• Secrets, access patterns, and operational tooling are mature enough to support production-scale operations.
• The DevOps operating model is clearly defined and enables faster deployments with less operational risk.
Scope Notes
In scope
• CI/CD and deployment foundations
• Kubernetes packaging and release management
• RDS, MongoDB, Redis/cache services
• Temporal platform setup and operational support
• Observability, alerting, and debugging tooling
• Secrets management and access enablement
• Infrastructure as Code and environment reproducibility
• DevOps / Development operational model definition
Candidate Profile
The ideal candidate is a builder-operator: someone who can establish engineering discipline where manual patterns currently exist, create durable automation for platform operations, and raise the overall maturity of the product's deployment and runtime ecosystem. This person should be equally comfortable discussing deployment architecture, writing IaC and Helm code, troubleshooting Kubernetes runtime issues, and defining how DevOps and software engineering teams work together over the full product lifecycle.
We are seeking a highly capable Senior DevOps Engineer / Platform Engineer to build, operationalize, and scale the infrastructure and deployment foundation for a strategic site-builder / network automation platform . This role will focus on creating reliable CI/CD pipelines, production-grade Kubernetes deployment patterns, managed database services, observability, environment reproducibility, secrets management, and Infrastructure as Code across development, testing, staging, and production environments.
This engineer will play a critical role in moving the platform from an early-stage, partially manual operating model into a repeatable, supportable, and production-ready DevOps model. The environment includes Kubernetes-hosted services, AWS managed services, workflow orchestration with Temporal, integration with Nautobot, Argo-based promotion flows, and the supporting tooling required for debugging, snapshotting, local development, and production support.
This is a hands-on engineering role for someone who can design the right platform patterns, implement them directly, and establish a durable operating model between development and DevOps teams.
Key Responsibilities
Platform Deployment & CI/CD
• Design, implement, and maintain CI/CD pipelines for testing, staging, and production environments.
• Build and maintain deployment workflows that support safe and seamless promotion across environments.
• Improve and maintain Argo-based deployment workflows to enable controlled release progression from test to staging to production.
• Establish baseline deployment mechanisms for the site-builder application and related services.
• Standardize Kubernetes application packaging and deployment patterns, with a strong preference toward Helm-based lifecycle management for complex services and third-party components.
• Migrate existing deployments to Helm charts where appropriate.
Kubernetes & Runtime Platform Engineering
• Support the deployment and ongoing operation of services running in Kubernetes.
• Improve runtime reliability, resiliency, and troubleshooting for distributed services operating inside shared Kubernetes clusters.
• Investigate and harden service-to-service connectivity patterns, especially for workflow components such as workers connecting to the Temporal engine.
• Partner with development teams to define production-grade runtime requirements, resource sizing, restart policies, and platform support boundaries.
Infrastructure as Code & Cloud Services
• Design and implement fully declarative Infrastructure as Code for managed cloud services, especially in AWS.
• Provision and maintain managed data services such as RDS/PostgreSQL and MongoDB-compatible document databases across all environments.
• Eliminate manual infrastructure setup where possible and replace it with reproducible, version-controlled deployment patterns.
• Prepare the platform for future scale across multiple environments and regions through repeatable IaC and GitOps-aligned practices.
Data Services, Snapshots & Developer Enablement
• Setup and maintain RDS, MongoDB, Redis/cache services , and related dependencies for all environments.
• Build tooling and operational processes for:
• production and staging database snapshots,
• restoring snapshots into development environments,
• enabling local debugging and development from realistic data states.
• Support creation of local and development environments, including Minikube-based environment-as-code approaches that mirror production behavior as closely as practical.
• Improve platform reproducibility so engineers can quickly stand up close-to-production development environments.
Workflow Orchestration & Temporal Support
• Lead the setup, deployment, and operational support of Temporal for workflow orchestration.
• Support production operations for Temporal, including troubleshooting performance issues, restarts, scaling concerns, and resource shortages.
• Establish maintainable deployment patterns for Temporal using supported packaging and lifecycle management approaches.
• Partner with engineering teams to ensure workflow platform reliability and upgradeability over time.
Observability, Reliability & Incident Readiness
• Design and maintain observability across testing, staging, and production using tools such as Prometheus and Grafana .
• Define and implement monitoring for:
• service and cluster utilization,
• CPU, memory, storage,
• IOPS / throughput metrics,
• database connections and session counts,
• cache hit / miss / coverage metrics,
• RDS and MongoDB utilization,
• service health and alerting.
• Build and maintain logging, tracing, and correlation capabilities, separated appropriately by environment.
• Create tools to support deep debugging and operational inspection, including raw database reads, cleanup of unused volumes, and emergency cache invalidation.
Security, Access & Secrets Management
• Maintain secrets management processes across environments.
• Build tooling for short-lived internal token generation and long-lived secret rotation.
• Support secure access from deployed services to active production devices and southbound systems.
• Help establish credential management patterns for southbound integrations and device-facing access.
• Partner with related teams to define safe operational limits and controls for service integrations.
External Integrations & Platform Support
• Support integration patterns with Nautobot and help define safe client-side behaviors such as rate limiting, retry/backoff, and service protection mechanisms.
• Partner with application teams to understand and mitigate integration issues such as rate limiting or request rejection.
• Support staging and testing by enabling virtual device environments where needed.
• Contribute to end-to-end acceptance testing and production readiness activities.
Operating Model & Cross-Functional Execution
• Help define an effective operating model between Development and DevOps, whether via RACI , embedded Agile delivery, or a hybrid support model.
• Support deployment readiness, incident management, environment ownership boundaries, and lifecycle responsibilities.
• Work closely with software engineering, infrastructure, application owners, and partner teams to drive production readiness and sustainable operations.
Required Qualifications
• Bachelor's degree in Computer Science, Engineering, Information Systems, or equivalent practical experience.
• 7+ years of experience in DevOps, Platform Engineering, SRE, or Infrastructure Engineering roles.
• Strong hands-on experience with Kubernetes in production environments.
• Strong experience building and maintaining CI/CD pipelines for multi-environment software delivery.
• Strong experience with ArgoCD , GitOps workflows, or equivalent deployment tooling.
• Strong experience with Helm and Kubernetes package/deployment lifecycle management.
• Experience with AWS managed services , especially RDS/PostgreSQL , document databases, and related infrastructure.
• Strong experience with Infrastructure as Code , such as Terraform and/or similar declarative tooling.
• Experience with Prometheus, Grafana , and modern observability practices.
• Experience with Redis/cache services , secrets management, and operational debugging.
• Strong Linux, networking, and distributed systems troubleshooting skills.
• Strong scripting and automation skills in one or more languages such as Python, Bash, or Go.
• Proven ability to work cross-functionally and operate effectively in environments where ownership boundaries are still evolving.
Preferred Qualifications
• Experience with Temporal deployment and production operations.
• Experience supporting developer platforms with local environment reproducibility using Minikube, kind, or similar tools.
• Experience with MongoDB / DocumentDB operations and restore workflows.
• Experience integrating with Nautobot , NetBox, or similar infrastructure source-of-truth platforms.
• Experience operating in shared-cluster environments with multi-team tenancy and constrained access models.
• Experience designing platform patterns for internal products that must scale across regions or multiple deployment footprints.
• Familiarity with network automation or infrastructure orchestration platforms is a plus.
What Success Looks Like
• CI/CD pipelines are reliable, repeatable, and support safe promotion across all environments.
• Kubernetes deployments are standardized, maintainable, and production ready.
• Managed infrastructure is defined as code rather than through manual setup.
• Temporal, databases, cache layers, and observability tooling are stable and supportable.
• Development teams can reproduce realistic environments locally for faster debugging and delivery.
• Secrets, access patterns, and operational tooling are mature enough to support production-scale operations.
• The DevOps operating model is clearly defined and enables faster deployments with less operational risk.
Scope Notes
In scope
• CI/CD and deployment foundations
• Kubernetes packaging and release management
• RDS, MongoDB, Redis/cache services
• Temporal platform setup and operational support
• Observability, alerting, and debugging tooling
• Secrets management and access enablement
• Infrastructure as Code and environment reproducibility
• DevOps / Development operational model definition
Candidate Profile
The ideal candidate is a builder-operator: someone who can establish engineering discipline where manual patterns currently exist, create durable automation for platform operations, and raise the overall maturity of the product's deployment and runtime ecosystem. This person should be equally comfortable discussing deployment architecture, writing IaC and Helm code, troubleshooting Kubernetes runtime issues, and defining how DevOps and software engineering teams work together over the full product lifecycle.
Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Sr DevOps Engineer in Santa Clara, CA vacancy
- ...Enterprise Technologies Inc. is a recognized provider of professional IT Consulting services in the US. We are actively seeking Sr Devops Engineer FullTime Rolefor one of our direct client. Sr DevOps Engineer Location: Remote (Anywhere in US) Full Time/ Direct...SeniorFull timeRemote work
$91k - $147.2k
...Sr. DevOps Engineer – Shockwave Medical Johnson & Johnson is hiring for a Sr. DevOps Engineer – Shockwave Medical to join our team located in Santa Clara, CA. Fueled by innovation at the intersection of biology and technology, we're developing the next generation...SeniorLocal areaImmediate start- ...Senior Web Release Engineer About the Role We are seeking a highly skilled and motivated Senior Web Release Engineer to join our team... ..., Kubernetes, or Concord. Collaborate with developers, QA, DevOps, and product teams to ensure high-quality and timely releases....Senior
- ...immediate opening with my client. If you are looking for a new project, please send me a copy of your updated resumes Title: Sr. SRE / DevOps Engineer Location: Sunnyvale, CA (Only Local candidate) Client Interview In-Person Job Summary For this role, we are...SeniorLocal areaImmediate start
- ...Senior DevOps Engineer Location: Sunnyvale, CA Onsite position Fulltime position JD: Must Have Skills: AWS, EKS, IAM, S3, Kubernetes, Kustomize, Flux, Crossplane, CRDs, Python, Github, Kafka, Linux, Trino Strong...SeniorFull time
- ...Job Title : Sr DevOPS Engineer Location - San Jose, CA FTE Only Job Description Sr DevOPS Engineer (Snowflake, DBT and Qlik) • CI/CD tools: Azure DevOps Pipelines or GitLab CI/CD (hands-on pipeline development) • Infrastructure as...Senior
- ...practices for each phase. Requirements Bachelor’s degree or above, with 5+ years of experience in system operations and DevOps engineering; experience supporting mobile advertising platforms is preferred. Proficient in Golang and Vue.js development; experience...Senior
- ...Job Title - Sr. DevOps Engineer Location- Milpitas, CA-Day 1 onsite Onsite Contract 12 Months Rate - As per market standard We are seeking an experienced and highly skilled DevOps Engineer with more than 9 years of hands-on experience...SeniorContract work
- Stellar IT Solutions LLC is seeking a Senior DevOps Engineer in Santa Clara, CA to design, build, and scale infrastructure for a site-builder/network automation platform. This role involves transitioning the platform to a production-ready DevOps model, focusing on CI/CD...SeniorLong term contract
$148k - $235.75k
...a lasting impact on the world. Join our team of innovative engineers who are building an AI Data Center AIOps platform that turns raw... ...centric insights and automation for GPU fleets. We’re hiring a DevOps Engineer to operate the platform itself (not the compute cluster...Senior- Job Description Primary Function of Position We are seeking a Senior DevOps Engineer to join the software team within the Endoluminal business unit and help drive AI developer enablement initiatives across the company. The successful candidate will design, build, and operate...Senior
$176.36k - $293.94k
...Sr. DevOps Engineer We are Omnissa! Omnissa is the first AI-driven digital work platform, built to support flexible, secure, work-from anywhere experiences. We integrate industry-leading solutions—including Unified Endpoint Management, Virtual Apps and Desktops, Digital...SeniorWork experience placementLocal areaVisa sponsorshipFlexible hours$160k - $210k
...move fast, we’d love to have you on board. Join us in shaping the future of healthcare. Job Summary We’re looking for a Senior DevOps Engineer to own and scale our cloud infrastructure powering AI solutions for major health systems. You’ll architect and manage multi-...SeniorLive inFlexible hours$184k - $287.5k
NVIDIA Corporation is looking for a Senior System Software Engineer focusing on DevOps and Infrastructure Automation in Santa Clara, California. Candidates should have a strong background in operating production distributed systems, particularly with Kubernetes and CI/CD...SeniorRemote job$147k - $222.29k
...Development Expected Travel: 0% Career Status: Professional Employment Type: Regular Full Time Career Level: T3-1 Job Title: Sr. DevOps Engineer Location: Palo Alto, CA Work Model: Hybrid Work Model Purpose and Objective SAP Labs, LLC seeks a Sr. DevOps Engineer at...SeniorFull timeLocal area- ...Senior Cloud/DevOps Engineer Location: Sunnyvale, CA Experience: 10 Duration: 6 Months Please mention the current location, DL location, and visa status. Only U.S. Citizens and Green Card Holders. Must have skills: NGINX, Zero Trust Networking, AWS, Kubernetes...Senior
$128k - $176k
A leading technology firm in Santa Clara seeks a DevOps Engineer with a strong background in CI/CD systems and cloud environments. The ideal candidate will have over 8 years of experience in DevOps, proficient in tools like Jenkins and Kubernetes, and will play a key role...SeniorFull time$155k - $230k
A leading cybersecurity company in Santa Clara is seeking a Senior/Staff Software Engineer to provide technology leadership in their DevOps Team. You'll design and manage resilient infrastructures and implement CI/CD pipelines while mentoring junior engineers. The ideal...Senior$184k - $287.5k
...We are seeking a highly skilled and experienced Senior DevOps Engineer to join NVIDIA’s Robotics DevOps team! The ideal candidate will bring deep expertise in CI/CD infrastructure along with hands‑on experience supporting robotics software, including ROS 2–based systems...SeniorNight shift- Intuitive is looking for a Senior DevOps Engineer to join the software team within the Endoluminal business unit. This role encompasses driving AI developer enablement initiatives across the company, focusing on designing, building, and operating the infrastructure and...Senior
- ...Experience : 8-15 Job Type : Contractual Overview Job Location: Santa Clara, CA Job Type: Long Term Contract Summary Seeking a Senior DevOps Engineer to design, build, and scale infrastructure for a site‑builder/network automation platform. This role focuses on CI/CD pipelines...SeniorLong term contract
- ...Service. You will collaborate with architects and subject matter experts to design robust technical architectures while setting up DevOps processes and CI/CD pipelines. If you are passionate about leveraging your expertise in AEM and cloud solutions, this opportunity is...Senior
$198k - $260k
...over 200 repositories across three product lines, producing firmware that ships to automotive OEMs. We are looking for a Sr. Staff DevOps Engineer to own the delivery platform: CI/CD pipelines, release automation, artifact management, build tooling, and the instrumentation...SeniorWork at officeWorldwideFlexible hoursShift work$180k - $270k
Pure Storage, Inc. in California is seeking a Senior Release Engineer to improve software release processes. This role requires over 5 years of experience with Python or Golang and a strong analytical skill set. The successful candidate will work in a fast-paced environment...SeniorFlexible hours- ...Senior Release Engineer, Hyperscale Line Of Business Santa Clara, California We're in an unbelievably exciting area of tech and are fundamentally reshaping the data storage industry. Here, you lead with innovative thinking, grow along with us, and join the smartest...SeniorWork at officeFlexible hours
- ...Position: Sr. Build and Release DevOps Engineer Location: San Jose, CA Duration: 12+ Months Contract Total Hours/week: 40.00 Client: Medical Device Company Employment Type: Contract on W2 (Need US Citizens Or GC Holders Only) No H1B’s Contract-to-hire...SeniorContract workH1bWork at officeRemote work
- ...Job description Company is helping our client find a Senior DevOps Engineer to provide follow-the-sun coverage for the ADAS line of business, ensuring platform stability, SLA compliance, and rapid incident response during West Coast business hours. In this role...Senior
$100k - $300k
...Job Title: Senior DevOps Engineer Position Type: FTE Location: Palo Alto, CA Salary Range / Rate (Currency): $100,000 - $300,000 Job ID#: 158174 Job Summary (Responsibilities and Requirements): Responsibilities You will continue to develop...SeniorWork experience placement- ...our fast‑growing, highly ambitious team you won’t just drive the future of AI—you’ll help define it. Role Overview Senior DevOps Engineer – architect and maintain the core infrastructure that powers our cutting‑edge AI solutions. Responsibilities Design, build...SeniorFull time
$160k - $200k
A leading cybersecurity company in Sunnyvale, California, is looking for a skilled DevOps Engineer to design, implement, and maintain infrastructure. The ideal candidate will have 2-5 years of experience in DevOps, hands-on knowledge of cloud platforms like AWS, and proficiency...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Sr DevOps Engineer. Be the first to apply!
Related searches
- devops aws developer (remote) Santa Clara, CA
- devops engineer sre Santa Clara, CA
- senior devops cloud engineer Santa Clara, CA
- senior devops engineer Santa Clara, CA
- devops engineer remote Santa Clara, CA
- senior devops engineer remote Santa Clara, CA
- devops engineer full time Santa Clara, CA
- big data devops engineer Santa Clara, CA
- devops engineer Santa Clara, CA
- devops cloud engineer Santa Clara, CA


