Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Principal Cloud and Production Operations Engineer

Incedo Inc.

Position: Principal Cloud Engineer/Architect

Location: San Jose, CA (Hybrid)

Type: Full-Time/W2

Company Overview

Incedo is a US-based consulting, data science, and technology services firm with over 4,000 professionals across the US, Mexico, and India. We help clients achieve competitive advantage through end‑to‑end digital transformation. Our strength lies in combining engineering, data science, and design capabilities with deep domain expertise. We support clients across telecom, banking, wealth management, product engineering, and life sciences & healthcare.

Job Description:

The Principal Cloud and Production Operations Engineer serves as the senior technical authority responsible for architecting, automating, and optimizing hybrid and cloud-native production environments that power critical customer-facing services and enterprise applications.

This role combines deep cloud infrastructure expertise with strong production reliability and operational engineering skills. The Principal Engineer acts as both architect and hands-on builder, ensuring scalability, resilience, and security across multi-cloud and on-prem environments.

Reporting to the Associate Director of IT and Infrastructure, this position will collaborate closely with Engineering, DevOps, Security, and IT Operations to drive a culture of automation, observability, and continuous improvement across the production ecosystem.

Key Responsibilities:

Cloud Architecture and Engineering

•Design, implement, and maintain cloud and hybrid infrastructure supporting production workloads, enterprise systems, and CI/CD pipelines

•Lead the adoption of infrastructure-as-code (IaC) using Terraform, CloudFormation, or similar tools to enable repeatable, auditable, and secure deployments

•Architect scalable and fault-tolerant solutions across OCI, AWS, Azure, and on-prem data centers, ensuring high availability and cost efficiency

•Evaluate emerging cloud services and technologies for applicability to business needs and long-term scalability goals

Production Operations and Reliability

•Serve as the technical lead for production operations, ensuring uptime, performance, and reliability of customer-facing and internal systems

•Develop and maintain observability frameworks leveraging metrics, logs, and traces to ensure proactive detection and rapid response

•Partner with engineering teams to implement SRE-inspired practices, including service level objectives (SLOs), error budgets, and post-incident reviews

•Drive root cause analysis, performance tuning, and continuous improvement of production services

Automation and CI/CD Enablement

•Collaborate with DevOps and application engineering teams to build and optimize automated deployment pipelines supporting frequent, low-risk releases

•Integrate security and compliance checks into CI/CD workflows to ensure production readiness and alignment with internal standards

•Design self-healing infrastructure and automated rollback mechanisms to reduce operational risk

•Ensure secure and reliable configuration management and environment orchestration using tools such as Ansible, Chef, or Puppet

Operational Governance and Collaboration

•Establish and enforce operational best practices for monitoring, patching, and change management across production systems

•Lead production readiness reviews for new releases and large-scale changes

•Collaborate with the Security and Compliance teams to ensure systems adhere to policy, hardening standards, and regulatory requirements

•Participate in and occasionally lead on-call rotations for critical production systems, ensuring rapid triage and resolution

Leadership and Mentorship

•Act as a technical mentor to cloud and infrastructure engineers, fostering a culture of knowledge sharing and engineering excellence

•Lead architectural reviews, design sessions, and capacity planning discussions

•Serve as a trusted advisor to management on cloud modernization, resilience engineering, and cost optimization strategies

Qualifications:

•Bachelor’s degree in Computer Science, Information Systems, or related field; Master’s preferred

•10+ years of experience in cloud and infrastructure engineering, including 3+ years in a senior or principal role

•Expertise with OCI (preferred), AWS and/or Azure cloud services, including networking, compute, storage, and identity management

•Proven experience managing production-scale environments supporting mission-critical applications and services

•Strong proficiency in:

-Infrastructure-as-code (Terraform, CloudFormation)

-CI/CD and DevOps toolchains (Jenkins, GitLab, ArgoCD)

-Container orchestration (Kubernetes, Docker)

-Monitoring and observability platforms (Prometheus, Grafana, Datadog, ELK)

-Scripting and automation (Python, Bash, PowerShell)

•Solid understanding of security, compliance, and networking principles in hybrid environments

•Exceptional analytical, problem-solving, and incident management skills

•Demonstrated ability to lead complex, cross-functional initiatives from concept to execution

Preferred Experience:

•Experience in high-availability SaaS or networking environments

•Knowledge of FinOps, cost optimization, and multi-cloud governance frameworks

•Familiarity with Zero Trust, identity federation, and cloud access security model

  • •Exposure to AI/ML infrastructure or data-driven pipelines is a plus
Vacancy posted 8 hours ago
Similar jobs that could be interesting for youBased on the Principal Cloud and Production Operations Engineer in San Jose, CA vacancy
  •  ...Palo Alto Networks, Inc. is seeking a Principal Site Reliability Engineer in Santa Clara, CA. This role...  ...infrastructure and ensuring applications are production-ready, scalable, and reliable. You'...  ...and researchers, design secure cloud infrastructure, automate processes,... 
    Principal
    Cloud

    Palo Alto Networks

    Santa Clara, CA
    5 days ago
  • $272k - $431.25k

    NVIDIA Corporation is looking for a Principal Software Engineer for DGX Cloud Production Engineering to define technical strategies and lead efforts in large-scale GPU operations. The successful candidate will have over 15 years of experience in distributed systems, with... 
    Principal
    Cloud
    Remote job

    NVIDIA Corporation

    Santa Clara, CA
    2 days ago
  • $272k - $431.25k

    NVIDIA Gruppe is seeking a Principal Software Engineer to shape the technical direction of our GPU infrastructure in Santa Clara, California. You will define the technical strategy for DGX Cloud cluster operations and lead the design and implementation of critical systems... 
    Principal
    Cloud

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $300 per month

     ...built from the ground up, we own and operate each layer of the stack — from...  ...manufacturing, data center construction, and cloud services. If you want to do the...  ...AI runs on. We are looking for a Principal Engineer on our Production Engineering team. Someone who will... 
    Principal
    Cloud
    Full time
    Temporary work
    Immediate start

    Crusoe

    Sunnyvale, CA
    26 days ago
  • $140k - $185k

     ...Principal Cloud Engineering and Production Operations Engineer The Principal Cloud and Production Operations Engineer serves as the senior technical authority responsible for architecting, automating, and optimizing hybrid and cloud-native production environments that... 
    Principal
    Cloud
    For subcontractor
    Local area

    A10 Networks

    San Jose, CA
    3 days ago
  • $272k - $431.25k

     ...NVIDIA DGX Cloud is scaling GPU infrastructure across internal, partner, and...  ...cloud environments. We are looking for Principal Software Engineers to help shape the technical direction for production engineering, Kubernetes-based operations, automation, and reliability across... 
    Principal
    Cloud

    NVIDIA Gruppe

    Santa Clara, CA
    5 days ago
  • $150k - $200k

     ...Principal Software/Automation QA Engineer – Logitech – San Jose, CA The Principal Software...  ...quality, deployment, and operational stability of sync and...  ...testing frameworks across Cloud UI, Cloud API, mobile, and...  ...testing and inspecting production processes, equipment,... 
    Principal
    Cloud

    Payfuture Technologies

    San Jose, CA
    4 days ago
  • $147k - $237.5k

     ...the Prisma SASE Test team and seeking Test Engineers with an Automation‑First Mindset...  ...scale, working closely with Development, Product Management, SRE and Technical Marketing teams...  ...thorough technical leadership in the areas of cloud‑based orchestration, cloud‑delivered... 
    Principal
    Cloud
    Permanent employment
    Contract work
    Flexible hours

    Palo Alto Networks

    Santa Clara, CA
    4 days ago
  • $248k - $396.75k

     ...infrastructure both on‑prem and cloud. Join us in this exciting...  ...seeking a highly skilled Principal AI/ML Engineer to join our dynamic team to...  ...7+ years building production‑grade network automation. Strong...  ...architecture/standards/reuse, and operational documentation via... 
    Principal
    Cloud

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  • $164.5k - $235k

     ...largest security data lake to power our cloud-native Zero Trust Exchange platform....  .... Role We are looking for a Principal Production Engineer to join our team. This role is available...  ...Engineering in the Cloud Infrastructure & Operations department. Join Zscaler to be a... 
    Principal
    Cloud
    Full time
    Work at office
    Local area
    Remote work
    3 days per week

    Zscaler

    San Jose, CA
    23 hours ago
  • $180k - $225k

     ...customers globally trust our end-to-end, cloud-driven networking solutions. They...  ...per week Extreme's Cloud Operations team is a group of talented engineers passionate about building highly...  ...engineers with strong work experience in production operation, as well as cloud... 
    Principal
    Cloud
    Work experience placement
    Work at office
    Local area
    2 days per week
    1 day per week

    Extreme Networks

    San Jose, CA
    2 days ago
  •  ...A leading cybersecurity firm in Santa Clara is seeking a Principal Site Reliability Engineer to design and optimize their cloud platforms. The successful candidate will lead automation strategies, enhance system reliability, and mentor teams in best practices. This role... 
    Principal
    Cloud

    Fortinet

    Santa Clara, CA
    5 days ago
  •  ...Palo Alto Networks, Inc. is searching for a Principal Engineer to lead the evolution of AI-driven tools within our Cloud Infrastructure and Platform Engineering team. This role demands a recognized expert in developer platforms who is passionate about using AI to enhance... 
    Principal
    Cloud

    Palo Alto Networks

    Santa Clara, CA
    5 days ago
  •  ...Principal Data Engineer – Azure Databricks Flexton is a growing IT services and staffing company...  ...transforming it into trusted and governed data products, and enabling business-critical...  ...with Azure Databricks and modern cloud data architectures. ~ Expertise in data... 
    Principal
    Cloud

    Flexton

    Milpitas, CA
    7 days ago
  •  ...Palo Alto Networks, Inc. is seeking a visionary Senior Principal AI / Data Scientist to lead the transformation of our Autonomous Digital...  .../ML, strong programming skills in Python, and experience with cloud infrastructures like BigQuery. Join us to tackle real-world problems... 
    Principal
    Cloud

    Palo Alto Networks

    Santa Clara, CA
    4 days ago
  •  ...seeks a technical leader to design and deliver a key-value store for Oracle Cloud Infrastructure, supporting billions of keys with sub-millisecond responses. This role invites self-motivated engineers with a passion for solving complex challenges in high-performance... 
    Principal
    Cloud

    Ll Oefentherapie

    Santa Clara, CA
    4 days ago
  • $160k - $200k

     ...company in Sunnyvale, California, is looking for a skilled DevOps Engineer to design, implement, and maintain infrastructure. The ideal...  ...have 2-5 years of experience in DevOps, hands-on knowledge of cloud platforms like AWS, and proficiency in CI/CD tools. This role involves... 
    Principal
    Cloud

    Fortinet

    Sunnyvale, CA
    5 days ago
  • $198k - $297k

     ...Pure Storage, Inc. is seeking a Principal Product Manager in Santa Clara, CA, to drive the direction of their next business unit in cloud storage services. In this influential role, you will define product roadmaps and ensure market positioning while collaborating across... 
    Principal
    Cloud

    Pure Storage

    Santa Clara, CA
    5 days ago
  • $307k - $427k

    Google Inc. is seeking a Cloud Networking AI Principal Engineer in Sunnyvale, CA, to enhance its Networking Security portfolio. This role involves architecting intelligent systems for network security while integrating AI to tackle emerging threats. Candidates should possess... 
    Principal
    Cloud

    Google Inc.

    Sunnyvale, CA
    1 day ago
  •  ...Palo Alto Networks, Inc. is seeking a Principal Site Reliability Engineer to lead our cloud-native infrastructure efforts. This role involves architecting reliable Kubernetes ecosystems and integrating advanced security protocols into delivery pipelines. Candidates should... 
    Principal
    Cloud

    Palo Alto Networks

    Santa Clara, CA
    4 days ago
  • A leading staffing firm seeks a System / Clojure Principal Software Engineer to join their team in building innovative cloud testing frameworks. This key position involves developing infrastructure-level solutions and collaborating to implement core libraries for testing... 
    Principal
    Cloud

    Integrated Resources Inc.

    Santa Clara, CA
    2 days ago
  •  ...Oracle Cloud Infrastructure (OCI) delivers mission-critical applications for top tier...  ...for data planes. We are hoping to enhance engineering efficiency by concentrating our...  ...and debugging of software applications or operating systems. You will be able to work with Engineering... 
    Principal
    Cloud
    Worldwide
    Flexible hours

    Ll Oefentherapie

    Santa Clara, CA
    4 days ago
  •  ...Abbott Laboratories is seeking a Principal AI/ML Engineer in Santa Clara, CA. This role focuses on leading the technical execution of AI initiatives...  ...will have extensive experience in ML infrastructure, cloud platforms, and agile development. Abbott offers comprehensive... 
    Principal
    Cloud

    Abbott Laboratories company

    Santa Clara, CA
    5 days ago
  • $96.8k - $251.6k

    Oracle is seeking a visionary technical leader for its Cloud Infrastructure team in Santa Clara, California. The ideal candidate will...  ...include providing technical leadership, mentoring senior engineers, and defining scalable system architectures. Oracle offers a competitive... 
    Principal
    Cloud

    Oracle

    Santa Clara, CA
    1 day ago
  •  ...NVIDIA Gruppe is hiring a Principal Engineer in Santa Clara, CA to architect and scale diagnostic systems for Cloud Service Providers. This role involves defining technical...  ...ensure robust diagnostic frameworks for AI products. The ideal candidate will have over 15 years... 
    Principal
    Cloud

    NVIDIA Gruppe

    Santa Clara, CA
    4 days ago
  •  ...NetApp, Inc. is searching for a principal-level product leader responsible for the AI product strategy of Azure NetApp Files. This role requires...  ...have over 10 years of relevant experience, particularly in cloud infrastructure and enterprise storage, along with excellent... 
    Principal
    Cloud

    NetApp

    San Jose, CA
    4 days ago
  • $147k - $237.5k

     ...Palo Alto Networks, Inc. is seeking a Principal Software Engineer in Santa Clara, California, to drive the...  ...leadership and delivery of high-scale cloud security solutions. In this high-...  ...security challenges, manage the full product lifecycle, and collaborate across various... 
    Principal
    Cloud

    Palo Alto Networks

    Santa Clara, CA
    4 days ago
  • $167k - $270.5k

     ...Palo Alto Networks, Inc. is seeking a Principal IT Data Engineer in Santa Clara, California. This role involves architecting and maintaining data...  ...their extensive background to support various teams utilizing Cloud and Big Data technologies. Key responsibilities include... 
    Principal
    Cloud

    Palo Alto Networks

    Santa Clara, CA
    5 days ago
  • $208k - $260k

     ...Gigamon is seeking a Principal Software Engineer to lead the design and development of AI/ML-driven, cloud-native applications for network monitoring and analytics. You will be responsible for crafting scalable and resilient software while providing technical leadership... 
    Principal
    Cloud

    Gigamon

    Santa Clara, CA
    5 days ago
  • Walmart is looking for a Principal Software Engineer specializing in Observability located in Sunnyvale, CA. You will be the technical lead responsible for designing and developing cloud-native observability solutions, focusing on real-time telemetry systems. The ideal... 
    Principal
    Cloud

    Walmart

    Sunnyvale, CA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Principal Cloud and Production Operations Engineer. Be the first to apply!