Principal Cloud and Production Operations Engineer
Incedo Inc.
Position: Principal Cloud Engineer/Architect
Location: San Jose, CA (Hybrid)
Type: Full-Time/W2
Company Overview
Incedo is a US-based consulting, data science, and technology services firm with over 4,000 professionals across the US, Mexico, and India. We help clients achieve competitive advantage through end‑to‑end digital transformation. Our strength lies in combining engineering, data science, and design capabilities with deep domain expertise. We support clients across telecom, banking, wealth management, product engineering, and life sciences & healthcare.
Job Description:
The Principal Cloud and Production Operations Engineer serves as the senior technical authority responsible for architecting, automating, and optimizing hybrid and cloud-native production environments that power critical customer-facing services and enterprise applications.
This role combines deep cloud infrastructure expertise with strong production reliability and operational engineering skills. The Principal Engineer acts as both architect and hands-on builder, ensuring scalability, resilience, and security across multi-cloud and on-prem environments.
Reporting to the Associate Director of IT and Infrastructure, this position will collaborate closely with Engineering, DevOps, Security, and IT Operations to drive a culture of automation, observability, and continuous improvement across the production ecosystem.
Key Responsibilities:
Cloud Architecture and Engineering
•Design, implement, and maintain cloud and hybrid infrastructure supporting production workloads, enterprise systems, and CI/CD pipelines
•Lead the adoption of infrastructure-as-code (IaC) using Terraform, CloudFormation, or similar tools to enable repeatable, auditable, and secure deployments
•Architect scalable and fault-tolerant solutions across OCI, AWS, Azure, and on-prem data centers, ensuring high availability and cost efficiency
•Evaluate emerging cloud services and technologies for applicability to business needs and long-term scalability goals
Production Operations and Reliability
•Serve as the technical lead for production operations, ensuring uptime, performance, and reliability of customer-facing and internal systems
•Develop and maintain observability frameworks leveraging metrics, logs, and traces to ensure proactive detection and rapid response
•Partner with engineering teams to implement SRE-inspired practices, including service level objectives (SLOs), error budgets, and post-incident reviews
•Drive root cause analysis, performance tuning, and continuous improvement of production services
Automation and CI/CD Enablement
•Collaborate with DevOps and application engineering teams to build and optimize automated deployment pipelines supporting frequent, low-risk releases
•Integrate security and compliance checks into CI/CD workflows to ensure production readiness and alignment with internal standards
•Design self-healing infrastructure and automated rollback mechanisms to reduce operational risk
•Ensure secure and reliable configuration management and environment orchestration using tools such as Ansible, Chef, or Puppet
Operational Governance and Collaboration
•Establish and enforce operational best practices for monitoring, patching, and change management across production systems
•Lead production readiness reviews for new releases and large-scale changes
•Collaborate with the Security and Compliance teams to ensure systems adhere to policy, hardening standards, and regulatory requirements
•Participate in and occasionally lead on-call rotations for critical production systems, ensuring rapid triage and resolution
Leadership and Mentorship
•Act as a technical mentor to cloud and infrastructure engineers, fostering a culture of knowledge sharing and engineering excellence
•Lead architectural reviews, design sessions, and capacity planning discussions
•Serve as a trusted advisor to management on cloud modernization, resilience engineering, and cost optimization strategies
Qualifications:
•Bachelor’s degree in Computer Science, Information Systems, or related field; Master’s preferred
•10+ years of experience in cloud and infrastructure engineering, including 3+ years in a senior or principal role
•Expertise with OCI (preferred), AWS and/or Azure cloud services, including networking, compute, storage, and identity management
•Proven experience managing production-scale environments supporting mission-critical applications and services
•Strong proficiency in:
-Infrastructure-as-code (Terraform, CloudFormation)
-CI/CD and DevOps toolchains (Jenkins, GitLab, ArgoCD)
-Container orchestration (Kubernetes, Docker)
-Monitoring and observability platforms (Prometheus, Grafana, Datadog, ELK)
-Scripting and automation (Python, Bash, PowerShell)
•Solid understanding of security, compliance, and networking principles in hybrid environments
•Exceptional analytical, problem-solving, and incident management skills
•Demonstrated ability to lead complex, cross-functional initiatives from concept to execution
Preferred Experience:
•Experience in high-availability SaaS or networking environments
•Knowledge of FinOps, cost optimization, and multi-cloud governance frameworks
•Familiarity with Zero Trust, identity federation, and cloud access security model
- •Exposure to AI/ML infrastructure or data-driven pipelines is a plus
- ...Palo Alto Networks, Inc. is seeking a Principal Site Reliability Engineer in Santa Clara, CA. This role... ...infrastructure and ensuring applications are production-ready, scalable, and reliable. You'... ...and researchers, design secure cloud infrastructure, automate processes,...PrincipalCloud
$272k - $431.25k
NVIDIA Corporation is looking for a Principal Software Engineer for DGX Cloud Production Engineering to define technical strategies and lead efforts in large-scale GPU operations. The successful candidate will have over 15 years of experience in distributed systems, with...PrincipalCloudRemote job$272k - $431.25k
NVIDIA Gruppe is seeking a Principal Software Engineer to shape the technical direction of our GPU infrastructure in Santa Clara, California. You will define the technical strategy for DGX Cloud cluster operations and lead the design and implementation of critical systems...PrincipalCloud$300 per month
...built from the ground up, we own and operate each layer of the stack — from... ...manufacturing, data center construction, and cloud services. If you want to do the... ...AI runs on. We are looking for a Principal Engineer on our Production Engineering team. Someone who will...PrincipalCloudFull timeTemporary workImmediate start$140k - $185k
...Principal Cloud Engineering and Production Operations Engineer The Principal Cloud and Production Operations Engineer serves as the senior technical authority responsible for architecting, automating, and optimizing hybrid and cloud-native production environments that...PrincipalCloudFor subcontractorLocal area$272k - $431.25k
...NVIDIA DGX Cloud is scaling GPU infrastructure across internal, partner, and... ...cloud environments. We are looking for Principal Software Engineers to help shape the technical direction for production engineering, Kubernetes-based operations, automation, and reliability across...PrincipalCloud$150k - $200k
...Principal Software/Automation QA Engineer – Logitech – San Jose, CA The Principal Software... ...quality, deployment, and operational stability of sync and... ...testing frameworks across Cloud UI, Cloud API, mobile, and... ...testing and inspecting production processes, equipment,...PrincipalCloud$147k - $237.5k
...the Prisma SASE Test team and seeking Test Engineers with an Automation‑First Mindset... ...scale, working closely with Development, Product Management, SRE and Technical Marketing teams... ...thorough technical leadership in the areas of cloud‑based orchestration, cloud‑delivered...PrincipalCloudPermanent employmentContract workFlexible hours$248k - $396.75k
...infrastructure both on‑prem and cloud. Join us in this exciting... ...seeking a highly skilled Principal AI/ML Engineer to join our dynamic team to... ...7+ years building production‑grade network automation. Strong... ...architecture/standards/reuse, and operational documentation via...PrincipalCloud$164.5k - $235k
...largest security data lake to power our cloud-native Zero Trust Exchange platform.... .... Role We are looking for a Principal Production Engineer to join our team. This role is available... ...Engineering in the Cloud Infrastructure & Operations department. Join Zscaler to be a...PrincipalCloudFull timeWork at officeLocal areaRemote work3 days per week$180k - $225k
...customers globally trust our end-to-end, cloud-driven networking solutions. They... ...per week Extreme's Cloud Operations team is a group of talented engineers passionate about building highly... ...engineers with strong work experience in production operation, as well as cloud...PrincipalCloudWork experience placementWork at officeLocal area2 days per week1 day per week- ...A leading cybersecurity firm in Santa Clara is seeking a Principal Site Reliability Engineer to design and optimize their cloud platforms. The successful candidate will lead automation strategies, enhance system reliability, and mentor teams in best practices. This role...PrincipalCloud
- ...Palo Alto Networks, Inc. is searching for a Principal Engineer to lead the evolution of AI-driven tools within our Cloud Infrastructure and Platform Engineering team. This role demands a recognized expert in developer platforms who is passionate about using AI to enhance...PrincipalCloud
- ...Principal Data Engineer – Azure Databricks Flexton is a growing IT services and staffing company... ...transforming it into trusted and governed data products, and enabling business-critical... ...with Azure Databricks and modern cloud data architectures. ~ Expertise in data...PrincipalCloud
- ...Palo Alto Networks, Inc. is seeking a visionary Senior Principal AI / Data Scientist to lead the transformation of our Autonomous Digital... .../ML, strong programming skills in Python, and experience with cloud infrastructures like BigQuery. Join us to tackle real-world problems...PrincipalCloud
- ...seeks a technical leader to design and deliver a key-value store for Oracle Cloud Infrastructure, supporting billions of keys with sub-millisecond responses. This role invites self-motivated engineers with a passion for solving complex challenges in high-performance...PrincipalCloud
$160k - $200k
...company in Sunnyvale, California, is looking for a skilled DevOps Engineer to design, implement, and maintain infrastructure. The ideal... ...have 2-5 years of experience in DevOps, hands-on knowledge of cloud platforms like AWS, and proficiency in CI/CD tools. This role involves...PrincipalCloud$198k - $297k
...Pure Storage, Inc. is seeking a Principal Product Manager in Santa Clara, CA, to drive the direction of their next business unit in cloud storage services. In this influential role, you will define product roadmaps and ensure market positioning while collaborating across...PrincipalCloud$307k - $427k
Google Inc. is seeking a Cloud Networking AI Principal Engineer in Sunnyvale, CA, to enhance its Networking Security portfolio. This role involves architecting intelligent systems for network security while integrating AI to tackle emerging threats. Candidates should possess...PrincipalCloud- ...Palo Alto Networks, Inc. is seeking a Principal Site Reliability Engineer to lead our cloud-native infrastructure efforts. This role involves architecting reliable Kubernetes ecosystems and integrating advanced security protocols into delivery pipelines. Candidates should...PrincipalCloud
- A leading staffing firm seeks a System / Clojure Principal Software Engineer to join their team in building innovative cloud testing frameworks. This key position involves developing infrastructure-level solutions and collaborating to implement core libraries for testing...PrincipalCloud
- ...Oracle Cloud Infrastructure (OCI) delivers mission-critical applications for top tier... ...for data planes. We are hoping to enhance engineering efficiency by concentrating our... ...and debugging of software applications or operating systems. You will be able to work with Engineering...PrincipalCloudWorldwideFlexible hours
- ...Abbott Laboratories is seeking a Principal AI/ML Engineer in Santa Clara, CA. This role focuses on leading the technical execution of AI initiatives... ...will have extensive experience in ML infrastructure, cloud platforms, and agile development. Abbott offers comprehensive...PrincipalCloud
$96.8k - $251.6k
Oracle is seeking a visionary technical leader for its Cloud Infrastructure team in Santa Clara, California. The ideal candidate will... ...include providing technical leadership, mentoring senior engineers, and defining scalable system architectures. Oracle offers a competitive...PrincipalCloud- ...NVIDIA Gruppe is hiring a Principal Engineer in Santa Clara, CA to architect and scale diagnostic systems for Cloud Service Providers. This role involves defining technical... ...ensure robust diagnostic frameworks for AI products. The ideal candidate will have over 15 years...PrincipalCloud
- ...NetApp, Inc. is searching for a principal-level product leader responsible for the AI product strategy of Azure NetApp Files. This role requires... ...have over 10 years of relevant experience, particularly in cloud infrastructure and enterprise storage, along with excellent...PrincipalCloud
$147k - $237.5k
...Palo Alto Networks, Inc. is seeking a Principal Software Engineer in Santa Clara, California, to drive the... ...leadership and delivery of high-scale cloud security solutions. In this high-... ...security challenges, manage the full product lifecycle, and collaborate across various...PrincipalCloud$167k - $270.5k
...Palo Alto Networks, Inc. is seeking a Principal IT Data Engineer in Santa Clara, California. This role involves architecting and maintaining data... ...their extensive background to support various teams utilizing Cloud and Big Data technologies. Key responsibilities include...PrincipalCloud$208k - $260k
...Gigamon is seeking a Principal Software Engineer to lead the design and development of AI/ML-driven, cloud-native applications for network monitoring and analytics. You will be responsible for crafting scalable and resilient software while providing technical leadership...PrincipalCloud- Walmart is looking for a Principal Software Engineer specializing in Observability located in Sunnyvale, CA. You will be the technical lead responsible for designing and developing cloud-native observability solutions, focusing on real-time telemetry systems. The ideal...PrincipalCloud
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Principal Cloud and Production Operations Engineer. Be the first to apply!
- senior aws cloud engineer San Jose, CA
- senior cloud engineer San Jose, CA
- google cloud architect San Jose, CA
- senior cloud network engineer San Jose, CA
- senior cloud infrastructure engineer San Jose, CA
- principal cloud engineer San Jose, CA
- senior cloud security engineer San Jose, CA
- software engineer - cloud services San Jose, CA
- cloud developer San Jose, CA
- big data cloud engineer San Jose, CA


