Lead Cloud Engineering and Production Operations Engineer
Qode
Job Description
Job Description
About Incedo:
Incedo is a global AI and data transformation specialist empowering companies to realize sustainable business impact from their digital investments by delivering ROI from View email address on ziprecruiter.com. As a long-term partner for strategy to execution, we operate at the intersection of business and technology. Our integrated services and platforms are built on the foundation of AI & Data, digital engineering, and operations transformation, bringing deep domain expertise and full stack capabilities together. With over 4,000 people in the US, Canada, Latin America and India and a large, diverse portfolio of Fortune 500 enterprises and fast-growing clients worldwide, we work across banking & payments, wealth management, telecom, hi-tech and life sciences.
Please visit the linke to know about Incedo:
Location- San Jose, CA
Title- Lead Cloud Engineering and Production Operations Engineer
Job Description:
This role acts as a hands-on technical lead, driving cloud engineering initiatives, automating infrastructure, and ensuring high-availability and performance across customer-facing systems. The Lead Engineer will collaborate with IT, DevOps, and Software Engineering teams to build secure, scalable environments that support continuous delivery and rapid innovation.
Reporting to the Associate Director of IT and Infrastructure, this position combines deep technical execution with mentoring responsibilities—balancing architectural vision with day-to-day operational excellence.
Key Responsibilities:
Cloud Infrastructure and Engineering
- Design, deploy, and manage hybrid and cloud infrastructures (OCI, AWS, Azure, on-prem) to support production and enterprise systems
- Implement infrastructure-as-code (IaC) using Terraform or CloudFormation to ensure repeatable, secure, and automated deployments
- Develop and maintain CI/CD-ready environments that support rapid build, test, and release cycles for engineering teams
- Partner with network and security teams to implement resilient, compliant architectures
Production Operations and Reliability
- Serve as technical lead for production systems, ensuring stability, performance, and scalability
- Establish monitoring, logging, and alerting frameworks to improve visibility and reduce mean time to detection (MTTD) and resolution (MTTR)
- Participate in incident response, root cause analysis, and reliability improvement efforts
- Collaborate with Engineering and SRE teams to define SLIs, SLOs, and performance metrics for critical services
Automation and CI/CD Enablement
- Develop and enhance deployment pipelines (e.g., Jenkins, GitLab, ArgoCD) to automate software delivery and environment provisioning
- Embed security, compliance, and testing gates into CI/CD workflows
- Implement configuration management and orchestration tools such as Ansible, Chef, or Puppet to manage infrastructure at scale
- Drive efficiency through self-healing systems, auto-scaling, and infrastructure automation
Operational Leadership and Collaboration
- Lead day-to-day production operations activities, mentoring junior engineers on cloud and reliability best practices
- Act as a technical bridge between Infrastructure, Security, and Application Engineering teams
- Contribute to capacity planning, cost optimization, and production readiness reviews
- Maintain documentation, runbooks, and standard operating procedures for production systems
Qualifications:
- Bachelor’s degree in Computer Science, Information Systems, or equivalent experience
- 7+ years of experience in cloud and infrastructure engineering, with at least 2–3 years in a lead or senior engineer capacity
- Deep expertise in OCI (preferred) AWS or Azure (networking, compute, storage, IAM, and monitoring)
- Proven experience with production-scale operations and hybrid cloud deployments
- Proficiency in:
- Infrastructure-as-code (Terraform, CloudFormation)
- CI/CD and DevOps pipelines (Jenkins, GitLab, ArgoCD)
- Containers and orchestration (Kubernetes, Docker)
- Observability tools (Datadog, Prometheus, Grafana, ELK)
- Scripting languages (Python, Bash, PowerShell)
- Strong troubleshooting skills and the ability to lead through high-impact incidents
- Excellent communication and collaboration skills across cross-functional teams
Preferred Experience:
- Experience supporting high-availability SaaS or production environments
- Knowledge of FinOps, cloud governance, and cost optimization practices
- Familiarity with DevSecOps principles, Zero Trust, and automated compliance frameworks
- Exposure to AI/ML pipeline infrastructure or high-throughput data systems
AI Use Guidelines for Interviews: Our interviews are designed to reflect your own skills and thinking. The use of AI or recording tools during live interviews is not permitted unless explicitly invited by the interviewer or approved in advance as part of a reasonable accommodation. If these tools are used inappropriately or in a way that misrepresents your work, your application may not move forward in the process.
Hybrid
Targeted compensation guideline: Compensation will vary based on number of factors, including market demand for specific skills, role type, job level, and individual qualifications. Final salary offers are determined by considerations including, but not limited to, subject matter expertise, demonstrated skill level, relevant experience, geographic location, education, certifications, and training.
- NVIDIA is seeking an Implementation Methodology Engineer to join its VLSI team in Santa Clara, California. The role involves front-end design implementation methodologies and collaboration with designers to develop innovative solutions. The ideal candidate has a BS or...Suggested
$147k - $220k
...A leading cybersecurity company in Santa Clara is seeking an experienced QA/Automation Engineer to validate core networking and security features. Candidates must have a graduate degree and over 8 years of relevant experience, showcasing strong automation skills in Python...Suggested$140k - $185k
...Principal Cloud Engineering and Production Operations Engineer page is loaded## Principal Cloud Engineering and Production Operations Engineerlocations:... ...production workloads, enterprise systems, and CI/CD pipelines* Lead the adoption of infrastructure-as-code (IaC) using...SuggestedFor subcontractorLocal area$160k - $180k
TigerGraph in Milpitas, California is looking for a QA Technical Leader. This role involves leading the QA team, maintaining database engine quality, and establishing automation testing processes. Candidates should have a Bachelor’s degree in Computer Science and over...SuggestedRemote job$184k - $287.5k
Overview NVIDIA DGX Cloud is building and operating large-scale GPU infrastructure for AI research and production workloads. We are looking for Senior Software Engineers to help build the automation, tooling, and operational systems that make GPU clusters reliable, scalable...Suggested- ..., California, is seeking a seasoned leader to manage a team of Engineering Program Managers for iCloud Platform. The ideal candidate will... ...program management and will be pivotal in driving the execution of cloud services. Responsibilities include team development, program...
$272k - $431.25k
NVIDIA DGX Cloud is scaling GPU infrastructure across internal... ...for Principal Software Engineers to help shape the technical direction for production engineering, Kubernetes-based operations, automation, and... ...can define architecture, lead through influence, build...$186.06k
...designs setup from standalone servers to cloud servers, which can leverage edge... ...analytics and optimization. Support the engineering team to migrate to cloud computing resources... ...resources while maintaining low cost of operation. Qualifications Requires Bachelor’s degree...Relocation- NVIDIA Gruppe is seeking a Senior Network Engineer to develop and manage a robust cloud network infrastructure. You will lead the design and implementation of large-scale L3 networks across data centers and corporate IT. Ideal candidates will have over 8 years of networking...
- NVIDIA Gruppe is seeking experienced Senior Software Engineers to join their production engineering team in Santa Clara, California. The role involves building automation and operational systems for GPU clusters, with a focus on Kubernetes and reliability practices. The...
$272k - $431.25k
NVIDIA Gruppe is seeking a Principal Software Engineer to shape the technical direction of our GPU infrastructure in Santa... .... You will define the technical strategy for DGX Cloud cluster operations and lead the design and implementation of critical systems. The ideal...$147k - $237.5k
...Networks, Inc. is seeking a Principal Software Engineer in Santa Clara, California, to drive the... ...leadership and delivery of high-scale cloud security solutions. In this high-impact... ...network security challenges, manage the full product lifecycle, and collaborate across various...$112k - $137k
...A leading cybersecurity company in Santa Clara seeks an experienced Software Testing Engineer to design and validate cloud security products. The ideal candidate holds a Bachelor's in Computer Science and has over 10 years of experience in software testing, particularly...Work experience placement- ...seeking a Principal Site Reliability Engineer in Santa Clara, CA. This role... ...infrastructure and ensuring applications are production-ready, scalable, and reliable.... ...and researchers, design secure cloud infrastructure, automate processes, and lead root cause analysis. Ideal...
- ...for application microservices deployed in both on-prem and on Cloud. Setup test tools to validate environment, application and solutions... ...guidance for team members and coworkers on development and operations. Communicate and highlight any potential risks...
$190.9k - $334.1k
...and experienced Automation Engineering Tech Lead to own and elevate Veza's test... ...engineering excellence and product quality, you will set the automation... ...ships software. You will operate with startup-level ownership... .... Experience with AWS and cloud‑native infrastructure....Flexible hoursShift work$80k
...A leading technology company based in Sunnyvale, California, is seeking an Engineer for Cloud Operations & Support. The successful candidate will deploy and maintain cloud services while developing automation tools to enhance operational efficiency. A Bachelor’s degree...$170k
...thrive because of their differences, not despite them. Staff Cloud Operations Engineer - San Jose HQ Extreme’s Cloud Operations team is a group... ...Operations engineer with strong working experience in production operation and deployment automation. You will work with the...Work experience placementLocal areaRelocation$170k
...A leading technology company is seeking a Staff Cloud Operations Engineer in San Jose, CA. The ideal candidate will manage and maintain cloud service infrastructure, troubleshoot issues, and design deployment automation solutions. Candidates should have a Bachelor's degree...Relocation$180k - $225k
...globally trust our end-to-end, cloud-driven networking... ...week Extreme’s Cloud Operations team is a group of talented engineers passionate about... ...strong work experience in production operation, as well as cloud... ...position is responsible for leading cloud infrastructure...Work experience placementWork at officeLocal area2 days per week1 day per week- ...An established industry player is seeking a skilled software engineer with a strong focus on platforms and systems in the analytics domain. This role involves engineering and maintaining a hybrid cloud analytics data platform while collaborating with cross-functional...
- ...Clara is looking for an IT Helpdesk and Operations Engineer. This role involves supporting and... ...systems, managing security protocols, and leading significant IT projects. Candidates should... ...operations experience, a background in cloud solutions, and significant...
$182.13k - $220.9k
...Cupertino, California is seeking a skilled automation engineer to develop and maintain testing frameworks for their diverse product line. The ideal candidate will possess a... ...in building automation tools, working with cloud platforms, and conducting performance testing....- ...A leading cybersecurity company is seeking a Principal Software Engineer in San Jose. The role involves architecting a scalable test automation framework, collaborating across teams to develop cloud-based solutions, and mentoring junior engineers. The ideal candidate...
- ...Cisco is seeking a Senior Software Engineer in San Jose, CA, to lead API development and enhance their AI platform. The ideal candidate has over 5... ...technical leadership, collaborating with teams, and managing cloud infrastructure. Cisco offers competitive benefits,...Flexible hours
- ...looking for a strong and experienced software engineer who has a focus on Platforms/Systems... ...analytics data platform based on a hybrid cloud infrastructure. Work collaboratively with... ...required. Key Qualifications Experience leading teams and working with multiple stakeholders...
- ...An innovative firm is seeking a Wireless Engineer to join their dynamic team in Sunnyvale. This role involves designing and developing... ...tests, and collaborating with cross-functional teams to ensure product performance and stability. The ideal candidate will have a solid...
$229.9k - $262.4k
...Sr. Lead AI Engineer (Gen AI Platform Services) Overview: At Capital One, we are creating... ...leading capabilities with breakthrough product experiences and scalable, high-performance... ...and responsible AI solutions on cloud platforms (e.g. AWS, Google Cloud, Azure...Full timePart timeLocal area$74.04k - $148.08k
...Test Automation Engineer At Capgemini Engineering, the world leader... ...CD pipelines. Your role Lead development and execution of... ...capabilities in AI, generative AI, cloud and data, combined with its... ...Professional Community: Products & Systems Engineering Capgemini...Permanent employmentFull timeContract workLocal area- ...A leading financial services firm in San Jose seeks a Distinguished AI Engineer to design and implement robust AI platforms. The role requires extensive experience in developing scalable AI solutions with a focus on responsibility and efficiency. Ideal candidates will...
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Lead Cloud Engineering and Production Operations Engineer. Be the first to apply!
- lead engineer San Jose, CA
- lead algorithm engineer San Jose, CA
- lead infrastructure engineer San Jose, CA
- lead operating engineer San Jose, CA
- senior cloud solutions architect San Jose, CA
- senior cloud security engineer San Jose, CA
- cloud network engineer San Jose, CA
- big data cloud engineer San Jose, CA
- cloud architect San Jose, CA
- cloud engineering manager San Jose, CA


