Senior Kubernetes Platform Engineer - AI/ML Infrastructure
$137k - $200.5kCisco
The application window is expected to close on: 06/12/2026
Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received .
Senior Kubernetes Platform Engineer - AI/ML Infrastructure - hybrid (2013054)
***hybrid role requires some work activity on-site at Research Triangle Park NC, Dallas TX or Allen TX office
Join our Platform Engineering team to design, build, and operate large-scale, on-prem Kubernetes infrastructure powering next-generation AI/ML platforms, including GPU-enabled environments for both traditional ML and state-of-the-art LLM workloads.
You will be pivotal in defining and evolving a highly scalable Kubernetes platform that serves as the foundation for AI/ML workloads. This role combines deep Kubernetes platform engineering with AI/ML infrastructure enablement, ensuring performance, reliability, and scalability across distributed systems.
You will lead technical direction across Kubernetes control plane operations, cluster lifecycle management, and platform extensibility, while working closely with data scientists, ML engineers, and infrastructure teams to support production AI workloads at scale.
This is a senior individual contributor role focused on platform ownership, engineering excellence, and driving reliability and automation across complex distributed environments.
Your Impact / Core Responsibilities
Architect, build, and operate large-scale on-prem Kubernetes platforms (OpenShift/Anthos), including control plane and etcd lifecycle management
Define and evolve scalable, multi-tenant platform architecture supporting AI/ML and GPU-based workloads
Enable and optimize ML workloads including training, inference, and LLM deployment pipelines on Kubernetes
Build platform extensions using Kubernetes controllers, operators, CRDs, and Golang-based services
Implement Infrastructure as Code and automation to improve scalability, consistency, and operational efficiency
Drive AIOps capabilities using telemetry, automation, anomaly detection, and self-healing systems for platform reliability
Improve observability (metrics, logs, traces) and optimize resource utilization, scheduling, and cluster performance
Partner with ML engineers and data scientists to operationalize ML workflows and ensure platform readiness for AI workloads
Participate in on-call rotations, owning incident response, reliability, and continuous operational improvement
Mentor engineers and contribute to defining platform engineering standards and best practices
Minimum Qualifications
8+ years of software engineering experience
4+ years of hands-on Kubernetes production experience with control plane ownership
Strong experience operating on-prem or self-managed Kubernetes environments
Deep expertise in etcd management (backup, restore, recovery, upgrades)
Strong proficiency in Go with experience building Kubernetes controllers, operators, CRDs, and webhooks
Deep understanding of Kubernetes internals (API server, scheduler, controller loops, reconciliation)
Experience supporting AI/ML or GPU-based workloads on Kubernetes platforms
Proven experience operating and debugging large-scale distributed systems
Experience participating in on-call rotations and production incident management
Preferred Qualifications
Experience with bare-metal or enterprise on-prem infrastructure at scale
Exposure to AI/ML platforms and tooling (e.g., Kubeflow, MLflow, distributed training systems)
Experience building internal developer platforms or platform-as-a-service (PaaS) systems
Familiarity with AIOps concepts such as automated remediation and predictive operations
Experience applying data-driven or ML-based techniques for system reliability or capacity planning
Contributions to Kubernetes, CNCF, or other open-source ecosystems
Why Cisco?
At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint.
Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere.
We are Cisco, and our power starts with you.
Message to applicants applying to work in the U.S. and/or Canada:
The starting salary range posted for this position is $137,000.00 to $200,500.00 and reflects the projected salary range for new hires in this position in U.S. and/or Canada locations, not including incentive compensation*, equity, or benefits.
Individual pay is determined by the candidate's hiring location, market conditions, job-related skillset, experience, qualifications, education, certifications, and/or training. The full salary range for certain locations is listed below. For locations not listed below, the recruiter can share more details about compensation for the role in your location during the hiring process.
U.S. employees are offered benefits, subject to Cisco's plan eligibility rules, which include medical, dental and vision insurance, a 401(k) plan with a Cisco matching contribution, paid parental leave, short and long-term disability coverage, and basic life insurance. Please see the Cisco careers site to discover more benefits and perks. Employees may be eligible to receive grants of Cisco restricted stock units, which vest following continued employment with Cisco for defined periods of time.
U.S. employees are eligible for paid time away as described below, subject to Cisco's policies:
10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees
1 paid day off for employee's birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness determined by Cisco
Non-exempt employees** receive 16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees
Exempt employees participate in Cisco's flexible vacation time off program, which has no defined limit on how much vacation time eligible employees may use (subject to availability and some business limitations)
80 hours of sick time off provided on hire date and each January 1st thereafter, and up to 80 hours of unused sick time carried forward from one calendar year to the next
Additional paid time away may be requested to deal with critical or emergency issues for family members
Optional 10 paid days per full calendar year to volunteer
For non-sales roles, employees are also eligible to earn annual bonuses subject to Cisco's policies.
Employees on sales plans earn performance-based incentive pay on top of their base salary, which is split between quota and non-quota components, subject to the applicable Cisco plan. For quota-based incentive pay, Cisco typically pays as follows:
.75% of incentive target for each 1% of revenue attainment up to 50% of quota;
1.5% of incentive target for each 1% of attainment between 50% and 75%;
1% of incentive target for each 1% of attainment between 75% and 100%; and
Once performance exceeds 100% attainment, incentive rates are at or above 1% for each 1% of attainment with no cap on incentive compensation.
For non-quota-based sales performance elements such as strategic sales objectives, Cisco may pay 0% up to 125% of target. Cisco sales plans do not have a minimum threshold of performance for sales incentive compensation to be paid.
The applicable full salary ranges for this position, by specific state, are listed below:
New York City Metro Area:
$165,000.00 - $277,600.00
Non-Metro New York state & Washington state:
$146,700.00 - $247,000.00
- For quota-based sales roles on Cisco's sales plan, the ranges provided in this posting include base pay and sales target incentive compensation combined.
** Employees in Illinois, whether exempt or non-exempt, will participate in a unique time off program to meet local requirements.
Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis.
Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.
$137k - $200.5k
...sufficient number of applications are received . Senior Kubernetes Platform Engineer - AI Infrastructure - hybrid (2013580) *** hybrid role requires some on... ...infrastructure powering next-generation AI/ML platforms, including GPU-enabled environments for both...SeniorFull timeTemporary workWork at officeLocal areaFlexible hours- ...Senior Kubernetes Platform Engineer - AI/ML Infrastructure Join our Platform Engineering team to design, build, and operate large-scale, on-prem Kubernetes infrastructure powering next-generation AI/ML platforms, including GPU-enabled environments for both traditional...Senior
- ...Company is looking for a Software Engineer in Plano, Texas, to design and enhance cloud-native platform components for AI/ML workloads. Candidates should... ...in developing and supporting infrastructure for AI/ML, particularly with AWS, Kubernetes, and CI/CD pipelines. The...SeniorFlexible hours
- ...and enhance cloud‑native platform components for AI/ML and LLM workloads in production... ...services. Partner with engineering, data, and cybersecurity... ...in cloud platforms or infrastructure engineering. 2+ years of... ...Hands‑on experience with Kubernetes (EKS or equivalent), containers...Senior
- ...is seeking a Cloud Database Platform Engineer to join their team. As a... ...Engineer, you will be part of the Infrastructure Support team supporting... ...on containers within Kubernetes / OpenShift platforms. Implement... ...innovative automation and AI-driven applications....SeniorWeekly pay3 days per week
- Toyota Deutschland GmbH is seeking a Senior AI/ML Platform Engineer in Plano, Texas to design and build scalable AI/ML platforms. The role involves... ...in software engineering and expertise with AWS, Kubernetes, and AI/ML workloads. This position offers a collaborative...Senior
- Senior Staff Engineer, Software Date: May 25, 2026 General... ...optimization of large‑scale infrastructure supporting AI workloads across... ...AI infrastructure platforms supporting GPU‑... ...Python or Go within Kubernetes‑based environments... ...Experience with AI/ML infrastructure, GPU...Senior
- ...We are seeking a Senior AI Cloud Architect with... ...agentic AI, and cloud infrastructure, while partnering closely with engineering and product teams from... ...documents, and AI/ML white papers Build... ...deploy applications on Kubernetes-based cloud platforms, ensuring...SeniorFull time
- ...innovation, building the platforms, data products, and AI capabilities that give... ...Role Summary The Senior AI/ML Engineer is a hands-on technical... ...containerization (Docker/Kubernetes), and CI/CD pipelines.... ...Vertex AI) and associated infrastructure services....SeniorDay shift
$60 - $70 per hour
...expertise in ideating and developing AI/ML applications on prediction,... .... Create and maintain data infrastructure to ingest, normalize, and... ...Azure architecture and Azure Kubernetes ~3 years of experience with AI platform engineering, ModelOps About NTT DATA...Hourly payRemote work- ...Description Job Details: Role : Sr AI Platform Engineer Location: Bellevue, Frisco,... ...engineering: hands-on building data infrastructure for AI and ML use cases (RAG, agent tooling, feature... ...Communication: works directly with senior architects, product managers, and...Senior
$123.5k - $206.75k
...Overview The AI Observability Architect is a senior technical... ...observability platform that spans the... ...strategic and engineering authority for end... ...hours. Drive infrastructure-as-code (IaC) practices... ...Science, AI/ML, Data Science,... ...proficiency in Kubernetes, service mesh,...SeniorShift work$135k - $155k
...Department OverviewThe AI Services team creates... ...system integrations, and infrastructure deployment. Our... ...will join a talented engineering organization with a track... ...and implement scalable ML/AI systems and pipeline... ...containerized solutions (Docker, Kubernetes) and serverless...SeniorContract workRemote workRelocation- ...Financial Services Enterprise Platforms team is looking for a... ...and highly motivated Lead AI/ML Platform Engineer . The primary responsibility... ...‑ready MLOps and LLMOps infrastructure that supports model training... ...Experience with Kubernetes, containerization, and CI/...
$140k - $150k
...Technical Lead Software Engineer with deep... ...architecture, agentic AI systems, strong... ...rapidly delivering AI/ML solutions into... ...Experience with cloud platforms (AWS, Azure, or... ...technologies (Docker, Kubernetes) as a primary... ...Advanced Cloud & K8s Infrastructure: Deep expertise in...Work experience placementWork at officeRemote work$10k
...AI Engineer Opportunity This is an exciting and pivotal... ...with AI engineers, platform teams, and partners who... ...closely with AI leads, ML engineers, and... ...Collaborate with the AI Infrastructure team to architect robust... ...containerized environments (e.g., Kubernetes, Docker) Bachelor's...SeniorWork at officeWork from home- Senior Engineer, Enterprise AI Job Overview: The Senior Engineer, Enterprise AI helps design, build, and scale AI-powered applications and platforms that improve productivity and intelligent decision-making... ...in software engineering, AI/ML engineering, platform engineering...SeniorFull timeTemporary workPart timeWork experience placementLocal areaFlexible hours
- ...Global is seeking a Senior Cloud Engineer, you will consult and... ...security controls and infrastructure. You will also conduct... ...report to the Cloud Platform Team Lead. The Day... ..., high-performance AI infrastructure on Microsoft... ...with Deployments of ML models - VERY STRONG...SeniorWork experience placementFreelanceFlexible hours
- ...4 Experience – 8+ year We are seeking a highly skilled and experienced AI/ML Platform Engineer to build and manage our end to end Machine Learning (ML) and IT operations (AIOps) infrastructure on Google Cloud Platform (GCP). In this role, you will be a key player in...Contract work
$125.7k - $213.9k
...the Generative AI revolution, dedicated... ...a Lead AI Engineer who is a senior technical leader... ...robust, scalable AI platforms that leverage... ...-time streaming infrastructures Advanced agent... ...to AI Engineers, ML Engineers, and... ...orchestration (Docker, Kubernetes) and modern CI/...SeniorRemote work- Toyota Deutschland GmbH is seeking a Lead AI/ML Platform Engineer to design and build scalable solutions for enterprise AI/ML capabilities. You will work closely with various teams to solve infrastructure challenges and enhance operational resilience. The ideal candidate...
- ...Senior Principal Engineer, Infrastructure Platform Architect At RTX, the world largest aerospace and defense company, 185,000 great minds are united by... ...Experience with containerization using technologies such as Kubernetes, Docker Experience developing, testing &...SeniorRelocation
- ...hiring a Technical Program Manager to oversee large-scale technology initiatives involving Cloud infrastructure and AI/ML platforms. You will work closely with engineering teams and stakeholders to ensure the successful delivery of programs aligned with strategic goals...Senior
- ...network Join our DevOps / Platform Engineer Expert Network to connect with leading AI labs and companies seeking your... ...experience in CI / CD pipelines, cloud infrastructure (AWS / GCP / Azure),... ...containerization & orchestration (Docker / Kubernetes) Strong communication skills...Contract workRemote work
$140k - $150k
GlobalLogic is hiring a Principal Software Engineer/Architect in Allen, TX, specializing in AI/ML. The ideal candidate will have 10-15 years of experience, strong expertise in cloud-native technologies, and a proven track record in leading technical projects. The role...SeniorRemote job$100k - $150k
Bright Vision Technologies is seeking an AI Data Infrastructure Engineer to operate large-scale data systems for AI training pipelines. This role requires expertise in data engineering and a deep understanding of AI workloads. The position is fully remote with a competitive...SeniorRemote job- Senior Principal Engineer, Infrastructure Platform Architect Platform Engineering group at RTX seeks an experienced Senior Principal Engineer to architect,... .... Preferred: Experience with containerization using Kubernetes or Docker. Preferred: Experience developing, testing...SeniorRelocation
- ...and deploying agentic AI systems and AI/ML models into... ...Experience with cloud platforms (AWS, Azure, or GCP)... ...technologies (Docker, Kubernetes) as a primary deployment... ...Cloud & K8s Infrastructure: Deep expertise in Kubernetes... ...a team of software engineers, fostering a culture...
$160k - $184k
...Job Description Job Description Senior Platform IAC Engineer (Infrastructure as Code) Location: Richardson, TX (Onsite) | Compensation: $160,00... ...platform container development and orchestration such as Kubernetes and Docker Support the design, development, and...SeniorWork experience placement- ...AI Ops Senior Technical Architect Location: Richardson... ...delivery of the AIOps platform across observability,... ...driven operations. Guide engineering teams, drive... ...analytics, automation, and AI/ML adoption. Architect... ...architecture (AWS/Azure/GCP), Kubernetes platform patterns, and...SeniorFull timeContract workWork at office3 days per week
Do you want to receive more vacancies?
Subscribe and receive similar vacancies to Senior Kubernetes Platform Engineer - AI/ML Infrastructure. Be the first to apply!
- infrastructure engineer Allen, TX
- infrastructure developer Allen, TX
- senior lead project manager Allen, TX
- senior manager quality engineering Allen, TX
- senior network engineer remote Allen, TX
- senior project manager contract Allen, TX
- senior mulesoft developer Allen, TX
- senior leadership Allen, TX
- senior principal cloud computing engineer Allen, TX
- senior vice president of operations Allen, TX



