Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Staff Software Engineer - Managed Kubernetes

AI Chopping Block, Inc.

Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serving tens of thousands of customers. Our customers range from AI researchers to enterprises and hyperscalers. Lambda's mission is to make compute as ubiquitous as electricity and give everyone the power of superintelligence. One person, one GPU. If you'd like to build the world's best AI cloud, join us. *Note: This position requires presence in our San Francisco, San Jose, or Bellevue office location 4 days per week; Lambda’s designated work from home day is currently Tuesday. About the Role Lambda is building the AI Cloud of the future. We are seeking a Staff Engineer to help our development of our Managed Kubernetes platform. Think GKE, but purpose-built for AI workloads and running on bare metal. This is a foundational technical leadership role where you will shape the infrastructure that powers the next generation of AI training and inference at scale. As a Staff Engineer on our Orchestration team, you will collaborate to help drive the technical vision for Lambda's managed orchestration services, including Managed Kubernetes, Managed Slurm on Kubernetes, and higher-level platform services for inference and AIOps. You'll work at the intersection of distributed systems, GPU-accelerated computing, and Cloud Native infrastructure to build systems that are reliable, performant, and elegantly simple for our customers. This is not a role for someone who just operates Kubernetes; it is a technical leadership role for an engineer who has synthesized the core domains of infrastructure (compute, network, storage, security) and can design holistic solutions across all of them. You'll be working closely with NVIDIA's open-source ecosystem, and partnering with internal teams across the stack to deliver a world‑class managed platform. What You'll Do: Product Engineering Drive technical vision for Lambda's Managed Kubernetes bare‑metal platform, including control plane scalability, multi‑tenancy, cluster lifecycle management, and high availability Integrate and extend NVIDIA's open‑source ecosystem: GPU Operator, Network Operator, DCGM, NCCL, and emerging projects like AICR and Topograph for topology‑aware scheduling and placement Design GPU‑aware orchestration systems Lead development of services that power our managed services Inform on and help with networking solutions for AI workloads: CNI integration (Cilium, Multus), high‑performance fabrics (InfiniBand, RoCE), RDMA, and GPUDirect. You will work closely with our Network team to define and drive requirements Inform and help with storage architecture requirements for AI workloads. You will partner with Storage teams on what managed K8s, Slurm, and future services need Build the foundation for Managed Slurm on Kubernetes, enabling traditional HPC workloads to run seamlessly alongside Kubernetes workload Design higher‑level platform services for inference, including model serving infrastructure, autoscaling based on inference load, and multi‑model deployment patterns Design self‑healing systems and automation for incident response, root cause analysis, and platform resilience Lead chaos engineering efforts to validate system behavior under failure conditions at scale Establish operational excellence for a managed service: upgrade automation, security patching, and zero‑downtime maintenance Cross‑Functional Infrastructure Leadership Serve as the technical bridge between Orchestration and other infrastructure teams (Network, Storage, Security), translating platform requirements into actionable specifications Drive infrastructure‑wide decisions that enable successful managed services. You’re someone who understands what’s needed end‑to‑end, not just at the Kubernetes layer Provide input on bare‑metal provisioning, network topology, and storage systems to ensure they meet the needs of managed the services being built by the Orchestration organization Champion consistency and standardization across Lambda's infrastructure stack Work directly with customers and internal teams to understand existing deployments and chart a path to the managed platform Technical Leadership Set technical direction for Kubernetes services across the Orchestration team, influencing roadmap and prioritization Drive reviews and design sessions, ensuring we build systems that are scalable, maintainable, and aligned with customer needs Mentor and grow engineers, establishing best practices for Kubernetes development, distributed systems, and Cloud Native engineering Collaborate cross‑functionally with Network, Storage, Security, and Customer Success teams Engage with NVIDIA and the open‑source community to stay current on GPU orchestration technologies and contribute back where appropriate Represent Lambda externally through technical blog posts, conference talks, and strategic customer engagements Shape our AIOps vision: design intelligent systems for automated capacity planning, anomaly detection, and predictive maintenance of cloud infrastructure Who You Are You are a creative, innovative engineer who operates at high velocity. You don't just solve problems. You find elegant solutions and ship them quickly. You embrace modern tools and AI‑assisted development (like Claude Code) to accelerate your productivity and multiply your impact. You're energized by building new things, not maintaining the status quo. Required Qualifications 10+ years of experience in software engineering, platform engineering, or SRE, with at least 5 years focused on Kubernetes at scale Expert‑level understanding of Kubernetes internals: API machinery, controllers, schedulers, operators, CRDs, CSI, CNI, and the extension patterns that make Kubernetes powerful Holistic infrastructure expertise: you've synthesized knowledge across compute, networking, storage, and security, not just Kubernetes in isolation. You can build solutions that span the full stack Strong software engineering skills in Go (required) and Python; you write production‑quality code, not just scripts Deep experience with GPU orchestration in Kubernetes: NVIDIA GPU Operator, device plugins, DCGM, MIG, time‑slicing, and GPU‑aware scheduling. Familiarity with NVIDIA Network Operator and GPUDirect is strongly preferred Proven track record of technical leadership: driving design decisions across teams, mentoring engineers, and influencing infrastructure direction beyond your immediate scope Deep experience designing and operating managed services or multi‑tenant platforms. You understand what it takes to run infrastructure for external customers Strong understanding of distributed systems principles: consensus, fault tolerance, consistency models, and graceful degradation Experience with observability at scale: Prometheus, Grafana, distributed tracing, and building actionable alerting systems Solid knowledge of Linux systems and networking (L2‑L7), including high‑performance networking concepts (RDMA, InfiniBand, RoCE) Experience with infrastructure‑as‑code and GitOps workflows Preferred Qualifications Experience building and operating managed Kubernetes services (GKE, EKS, AKS, or similar) or working on Kubernetes control plane components Hands‑on experience with NVIDIA's open‑source ecosystem beyond GPU Operator: Network Operator, NCCL tuning, Topograph, AICR, or similar emerging projects Familiarity with HPC and traditional job schedulers (Slurm) and Kubernetes‑native batch scheduling (KAI, Volcano, Kueue) Background in confidential computing Experience migrating customers or workloads from legacy/bespoke infrastructure to standardized platforms Contributions to CNCF projects, Kubernetes SIGs, or NVIDIA open‑source projects Familiarity with security and compliance in multi‑tenant environments: RBAC, Pod Security Standards, network policies, workload isolation Background in ML infrastructure: training clusters, inference serving, simulation Why Lambda Lambda is building the essential infrastructure for the AI era. We're not just another cloud provider: we're a company founded by ML practitioners, for ML practitioners. Our customers include leading AI research labs and enterprises pushing the boundaries of what's possible with artificial intelligence. What makes this role special You’ll be building core platform services the world’s largest AI companies will consume NVIDIA partnership: Deep integration with NVIDIA's GPU and networking stack, working with cutting‑edge open‑source tooling Real technical challenges: Massive scale GPU clusters and the unique demands of AI workloads Cross‑stack influence: Shape not just Kubernetes, but the network, storage, and compute infrastructure that supports it Direct impact: Your work enables AI breakthroughs. Every model trained on Lambda benefits from systems you build World‑class team: Work alongside engineers with deep expertise in ML, systems, and infrastructure Salary Range Information The annual salary range for this position has been set based on market data and other factors. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description. About Lambda Founded in 2012, with 500+ employees, and growing fast Our investors notably include TWG Global, US Innovative Technology Fund (USIT), Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In‑Q‑Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, Gradient Ventures, Mercato Partners, SVB, 1517, and Crescent Cove We have research papers accepted at top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG Our values are publicly available: We offer generous cash & equity compensation Health, dental, and vision coverage for you and your dependents Wellness and commuter stipends for select roles 401k Plan with 2% company match (USA employees) Flexible paid time off plan that we all actually use A Final Note You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills. Equal Opportunity Employer Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law. #J-18808-Ljbffr AI Chopping Block, Inc.

Vacancy posted 3 days ago
Similar jobs that could be interesting for youBased on the Staff Software Engineer - Managed Kubernetes in Bellevue, WA vacancy
  • $180.5k - $225.6k

    Job Overview As a Staff Software Engineer and Tech Lead on the Compute Infra team, you will design and build the systems that power...  ...launches tens of millions of VMs per day and manages thousands of Kubernetes clusters, delivering extreme elasticity, reliability... 
    Suggested
    Local area

    Cacheflow

    Bellevue, WA
    5 days ago
  • IBM Computing is seeking a Staff Software Engineer for the Secure Compute team, located in Bellevue, WA. You will lead the development of a cloud-native compute platform built on Kubernetes, supporting both trusted and untrusted workloads at scale. The ideal candidate... 
    Suggested

    IBM Computing

    Bellevue, WA
    4 days ago
  • $230k - $270k

     ...operational overhead through automation. We manage thousands of databases and hundreds of...  ...downtime during market hours. As a Staff Software Engineer , you will design and evolve the core...  ...services Hands‑on experience with Kubernetes, AWS managed services (RDS, DynamoDB),... 
    Suggested
    Work at office
    Flexible hours
    Shift work
    3 days per week

    I did my part and supported the Regular Toilet

    Bellevue, WA
    4 days ago
  • $168.75k

     ...ecosystem of devices and cloud software. Like our products, we work...  ...everyone involved. As a Staff Software Engineer on the Justice &...  ...corrections officers better manage enormous workloads. You will...  ...orchestration technologies (Docker, Kubernetes, etc) Proven ability to... 
    Suggested
    Work experience placement
    Work at office
    Remote work

    Accreditation Council for Graduate Medical Education

    Seattle, WA
    5 days ago
  • $140.6k - $173.1k

    Staff Software Engineer - Java | Kafka | Kubernetes page is loaded## Staff Software Engineer - Java | Kafka | Kuberneteslocations: New Jersey - Remote Office...  ...operational complexities like employee benefits, managing and mobilizing fleets, and streamlining payments.With... 
    Suggested
    Work at office
    Remote work

    WEX

    Seattle, WA
    2 days ago
  • SmithRx is seeking a Sr. Cloud Engineer in Seattle to build and manage cloud-based infrastructure, focusing on automated deployment and monitoring solutions...  ...extensive knowledge in container technologies like Kubernetes. This position offers a competitive benefits package,... 
    Remote job

    SmithRx

    Seattle, WA
    5 days ago
  • Pay-i is seeking a skilled DevOps Engineer based in Seattle to design and manage cloud infrastructures across AWS, Azure, and GCP. This role involves collaborating...  ...candidate has strong experience with Terraform, Kubernetes, and Azure DevOps. Join an innovative team and... 

    PassFort

    Seattle, WA
    3 days ago
  • ElastixAI INC. in Seattle seeks an Inference Infrastructure Software Engineer to manage the cloud and Kubernetes backbone behind their Token-as-a-Service platform. The ideal candidate will have a strong background in Kubernetes, AWS, and productivity automation, focusing... 

    ElastixAI INC.

    Seattle, WA
    4 days ago
  • $182.4k - $247k

     ...insights to improve their business. Founded by engineers — and customer obsessed — we leap at...  ...of traditional SQL query engines. As a software engineer on the Runtime team at...  ...Azure Blob Store. Delta Lake : A storage management system that combines the scale and cost‑... 
    Local area
    Worldwide

    Menlo Ventures

    Bellevue, WA
    5 days ago
  •  ...leader to join a skilled team of engineers responsible for the...  ...available Electricity Market Management System (MMS). The team includes...  ...in power systems, databases, software engineering, and...  ...Ansible. Exposure to Kafka, Kubernetes, and containerization technologies... 
    Contract work

    PS0178 GE Energy Management Services, LLC

    Bellevue, WA
    5 days ago
  • $168.75k - $270k

     ...Staff Software Engineer Join Axon and be a force for good. At Axon, we're on a mission to protect life. We're explorers, pursuing society'...  ...responders in order to make them more effective and efficient in managing life-and-death situations. As a staff software engineer,... 
    Work experience placement
    Work at office
    Remote work

    Axon

    Seattle, WA
    1 day ago
  • $132k - $264k

     ...Position Summary... As a Staff Software Engineer on the Marketplace Platform team, you will play...  ...integrations. You will oversee the management of the Walmart Marketplace Platform API...  ...tools like Docker and Kubernetes. Exposure to developer productivity... 
    Full time
    Temporary work
    Part time

    Walmart

    Bellevue, WA
    5 days ago
  • $230k - $280k

     ...Continuous Threat Exposure Management (CTEM). The HackerOne Platform...  ..., and accountability. Staff Software Applied AI EngineerLocation:...  ...security . As a Staff AI Engineer , you'll help shape the evolution...  ...technologies (Docker, Kubernetes) for AI workloads. Familiarity... 
    Apprenticeship
    Work at office
    Local area
    Remote work
    Flexible hours
    Shift work
    1 day per week

    HackerOne

    Seattle, WA
    3 days ago
  • $184k - $230k

    Job Overview Business Area: Engineering Seniority Level: Mid-Senior level At Cloudera, we empower...  ...insights. With as much data under management as the hyperscalers, we’re the preferred...  ...at Cloudera! We are seeking a visionary Staff Engineer to take ownership of the next generation... 
    Work from home
    Flexible hours

    Cloudera

    Seattle, WA
    2 days ago
  • $208k - $260k

    ## Staff Software Engineer, High Value SendApplylocations: Seattle, Washington United Statestime type: Full timeposted on: Posted Todayjob requisition...  ...believe everyone deserves the freedom to access, move, and manage their money wherever life takes them. Since 2011, we've... 
    Work experience placement
    Work at office
    Worldwide
    Flexible hours
    3 days per week

    Remitly

    Seattle, WA
    2 days ago
  •  ...crafted Organizations to allow businesses to manage all their the company accounts in one...  ...Lead by example to uphold high engineering standards, and elevate quality and engineering...  ...requirement. We're looking for This is a Staff‑level role - that typically means 10+... 

    United States Digital Space LLC

    Seattle, WA
    1 day ago
  • $108k - $192k

    Staff Software Engineer - Intelligent Applications team page is loaded## Staff Software Engineer -...  ...enhance the developer experience and the management of our platform, and we are...  ...Experience: Experience with Cloud Foundry, Kubernetes, or building Platform-as-a-Service (PaaS... 
    Local area

    Broadcom Corporation

    Bellevue, WA
    4 days ago
  •  ...TekWissen is a global workforce management provider headquartered in Ann Arbor, Michigan...  ...and services Position: Senior AI Engineer Privacy Location: Bellevue, WA...  ...containerization (Docker) and orchestration (Kubernetes) to ensure scalable and reliable AI... 
    Temporary work

    Tekwissen

    Bellevue, WA
    5 days ago
  • $168.75k - $270k

     ...justice issues with our ecosystem of devices and cloud software. Like our products, we work better together. We...  ...to make them more effective and efficient in managing life-and-death situations. As a staff software engineer, you will be joining our team focused on building... 
    Work experience placement
    Work at office
    Remote work

    Accreditation Council for Graduate Medical Education

    Seattle, WA
    4 days ago
  • $180k - $315k

     ...computers. For the first time ever, you can manage and automate every part of the employee...  ...device operations easily, from managing software installation, updates and upgrades, all...  ...leverage Invest and grow multiple engineers Partner with stakeholders to identify the... 
    Work at office
    3 days per week

    Rippling

    Seattle, WA
    3 days ago
  • $182.4k - $247k

     ...companies in the world. Our engineering teams build highly technical...  ...operate one of the largest scale software platforms. The fleet...  ...with your team and product management to prioritize, design, implement...  ...possible. We run thousands of Kubernetes clusters across all regions... 
    Work at office
    Local area
    Worldwide
    Flexible hours

    Databricks

    Bellevue, WA
    5 days ago
  • $182.4k - $247k

     ...platform. We are looking for a backend software engineer to design, build, and operate micro‑...  ...Terraform. • Collaborate with product management and other engineering teams. • Operate...  ...technologies such as AWS, Azure, GCP, Docker, Kubernetes. Pay Range Transparency Local Pay... 
    Local area

    Menlo Ventures

    Bellevue, WA
    5 days ago
  •  ...the rewards. The Security Engineering team focuses on protecting...  ...scale. Work includes access management, certificate lifecycle management...  ...trust at scale! As a Staff Software Engineer, you will lead the...  ...in environments such as Kubernetes and AWS. You have experience... 
    Work at office
    Flexible hours
    Shift work
    3 days per week

    Somi AI

    Bellevue, WA
    3 days ago
  • $140.6k - $173.1k

    A technology-driven company is seeking a Staff Software Engineer skilled in Java, Kafka, and ElasticSearch for their Seattle location. In this role, you will design and implement scalable software solutions and maintain cloud infrastructure while leading a collaborative... 

    WEX

    Seattle, WA
    2 days ago
  • $154k - $220k

     ...future of cybersecurity. Role We are looking for a Sr. Staff Software Development Engineer-AI Security to join our team. This is a Hybrid (based...  ...including deep understanding of Linux networking stacks, Kubernetes networking, service meshes, and LLM model optimization... 
    Full time
    Work at office
    Local area

    Zscaler

    Bellevue, WA
    1 day ago
  • $165k - $242k

     ...Senior Software Engineer, Data Center Infrastructure Tooling CoreWeave is The Essential Cloud...  ...teams the ability to plan, visualize, and manage massive amounts of infrastructure...  ...for the services you build, including Kubernetes manifests, CI/CD pipelines, observability... 

    CoreWeave

    Bellevue, WA
    3 days ago
  • $181.3k - $261k

     ...that are central to their missions. Our engineering teams build highly technical products...  ...or more of the following--Cryptography, Kubernetes Security, Web Security, Governance, Privacy...  ..., Safety, Authentication, Identity Management, Access Control, Key Management, Inter-... 
    Local area
    Worldwide

    Databricks

    Bellevue, WA
    4 days ago
  • $208k - $260k

     ...freedom to access, move, and manage their money wherever life takes...  ...moments that matter most. Engineers on these teams apply a full-stack...  ...Canada send business. This Staff Engineer will work across...  ...systems concepts. 10+ years of software development experience, including... 
    Full time
    Work at office
    Worldwide
    Flexible hours
    3 days per week

    Remitly

    Seattle, WA
    21 hours ago
  •  ...leading financial technology company is seeking a Senior Software Engineer to build systems that manage cloud costs effectively. Located in Bellevue, WA,...  ...experience in cloud-native environments, particularly with Kubernetes, and strong analytical skills are essential. The... 

    Robinhood

    Bellevue, WA
    3 days ago
  • IBM Computing seeks a Senior Software Engineer to join the Compute Platform team. You will be a key leader in developing cloud-native solutions that power diverse workloads. With deep expertise in Kubernetes and Go, you will drive technical initiatives and mentor team... 

    IBM Computing

    Bellevue, WA
    2 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Staff Software Engineer - Managed Kubernetes. Be the first to apply!