Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Kubernetes Infra Ops Engineer — AI Fleet & Capacity

Baseten

A leading AI infrastructure company in San Francisco is seeking an Infrastructure Ops Engineer to manage the operational health of their GPU fleet. This role involves working closely with customer success and engineering teams to execute complex hardware lifecycles while ensuring the reliable performance of their platform. Candidates should possess strong skills in Kubernetes and a solid background in cloud infrastructure management. The position offers competitive pay, equity, and comprehensive benefits including medical coverage and generous PTO policies. #J-18808-Ljbffr Baseten

Vacancy posted 8 days ago
Similar jobs that could be interesting for youBased on the Kubernetes Infra Ops Engineer — AI Fleet & Capacity in San Francisco, CA vacancy
  •  ...Consensus in San Francisco is seeking an Infrastructure Ops Engineer to manage daily operations of our GPU fleet and maintain infrastructure excellence. This hands-...  ...teams to ensure timely fulfillment of customer capacity requests. Candidates should have a Bachelor's or... 
    Fleet

    The Consensus

    San Francisco, CA
    5 days ago
  • A high-growth AI startup in San Francisco is seeking a Software Engineer (Infrastructure) to design and scale Kubernetes systems for a rapidly expanding platform. You will be responsible for leading technical deployments for enterprise clients and developing secure execution... 
    Suggested

    Jack & Jill/External ATS

    San Francisco, CA
    2 days ago
  • $150k - $300k

     ...We are looking for a AI Cloud Infra Engineer to join our infrastructure...  ...relies heavily on Kubernetes (K8s), Terraform, and...  ...managing GPU server capacity, observability, and...  ...provider APIs to manage our fleet of GPU workers across...  ...: dev, arts, prod, ops, etc (and no, there... 
    Fleet

    Algora Public Benefit Corporation

    San Francisco, CA
    4 days ago
  •  ...s most dynamic AI companies, like...  ...build the platform engineers turn to to ship...  ...Infrastructure Ops Engineer at...  ...that power our fleet This role is designed...  ...high-level capacity strategies are...  ...hands-on with Kubernetes and cloud-...  ...between SRE and Infra teams, executing... 
    Fleet
    Work experience placement
    Work at office
    Flexible hours

    Baseten

    San Francisco, CA
    8 days ago
  • $180k - $250k

    A tech innovation company is looking for a hands-on engineer in San Francisco to manage a vast fleet of GPU servers. You will build systems for tracking server lifecycle, automate provisioning and health checks, and ensure OS-level security. The role requires 5+ years... 
    Fleet

    Fal

    San Francisco, CA
    5 days ago
  • $320k - $405k

     ...interpretable, and steerable AI systems. We want...  ...researchers, engineers, policy experts,...  ...Engineer, Node Infra About the role the...  ...lifecycle of accelerator capacity at the company. We...  ...node in the fleet usable and ready to...  ...platforms (e.g., Kubernetes, IaC, AWS/GCP/Azure... 
    Fleet
    Visa sponsorship

    United States Digital Space LLC

    San Francisco, CA
    5 days ago
  • $127k - $249k

     ...The Team Platform Engineering is the department within...  ...our multi-cloud-provider Kubernetes infrastructure,...  ...alerting systems. The Fleet Management team provides...  ...processes ("allergic to ops work") We are a small...  ...data platform for the AI era, enabling builders... 
    Fleet
    Work at office
    Local area
    Remote work
    Worldwide
    Flexible hours

    MongoDB

    San Francisco, CA
    4 days ago
  • $293k

     ...the architectural and engineering backbone of OpenAI’s infrastructure...  ...of cutting-edge AI models. Our work spans...  ...architecture, fleet-level monitoring, and...  ..., joined to the right Kubernetes control plane, registered...  ...turning new SKUs into capacity that is usable by... 
    Fleet
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    2 days ago
  •  ...the pioneering Causal AI platform. We help the world...  ...we're looking for an engineer who wants to own the...  ...reliability across the fleet. Improve system and network...  ..., and proactive capacity planning. Implement and...  ...peering. Familiarity with Kubernetes networking (CNI plugins... 
    Fleet

    Mxv

    San Francisco, CA
    3 days ago
  • $140.6k - $173.1k

     ...seasoned Staff Software Engineer in the North America...  ...that focuses on building AI Platform to support the...  ...Platform. Within this capacity, you will be responsible...  ...strategic credit issuance to fleet organizations and their...  ...such as Docker and Kubernetes) Awareness of API... 
    Fleet
    Remote work
    Flexible hours

    WEX

    San Francisco, CA
    1 day ago
  • Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote /...  ...degradation, NCCL timeouts). Own capacity planning across heterogeneous GPU fleets optimized for training...  ...the syscall and hardware level. Kubernetes & Orchestration: Strong experience... 
    Fleet
    Full time
    Remote work

    Cortes 23

    San Francisco, CA
    3 days ago
  •  ...Site Reliability Engineer to ensure the reliability...  ...scalability of our AI infrastructure...  ...performance tuning, incident ops, infrastructure...  ...the founders, the infra team, and the dev...  ..., debugging, capacity planning, and failure...  ...~ Experience with Kubernetes or similar orchestrators... 

    Blaxel, Inc

    San Francisco, CA
    5 days ago
  •  ...that increase the speed and capacity of buildout, starting with data...  ...foundational members of our Engineering team, you will design components...  ...for our fast-growing fleet of production robots. We like...  ...with Hardware Engineers and Ops team to support overall product... 
    Fleet

    Watney

    San Francisco, CA
    10 days ago
  •  ...America's manufacturing base. Our AI-powered robots automate food prep...  ...is looking for a Senior Software Engineer, Robotics Platform, to help us scale our fleet of robots. You will make a large...  ...people in a tech lead or similar capacity. Chef Robotics is solving one of... 
    Fleet

    Israelvcforum

    San Francisco, CA
    5 days ago
  • A leading AI research company in San Francisco is seeking engineers to operate next-gen compute clusters. The role requires scaling Kubernetes, automating infrastructure, and ensuring system reliability. Ideal candidates have strong Kubernetes and scripting skills with... 

    Slope

    San Francisco, CA
    3 days ago
  • $200k - $260k

     ...Inc. is seeking a founding infrastructure engineer based in San Francisco to rebuild their...  ...core systems from scratch using AWS and Kubernetes. The ideal candidate will have extensive...  ...revolutionizing the physical goods supply chain with AI-native solutions. #J-18808-Ljbffr... 
    Full time

    Matterhaul Inc.

    San Francisco, CA
    5 days ago
  •  ...the world's most dynamic AI companies, like Cursor,...  ...build the platform engineers turn to ship AI products...  ...THE ROLE We’re hiring a Capacity and Infrastructure Analytics...  ...across Baseten’s fleet. You’ll create reliable...  ...Accounting, Product, and Ops stakeholders. Strong... 
    Fleet
    Flexible hours

    Baseten

    San Francisco, CA
    4 days ago
  •  ...the world's most dynamic AI companies, like Cursor,...  ...build the platform engineers turn to to ship AI products...  ...how Baseten operates. Capacity helps unlock revenue by...  ...Partner with SRE and Infra teams to ensure Capacity...  ...operational reality of the fleet, not just the desired... 
    Fleet
    Flexible hours

    The Consensus

    San Francisco, CA
    5 days ago
  • AI Systems Engineer - Codex Core Agents The Codex Core Agents team builds the agent harness that turns...  ...envelope around tokens, latency, cost, capacity, and quality. The harness is open...  ...behavior, inference/runtime stack, GPU fleet, and product surface. You’ll work with... 
    Fleet

    AI Chopping Block, Inc.

    San Francisco, CA
    3 days ago
  • $200k - $260k

     ...from zero — AWS, Kubernetes, Golang, and...  ...that the rest of engineering will build on...  ...years. Founding infra seat with the architectural...  ...building the AI-native...  ....ts service fleet (API gateway, GraphQL...  ...Reliability & ops — Backups, DR,...  ...for tomorrow. Capacity planning. Cost... 
    Fleet
    Full time
    Work at office
    Local area

    Matterhaul Inc.

    San Francisco, CA
    5 days ago
  • Generalist is seeking a candidate to manage GPU fleets for training large-scale AI models. You will optimize ML data loading, storage, and orchestration...  ...candidates have deep experience with GPUs, Slurm or Kubernetes, and a strong understanding of the ML hardware stack.... 
    Fleet

    Generalist

    San Francisco, CA
    1 day ago
  •  ...Hyphen Connect in San Francisco is seeking a Robotics Software Engineer to design advanced algorithms and systems for optimizing robotic fleet efficiency. Your expertise will be critical in developing low-latency communication protocols and cloud dashboards for fleet monitoring... 
    Fleet

    Hyphen Connect

    San Francisco, CA
    5 days ago
  • $202.5k - $247.5k

     ...sharing localhost or running AI workloads in production...  ..., AI inference, device fleets, and site‑to‑site...  ...your time. About the Infra Platform Team The Infra...  ...builds the systems ngrok engineers rely on to build,...  ...Go, PostgreSQL, gRPC, Kubernetes, Terraform, Protobuf, nix... 
    Fleet
    Permanent employment
    Full time
    Work at office
    Local area
    Remote work
    Home office
    Flexible hours

    jobr.pro

    San Francisco, CA
    3 days ago
  •  ...build the foundation for agent engineering in the real world, helping...  ...prototypes to production‑ready AI agents that teams can rely on...  ..., Evaluation, Deployment, Fleet, and Sandboxes), our open‑source...  ...), containers, and basic Kubernetes concepts Have shipped and operated... 
    Fleet
    Work at office
    Flexible hours

    LangChain

    San Francisco, CA
    3 days ago
  •  ...individual to build a high-performance macOS virtualization platform. This role involves managing the VM lifecycle and integrating with the fleet scheduler to optimize performance on Apple Silicon. The ideal candidate is curious and has hands-on experience in virtualization... 
    Fleet
    Flexible hours

    Name.Space

    San Francisco, CA
    3 days ago
  • AI Systems Engineer - Codex Core Agents Location San Francisco Employment Type Full time Department Applied AI Compensation 230K-385...  ...agent stack, from backend systems to inference, GPUs, and fleet capacity. Work closely with research to make the harness trainable... 
    Fleet
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    OpenAI

    San Francisco, CA
    3 days ago
  •  ...are a non-hierarchical team seeking a highly experienced Dev-ops Engineer to collaborate with the technical team who specializes in blockchain...  ..., utilizing containerisation technologies (e.g. Docker, Kubernetes) to automate the deployment process, ensuring seamless and... 
    Remote work
    Flexible hours

    Blockchain Works

    San Francisco, CA
    5 days ago
  • $125k - $195k

     ...exceptional, hands-on engineers to make this happen. Mechanical...  ...philosophy towards infra is minimal,...  ...docker, cloud services, or kubernetes. Instead, there is a lot...  ...deploy and manage our fleet of on-prem servers,...  ...upon the applicant’s capacity to serve in compliance... 
    Fleet
    Work at office
    Visa sponsorship
    Night shift

    Atomicsemi

    San Francisco, CA
    5 days ago
  • $255k - $405k

     ...Join the engineering teams that bring OpenAI's ideas...  ...the benefits of AI, while ensuring that...  ...metrics across our fleet. We're now layering...  ...engineers across the stack-infra, backend, and...  ...researchers, user ops, and other teams...  ..., and cloud infra (Kubernetes, AWS, etc). Bonus... 
    Fleet

    OpenAI

    San Francisco, CA
    5 days ago
  •  ...world At Bedrock, we’re moving AI out of the lab and into the...  ...construction veterans and world-class engineers to solve physical-world...  ...hardware development and robotic fleet operations. This role sits at...  ...Meeting with field ops on triage trends Reviewing top... 
    Fleet
    Permanent employment
    Contract work
    Work at office
    Flexible hours
    Shift work
    Night shift
    Day shift

    Bedrock Robotics Inc

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Kubernetes Infra Ops Engineer — AI Fleet & Capacity. Be the first to apply!