Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

Kubernetes Infra Ops Engineer — AI Fleet & Capacity

Baseten

A leading AI infrastructure company in San Francisco is seeking an Infrastructure Ops Engineer to manage the operational health of their GPU fleet. This role involves working closely with customer success and engineering teams to execute complex hardware lifecycles while ensuring the reliable performance of their platform. Candidates should possess strong skills in Kubernetes and a solid background in cloud infrastructure management. The position offers competitive pay, equity, and comprehensive benefits including medical coverage and generous PTO policies. #J-18808-Ljbffr Baseten

Vacancy posted 1 day ago
Similar jobs that could be interesting for youBased on the Kubernetes Infra Ops Engineer — AI Fleet & Capacity in San Francisco, CA vacancy
  • Baseten is seeking a Capacity Operations Associate in...  ...support their global AI infrastructure. This role...  ...involves managing GPU fleet maintenance and...  ...in Computer Science or Engineering with over 2 years of experience...  ...-facing roles, strong Kubernetes knowledge, and a... 
    Fleet

    aijoblist

    San Francisco, CA
    3 days ago
  •  ...s most dynamic AI companies, like...  ...build the platform engineers turn to to ship...  ...Infrastructure Ops Engineer at...  ...that power our fleet This role is...  ...that high-level capacity strategies are...  ...hands-on with Kubernetes and cloud-native...  ...between SRE and Infra teams,... 
    Fleet
    Work experience placement
    Work at office
    Flexible hours

    Baseten

    San Francisco, CA
    2 days ago
  • $200k - $300k

     ...A technology infrastructure company is seeking a System Engineer, GPU Fleet, to manage and optimize GPU compute infrastructure for AI/ML workloads. You will ensure high availability and performance of the GPU server fleet through monitoring, troubleshooting, and automation... 
    Fleet

    Fluidstack

    San Francisco, CA
    2 days ago
  • A high-growth AI startup in San Francisco is seeking a Software Engineer (Infrastructure) to design and scale Kubernetes systems for a rapidly expanding platform. You will be responsible for leading technical deployments for enterprise clients and developing secure execution... 
    Suggested

    Jack & Jill/External ATS

    San Francisco, CA
    14 hours ago
  •  ...builds general-purpose AI for the physical...  ...a heterogeneous fleet of GPU and TPU...  ...work closely with ML Infra (training systems)...  ...Strong software engineering fundamentals Experience...  ...systems (SLURM, Kubernetes, GKE, K3S, or...  ...Experience with capacity planning and cloud... 
    Fleet

    Physical Intelligence

    San Francisco, CA
    1 day ago
  • $180k - $250k

    A tech innovation company is looking for a hands-on engineer in San Francisco to manage a vast fleet of GPU servers. You will build systems for tracking server lifecycle, automate provisioning and health checks, and ensure OS-level security. The role requires 5+ years... 
    Fleet

    Fal

    San Francisco, CA
    3 days ago
  • $293k - $385k

     ...The Infrastructure Engineering function sits within IT...  ...infrastructure provisioned through Infra Terraform, ensuring...  ...platforms, and fleet systems, driving durable...  ...platform services, including Kubernetes and Docker-based...  ...OpenAI OpenAI is an AI research and deployment... 
    Fleet
    Work at office

    OpenAI

    San Francisco, CA
    2 days ago
  • $140.6k - $173.1k

     ...seasoned Staff Software Engineer in the North America...  ...that focuses on building AI Platform to support the...  ...Platform. Within this capacity, you will be responsible...  ...strategic credit issuance to fleet organizations and their...  ...such as Docker and Kubernetes) ~ Awareness of API... 
    Fleet
    Remote work
    Flexible hours

    WEX

    San Francisco, CA
    2 days ago
  • Baseten is seeking a Capacity Operations Associate to support their global AI infrastructure in San Francisco...  ...and infrastructure engineering, collaborating closely...  ...include: Managing GPU fleet maintenance Coordinating...  ...engineering. Strong Kubernetes knowledge, attention to... 
    Fleet

    aijoblist

    San Francisco, CA
    3 days ago
  • Senior Site Reliability Engineer - AI Infrastructure Location: Global Remote /...  ...degradation, NCCL timeouts). Own capacity planning across heterogeneous GPU fleets optimized for training...  ...the syscall and hardware level. Kubernetes & Orchestration: Strong experience... 
    Fleet
    Full time
    Remote work

    Andromeda

    San Francisco, CA
    1 day ago
  • $293k

     ...the architectural and engineering backbone of OpenAI’s infrastructure...  ...of cutting-edge AI models. Our work spans...  ...architecture, fleet-level monitoring, and...  ..., joined to the right Kubernetes control plane, registered...  ...turning new SKUs into capacity that is usable by... 
    Fleet
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    14 hours ago
  • TRM Labs is looking for a Senior or Staff ML Systems Engineer to focus on building and scaling the technical infrastructure for AI/ML systems in San Francisco. This position involves developing reusable CI/CD workflows and automating model versioning to ensure compliance... 

    TRM Labs

    San Francisco, CA
    3 days ago
  • $202.5k - $247.5k

     ...sharing localhost or running AI workloads in production...  ..., AI inference, device fleets, and site‑to‑site...  ...worth your time. About the Infra Platform Team The Infra...  ...the systems ngrok engineers rely on to build, deploy...  ...Go, PostgreSQL, gRPC, Kubernetes, Terraform, Protobuf, nix... 
    Fleet
    Permanent employment
    Full time
    Work at office
    Local area
    Remote work
    Home office
    Flexible hours

    Jobr

    San Francisco, CA
    2 days ago
  •  ...the world's most dynamic AI companies, like Cursor,...  ...build the platform engineers turn to ship AI products...  ...THE ROLE We’re hiring a Capacity and Infrastructure Analytics...  ...across Baseten’s fleet. You’ll create reliable...  ...Accounting, Product, and Ops stakeholders. Strong... 
    Fleet
    Flexible hours

    Baseten

    San Francisco, CA
    2 days ago
  • $200k - $260k

     ...from zero — AWS, Kubernetes, Golang, and...  ...that the rest of engineering will build on...  ...years. Founding infra seat with the architectural...  ...building the AI-native...  ....ts service fleet (API gateway, GraphQL...  ...Reliability & ops — Backups, DR,...  ...for tomorrow. Capacity planning. Cost... 
    Fleet
    Full time
    Work at office
    Local area

    Matterhaul Inc.

    San Francisco, CA
    3 days ago
  •  ...AI Systems Engineer - Codex Core Agents About The Team The Codex Core Agents team builds...  ...envelope around tokens, latency, cost, capacity, and quality. The harness is open source...  ...behavior, inference/runtime stack, GPU fleet, and product surface. You'll work with... 
    Fleet

    OpenAI

    San Francisco, CA
    2 days ago
  • $145k - $195k

     ...A tech-driven company in San Francisco seeks a Software Engineer for the Infra team. The ideal candidate will have 3+ years experience and a strong focus on Python and CI/CD processes. Responsibilities include developing testing strategies, enhancing developer productivity... 

    Langchain

    San Francisco, CA
    2 days ago
  •  ...America's manufacturing base. Our AI-powered robots automate food prep...  ...is looking for a Senior Software Engineer, Robotics Platform, to help us scale our fleet of robots. You will make a large...  ...people in a tech lead or similar capacity. Chef Robotics is solving one of... 
    Fleet

    Israelvcforum

    San Francisco, CA
    2 days ago
  •  ...About Us Most AI is frozen in place - it doesn't adapt...  ...about both. Researchers and ML engineers will hand you workloads that...  ...cost across heterogeneous GPU fleets. Batching, scheduling, KV cache...  ...by. ~ Experience operating Kubernetes-based infrastructure,... 
    Fleet
    Flexible hours

    Adaption

    San Francisco, CA
    13 days ago
  • A leading AI research company in San Francisco is seeking engineers to operate next-gen compute clusters. The role requires scaling Kubernetes, automating infrastructure, and ensuring system reliability. Ideal candidates have strong Kubernetes and scripting skills with... 

    Slope

    San Francisco, CA
    1 day ago
  • $200k - $260k

     ...Inc. is seeking a founding infrastructure engineer based in San Francisco to rebuild their...  ...core systems from scratch using AWS and Kubernetes. The ideal candidate will have extensive...  ...revolutionizing the physical goods supply chain with AI-native solutions. #J-18808-Ljbffr... 
    Full time

    Matterhaul Inc.

    San Francisco, CA
    3 days ago
  • $179k - $218k

     ...Senior Staff Data Center Operations Engineer, GPU Hardware Architecture...  ...the only vertically integrated AI infrastructure company built from...  ...AI/ML methodologies to analyze fleet-wide telemetry (power draws,...  ...diagnostic tooling that allows Site Ops to identify NVLink flapping,... 
    Fleet
    Temporary work

    Crusoe

    San Francisco, CA
    6 days ago
  •  ...build the foundation for agent engineering in the real world, helping...  ...prototypes to production-ready AI agents that teams can rely on...  ..., Evaluation, Deployment, Fleet, and Sandboxes), our open source...  ...), containers, and basic Kubernetes concepts Have shipped and operated... 
    Fleet
    Work at office
    Flexible hours

    LangChain

    San Francisco, CA
    1 day ago
  •  ...class Site Reliability Engineer to ensure the...  ...scalability of our AI infrastructure platform...  ...tuning, incident ops, infrastructure health...  ...the founders, the infra team, and the dev...  ...operations, debugging, capacity planning, and...  ...) Experience with Kubernetes or similar orchestrators... 

    Blaxel

    San Francisco, CA
    2 days ago
  •  ...world At Bedrock, we’re moving AI out of the lab and into the...  ...construction veterans and world-class engineers to solve physical-world...  ...We’re building out our first fleet of retrofitted autonomous construction...  ...to week, supporting relevant ops as needed, with potential for... 
    Fleet
    Temporary work
    Work at office
    Remote work
    Flexible hours
    Night shift
    Weekend work

    Bedrock Robotics Inc

    San Francisco, CA
    14 hours ago
  • AI Systems Engineer - Codex Core Agents Location San Francisco Employment Type Full time Department Applied AI Compensation 230K-385...  ...agent stack, from backend systems to inference, GPUs, and fleet capacity. Work closely with research to make the harness trainable... 
    Fleet
    Full time
    Work at office
    Local area
    Relocation package
    Flexible hours

    Slope

    San Francisco, CA
    1 day ago
  •  ...individual to build a high-performance macOS virtualization platform. This role involves managing the VM lifecycle and integrating with the fleet scheduler to optimize performance on Apple Silicon. The ideal candidate is curious and has hands-on experience in virtualization... 
    Fleet
    Flexible hours

    Namespace

    San Francisco, CA
    1 day ago
  • $125k - $195k

     ...exceptional, hands-on engineers to make this happen. Mechanical...  ...philosophy towards infra is minimal,...  ...docker, cloud services, or kubernetes. Instead, there is a lot...  ...deploy and manage our fleet of on-prem servers,...  ...upon the applicant’s capacity to serve in compliance... 
    Fleet
    Work at office
    Visa sponsorship
    Night shift

    Atomicsemi

    San Francisco, CA
    3 days ago
  • $114k - $144k

    Who we are Pronto AI is a global leader in commercializing autonomous...  ...across our autonomous truck fleet. This role focuses on...  ...immediate problems, but building our capacity to diagnose and resolve issues...  ...closely with experienced engineers and industry operators. What... 
    Fleet
    Full time
    Work experience placement
    Internship
    Work at office
    Immediate start
    Worldwide
    Flexible hours

    ATOMS Careers page

    San Francisco, CA
    6 hours ago
  •  ...company in San Francisco is seeking a Backend Software Engineer to own core parts of the software stack for robotic arms. Responsibilities include architecting systems for AI models, maintaining tooling for fleet management, and supporting deployment on real hardware.... 
    Fleet

    Jobleads-US

    San Francisco, CA
    3 days ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Kubernetes Infra Ops Engineer — AI Fleet & Capacity. Be the first to apply!