Sign up to access all features of our service.
  • Job search
  • Favorites
  • Create a CV
    New
  • Salaries
  • Subscriptions

AI Infra & Cluster Engineer — Scale GPU/CPU Orchestration

Linuxcareers

Linuxcareers is seeking an Infrastructure/Cluster Engineer to design and operate large-scale clusters that enable AI inference at scale. The role focuses on managing diverse hardware architectures and building robust infrastructure. The ideal candidate will possess deep expertise in Linux systems, automation tools, and orchestration technologies. Responsibilities include debugging performance issues and designing observability systems for cluster health. Experience with GPU infrastructure is a plus. #J-18808-Ljbffr Linuxcareers

Vacancy posted 2 days ago
Similar jobs that could be interesting for youBased on the AI Infra & Cluster Engineer — Scale GPU/CPU Orchestration in San Francisco, CA vacancy
  • A cutting-edge tech company in San Francisco seeks infrastructure engineers to enhance the tooling and systems that power its AI applications. Responsibilities include building GPU orchestration, scaling cloud batchjob systems, and designing efficient scheduling software... 
    Suggested
    Visa sponsorship

    Exa Corporation

    San Francisco, CA
    1 day ago
  • $300k

    Albert Bow is seeking a Founding Engineer to design and scale their distributed systems for autonomous AI agents. With a salary of up to $300,000 and equity, you will have the opportunity to join an experienced founding team at a rapidly growing venture-backed AI startup... 
    Suggested

    Albert Bow

    San Francisco, CA
    4 days ago
  • Sciforium is looking for a Senior HPC & GPU Infrastructure Engineer to oversee our GPU compute cluster’s health, reliability, and performance. This role involves hands-on Linux systems engineering, GPU driver management, and maintaining machine learning software stacks... 
    Suggested
    Flexible hours

    Sciforium

    San Francisco, CA
    1 day ago
  • A leading AI technology company in San Francisco is looking for a Senior Software Engineer to build scalable infrastructure for large‑scale training and fine-tuning of foundation models. You will design...  ...training systems and optimize GPU utilization while collaborating with... 
    Suggested

    Baseten

    San Francisco, CA
    1 day ago
  •  ...A leading AI acceleration company in San Francisco is seeking a GPU Kernel Engineer to optimize performance for machine learning models. You will be responsible for designing high-performance GPU kernels and using advanced techniques to boost computation efficiency. Ideal... 
    Suggested

    Baseten

    San Francisco, CA
    1 day ago
  •  ...innovative company is seeking a talented software engineer to join their dynamic Inference team. This...  ...and implementing infrastructure for large-scale multimodal models, focusing on high-...  ...product teams to push the boundaries of AI technology, ensuring reliable production services... 

    OpenAI

    San Francisco, CA
    17 hours ago
  • Nooks in San Francisco is seeking a Senior Engineer to build infrastructure that enhances the efficiency of multiple product teams. The...  ...engineering experience, particularly in distributed systems and scaling production environments. Candidates should be comfortable... 
    Work at office
    3 days per week

    Nooks

    San Francisco, CA
    1 day ago
  • A high-growth AI startup in San Francisco is seeking a Software Engineer (Infrastructure) to design and scale Kubernetes systems for a rapidly expanding platform. You will be responsible for leading technical deployments for enterprise clients and developing secure execution... 

    Jack & Jill/External ATS

    San Francisco, CA
    3 days ago
  • $250k

     ...opportunities? Join a rapidly scaling AI cloud infrastructure provider building a next-generation GPU platform designed for AI...  ...Senior / Staff Site Reliability Engineer to support and scale large-scale...  ...monitoring frameworks for GPU compute clusters Collaborate with ML, data,... 
    Permanent employment
    Remote work
    San Francisco, CA
    27 days ago
  • $190k - $270k

    AI Chopping Block, Inc. is seeking an experienced AI Infrastructure Engineer to manage user-facing services and production systems. The role encompasses participating in on-call rotations, building infrastructure with tools like Ansible, Terraform, and Kubernetes, and... 
    Full time
    Internship

    AI Chopping Block, Inc.

    San Francisco, CA
    4 days ago
  • $180k - $250k

    A tech innovation company is looking for a hands-on engineer in San Francisco to manage a vast fleet of GPU servers. You will build systems for tracking server lifecycle, automate provisioning and health checks, and ensure OS-level security. The role requires 5+ years of... 

    Fal

    San Francisco, CA
    1 day ago
  • $180k - $250k

     ...next generation of AI products. We build...  ...production, and do it at scale without compromise....  ...inference, orchestration, and observability...  ...experienced software engineer who thrives on building...  ..., scheduling, GPU autoscaling, large...  ...and tune low level CPU and memory performance... 
    Currently hiring
    Remote work
    Relocation package

    Fal

    San Francisco, CA
    1 day ago
  • $170k - $230k

     ...Site Reliability Engineer (SRE) Palo Alto...  ...Mithril is an AI infrastructure platform...  ...platform built to make GPU compute more...  ...shape how Mithril scales its platform across...  ...Mithril's global GPU orchestration platform. This is...  ...managing clusters, deployments, and... 
    Work at office
    Local area
    1 day per week

    Mithril

    San Francisco, CA
    3 days ago
  • $120k - $290k

    Somi AI is looking for a Software Engineer to join their team in San Francisco. In this role, you will design and build systems that provision and scale Neki clusters, ensuring high availability and data protection. The ideal candidate will have 5+ years of software engineering... 

    Somi AI

    San Francisco, CA
    4 days ago
  • A leading AI research company in San Francisco is seeking engineers to operate next-gen compute clusters. The role requires scaling Kubernetes, automating infrastructure, and ensuring system reliability. Ideal candidates have strong Kubernetes and scripting skills with... 

    Slope

    San Francisco, CA
    4 days ago
  • $200k - $400k

    Inferact is seeking a dedicated cluster administration engineer to manage high-performance GPU compute infrastructure in San Francisco. This hands-on role focuses on optimizing system health and availability for engineering productivity. Ideal candidates will have substantial... 
    Remote job

    Inferact

    San Francisco, CA
    7 hours ago
  • $300k

    Aionia Group in San Francisco is seeking a Systems Infrastructure Engineer to build scalable infrastructure for RL experiments. This role...  ...on innovative projects with leading researchers in a well-funded AI company. The ideal candidate has over 2 years of experience in... 

    Aionia Group

    San Francisco, CA
    4 days ago
  •  ...the world's most dynamic AI companies, like Cursor,...  ...build the platform engineers turn to to ship AI products...  ...multi‑modal workloads scale, the network is the...  ...engineers to lead our GPU Networking efforts, making...  ...performance on bleeding‑edge clusters (H100/H200, B200/B300,... 
    Flexible hours

    Baseten

    San Francisco, CA
    1 day ago
  • $300k

     ...building out their AI and cloud platform...  ..., full-scale model training, or...  ...inference. As a Platform Engineer/Senior Site Reliability...  ...of this GPU-powered infrastructure...  ...ensuring seamless orchestration across environments...  ...of the largest GPU clusters in private deployment... 

    Hamilton Barnes Associates Limited

    San Francisco, CA
    1 day ago
  • Senior Site Reliability Engineer - AI Infrastructure...  ...About Andromeda Andromeda Cluster was founded by Nat Friedman...  ...access to the kind of scaled AI infrastructure once...  ...systems, network, and orchestration layer that makes the...  ...and debug large‑scale GPU infrastructure used... 
    Full time
    Remote work

    Cortes 23

    San Francisco, CA
    4 days ago
  • Site Reliability Engineer - AI Infrastructure Location: Global...  ...Andromeda Andromeda Cluster was founded by Nat...  ...access to the kind of scaled AI infrastructure once...  ...systems, network, and orchestration layer that makes the world...  .../AI infrastructure or GPU-based systems (CUDA,... 
    Full time
    Remote work

    Andromeda Cluster

    San Francisco, CA
    1 day ago
  • $179k - $218k

     ...the only vertically integrated AI infrastructure company built...  ...urgency, who believe in the scale of our ambition and thrive on...  ...Staff Data Center Operations Engineer, GPU Hardware Architecture to be the...  ...needed to maintain peak cluster health.   The Strategic Bridge... 
    Temporary work

    Crusoe

    San Francisco, CA
    3 days ago
  • Mistral in San Francisco is seeking a Systems Engineer/System Administrator to manage and scale its AI infrastructure. This hybrid role demands skills in Linux...  ...in systems administration and experience with HPC clusters or cloud infrastructure. Join Mistral for a high-impact... 

    Mistral

    San Francisco, CA
    3 days ago
  • $172.5k - $210k

    A cutting-edge AI infrastructure firm located in San Francisco is seeking a Senior Systems Performance Engineer. This role involves leading hardware evaluations and optimizing AI systems for performance. Candidates should have over 5 years of experience, proficiency in... 

    Epoch Biodesign

    San Francisco, CA
    3 days ago
  • A tech company focused on AI is seeking a Site Reliability Engineer to ensure the reliability and performance of its GPU marketplace. This role involves maintaining service level objectives, managing capacity, and implementing secure systems. The ideal candidate has strong... 

    Hyperbolic Labs

    San Francisco, CA
    1 day ago
  • Senior Infrastructure Engineer - Bland As a Senior Infrastructure...  ...anticipating and solving scaling challenges related to...  ...industries. Lead - AI/ML Stack Infrastructure Lead...  ...operating production Kubernetes clusters optimized for AI/ML workloads with GPU support, implementing... 
    Temporary work

    AI Chopping Block, Inc.

    San Francisco, CA
    1 day ago
  • $300k

     ...Join a seed-stage AI infrastructure company building large-scale training and inference platforms...  ...with a single managed GPU cluster that reached capacity...  ..., networking, and orchestration. You lead technical...  ...with both executives and engineers, and help create a repeatable... 
    Permanent employment
    Immediate start
    San Francisco, CA
    more than 2 months ago
  • $250k - $400k

    A leading AI research firm in San Francisco seeks experienced professionals to build and scale systems for AI-driven scientific discovery. The role involves developing...  ...base plus equity, with opportunities for ML Engineers, ML Infra, Research Engineers, and Research... 
    Remote job

    Trades Workforce Solutions

    San Francisco, CA
    1 day ago
  • $335k

    OpenAI in San Francisco seeks a System Engineer to architect and operationalize essential infrastructure for AI systems. The role demands 7+ years in systems engineering...  ...experience debugging and a solid grasp of clustering and scaling in production environments. Offers a hybrid... 
    Relocation package

    OpenAI

    San Francisco, CA
    1 day ago
  •  ...history. When people finance GPU clusters, the datacenters housing...  ...to the market? Otherwise, as AI scales, compute only becomes available...  ...metal servers with our VM orchestration software all the way to coordinating...  ...assembly Understanding of CPU interrupts Networking... 
    Long term contract
    Contract work
    Fixed term contract
    Work at office
    Local area
    Visa sponsorship
    Shift work

    The San Francisco Compute Company

    San Francisco, CA
    1 day ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to AI Infra & Cluster Engineer — Scale GPU/CPU Orchestration. Be the first to apply!